obinnaonyema / movie_recommendation_with_spark Goto Github PK

View Code? Open in Web Editor NEW

Movie recommendation system. Azure Data Lake Storage is used as data store, Azure Data Factory for workflow management, Azure Databricks for feature engineering and modelling with Spark.

Python 100.00%

movie_recommendation_with_spark's Introduction

Introduction

In this project, data is ingested into Azure Gen2 datalake storage using Azure Data Factory. This data is loaded and prepared for machine learning in Databricks so as to use the distributed processing engine provided by Spark.

Getting Started

TODO: Guide users through getting your code up and running on their own system. In this section you can talk about:

Installation process
Software dependencies
Latest releases
API references

Build and Test

Create storage account with hierarchical namespace enabled and load movies and ratings files into blobl containers.
Create Databricks workspace and mount data from Azure blob storage to databricks.
Create feature engineering, training and prediction script in Databricks.
Set up app registration and create Key Vault to store app secrets.
Create Data Factory and set up pipeline with datasets and email notification extension
Create Logic App to receipt http request from Data Factory and shoot out email message in the request body

Contribute

Current challenges:

Scheduled trigger fails with unknown error. Manual trigger works

Recommend Projects

obinnaonyema / movie_recommendation_with_spark Goto Github PK

movie_recommendation_with_spark's Introduction

Introduction

Getting Started

Build and Test

Contribute

movie_recommendation_with_spark's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent