In this project, data is ingested into Azure Gen2 datalake storage using Azure Data Factory. This data is loaded and prepared for machine learning in Databricks so as to use the distributed processing engine provided by Spark.
TODO: Guide users through getting your code up and running on their own system. In this section you can talk about:
- Installation process
- Software dependencies
- Latest releases
- API references
- Create storage account with hierarchical namespace enabled and load movies and ratings files into blobl containers.
- Create Databricks workspace and mount data from Azure blob storage to databricks.
- Create feature engineering, training and prediction script in Databricks.
- Set up app registration and create Key Vault to store app secrets.
- Create Data Factory and set up pipeline with datasets and email notification extension
- Create Logic App to receipt http request from Data Factory and shoot out email message in the request body
Current challenges:
- Scheduled trigger fails with unknown error. Manual trigger works