- Backend is capable of loading data in bulk
- Backend is open to add more users and movies to the database
- Backend is capable to search movies (by movie name and movie id) to show its properties
- Backend is capable to search user (by user id) to show its properties
- Based on viewed movies, the system should recommend unwatched movies to the user
Download MovieLens data from this link https://grouplens.org/datasets/movielens/
To ingest data to the backend, we need to format it. To directly download the data, refer to the drive link mentioned below.
ratings.csv -----> https://drive.google.com/file/d/1pSwZIl4YymTQK38Kj7JMr-fAgKa7IPhg/view?usp=sharing
fullMovie.csv -----> https://drive.google.com/file/d/1OguNBtmL_LgzWYW8lykfCr3cLMdM1YvB/view?usp=sharing
users.csv -----> https://drive.google.com/file/d/15mQs9ie4GJP6OH3rtagPXcnYthdo3xAf/view?usp=sharing
If you're curious to know how I got this formatted, please refer the information given below. I had used Spark (spark-shell) for this purpose.
Merge movies.csv with links.csv
/opt/spark/spark-shell --master local
(In my system spark is installed inside /opt. So, change it with yours!)
val movies = spark.read.format("csv").option("header", "true").option("inferSchema", true).load("/path/to/movies.csv")
val links = spark.read.format("csv").option("header", "true").option("inferSchema", true).load("/path/to/links.csv")
movies.show()
links.show()
val joinedMovieInfo = movies.join(links, Seq("movieId"), "inner")
joinedMovieInfo.show()
/**
* Movies name can contain ',' so, using csv format is not a good option. I'm saving it using semicolon separated value.
*/
val ssvWithHeaderOptions: Map[String, String] = Map(("delimiter", ";"), ("header", "true"))
joinedMovieInfo.write.options(tsvWithHeaderOptions).csv("path/to/store/ssv")
To get data for UserId
val ratings = spark.read.format("csv").option("header", "true").option("inferSchema", true).load("/path/to/ratings.csv")
val users = ratings.select("userId")
val user = users.dropDuplicates()
user.coalesce(1).write.csv("path/to/write/users.csv")
Once you get the data put it inside src/main/resources/
. Make sure the name is same as the file name mentioned above in drive link.
Make sure Apache Cassandra and Elasticsearch is running in the background. If you're running Cassandra or Elasticsearch on the cloud then, configure it in the file config/janusgraph-cql-es.properties
.
Now, we are all set to start.
Step 1: Load data into Janusgraph
We need to start the main method inside org.annihilator.recommendation.core.LoadMovieLensData.java
.
NOTE: By running this file it will first clear all the data from your Janusgraph.
This will upload all the Vertex (User and Movie) and Edges. It will take almost 5 hours to upload the complete data (20 Million relations).
Step 2: Start the API
To start the API run org.annihilator.recommendation.controller.RecommendationEngineController.java
with argument parameter server config/engine.yaml
.
POST /janus_engine/add_edge (org.annihilator.recommendation.controller.RecommendingEngineController)
POST /janus_engine/add_movie (org.annihilator.recommendation.controller.RecommendingEngineController)
POST /janus_engine/add_user (org.annihilator.recommendation.controller.RecommendingEngineController)
POST /janus_engine/get_edge_properties (org.annihilator.recommendation.controller.RecommendingEngineController)
POST /janus_engine/get_movie_details_by_name (org.annihilator.recommendation.controller.RecommendingEngineController)
POST /janus_engine/get_vertex_properties (org.annihilator.recommendation.controller.RecommendingEngineController)
POST /janus_engine/purge (org.annihilator.recommendation.controller.RecommendingEngineController)
(Work in Progress)POST /janus_engine/get_similar_movies (org.annihilator.recommendation.controller.RecommendingEngineController)
For more information download Postman collection and export it to your Postman console
https://www.getpostman.com/collections/4c2fd9093cd8b8dc234f
All the configurations related with Janusgraph are in the file config/janusgraph-cql-es.properties
. Application level configuration, find it in the file config/engine.yaml
. API Log level we can change it in config.yml
in the root directory.
This project can be used for the recommendation, specially for study and research purposes. I always wanted to have a powerful rule based recommendation engine, so here the journey begins.
I love coming up with elegant simple solutions to solve the most complex problems. I find joy and satisfaction in building a solution, that is intuitive and simple on its surface and yet complex and intelligent at its core.
Checkout my social profiles for more information:
Website: https://arcticoak2.github.io/
Linkedin: https://www.linkedin.com/in/abhijeet-kumar-983b57a4/
Apache License
Version 2.0, January 2004