News-Recommendation-at-scale-using-Neo4j
MIcrosoft News Dataset (MIND) [1] is a sample of 1 million anonymized users and their click behaviors collected from the Microsoft News website. It includes about 15M impressions logs for about 160K English News articles to quickly predict similar news based on user preferences and enable rank-ordered recommendation queries personalized to each user by leveraging Collaborative Filtering and KNN.
You can download the MIND dataset and read the paper from MIND website. Download the training and validation set of MIND Large dataset and store it in the same directory as the 2 notebooks.
- Data_Preprocessing_and_Loading.ipynb notebook preprocess data and models it as a graph and loads it in Neo4j DBMS.
- News_Recommendation_Engine.ipynb notebook profiles the graph, Scaled Collaborative Filtering (Item-based) using Neo4j GDS with FastRP embeddings and KNN.
Dependencies
- Need Neo4j Desktop 1.4.12 installed with PC. (16GB of Ram is required by Neo4j to do in-memory graph computations)
- Add APOC and Graph Data Science Plugins while creating DBMS in Neo4j Desktop.
- Need Python 3.5.x or more.
- pip install neo4j (make sure you uninstall neo4j library if already exists in your site-packages in python and re-install it).
Make sure to close the driver at the end of the session.
Citations
[1] Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu and Ming Zhou. MIND: A Large-scale Dataset for News Recommendation. ACL 2020.