Algorithm that recommends the most relevant articles to IBM Watson Studio platform users.
This project requires Python and usual libraries used in data science such as:
- numpy
- pandas
- matplotlib
To easily check the code at the same time as the output, it is recommended to install Anaconda and Jupyter Notebook.
Two datasets are available:
- user-item-interactions.csv records user id and article id that have been interacted to each other.
- articles_community.csv provides more details about the articles.
On IBM Watson Studio Platform, users read articles shared in data science and artificial intelligence community. The interactions that users have with articles are recorded (no rating was available).
While the above dashboard shows only newest articles, it will be more relevant to recommend the most pertinent articles to users.
The main tasks in this project are the following:
Explore the number of users, articles and interactions available in the dataset as well as main statistic metrics.
Build functions that output the top k most popular articles.
Build functions that find the most similar users based on interactions in the past and make recommendations to a specific user with articles seen by its most similar users.
Machine learning approach to building recommendations. Using Singular Value Decomposition to predict interaction between users and articles, then make recommendations. Discussion about methods that should work well in practice.
The dataset was provided by Udacity as part of its Data Scientist Nanodegree program. Dataset credit belongs to IBM Watson Studio.