Coder Social home page Coder Social logo

sparsh-ai / reco-bandit Goto Github PK

View Code? Open in Web Editor NEW
10.0 3.0 3.0 7.5 MB

Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning

License: Apache License 2.0

Python 3.03% Jupyter Notebook 96.97%
recommender-system bandit-algorithms contextual-bandits

reco-bandit's Introduction

RecoBandit

Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning

App 1

Thompson Sampling, Single-user Multi-product Simulation, Multi-armed Bandit

The objective of this app is to apply the bandit algorithms to recommendation problem under a simulated envrionment. Although in practice we would also use the real data, the complexity of the recommendation problem and the associated algorithmic challenges can already be revealed even in this simple setting.

RecoBandit - Thompson Sampling Simulation

Inspired by the following works:

App 2

Multi-user Multi-product Contextual Simulation, Contextual Bandit, Vowpal Wabbit

The objective of this app is to apply the contextual bandit algorithms to recommendation problem under a simulated envrionment. The recommender agent is able to quickly adapt the changing bahavior of users and change the recommendation strategy accordingly.

VW Contextual Bandit Simulation

App 3 (next release)

Image Embeddings, Offline Learning

The objective is to recommend products and adapt the model in real-time using user's feedback using Actor-critic algorithm. Suppose, we observed users’ behavior and acquired some products they clicked on. It is fed into the Actor Network which decides what we would like to read next. It produces an ideal product embedding. It can be compared with other product embeddings to find similarities. The most matching one will be recommended to the user. The Critic helps to judge the Actor and help it find out what is wrong.

Inspired by the following works:

App 4 (next release)

Offline Learning

The core intuition is that we couldn't just blindly apply RL algorithms in a production system out of the box. The learning period would be too costly. Instead, we need to leverage the vast amounts of offline training examples to make the algorithm perform as good as the current system before releasing into the online production environment. An agent is first given access to many offline training examples produced from a fixed policy. Then, they have access to the online system where they choose the actions.

Inspired by the following works:

What is Bandit based Recommendation?

Traditionally, the recommendation problem was considered as a simple classification or prediction problem; however, the sequential nature of the recommendation problem has been shown. Accordingly, it can be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to solve it. In fact, recent advances in combining deep learning with traditional RL methods, i.e. deep reinforcement learning (DRL), has made it possible to apply RL to the recommendation problem with massive state and action spaces.

Use case 1: Personalized recommendations

Goal: Quickly help users find products they would like to buy

In e-commerce and other digital domains, companies frequently want to offer personalised product recommendations to users. This is hard when you don’t yet know a lot about the customer, or you don’t understand what features of a product are pertinent. With limited information about what actions to take, what their payoffs will be, and limited resources to explore the competing actions that you can take, it is hard to know what to do.

Use case 2: Online model evaluation

Goal: Compare and find the best performing recommender model

Use case 3: Personalized re-ranking

Goal: Bring the most relevant option to the top

Use case 4: Personalized feeds

Goal: Recommend a never-ending feed of items (news, products, images, music)

https://youtu.be/CgGCbmlRI3o

References

  1. LinUCB Contextual News Recommendation
  2. Experiment with Bandits
  3. n-armed Bandit Recommender
  4. Bandit Algorithms for Website Optimization [eBook O’reilly] [GitHub] [Colab]
  5. MAB Ranking PyPi
  6. RecSim GitHub, Video, Medium
  7. https://vowpalwabbit.org/tutorials/contextual_bandits.html
  8. https://github.com/sadighian/recommendation-gym
  9. https://learning.oreilly.com/library/view/reinforcement-learning-pocket/9781098101527/ch02.html
  10. https://github.com/awarebayes/RecNN/
  11. https://vowpalwabbit.org/neurips2019/
  12. https://github.com/criteo-research/reco-gym
  13. https://pypi.org/project/SMPyBandits/
  14. https://github.com/bgalbraith/bandits
  15. https://pypi.org/project/mab-ranking/
  16. https://www.optimizely.com/optimization-glossary/multi-armed-bandit/
  17. https://abhishek-maheshwarappa.medium.com/multi-arm-bandits-for-recommendations-and-a-b-testing-on-amazon-ratings-data-set-9f802f2c4073

reco-bandit's People

Contributors

sparsh-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.