Coder Social home page Coder Social logo

reinforcement-q-learning-snake-and-ladder-'s Introduction

Description

As part of a ML course, we were asked to work on an algo that meets the academic requirements for our RL exam. Thus, in a team we focused on a variant of the famous game Snake & Ladder. This variant of the game allows you to play with 3 different dice. each with its own properties for advancing. The board has shortcuts and traps of different types that allow you to gain or lose time. The game also has two modes, ”circle” which turns the course into a continuous circle requiring the player to land precisely on the last one to finish the game; or ”no circle” which offers a classic course.

##Objective

The goal is to converge to an optimal strategy (using Markov Decision Process). So leading to maximisation of the total reward r of successive steps and in our case of the Snake and Ladder to minimise the costs.

Q(s, a) ← Q(s, a) + α [r + γ min Q(s′, a′) − Q(s, a)]

Implementation Q-learning

Three initial functions Safe Matrix(nbr steps), Normal Matrix(layout, nbr steps, circle) and Risky Matrix(layout, nbr steps, circle) were constructed in order to build the transition probability matrices that the value and policy iteration needed to work. These matrices allow the transition probabilities between states to be adapted according to the location of the traps, the dice used, the structure and the game mode.

markovDecision(layout,circle) applies the value and policy iteration method to the letter. It calculates in P1, P2 and P3, the transition probability matrices of the three dice thanks to the functions mentioned above. It then constructs the rewards vectors which depend on the dice used and the location of the type 3 trap (prison). Thus, going to jail implies additional rewards. Once the necessary inputs have been provided, the function applies the value and policy iteration defined on the previous page.

It calculates the values of each state for each of the 3 dice and retains the minimum as the optimal value. In a second step, it retains the index of this minimum value to indicate which die is used to obtain this optimal value. All this is done in an iterative way until the changes between 2 iterations are too small.

Several sub-optimal policy functions have been constructed to develop dice policies that differ from the optimal policy. These policies, which will later be used to prove that these strategies are less efficient than the optimal strategy, return a vector of dice to be used in each state. For example, the function Dice1 strategy(layout, cir- cle) constructs a policy that uses only dice 1. The function Purely Random Strategy(layout, circle) constructs a purely random policy.

Simulator(layout, circle, Dice, nbr sim, start) = the algo that plays the game conditionnaly to Dice strategy imposed on it, the location of the traps and the game mode. After playing thousands of games (nbr sim), simunlator() returns the average number of turns used to finish the game for the start box.

reinforcement-q-learning-snake-and-ladder-'s People

Contributors

addae-pk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.