Coder Social home page Coder Social logo

q-learning-openai-taxi-v2's Introduction

OpenAI Taxi-v2 game using Q learning

Q-learning is an off policy reinforcement learning algorithm that seeks to find the best action to take given the current state. In this project Q learning technique has been applied over OpenAI Taxi-v2 game.

In order to update Q values, we will use the following action-value function:

In this game we have 6 actions:

  • down (south)
  • up (north)
  • right (east)
  • left (west)
  • picking up
  • dropping off

Secondly, we should define our states. We have 5*5 cells which means we have at least 25 states. The taxi can be also in a state of picking up or dropping off the passenger. Then we have 4 more states. Finally, we have to compute those states where the passenger is actually picked up, dropped off (+ 4 states, since the locations where the passenger could be are 4) or simply transported (+ 1 state). So, in total, we have 5x5x4x5=500 states.

Each state of our agent is actually encoded within value 0 and 499. We should also take rewards into account. There are different rewards depending on action that agent makes:

  • if the taxi correctly pick-up/drop-off the passenger, it is rewarded with +20 points

  • if the taxi does an illegal pick-up/drop-off, it is punished with -10 points

  • for each step which does not include the states above, it loses 1 point

An agent interacts with the environment in 1 of 2 ways. The first is to use the q-table as a reference and view all possible actions for a given state. The agent then selects the action based on the max value of those actions. This is known as exploiting since we use the information we have available to us to make a decision.

The second way to take action is to act randomly. This is called exploring. Instead of selecting actions based on the max future reward we select an action at random. Acting randomly is important because it allows the agent to explore and discover new states that otherwise may not be selected during the exploitation process. You can balance exploration/exploitation using epsilon (ε) and setting the value of how often you want to explore vs exploit. Here’s some rough code that will depend on how the state and action space are setup.

You can launch game and see optimal converged Q table, the steps that has been taken to reach the goal:

$ python3 game.py

q-learning-openai-taxi-v2's People

Contributors

elmar999 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.