Coder Social home page Coder Social logo

openai-gym-taxi-v2's Introduction

OpenAI-Gym-Taxi-v2

This small repo represents a re-inforcement solution to the Taxi problem in OpenAI Gym: https://github.com/openai/gym/wiki/Leaderboard#taxi-v2

Steps to Run

  1. Clone the repo: git clone https://github.com/mostafaelhoushi/OpenAI-Gym-Taxi-v2

  2. cd to the workspace directory: cd OpenAI-Gym-Taxi-v2/workspace

  3. Run the main script: python main.py You may add any of the following arguments when calling the above command to specify the update method: SARSA, SARSA_MAX, EXPECTED_SARSA.

Source Code:

The repo contains three files in its workspace folder:

  • agent.py: The code I develop the reinforcement learning agent is written here here. This is the only file that I have modified.
  • monitor.py: The interact function tests how well the agent learns from interaction with the environment. This file has been provided by the creators of the Udacity Reinforcement Learning Nanodegree.
  • main.py: The main file to run in the terminal to check the performance of the agent. This file has been provided by the creators of the Udacity Reinforcement Learning Nanodegree.

Results:

The average of running 100 episodes for Sarsa Max (a.k.a. Q-Learning) is 9.2926, Expected Sara is 9.2754.

openai-gym-taxi-v2's People

Contributors

mostafaelhoushi avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

jrapudg mulintia

openai-gym-taxi-v2's Issues

Agent.py cliffwalker

In your agent, your implementation of step doesn't use epsilon as when done == false, epsilon = 1.0 /(1+epsilon). The following implementation uses epsilon and it replaces self.i_episode = 1. You can check it with the prints.
I also hardcoded the values of epsilon and alpha in the __init__ but I like to live dangerously.

I took the idea of rewriting the code from here.

import numpy as np
from collections import defaultdict

class Agent:

    def __init__(self, nA=6, epsilon=0.05, alpha=0.1, gamma=1):
        """ Initialize agent.

        Params
        ======
        - nA: number of actions available to the agent
        """
        self.nA = nA
        self.Q = defaultdict(lambda: np.zeros(self.nA))
        self.epsilon = 1.0#epsilon
        self.alpha = 0.2 #alpha
        self.gamma = gamma
        #self.update_method = update_method
        self.i_episode = 1#0        

    def get_policy_probs(self, state):
        """ Given the state, return the probability of each action.
        Params
        ======
        - state: the current state of the environment
        Returns
        =======
        - probs: an array, each element corresponds to probability of corresponding action selected
        """
        probs = np.ones(self.nA) * self.epsilon /self.nA
        probs[np.argmax(self.Q[state])] += 1 - self.epsilon
        return probs        
        
    def select_action(self, state):
        """ Given the state, select an action.

        Params
        ======
        - state: the current state of the environment

        Returns
        =======
        - action: an integer, compatible with the task's action space
        """
        #return np.random.choice(self.nA)
    
        probs = self.get_policy_probs(state)
        return np.random.choice(np.arange(self.nA), p=probs)

    def step(self, state, action, reward, next_state, done):
        """ Update the agent's knowledge, using the most recently sampled tuple.

        Params
        ======
        - state: the previous state of the environment
        - action: the agent's previous choice of action
        - reward: last reward received
        - next_state: the current state of the environment
        - done: whether the episode is complete (True or False)
        """
        #self.Q[state][action] += 1
        """
        if (done == False):
            print("\n",self.i_episode, done, self.epsilon)
            self.epsilon = 1.0 / (1.0 + self.i_episode)
            probs = self.get_policy_probs(state)
            next_action = np.random.choice(np.arange(self.nA), p=probs)
            self.Q[state][action] += self.alpha * (reward + self.gamma * np.sum(self.Q[next_state] * probs)  - self.Q[state][action])        
        else: # done == True
            print("\n",self.i_episode, done, self.epsilon)
            self.Q[state][action] += self.alpha * (reward - self.Q[state][action])
            self.i_episode +=  1             
        
        """
        if done:
            #print("\n",self.i_episode, done, self.epsilon)
            self.Q[state][action] += self.alpha * (reward - self.Q[state][action])
            self.i_episode +=  1
            self.epsilon = self.epsilon / (self.i_episode)  # old 1.0 / (1.0 + self.i_episode) with self.i_episode = 0
        else:
            #print("\n",self.i_episode, done, self.epsilon)
            probs = self.get_policy_probs(state)
            next_action = np.random.choice(np.arange(self.nA), p=probs)
            self.Q[state][action] += self.alpha * (reward + self.gamma * np.sum(self.Q[next_state] * probs)  - self.Q[state][action])        
        

https://github.com/mostafaelhoushi/OpenAI-Gym-Taxi-v2/blob/master/workspace/agent.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.