Coder Social home page Coder Social logo

dsai-hw4-mountain-car's Introduction

DSAI HW4 : mountain car v0

Agent

In this homework,I implemented a deep Q networks , which is a 4 layer DNN with relu acivation function and outputs Q-value of 2 actions(left/right).

state

I designed the state which consist of latest n observation history and actions, n is hyperparameter.

Here is the details:

o: observation (p,v)
  p is position,v is velocity a : action n: history number mentioned above.

if n = 1 :   the state is (p,v) if n > 1:   t : current time step of an epsidoe   the state is previous n-1 observation,action plus current observation (p[t-n-1],o[t-n-1],a[t-n-1].....p[t-2],v[t-2],a[t-2],p[t-1],v[t-1],a[t-1],p[t],v[t])

reward

-1 , the same as the gym environment

In the following report , I will show DQN's performance under differnet learning rate and history number n.

lr : learning rate n : history number

I tried lr=0.001,0.0001 and n = 1,2 4 agents was trained and evaluated.

Analyze

I trained lots of network but all fail to reach the goal on the right mountain in 200 steps (training with 1000 episodes , max step 200 each epsiode).

It is a hard problem with 200 max steps.It seldom success reach the goal.Because the agent always got reward -1 , which makes agent(Q-nework) hard to learn how to success,it makes no difference to all actions.They are as worse as other action because they all get -1 rewards on whole trajectory, so it took long time to train a success agent.

I train a n=1 , lr=0.001 agent with 10000 epsidoes,it succeed in the training procedure between 2000~2500 episode,but after 2500 it fails to solve the mountain car problem .Maybe DQN is not a stable learning algorithm.

Reports

Please check analyze.ipynb to see experiments figures.

evaluation results
  loss:root mean square error between r+gamma*Q(s',a') and Q(s,a)

   loss of 4 trained agent get lower and lower.

  reward sum

  most of the reward sum is -199 (the last step I forgot to add to reward sum)

evaluation results

Here is the evalation result of each agent .

Each test run 100 episode. If agent reach position > 0.5 it success. Each episode I count the sum of reward in that episode.

Following show the success rate and average reward sum of 100 episodes. Because they all fail to solve the mountain car problem.Success rate is 0 and average reward sum is -200.

History number Learning rate Success rate Avg reward sum
1 0.001 0.0 -200
1 0.0001 0.0 -200
2 0.001 0.0 -200
2 0.0001 0.0 -200

Answer of 3 questions

  1. What kind of RL algorithms did you use? value-based, policy-based, model-based? why?
    DQN,value-based,because Q learning try to find the opitmal value(discounted total reward) under a state.

  2. This algorithms is off-policy or on-policy? why? (10%) off-policy,because DQN using replay memory to update the policy ,which is not the current action the agent take.

  3. How does your algorithm solve the correlation problem in the same MDP? (10%) I use replay memory to solve the correlation problem,which will remember previous history and t training batch comes from it so learning isn't favor of latest states.

dsai-hw4-mountain-car's People

Contributors

uabharuhi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.