Coder Social home page Coder Social logo

Replicating DeepMind results about rainbow HOT 24 CLOSED

kaixhin avatar kaixhin commented on July 21, 2024
Replicating DeepMind results

from rainbow.

Comments (24)

Ashutosh-Adhikari avatar Ashutosh-Adhikari commented on July 21, 2024 2

@Kaixhin Just added a pull request to https://github.com/hengyuan-hu/rainbow.git for PER code inspired by your repo.

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024 1

Closing as results on Space Invaders are good and show a clear difference against previous methods:
newplot

Enduro (still 2x reported, not sure why):
newplot 2

Frostbite:
newplot 1

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024 1

@albertwujj if you check the releases you'll see that Enduro results match the paper now at just over 2k. The scores in general can be highly impacted by the total frame cap for the Atari games and the ε value used for ε-greedy during validation.

from rainbow.

stringie avatar stringie commented on July 21, 2024

I believe I read somewhere that the loss should be the max over the minibatch, but I think that sum should be just as fine.

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

Asked Matteo a few questions, placed his responses here, and will do some new runs based on this new information.

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

Trying a new game - Frostbite - which already seems to work fine! Space Invaders results are still along previous results, but it takes longer to see if it makes progress to the same scores.

newplot

from rainbow.

stringie avatar stringie commented on July 21, 2024

Have you tried to see how it performs on breakout? I'm really curious.

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

I have limited resources so not yet but plan to eventually. If that doesn't work I'll add the fire-on-reset wrapper to the environment and try again.

from rainbow.

stringie avatar stringie commented on July 21, 2024

from rainbow.

stringie avatar stringie commented on July 21, 2024

Have you seen the paper on Distributed Prioritized Experience Replay? The results look amazing relative to what Rainbow can achieve. I wonder if it can also somehow be integrated in your project. Here's the paper: https://arxiv.org/pdf/1803.00933.pdf

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

@stringie this issue is to track "replicating DeepMind results", so this is not relevant here. Anyway, adding extra components is a) currently not within the scope of this project b) unhelpful when results from the original Rainbow paper still cannot be replicated.

from rainbow.

Ashutosh-Adhikari avatar Ashutosh-Adhikari commented on July 21, 2024

@Kaixhin is it possible to try individual components of Rainbow and check whether the performances are similar to the respective papers?

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

@Ashutosh-Adhikari if I had written the code in a modular fashion that allowed these kinds of experiments. There's plenty of things I've not needed to include by just aiming for the full Rainbow model, and I don't have the capacity to refactor the code in this manner at the moment.

from rainbow.

Ashutosh-Adhikari avatar Ashutosh-Adhikari commented on July 21, 2024

@Kaixhin So I recently incorporated the PER from your code to a vanilla DQN and ran it for Breakout. It seems, PER is where the issue lies. As it is not behaving as per the claims in Prioritized Experience Replay paper (slightly worse than DQN while training, although more stable).

The experiments were run for Replay memory size of Replay_Mem_size_in_DQN_paper/5 i.e. 200k.
But not sure where the issue lies.

Also, I did not quite understand the function of attribute 'n' of a ReplayMemory object in your code in memory.py and henceforth the logic behind line 109 and 116 in memory.py.

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

@Ashutosh-Adhikari that's useful to know, thanks. Is any code public so that I can have a look at it?

n comes from n-step backups, so if you set n = 1 you should recover most of the original algorithms.

from rainbow.

Ashutosh-Adhikari avatar Ashutosh-Adhikari commented on July 21, 2024

@Kaixhin Thanks for the clarification regarding n. Give me a few days, I shall make the code public.

Till then, the basic code I used was from https://github.com/hengyuan-hu/rainbow.git.

The way they store the samples is slightly different, for ex. a sample will contain both the current as well as next state. But nevertheless, that should not distort the logic.

Also, not sure how scaling down the Replaymemory would affect the relative performances for modules related to experience replay (any insights over this?). I think the performance difference might become lesser between vanilla DQN and PER DQN, but vanilla DQN > PER DQN should not happen (?).

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

@Ashutosh-Adhikari actually PER can decrease performance on a few games. If you check Figure 6 in the appendix of the Rainbow paper you will see that removing PER from Rainbow actually improves performance in Breakout. In this case you would want to test against a game that the paper shows is clearly positively effected, such as Yars' Revenge. For normal ER I would say that /5 may be OK, but with PER I'm less certain.

from rainbow.

Ashutosh-Adhikari avatar Ashutosh-Adhikari commented on July 21, 2024

@Kaixhin I think while we are referring to vanilla DQN and addition of PER to it (since we are trying to check PER as a module), we might want to focus on PER paper. Which mentions improvement in Table 6 of its appendix. Or no?

If this is the case, then scaling down should affect or not (again in relative terms only)?

Because, I believe, when we are removing PER from Double, Duelling, etc. DQN, it might become too complex to draw a conclusion from.

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

@Ashutosh-Adhikari yes sorry the repo you pointed to does have most of the parts of Rainbow but if you are trying to do a comparison against just the vanilla DQN then it's best to check Figure 7 of the original PER paper, as the learning curves (which have been smoothed) are the most informative. The big caveat is that the double DQN paper introduced a new set of hyperparameters for the DQN algorithm which work better for the double DQN, but I'm pretty sure the baseline results shown in Figure 7 don't use these improved hyperparameters (whereas the PER model does). Nevertheless I think the PER learning curves should be unique so Frostbite or Space Invaders are good games to test if PER is working correctly.

from rainbow.

Ashutosh-Adhikari avatar Ashutosh-Adhikari commented on July 21, 2024

@Kaixhin I think I can check for Space Invaders (in a few days) after incorporating n and keeping n = 1. However, on a second run for Breakout, the best average (over 10 episodes) val reward for PER : 387 and for vanilla DQN : 384. The improvement is definitely not significant.
Note :1) ReplayMem size of 200k (1e6/5).
2) Also, not running for more than 25M (this should not be an issue).
3) Sharing the Training curve seems illogical as of now.

image

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

@Ashutosh-Adhikari according to Figure 7 and other stats from the paper there isn't much difference between Breakout scores for all of the different methods, so it's difficult to draw conclusions from (the best are games where there should be a clear difference in learning). Especially as the published results can come from taking an average over several runs.

If you're still investigating then if you could also plot a) the max priority b) the priorities of the 32 samples per minibatch over time, then that might provide some clues if things are going wrong?

from rainbow.

Ashutosh-Adhikari avatar Ashutosh-Adhikari commented on July 21, 2024

@Kaixhin Yep, the way they are taking averages is way different. Taking average over a 100 episodes, that would definitely increase that value I am sure (law of large numbers, I think :P, as when we are taking average over 10 episodes only, even one or two rare poor performances are too many to bring that count lower).

Nevertheless, one should try it on different games as mentioned by you, right.
So, basically we can't be sure as of now, whether there is a bug in PER or not, right?

from rainbow.

Kaixhin avatar Kaixhin commented on July 21, 2024

@Ashutosh-Adhikari averaging over more evaluation episodes would result in a better approximation to the true mean performance, it shouldn't bias the value either way. If anything it may result in smoother curves but the values should be comparable.

It's pretty hard to tell where the bug is (could even be in PyTorch if we're really unlucky), so yes it may not be in PER but it's one of the trickiest things to get right so it's most likely to be there. In my experience the best way to debug by running experiments is to pick options based on learning curves where there are clear differences between the options. Unit tests would also be a good idea, but depend on understanding the items correctly in the first place.

from rainbow.

albertwujj avatar albertwujj commented on July 21, 2024

Hi Kaixhin,

Are the hyperparams you used for these results currently the default ones in main.py?

Additionally, in the 'Human-level control' DQN paper, there is this note under Extended Table 4:

image

At 60 fps, 5 minutes is 18000 frames. This may be related to why your Enduro results are 2 times higher.

Thanks,
Albert

from rainbow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.