Comments (24)
@Kaixhin Just added a pull request to https://github.com/hengyuan-hu/rainbow.git for PER code inspired by your repo.
from rainbow.
Closing as results on Space Invaders are good and show a clear difference against previous methods:
Enduro (still 2x reported, not sure why):
from rainbow.
@albertwujj if you check the releases you'll see that Enduro results match the paper now at just over 2k. The scores in general can be highly impacted by the total frame cap for the Atari games and the ε value used for ε-greedy during validation.
from rainbow.
I believe I read somewhere that the loss should be the max over the minibatch, but I think that sum should be just as fine.
from rainbow.
Asked Matteo a few questions, placed his responses here, and will do some new runs based on this new information.
from rainbow.
Trying a new game - Frostbite - which already seems to work fine! Space Invaders results are still along previous results, but it takes longer to see if it makes progress to the same scores.
from rainbow.
Have you tried to see how it performs on breakout? I'm really curious.
from rainbow.
I have limited resources so not yet but plan to eventually. If that doesn't work I'll add the fire-on-reset wrapper to the environment and try again.
from rainbow.
from rainbow.
Have you seen the paper on Distributed Prioritized Experience Replay? The results look amazing relative to what Rainbow can achieve. I wonder if it can also somehow be integrated in your project. Here's the paper: https://arxiv.org/pdf/1803.00933.pdf
from rainbow.
@stringie this issue is to track "replicating DeepMind results", so this is not relevant here. Anyway, adding extra components is a) currently not within the scope of this project b) unhelpful when results from the original Rainbow paper still cannot be replicated.
from rainbow.
@Kaixhin is it possible to try individual components of Rainbow and check whether the performances are similar to the respective papers?
from rainbow.
@Ashutosh-Adhikari if I had written the code in a modular fashion that allowed these kinds of experiments. There's plenty of things I've not needed to include by just aiming for the full Rainbow model, and I don't have the capacity to refactor the code in this manner at the moment.
from rainbow.
@Kaixhin So I recently incorporated the PER from your code to a vanilla DQN and ran it for Breakout. It seems, PER is where the issue lies. As it is not behaving as per the claims in Prioritized Experience Replay paper (slightly worse than DQN while training, although more stable).
The experiments were run for Replay memory size of Replay_Mem_size_in_DQN_paper/5 i.e. 200k.
But not sure where the issue lies.
Also, I did not quite understand the function of attribute 'n' of a ReplayMemory object in your code in memory.py and henceforth the logic behind line 109 and 116 in memory.py.
from rainbow.
@Ashutosh-Adhikari that's useful to know, thanks. Is any code public so that I can have a look at it?
n
comes from n-step backups, so if you set n = 1
you should recover most of the original algorithms.
from rainbow.
@Kaixhin Thanks for the clarification regarding n
. Give me a few days, I shall make the code public.
Till then, the basic code I used was from https://github.com/hengyuan-hu/rainbow.git.
The way they store the samples is slightly different, for ex. a sample will contain both the current as well as next state. But nevertheless, that should not distort the logic.
Also, not sure how scaling down the Replaymemory would affect the relative performances for modules related to experience replay (any insights over this?). I think the performance difference might become lesser between vanilla DQN and PER DQN, but vanilla DQN > PER DQN should not happen (?).
from rainbow.
@Ashutosh-Adhikari actually PER can decrease performance on a few games. If you check Figure 6 in the appendix of the Rainbow paper you will see that removing PER from Rainbow actually improves performance in Breakout. In this case you would want to test against a game that the paper shows is clearly positively effected, such as Yars' Revenge. For normal ER I would say that /5 may be OK, but with PER I'm less certain.
from rainbow.
@Kaixhin I think while we are referring to vanilla DQN and addition of PER to it (since we are trying to check PER as a module), we might want to focus on PER paper. Which mentions improvement in Table 6 of its appendix. Or no?
If this is the case, then scaling down should affect or not (again in relative terms only)?
Because, I believe, when we are removing PER from Double, Duelling, etc. DQN, it might become too complex to draw a conclusion from.
from rainbow.
@Ashutosh-Adhikari yes sorry the repo you pointed to does have most of the parts of Rainbow but if you are trying to do a comparison against just the vanilla DQN then it's best to check Figure 7 of the original PER paper, as the learning curves (which have been smoothed) are the most informative. The big caveat is that the double DQN paper introduced a new set of hyperparameters for the DQN algorithm which work better for the double DQN, but I'm pretty sure the baseline results shown in Figure 7 don't use these improved hyperparameters (whereas the PER model does). Nevertheless I think the PER learning curves should be unique so Frostbite or Space Invaders are good games to test if PER is working correctly.
from rainbow.
@Kaixhin I think I can check for Space Invaders (in a few days) after incorporating n
and keeping n = 1
. However, on a second run for Breakout, the best average (over 10 episodes) val reward for PER : 387 and for vanilla DQN : 384. The improvement is definitely not significant.
Note :1) ReplayMem size of 200k (1e6/5).
2) Also, not running for more than 25M (this should not be an issue).
3) Sharing the Training curve seems illogical as of now.
from rainbow.
@Ashutosh-Adhikari according to Figure 7 and other stats from the paper there isn't much difference between Breakout scores for all of the different methods, so it's difficult to draw conclusions from (the best are games where there should be a clear difference in learning). Especially as the published results can come from taking an average over several runs.
If you're still investigating then if you could also plot a) the max priority b) the priorities of the 32 samples per minibatch over time, then that might provide some clues if things are going wrong?
from rainbow.
@Kaixhin Yep, the way they are taking averages is way different. Taking average over a 100 episodes, that would definitely increase that value I am sure (law of large numbers, I think :P, as when we are taking average over 10 episodes only, even one or two rare poor performances are too many to bring that count lower).
Nevertheless, one should try it on different games as mentioned by you, right.
So, basically we can't be sure as of now, whether there is a bug in PER or not, right?
from rainbow.
@Ashutosh-Adhikari averaging over more evaluation episodes would result in a better approximation to the true mean performance, it shouldn't bias the value either way. If anything it may result in smoother curves but the values should be comparable.
It's pretty hard to tell where the bug is (could even be in PyTorch if we're really unlucky), so yes it may not be in PER but it's one of the trickiest things to get right so it's most likely to be there. In my experience the best way to debug by running experiments is to pick options based on learning curves where there are clear differences between the options. Unit tests would also be a good idea, but depend on understanding the items correctly in the first place.
from rainbow.
Hi Kaixhin,
Are the hyperparams you used for these results currently the default ones in main.py?
Additionally, in the 'Human-level control' DQN paper, there is this note under Extended Table 4:
At 60 fps, 5 minutes is 18000 frames. This may be related to why your Enduro results are 2 times higher.
Thanks,
Albert
from rainbow.
Related Issues (20)
- Infinite loop in ReplayMemory._get_sample_from_segment HOT 1
- Noob Question : Need help running the code. It seems to be running for forever. HOT 1
- rainbow for multiagent setting HOT 1
- Policy and reward function HOT 4
- ploting the result HOT 1
- To run a demo HOT 2
- disable env.reset() after every episode HOT 5
- --learn-start HOT 2
- Delayed reward HOT 1
- load a model HOT 1
- Explanation of Q statistics in plots for "val_mem"? HOT 1
- [question] Training speed HOT 3
- Running Rainbow on a Cluster HOT 1
- IndexError: index 262141 is out of bounds for axis 0 with size 231071 HOT 2
- Evaluate the pretrained model HOT 1
- The problem about training with GPU HOT 1
- Stuck in memory._retrieve when batch size > 32 HOT 1
- A problem about one game in ALE cannot be trained HOT 1
- Montezuma's revenge - has this been tried using this codebase? HOT 2
- Data Effiecient Rainbow with Skiing does not work HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rainbow.