Coder Social home page Coder Social logo

Comments (8)

guydav avatar guydav commented on August 22, 2024

Another question which I'll tack onto here -- the default value for target-update parameter is 8000, which matches Table 1 in the Rainbow paper, which reports it as 32k frames.

Do you have a sense of why the data-efficient Rainbow paper, in Table 2 (Appendix E), reports the update period for both Rainbow and the data-efficient Rainbow as being every 2000 updates?

from rainbow.

Kaixhin avatar Kaixhin commented on August 22, 2024

Ah honestly using a fixed number of episodes is something I came up with (as it makes sense, keeps statistics easy, and also works across other environments), and I completely overlooked that evaluation detail. My default - 10 episodes - could run for max ~2x as long as DeepMind's procedure, but I feel like 5 episodes may be a bit too little to get a good estimate? So I'm in favour of keeping what I've done, but maybe noting in the readme that this differs from the original procedure - what do you think?

As noted in the footnote of the data-efficient paper, the target network update period is reported with respect to the online network update, which is only every 4 steps in the original and hence 8000 steps for the target network update.

from rainbow.

guydav avatar guydav commented on August 22, 2024

I think that makes sense. I don't know why you'd evaluate over a fixed number of frames rather than episodes. You could make a TODO to eventually implement their evaluation procedure too? It wouldn't be hard at all but might take a bit of someone's time.

Another realization regarding the target-update bit. These 32K frames are not, actually, 32K unique frames, right? If I understand correctly, we repeat every action 4 times, add the max pool of the last two action frames to the state (and drop the oldest frame from the state), and pass the 4-frame state buffer to the agent.

In other words, every pair of agent actions shares three of the four frames in the environment state, correct?

(also, which paper does the max pool of the last two frames of the action repetition come from, if at all? I'm just trying to trace all of these implementation details.)

from rainbow.

Kaixhin avatar Kaixhin commented on August 22, 2024

OK I've added a TODO with 2188966 .

You are correct. I can't remember where this was first reported in a paper, but I've spent years trying to replicate DeepMind's results, and I did a lot of digging into their Atari wrapper repositories and the DQN source code released with the Nature paper to get these implementation details.

from rainbow.

guydav avatar guydav commented on August 22, 2024

Thanks for the clarification. There appears to be so much voodoo around the implementation details that make it quite hard to know when you can trust your results.

It's interesting that the original Rainbow paper frames it as updating ever 32K frames, which, while strictly true, is far fewer than that in actual games frames given the overlap.

from rainbow.

guydav avatar guydav commented on August 22, 2024

@Kaixhin -- it seems that most papers also evaluate on either no-op starts or human starts. Did you ever take a stab at implementing either?

from rainbow.

Kaixhin avatar Kaixhin commented on August 22, 2024

No-op starts are still used during evaluation. Haven't tried human starts, but have no idea where you would get them from (presumably internal to DeepMind).

from rainbow.

guydav avatar guydav commented on August 22, 2024

Ah, I see, in env.reset(). That makes sense.

from rainbow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.