Coder Social home page Coder Social logo

curl_rainbow's Introduction

CURL Rainbow

MIT License

Status: Archive (code is provided as-is, no updates expected)

This is an implementation of CURL: Contrastive Unsupervised Representations for Reinforcement Learning coupled with the Data Efficient Rainbow method for Atari games. The code by default uses the 100k timesteps benchmark and has not been tested for any other setting.

Run the following command (or bash run_curl.sh) with the game as an argument:

python3 main.py --game ms_pacman

To install all dependencies, run bash install.sh.

curl_rainbow's People

Contributors

aravindsrinivas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

curl_rainbow's Issues

illegal memory access

When I run the code on TITAN RTX with python3.6, CUDA 10.0.130, pytorch 1.4.0, I got the error like this:
File "curl_rainbow/agent.py", line 74, in act
return (a * self.support).sum(2).argmax(1).item()
RuntimeError: CUDA error: an illegal memory access was encountered

But I can run this code on cpu. Does anyone know what happens here?

Where does the Query interact with the Rainbow DQN

Based on my understanding of the paper, I believe that the query-keys go into the contrastive learning objective function, while the queries go into the RL algorithm as observations. However, I am unable to find (in the code) where you send the query into the Rainbow DQN as state/observation. Can you please help me with this?

About Frame skip

In this CURL original paper, it says atari benchmark use 4 frameskip.

But, in this code, default code has 0 frameskip.

Is there something point that I miss?

T-max problem

Why did you choose T-max=100k in your experiment? I think you should train until its convergence. I run your code with T-max=800k on game pong,The CURL and rainbow have the same sample efficiency. Even when T-max=100k, I failed to reproduce the experimental results in the paper.

About the evaluation choice

Hi Aravind,

I like your work very much! But I just have one question about the evaluation, when I directly run your code in the game battle zone, sometimes the best performance is not in the end but in the middle, and the performance in the end could be worse and couldn't reach the scores mentioned in the paper. Did you use the best score ever, or only the score in the end in the evaluation? Thank you very much!

can't reproduce on pong and hero

I tried the code with default parameter you set.

But I can't reproduce the reward on pong and hero as reported in the paper. To be precise, the reward is ~-20 vs -16.5 on pong and ~3000 vs ~6000 on hero.

1604491601803

1604491647592

Do you have any idea why this happened? I simply used the command:

python main.py --game pong/hero

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.