Light

aravindsrinivas / curl_rainbow Goto Github PK

View Code? Open in Web Editor NEW

51.0 3.0 16.0 15 KB

License: MIT License

Python 99.00% Shell 1.00%

curl_rainbow's Introduction

CURL Rainbow

Status: Archive (code is provided as-is, no updates expected)

This is an implementation of CURL: Contrastive Unsupervised Representations for Reinforcement Learning coupled with the Data Efficient Rainbow method for Atari games. The code by default uses the 100k timesteps benchmark and has not been tested for any other setting.

Run the following command (or bash run_curl.sh) with the game as an argument:

python3 main.py --game ms_pacman

To install all dependencies, run bash install.sh.

curl_rainbow's People

Contributors

Stargazers

Watchers

Forkers

bochen0909 jingweiz dudwojae mariodoebler chris0711 arnavkj1995 orithu harivallabha maxaschwarzer teshnizi nik7273 nam630 wanglouis49 franciscorpuz lkoelman

curl_rainbow's Issues

illegal memory access

When I run the code on TITAN RTX with python3.6, CUDA 10.0.130, pytorch 1.4.0, I got the error like this:
File "curl_rainbow/agent.py", line 74, in act
return (a * self.support).sum(2).argmax(1).item()
RuntimeError: CUDA error: an illegal memory access was encountered

But I can run this code on cpu. Does anyone know what happens here?

Where does the Query interact with the Rainbow DQN

Based on my understanding of the paper, I believe that the query-keys go into the contrastive learning objective function, while the queries go into the RL algorithm as observations. However, I am unable to find (in the code) where you send the query into the Rainbow DQN as state/observation. Can you please help me with this?

About Frame skip

In this CURL original paper, it says atari benchmark use 4 frameskip.

But, in this code, default code has 0 frameskip.

Is there something point that I miss?

T-max problem

Why did you choose T-max=100k in your experiment? I think you should train until its convergence. I run your code with T-max=800k on game pong，The CURL and rainbow have the same sample efficiency. Even when T-max=100k, I failed to reproduce the experimental results in the paper.

About the evaluation choice

Hi Aravind,

I like your work very much! But I just have one question about the evaluation, when I directly run your code in the game battle zone, sometimes the best performance is not in the end but in the middle, and the performance in the end could be worse and couldn't reach the scores mentioned in the paper. Did you use the best score ever, or only the score in the end in the evaluation? Thank you very much!

can't reproduce on pong and hero

I tried the code with default parameter you set.

But I can't reproduce the reward on pong and hero as reported in the paper. To be precise, the reward is ~-20 vs -16.5 on pong and ~3000 vs ~6000 on hero.

Do you have any idea why this happened? I simply used the command:

python main.py --game pong/hero

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.