Coder Social home page Coder Social logo

minimal_goexplore's Introduction

Minimal Go-Explore

A Python implementation of the Go-Explore exploration algorithm without domain knowledge.

Installation

The setup for retro gym is more lengthy than the standard gym. gym comes preinstalled with many Atari environments, but retro allows you to import any ROM file you'd like. Follow the installation and usage procedures according to these links:

  • Try threaded-atari.py instead of atari.py, it's far faster when using multithreading

Demonstration

Sample from random exploration with Go-Explore

Click the image above to redirect to a 2-minute YouTube video. It shows a rendering of three environments with a random agent exploring, then returning to previously-discovered states according to the procedure of Go-Explore. These are obviously not samples of a trained agent, rather merely the random exploration agent after a bit of exploration.

Results

  • The logging and performance appears to match that of the official implementation written by the authors of the paper. On the left is the highscore reached thus far and on the right is the latest cell discovered. The agent has exploited a bug in the emulator that enables it to collect reward in the treasure room forever.

EDIT (March 16, 2021)

Looking back, there were far more efficient ways to accomplish this. Here are some ways in that I would write it differently in hindsight:

  • For one, Python's built-in hash is far better for this purpose than anything from hashlib. This is because Python's hash() function is fast and can hash the bytes of a numpy array using hash(ndarray.tobytes())
  • Some code refactoring made it much easier to read
  • I realized that I partially messed up the cell function (it's supposed to be reduced to 8 possible values, I accidentally made it 32)
  • Using a defaultdict object made it easier to automatically create a new cell if one is found, and attempted to be accessed before having been encountered in prior.
  • There were a few other components of the algorithm's logic that I misinterpreted in the initial implementation, but resolved in the latest version.

Taking these points into account, I've added v2 implementations which produce better results, for they are a more complete algorithm of the algorithm.

EDIT (March 31, 2021)

Here's a summary of the results observed in one particular run of threaded-atari-v2.py on MontezumaRevengeDeterministic-v4 (not cherry-picked):

Iterations: 0, Cells: 0, Frames: 0, Max Reward: 0
Iterations: 4184, Cells: 226, Frames: 68537, Max Reward: 100
Iterations: 10437, Cells: 582, Frames: 182623, Max Reward: 400
Iterations: 11814, Cells: 1183, Frames: 214587, Max Reward: 500
Iterations: 13176, Cells: 1261, Frames: 244967, Max Reward: 2500
Iterations: 102822, Cells: 3788, Frames: 2250169, Max Reward: 2600
Iterations: 163037, Cells: 4325, Frames: 3689907, Max Reward: 3600
Iterations: 267517, Cells: 4610, Frames: 6138934, Max Reward: 3900
Iterations: 381069, Cells: 4745, Frames: 8804395, Max Reward: 4900
Iterations: 404446, Cells: 4860, Frames: 9350689, Max Reward: 5100
Iterations: 645493, Cells: 5026, Frames: 14911490, Max Reward: 6100
Iterations: 882479, Cells: 5122, Frames: 20315900, Max Reward: 7100
Iterations: 3157501, Cells: 6637, Frames: 70733548, Max Reward: 7500
Iterations: 3611170, Cells: 6971, Frames: 80542377, Max Reward: 8300
Iterations: 3616284, Cells: 6977, Frames: 80651268, Max Reward: 9300
Iterations: 3862820, Cells: 9243, Frames: 86131224, Max Reward: 9500
Iterations: 3898876, Cells: 9294, Frames: 86925993, Max Reward: 17500
Iterations: 3970799, Cells: 9988, Frames: 88518299, Max Reward: 18500
Iterations: 3983512, Cells: 10034, Frames: 88796112, Max Reward: 19500
Iterations: 4012575, Cells: 10116, Frames: 89429866, Max Reward: 20500
Iterations: 4126501, Cells: 10648, Frames: 91943176, Max Reward: 20600
Iterations: 4129528, Cells: 10673, Frames: 92009894, Max Reward: 20900
Iterations: 4129892, Cells: 10681, Frames: 92017125, Max Reward: 21900
Iterations: 4177504, Cells: 11282, Frames: 93073347, Max Reward: 22000
Iterations: 4190708, Cells: 11352, Frames: 93360557, Max Reward: 22900
Iterations: 4348374, Cells: 12594, Frames: 96771923, Max Reward: 23000
Iterations: 4366546, Cells: 13150, Frames: 97152306, Max Reward: 23200
Iterations: 4377667, Cells: 13416, Frames: 97380833, Max Reward: 24200
Iterations: 4378041, Cells: 13421, Frames: 97388109, Max Reward: 25200
Iterations: 4406958, Cells: 13796, Frames: 97992201, Max Reward: 25500
...
Iterations: 6939517, Cells: 20224, Frames: 150848162, Max Reward: 56500

EDIT (April 22, 2021)

The goexplore directory now contains a more user-friendly implementation with improved logging and overall functionality. This is ideal for experimentation, but note that the current version does not include multithreading.

EDIT (April 26, 2021)

The goexplore directory has been moved to a separate repository

minimal_goexplore's People

Contributors

ryanrudes avatar wx-b avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.