Coder Social home page Coder Social logo

nowke / wumpus-rl Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 1.0 1.11 MB

Wumpus World Reinforcement Learning - Deep Q Network, based on OpenAI-Gym

Jupyter Notebook 19.71% Python 18.52% HTML 61.76% Shell 0.01%
wumpusworld reinforcement-learning deep-q-learning openai-gym

wumpus-rl's Introduction

Wumpus RL

The codebase consists of 2 parts:

  1. Wumpus World environment (gym-wumpus/) - compatible with OpenAI Gym

    This wraps around Code from Project 3: Using Logic to Hunt the Wumpus

  2. DQN algorithm(dqn/) - for training Wumpus world environment

Setup Python Environment

Create new environment with Python 3.7

conda create -n wumpus-rl python=3.7
conda activate wumpus-rl

Install packages

pip install -r requirements.txt
pip install -e gym-wumpus

Using Wumpus Environment

  • The sample code uses wumpus-v0 environment that is defined inside gym-wumpus/ folder.
  • This assumes you have already run pip install -e gym-wumpus command, which installs the wumpus world as a dependency inside the Python virtual environment.

Sample usage execution

>>> import gym_wumpus   # To be imported before `gym`
>>> import gym

>>> env = gym.make('wumpus-v0') # Initialize wumpus environment
>>> env.reset()
array([1, 1, 0, 0, 0, 0, 0, 0], dtype=uint32)

>>> env.render()
Scores: <Explorer>=0
  0   1   2   3   4   5    time_step=0
|---|---|---|---|---|---|
| # | # | # | # | # | # | 5
|---|---|---|---|---|---|
| # |   |   |   |   | # | 4
|---|---|---|---|---|---|
| # | W | G | P |   | # | 3
|---|---|---|---|---|---|
| # |   |   |   |   | # | 2
|---|---|---|---|---|---|
| # | ^ |   | P |   | # | 1
|---|---|---|---|---|---|
| # | # | # | # | # | # | 0
|---|---|---|---|---|---|

>>> env.step(2)  # Forward action
(array([1, 2, 0, 1, 0, 0, 0, 0], dtype=uint32), -1, False, {})

>>> env.render()
Scores: <Explorer>=-1
  0   1   2   3   4   5    time_step=1
|---|---|---|---|---|---|
| # | # | # | # | # | # | 5
|---|---|---|---|---|---|
| # |   |   |   |   | # | 4
|---|---|---|---|---|---|
| # | W | G | P |   | # | 3
|---|---|---|---|---|---|
| # | ^ |   |   |   | # | 2
|---|---|---|---|---|---|
| # |   |   | P |   | # | 1
|---|---|---|---|---|---|
| # | # | # | # | # | # | 0
|---|---|---|---|---|---|

Modifying gym-wumpus environment

Once modified, you can use the new enviornment in two ways:

  1. Reinstall gym-wumpus package from the root of the repository
pip install -e gym-wumpus
  1. Use the environment directly from the folder (no need to reinstall again and again)
import sys
import gym
sys.path.append('gym-wumpus')  
# NOTE: This assumes you are running this in root of repository
# You can give relative paths or absolute path to `gym-wumpus` folder

from gym_wumpus.envs import WumpusWorld

env = WumpusWorld()

# You can use `env` object just as a regular `gym` environment
env.render()

# Pass `rgb_array` to the `render` method to get numpy array
# of the rendered image --> this can be used to generate GIFs
np_arr_img = env.render('rgb_array')

Using DQN code

Files

Running code

  • Run dqn/wumpus_dqn.py file,
  • Give env_id (example, wumpus-v0) to setup.sh file
  • Set ENV_NAME variable in wumpus_dqn.py
cd dqn
./setup.sh wumpus-v0
python wumpus_dqn.py

Run clean.sh to clear out the generated logs, models, and tests

./clean.sh

Hyperparameters

Change the hyperparameters inside dqn/wumpus_dqn.py file

...

EPISODES = 35000
...

agent = Agent(learning_rate=0.01, gamma=0.95,
              state_shape=env.observation_space.shape, actions=7,
              batch_size=64,
              epsilon_initial=0.9, epsilon_decay=1e-6, epsilon_final=0.01,
              replay_buffer_capacity=1000000,
              ...)

...

Testing an existing model

Test any environment using the pretrained existing models. Run python test_wumpus_dqn.py <env_name>. For example,

python test_wumpus.py wumpus-noise-v0

List of environments

Environment Noise (%) Modified reward function (?) Grid #
wumpus-v0 0 Yes 1
wumpus-nr-v0 0 No 1
wumpus-noise2-v0 10 Yes 1
wumpus-nr-noise2-v0 10 No 1
wumpus-noise-v0 20 Yes 1
wumpus-nr-noise-v0 20 No 1
wumpus-l4x4_1-v0 0 Yes 2
wumpus-l4x4_1-nr-v0 0 No 2
wumpus-l4x4_1-noise2-v0 10 Yes 2
wumpus-l4x4_1-nr-noise2-v0 10 No 2
wumpus-l4x4_1-noise-v0 20 Yes 2
wumpus-l4x4_1-nr-noise-v0 20 No 2
wumpus-l4x4_2-v0 0 Yes 3
wumpus-l4x4_2-nr-v0 0 No 3
wumpus-l4x4_2-noise2-v0 10 Yes 3
wumpus-l4x4_2-nr-noise2-v0 10 No 3
wumpus-l4x4_2-noise-v0 20 Yes 3
wumpus-l4x4_2-nr-noise-v0 20 No 3
wumpus-l5x5_1-v0 0 Yes 4
wumpus-l5x5_1-nr-v0 0 No 4
wumpus-l5x5_1-noise2-v0 10 Yes 4
wumpus-l5x5_1-nr-noise2-v0 10 No 4
wumpus-l5x5_1-noise-v0 20 Yes 4
wumpus-l5x5_1-nr-noise-v0 20 No 4
       GRID #1                            GRID #2
  0   1   2   3   4   5            0   1   2   3   4   5
|---|---|---|---|---|---|        |---|---|---|---|---|---|
| # | # | # | # | # | # | 5      | # | # | # | # | # | # | 5
|---|---|---|---|---|---|        |---|---|---|---|---|---|
| # |   |   |   |   | # | 4      | # |   |   |   |   | # | 4
|---|---|---|---|---|---|        |---|---|---|---|---|---|
| # | W | G | P |   | # | 3      | # | W | G |   |   | # | 3
|---|---|---|---|---|---|        |---|---|---|---|---|---|
| # |   |   |   |   | # | 2      | # |   | P |   |   | # | 2
|---|---|---|---|---|---|        |---|---|---|---|---|---|
| # | ^ |   | P |   | # | 1      | # | ^ |   | P |   | # | 1
|---|---|---|---|---|---|        |---|---|---|---|---|---|
| # | # | # | # | # | # | 0      | # | # | # | # | # | # | 0
|---|---|---|---|---|---|        |---|---|---|---|---|---|

       GRID #3                           GRID #4

  0   1   2   3   4   5          0   1   2   3   4   5   6
|---|---|---|---|---|---|        |---|---|---|---|---|---|---|
| # | # | # | # | # | # | 5      | # | # | # | # | # | # | # | 6  
|---|---|---|---|---|---|        |---|---|---|---|---|---|---|
| # |   |   | W | G | # | 4      | # |   | G |   |   |   | # | 5  
|---|---|---|---|---|---|        |---|---|---|---|---|---|---|
| # |   |   | P |   | # | 3      | # |   |   | W |   |   | # | 4
|---|---|---|---|---|---|        |---|---|---|---|---|---|---|
| # |   |   |   |   | # | 2      | # |   |   | P |   |   | # | 3  
|---|---|---|---|---|---|        |---|---|---|---|---|---|---|
| # | ^ |   |   | P | # | 1      | # |   |   |   |   |   | # | 2  
|---|---|---|---|---|---|        |---|---|---|---|---|---|---|
| # | # | # | # | # | # | 0      | # | ^ |   |   | P |   | # | 1  
|---|---|---|---|---|---|        |---|---|---|---|---|---|---|
                                 | # | # | # | # | # | # | # | 0
                                 |---|---|---|---|---|---|---|  

Tensorboard

During training, logs are generated in dqn/logs/ folder. To view the Tensorboard, run the below command

tensorboard --logdir dqn/logs/

Tensorboard screenshot 1 Tensorboard screenshot 2

wumpus-rl's People

Contributors

nowke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

or3a

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.