Coder Social home page Coder Social logo

uvipen / super-mario-bros-a3c-pytorch Goto Github PK

View Code? Open in Web Editor NEW
1.0K 36.0 228.0 395.63 MB

Asynchronous Advantage Actor-Critic (A3C) algorithm for Super Mario Bros

License: MIT License

Python 100.00%
reinforcement-learning a3c pytorch gym python deep-learning super-mario-bros mario ai

super-mario-bros-a3c-pytorch's Introduction

[PYTORCH] Asynchronous Advantage Actor-Critic (A3C) for playing Super Mario Bros

Introduction

Here is my python source code for training an agent to play super mario bros. By using Asynchronous Advantage Actor-Critic (A3C) algorithm introduced in the paper Asynchronous Methods for Deep Reinforcement Learning paper.






Sample results

Motivation

Before I implemented this project, there are several repositories reproducing the paper's result quite well, in different common deep learning frameworks such as Tensorflow, Keras and Pytorch. In my opinion, most of them are great. However, they seem to be overly complicated in many parts including image's pre-processing, environtment setup and weight initialization, which distracts user's attention from more important matters. Therefore, I decide to write a cleaner code, which simplifies unimportant parts, while still follows the paper strictly. As you could see, with minimal setup and simple network's initialization, as long as you implement the algorithm correctly, an agent will teach itself how to interact with environment and gradually find out the way to reach the final goal.

Explanation in layman's term

If you are already familiar to reinforcement learning in general and A3C in particular, you could skip this part. I write this part for explaining what is A3C algorithm, how and why it works, to people who are interested in or curious about A3C or my implementation, but do not understand the mechanism behind. Therefore, you do not need any prerequiste knowledge for reading this part ☺️

If you search on the internet, there are numerous article introducing or explaining A3C, some even provide sample code. However, I would like to take another approach: Break down the name Asynchronous Actor-Critic Agents into smaller parts and explain in an aggregated manner.

Actor-Critic

Your agent has 2 parts called actor and critic, and its goal is to make both parts perfom better over time by exploring and exploiting the environment. Let imagine a small mischievous child (actor) is discovering the amazing world around him, while his dad (critic) oversees him, to make sure that he does not do anything dangerous. Whenever the kid does anything good, his dad will praise and encourage him to repeat that action in the future. And of course, when the kid does anything harmful, he will get warning from his dad. The more the kid interacts to the world, and takes different actions, the more feedback, both positive and negative, he gets from his dad. The goal of the kid is, to collect as many positive feedback as possible from his dad, while the goal of the dad is to evaluate his son's action better. In other word, we have a win-win relationship between the kid and his dad, or equivalently between actor and critic.

Advantage Actor-Critic

To make the kid learn faster, and more stable, the dad, instead of telling his son how good his action is, will tell him how better or worse his action in compared to other actions (or a "virtual" average action). An example is worth a thousand words. Let's compare 2 pairs of dad and son. The first dad gives his son 10 candies for grade 10 and 1 candy for grade 1 in school. The second dad, on the other hand, gives his son 5 candies for grade 10, and "punishes" his son by not allowing him to watch his favorite TV series for a day when he gets grade 1. How do you think? The second dad seems to be a little bit smarter, right? Indeed, you could rarely prevent bad actions, if you still "encourage" them with small reward.

Asynchronous Advantage Actor-Critic

If an agent discovers environment alone, the learning process would be slow. More seriously, the agent could be possibly bias to a particular suboptimal solution, which is undesirable. What happen if you have a bunch of agents which simultaneously discover different part of the environment and update their new obtained knowledge to one another periodically? It is exactly the idea of Asynchronous Advantage Actor-Critic. Now the kid and his mates in kindergarten have a trip to a beautiful beach (with their teacher, of course). Their task is to build a great sand castle. Different child will build different parts of the castle, supervised by the teacher. Each of them will have different task, with the same final goal is a strong and eye-catching castle. Certainly, the role of the teacher now is the same as the dad in previous example. The only difference is that the former is busier 😅

How to use my code

With my code, you can:

  • Train your model by running python train.py
  • Test your trained model by running python test.py

Trained models

You could find some trained models I have trained in Super Mario Bros A3C trained models

Requirements

  • python 3.6
  • gym
  • cv2
  • pytorch
  • numpy

Acknowledgements

At the beginning, I could only train my agent to complete 9 stages. Then @davincibj pointed out that 19 stages could be completed and sent me the trained weights. Thank you a lot for the finding!

super-mario-bros-a3c-pytorch's People

Contributors

uvipen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

super-mario-bros-a3c-pytorch's Issues

implementation of optimizer

Sorry, I refered to google but still confused about the GlabalAdam optimizer. Can someone give me a HELP??

        for group in self.param_groups:                             
            for p in group['params']:
                state = self.state[p]
                state['step'] = 0
                state['exp_avg'] = torch.zeros_like(p.data)
                state['exp_avg_sq'] = torch.zeros_like(p.data)

                state['exp_avg'].share_memory_()
                state['exp_avg_sq'].share_memory_()

RuntimeError: cuda runtime error (801)

Hi, when I try to run this code using CUDA I always am getting this error. It works fine on CPU and other programs work fine using CUDA. I tried to search online but I didn't find anything of use.
Here's the entire Traceback:
THCudaCheck FAIL file=..\torch/csrc/generic/StorageSharing.cpp line=253 error=801 : operation not supported
Traceback (most recent call last):
File ".\train.py", line 84, in
train(opt)
File ".\train.py", line 73, in train
process.start()
File "C:\Users\Aarush\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Aarush\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Aarush\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Aarush\AppData\Local\Programs\Python\Python38\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Users\Aarush\Documents\Visual Studio Code\Maching_Learning\machine-learning-env38\lib\site-packages\torch\multiprocessing\reductions.py", line 240, in reduce_tensor
event_sync_required) = storage.share_cuda()
RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:253
(machine-learning-env38) PS C:\Users\Aarush\Documents\Visual Studio Code\Maching_Learning\Super-mario-bros-A3C-pytorch> Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Aarush\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Aarush\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

training never ending problem..??

Hi,
When i training with your code, (my global steps option is 5k,)

Process 0. Episode 99
Training process 5 terminated
The code runs for 95.45s
Training process 1 terminated
Training process 3 terminated
Training process 0 terminated
Training process 2 terminated
Training process 4 terminated

and it never stop... 'ctrl+c' is the only way to stop this training.
do i missed something...?

Higher resolution video

I want some Higher resolution videos because the quality of the output is too low. Is that any way to output video with a higher resolution?

while loop in local test

Hi, I am wondering why there is no break statement in the local_test in this script [https://github.com/uvipen/Super-mario-bros-A3C-pytorch/blob/master/src/process.py]. It seems like the testing loop will run forever. How to terminate the local test?

EOFError after training Process 0. Episode 99999

Training process 0 terminated
The code runs for 80688.54 s
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/anaconda3/envs/mario/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/anaconda3/envs/mario/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 202, in run
data = self._queue.get(True, queue_wait_duration)
File "/home/anaconda3/envs/mario/lib/python3.6/multiprocessing/queues.py", line 108, in get
res = self._recv_bytes()
File "/home/anaconda3/envs/mario/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/anaconda3/envs/mario/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/anaconda3/envs/mario/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Do anyone know what's wrong with this? How to solve this?

Is there any way to speed up the traning?

we installed this project on our server,which has some 4090s.But the traning process is very slow.We wonder is there any way to speed up this process?We used the original configs,as 6 num_processes, to train the game 1-1.

I have installed gym but it shows that I need to install gym_super_mario_bros, I use "pip install gym_super_mario_bros" but I failed .....

Collecting gym_super_mario_bros
Using cached gym_super_mario_bros-7.4.0-py3-none-any.whl (199 kB)
Collecting nes-py>=8.1.4
Using cached nes_py-8.2.1.tar.gz (77 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: gym>=0.17.2 in d:\anaconda\lib\site-packages (from nes-py>=8.1.4->gym_super_mario_bros) (0.26.2)
Requirement already satisfied: numpy>=1.18.5 in d:\anaconda\lib\site-packages (from nes-py>=8.1.4->gym_super_mario_bros) (1.21.5)
Requirement already satisfied: pyglet<=1.5.21,>=1.4.0 in d:\anaconda\lib\site-packages (from nes-py>=8.1.4->gym_super_mario_bros) (1.5.21)
Requirement already satisfied: tqdm>=4.48.2 in d:\anaconda\lib\site-packages (from nes-py>=8.1.4->gym_super_mario_bros) (4.64.1)
Requirement already satisfied: cloudpickle>=1.2.0 in d:\anaconda\lib\site-packages (from gym>=0.17.2->nes-py>=8.1.4->gym_super_mario_bros) (2.0.0)
Requirement already satisfied: gym-notices>=0.0.4 in d:\anaconda\lib\site-packages (from gym>=0.17.2->nes-py>=8.1.4->gym_super_mario_bros) (0.0.8)
Requirement already satisfied: importlib-metadata>=4.8.0 in d:\anaconda\lib\site-packages (from gym>=0.17.2->nes-py>=8.1.4->gym_super_mario_bros) (4.11.3)
Requirement already satisfied: colorama in d:\anaconda\lib\site-packages (from tqdm>=4.48.2->nes-py>=8.1.4->gym_super_mario_bros) (0.4.5)
Requirement already satisfied: zipp>=0.5 in d:\anaconda\lib\site-packages (from importlib-metadata>=4.8.0->gym>=0.17.2->nes-py>=8.1.4->gym_super_mario_bros) (3.8.0)
Building wheels for collected packages: nes-py
Building wheel for nes-py (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [21 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-39
creating build\lib.win-amd64-cpython-39\nes_py
copying nes_py\nes_env.py -> build\lib.win-amd64-cpython-39\nes_py
copying nes_py_image_viewer.py -> build\lib.win-amd64-cpython-39\nes_py
copying nes_py_rom.py -> build\lib.win-amd64-cpython-39\nes_py
copying nes_py_init_.py -> build\lib.win-amd64-cpython-39\nes_py
creating build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app\cli.py -> build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app\play_human.py -> build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app\play_random.py -> build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app_init_.py -> build\lib.win-amd64-cpython-39\nes_py\app
creating build\lib.win-amd64-cpython-39\nes_py\wrappers
copying nes_py\wrappers\joypad_space.py -> build\lib.win-amd64-cpython-39\nes_py\wrappers
copying nes_py\wrappers_init_.py -> build\lib.win-amd64-cpython-39\nes_py\wrappers
running build_ext
building 'nes_py.lib_nes_env' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for nes-py
Running setup.py clean for nes-py
Failed to build nes-py
Installing collected packages: nes-py, gym_super_mario_bros
Running setup.py install for nes-py ... error
error: subprocess-exited-with-error

× Running setup.py install for nes-py did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
running install
D:\Anaconda\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-39
creating build\lib.win-amd64-cpython-39\nes_py
copying nes_py\nes_env.py -> build\lib.win-amd64-cpython-39\nes_py
copying nes_py_image_viewer.py -> build\lib.win-amd64-cpython-39\nes_py
copying nes_py_rom.py -> build\lib.win-amd64-cpython-39\nes_py
copying nes_py_init_.py -> build\lib.win-amd64-cpython-39\nes_py
creating build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app\cli.py -> build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app\play_human.py -> build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app\play_random.py -> build\lib.win-amd64-cpython-39\nes_py\app
copying nes_py\app_init_.py -> build\lib.win-amd64-cpython-39\nes_py\app
creating build\lib.win-amd64-cpython-39\nes_py\wrappers
copying nes_py\wrappers\joypad_space.py -> build\lib.win-amd64-cpython-39\nes_py\wrappers
copying nes_py\wrappers_init_.py -> build\lib.win-amd64-cpython-39\nes_py\wrappers
running build_ext
building 'nes_py.lib_nes_env' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> nes-py

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

AttributeError: 'Monitor' object has no attribute 'pipe'

Hello, I have an error when I run test.py.

python3 test.py Traceback (most recent call last): File "test.py", line 68, in <module> test(opt) File "test.py", line 58, in test state, reward, done, info = env.step(action) File "/home/pluto/Super-mario-bros-A3C-pytorch/src/env.py", line 76, in step state, reward, done, info = self.env.step(action) File "/home/pluto/Super-mario-bros-A3C-pytorch/src/env.py", line 51, in step self.monitor.record(state) File "/home/pluto/Super-mario-bros-A3C-pytorch/src/env.py", line 26, in record self.pipe.stdin.write(image_array.tostring()) AttributeError: 'Monitor' object has no attribute 'pipe'
The error appears in the env.py file.

try: self.pipe = sp.Popen(self.command, stdin=sp.PIPE, stderr=sp.PIPE) except FileNotFoundError: pass def record(self, image_array): self.pipe.stdin.write(image_array.tostring())

I tried to modify the code but it didn't work. Can you help me?

How can I change the frame rate of the monitor?

Hi, firstly, thanks for your beautiful code. I have one tiny question: I do not know how to change the frame rate of the Mario monitor because currently it is too large to watch Mario's movement clearly, so I want to slow down the frame rate but I do not know where I should change the code.

Looking forward to your reply :)

I have got one erro

I run the train.py but failed the terminal give that
ImportError: cannot import name 'BinarySpaceToDiscreteSpaceEnv',how can I fix this error???

num_states = 210 (Atari environment) indicates you need 210 frames to propagate, seems infeasible?

env, num_states, num_actions = create_train_env(opt.world, opt.stage, opt.action_type)

In the first convolution layer in model.py, you specify the input channels as num_inputs. After digging through your repo, you're calling create_train_env() in src/env.py For Atari environments, breakout/pong/space-force, the observation space is (210, 160, 3). This means that you're picking up 210 frames before training the actual agent.
Doesn't this result in huge training error initially, since the number of channels is significantly larger than most A3C papers indicate, maybe like 2 or 3?
What's your justification towards such high of a channel number?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.