Coder Social home page Coder Social logo

minaek / ppo-pytorch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nikhilbarhate99/ppo-pytorch

0.0 0.0 0.0 12.38 MB

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch

License: MIT License

Python 100.00%

ppo-pytorch's Introduction

PPO-PyTorch

UPDATE [April 2021] :

  • merged discrete and continuous algorithms
  • added linear decaying for the continuous action space action_std; to make training more stable for complex environments
  • added different learning rates for actor and critic
  • episodes, timesteps and rewards are now logged in .csv files
  • utils to plot graphs from log files
  • utils to test and make gifs from preTrained networks
  • PPO_colab.ipynb combining all the files to train / test / plot graphs / make gifs on google colab in a convenient jupyter-notebook

Introduction

This repository provides a Minimal PyTorch implementation of Proximal Policy Optimization (PPO) with clipped objective for OpenAI gym environments. It is primarily intended for beginners in Reinforcement Learning for understanding the PPO algorithm. It can still be used for complex environments but may require some hyperparameter-tuning or changes in the code.

To keep the training procedure simple :

  • It has a constant standard deviation for the output action distribution (multivariate normal with diagonal covariance matrix) for the continuous environments, i.e. it is a hyperparameter and NOT a trainable parameter. However, it is linearly decayed. (action_std significantly affects performance)
  • It uses simple monte-carlo estimate for calculating advantages and NOT Generalized Advantage Estimate (check out the OpenAI spinning up implementation for that).
  • It is a single threaded implementation, i.e. only one worker collects experience. One of the older forks of this repository has been modified to have Parallel workers

A concise explaination of PPO algorithm can be found here

Usage

  • To train a new network : run train.py
  • To test a preTrained network : run test.py
  • To plot graphs using log files : run plot_graph.py
  • To save images for gif and make gif using a preTrained network : run make_gif.py
  • All parameters and hyperparamters to control training / testing / graphs / gifs are in their respective .py file
  • PPO_colab.ipynb combines all the files in a jupyter-notebook
  • All the hyperparameters used for training (preTrained) policies are listed in the README.md in PPO_preTrained directory

Note :

  • if the environment runs on CPU, use CPU as device for faster training. Box-2d and Roboschool run on CPU and training them on GPU device will be significantly slower because the data will be moved between CPU and GPU often

Citing

Please use this bibtex if you want to cite this repository in your publications :

@misc{pytorch_minimal_ppo,
    author = {Barhate, Nikhil},
    title = {Minimal PyTorch Implementation of Proximal Policy Optimization},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/nikhilbarhate99/PPO-PyTorch}},
}

Results

PPO Continuous RoboschoolHalfCheetah-v1 PPO Continuous RoboschoolHalfCheetah-v1
PPO Continuous RoboschoolHopper-v1 PPO Continuous RoboschoolHopper-v1
PPO Continuous RoboschoolWalker2d-v1 PPO Continuous RoboschoolWalker2d-v1
PPO Continuous BipedalWalker-v2 PPO Continuous BipedalWalker-v2
PPO Discrete CartPole-v1 PPO Discrete CartPole-v1
PPO Discrete LunarLander-v2 PPO Discrete LunarLander-v2

Dependencies

Trained and Tested on:

Python 3
PyTorch
NumPy
gym

Training Environments

Box-2d
Roboschool
pybullet

Graphs and gifs

pandas
matplotlib
Pillow

References

ppo-pytorch's People

Contributors

aakarshan-chauhan avatar alpogit avatar nikhilbarhate99 avatar noanabeshima avatar xunzhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.