Coder Social home page Coder Social logo

erfanmhi / deep-reinforcement-learning-cs285-pytorch Goto Github PK

View Code? Open in Web Editor NEW
130.0 4.0 10.0 35.5 MB

Solutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework

Home Page: http://rll.berkeley.edu/deeprlcourse/

License: MIT License

Python 40.07% Jupyter Notebook 24.23% C 19.92% C++ 15.57% Makefile 0.07% Shell 0.14%
pytorch reinforcement-learning deep-learning berkeley python deep-reinforcement-learning neural-networks openai-gym mujoco deep-q-learning

deep-reinforcement-learning-cs285-pytorch's Introduction

MIT License LinkedIn


Pytorch Version of homework assignments of Deep Reinforcement Learning Course
Presented by Dr. Sergey Levin at University of California, Berkeley

Report Bug

Table of Contents

About The Project

In this project, we aim to create a Pytorch version of CS285 course whose Tensorflow 1 version is already available at here.

Main Goals

  1. Converting all the Tensorflow 1 code to the newest version of Pytorch
  2. The current version Mujoco environment that has been used in this project is old which requires only using Python < 3.6 version. Therefore, we seek to make this project compatible with the newer version of this library and, consequently, Python >= 3.6.

Completed So Far

  1. Homework 1, 2, 3, and 4 Tensorflow codes have been fully replaced by Pytorch.

Getting Started

Currently, this project is under development, and the same libraries that have been employed in the Tensorflow version of these assignments plus Pytorch are required for running the assignments of this project. However, we are eager to use the versions of these libraries that are presented in the prerequisites section for the future release of this project.

Prerequisites

The libraries that we want to use in the future are as follows.

  • Python >= 3.6
  • Gym >= 0.17
  • Mujoco-py >= 2.0
  • Pytorch >= 1.5.1
  • TensorboardX
  • Matplotlib
  • Ipython
  • Moviepy
  • OpenCV
  • Box2d-py

Usage

The instructions for execution of all of these assignments are given in the Readme documents that are located in each of the homework directories.

Roadmap

See the open issues for a list of known issues.

Contributing

Unfortunately, the current version of this repository is not compatible with the latest versions of libraries, such as Tensorflow and Mojocu-py. As a result, installing the proper versions of these libraries, which can enable you to contribute to this repo, could be a hard challenge. However, since I have been faced with this problem before, I designed a certain number of steps that you can take to install the right versions of these libraries.

  1. Create a new Conda environment based on Python 3.5 and install matplotlib, ipython, and pytorch. Then, activate it.
conda create -n cs285_env python=3.5 matplotlib ipython pytorch=1.5.0
source activate cs285_env
  1. Clone this repository
  2. Install mujoco-py
    1. Get mujoco license key file from its website
    2. Create a .mujoco folder in the home directory and copy the given mjpro150 directory and your license key into it
    mkdir ~/.mujoco/
    cd <location_of_your_license_key>
    cp mjkey.txt ~/.mujoco/
    cd <this_repo>/mujoco
    cp -r mjpro150 ~/.mujoco/
    1. Add the following line to bottom of your .bashrc file:
    export LD_LIBRARY_PATH=~/.mujoco/mjpro150/bin/
    1. Build and install mujoco-py 1.50.1.1. It can be downloaded from this link.
    tar -xzf mujoco-py-1.50.1.1.tar.gz 
    cd mujoco-py-1.50.1.1
    python setup.py install
  3. Install rest of the libraries given in contribution_requirements.txt file using pip
pip install --user --requirement contribution_requirements.txt
  1. At last, it should be considered that before executing scripts of each homework folder (e.g., hw1), you should allow your code to be able to see 'cs285' by executing the following lines:
cd <path_to_hw>
pip install -e .

License

Distributed under the MIT License. See LICENSE file for more information.

Contact

Erfan Miahi - @erfan_mhi - [email protected]

Project Link: https://github.com/erfanMhi/Deep-Reinforcement-Learning-CS285-Pytorch

deep-reinforcement-learning-cs285-pytorch's People

Contributors

erfanmhi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

deep-reinforcement-learning-cs285-pytorch's Issues

Noisy learning of Pytorch implementation compared to Tensorflow in homwork 2

Experiments

The following experiments have been conducted using both Tensorflow and Pytorch Impelementaitons:

python cs285/scripts/run_hw2_policy_gradient.py --env_name CartPole-v1 --exp_name test_pg_cartpole
python cs285/scripts/run_hw2_policy_gradient.py --env_name InvertedPendulum-v2 --exp_name test_pg_pendulum

The resulting plots of average return and standard deviation of return in each step for Pytorch implementation of are as follows (Blue line belongs to Inverted-pendulum and orange line belongs to Cart-pole):

pytorch-average-return

pytorch-std-return

On the other hand, the same graphs for the Tensorflow implementation are as follows:

tensorflow-average-return

tensorflow-std-return

Problem Statement

As evident from these graphs, the Pytorch implementation experiences high amounts of fluctuations both in average return and its average std remains high even in the last steps whereas the return of the Tensorflow implementation converges to the global optimum only after a couple of generations and have been exposed to a low amount of noise during training. These findings suggest that there is a problem with Pytorch implementation. However, I checked the weight initialization, loss function, and architecture of both implementations, and they are the same. There should be something different in the Pytorch implementation that I have not considered so far. Therefore, I call for help from you, dear contributors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.