Coder Social home page Coder Social logo

ddpg-rl's Introduction

MindSpore Reinforcement

查看中文

Python Version LICENSE PRs Welcome

Overview

MindSpore Reinforcement is an open-source reinforcement learning framework that supports the distributed training of agents using reinforcement learning algorithms. MindSpore Reinforcement offers a clean API abstraction for writing reinforcement learning algorithms, which decouples the algorithm from deployment and execution considerations, including the use of accelerators, the level of parallelism and the distribution of computation across a cluster of workers. MindSpore Reinforcement translates the reinforcement learning algorithm into a series of compiled computational graphs, which are then run efficiently by the MindSpore framework on CPUs, GPUs and Ascend AI processors. Its architecture is shown below:

MindSpore_RL_Architecture

Installation

MindSpore Reinforcement depends on the MindSpore training and inference framework. Therefore, please first install MindSpore following the instruction on the official website, then install MindSpore Reinforcement. You can install from pip or source code.

Version dependency

Due the dependency between MindSpore Reinforcement and MindSpore, please follow the table below and install the corresponding MindSpore verision from MindSpore download page.

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore-Version}/MindSpore/cpu/ubuntu_x86/mindspore-{MindSpore-Version}-cp37-cp37m-linux_x86_64.whl
MindSpore Reinforcement Version Branch MindSpore version
0.5.0 r0.5 1.8.0
0.3.0 r0.3 1.7.0
0.2.0 r0.2 1.6.0
0.1.0 r0.1 1.5.0

Installing from pip command

If you use the pip command, please download the whl package from MindSpore Reinforcement page and install it.

pip install  https://ms-release.obs.cn-north-4.myhuaweicloud.com/{MindSpore_version}/Reinforcement/any/mindspore_rl-{Reinforcement_version}-py3-none-any.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
  • Installing whl package will download MindSpore Reinforcement dependencies automatically (detail of dependencies is shown in requirement.txt), other dependencies should install manually.
  • {MindSpore_version} stands for the version of MindSpore. For the version matching relationship between MindSpore and Reinforcement, please refer to page.
  • {Reinforcement_version} stands for the version of Reinforcement. For example, if you would like to download version 0.1.0, you should fill 1.5.0 in {MindSpore_version} and fill 0.1.0 in {Reinforcement_version}.

Installing from source code

Download source code, then enter the reinforcement directory.

git clone https://gitee.com/mindspore/reinforcement.git
cd reinforcement/
bash build.sh
pip install output/mindspore_rl-{Reinforcement_version}-py3-none-any.whl

build.sh is the compiling script in reinforcement directory. Reinforcement_version is the version of MindSpore Reinforcement.

Install dependencies

cd reinforcement && pip install requirements.txt

Verification

If you can successfully execute following command, then the installation is completed.

import mindspore_rl

Quick Start

The algorithm example of mindcore reinforcement is located under reinforcement/example/. A simple algorithm Deep Q-Learning (DQN) is used to demonstrate how to use MindSpore Reinforcement.

The first way is using script files to run it directly:

cd reinforcement/example/dqn/scripts
bash run_standalone_train.sh

The second way is to use config.py and train.py to modify the configuration more flexibly:

cd reinforcement/example/dqn
python train.py --episode 1000 --device_target GPU

The first way will generate the logfile dqn_train_log.txt in the current directory. The second way prints log information on the screen:

Episode 0: loss is 0.396, rewards is 42.0
Episode 1: loss is 0.226, rewards is 15.0
Episode 2: loss is 0.202, rewards is 9.0
Episode 3: loss is 0.122, rewards is 15.0
Episode 4: loss is 0.107, rewards is 12.0
Episode 5: loss is 0.078, rewards is 10.0
Episode 6: loss is 0.075, rewards is 8.0
Episode 7: loss is 0.084, rewards is 12.0
Episode 8: loss is 0.069, rewards is 10.0
Episode 9: loss is 0.067, rewards is 10.0
Episode 10: loss is 0.056, rewards is 8.0
-----------------------------------------
Evaluate for episode 10 total rewards is 9.600
-----------------------------------------

For more details about the installation guide, tutorials, and APIs, see MindSpore Reinforcement API Docs.

Features

Algorithm

Algorithm RL Version Action Space Device Example Environment
DiscreteContinuous CPUGPUAscend
DQN >= 0.1 ✔️ / ✔️ ✔️ ✔️ CartPole-v0
PPO >= 0.1 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2
AC >= 0.1 ✔️ / ✔️ ✔️ / CartPole-v0
A2C >= 0.2 ✔️ / ✔️ ✔️ / CartPole-v0
DDPG >= 0.3 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2
QMIX >= 0.5 ✔️ / ✔️ ✔️ / SMAC
SAC >= 0.5 / ✔️ ✔️ ✔️ ✔️ HalfCheetah-v2

Environment

In the field of reinforcement learning, during the interaction between the agent and the environment, the learning strategy maximizes the numerical benefit signal. As a problem to be solved, environment is an important element in reinforcement learning.

At present, there are many kinds of environments used for reinforcement learning:MujocoMPEAtariPySC2SMACTORCSIsaac etc. At present, MindSpore Reinforcement has access to both Gym and Smac environments. With the enrichment of algorithms, it will gradually access more environments.

ReplayBuffer

In reinforcement learning, ReplayBuffer is a commonly used basic data storage method. It is used to store the data obtained by the interaction between the agent and the environment. ReplayBuffer can solve the following problems:

  1. The stored historical experience data can be extracted by sampling or certain priority to break the correlation of the training data and make the sampled data have the characteristics of independent and identical distribution.

  2. It can provide temporary storage of data and improve the utilization rate of data.

In general, researchers use native Python data structures or numpy data structures to construct ReplayBuffer, or the general reinforcement learning framework also provides standard API encapsulation. The difference is that MindSpore implements the ReplayBuffer structure on the device. On the one hand, it can reduce the frequent copying of data between the host and the device when using GPU/Ascend hardware. On the other hand, it can express the ReplayBuffer in the form of MindSpore operators, which can build a complete IR graph and enable MindSpore GRAPH_MODE optimization to improve the overall performance.

Type Features Device
CPUGPUAscend
UniformReplayBuffer 1 FIFO, fist in fist out.
2 Support batch input.
✔️ ✔️ /
PriorityReplayBuffer 1 Proportional-based priority strategy.
2 Using Sum Tree to improve sample performance.
✔️ ✔️ ✔️
ReservoirReplayBuffer keeps an 'unbiased' sample of previous iterations. ✔️ ✔️ ✔️

Future Roadmap

This initial release of MindSpore Reinforcement contains a stable API for implementing reinforcement learning algorithms and executing computation using MindSpore's computational graphs. Now it supports semi-automatic distributed execution of algorithms and multi-agent, but does not support fully automatic distributed capabilities yet. These features will be included in the subsequent version of MindSpore Reinforcement. Please look forward to it.

Community

Governance

MindSpore Open Governance

Communication

Contributions

Welcome to MindSpore contribution. MindSpore Reinforcement will be updated every 3 months. If you encounter any problems, please inform us in time. We appreciate all contributions and can submit your questions or modifications in the form of issues or prs.

License

Apache License 2.0

ddpg-rl's People

Contributors

it-is-a-robot avatar vectorsl avatar wilfchen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.