To train 2 agents to play tennis.
Environment: In this environment, two agents control rackets to bounce a ball over a net. Thus, the goal of each agent is to keep the ball in play.
NOTE: This environment is a modified version of Unity ML Tennis. Do NOT use the Unity version.
Observation Space: The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation.
Action Space: Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.
Reward: If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.
Problem Sovled: The task is episodic, and in order to solve the environment, your agents must get an average score of +0.5 (over 100 consecutive episodes, after taking the maximum over both agents). Specifically,
- After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 2 (potentially different) scores. We then take the maximum of these 2 scores.
- This yields a single score for each episode.
The environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.
Follow the instructions in the DRLND github repo to set up your python environment.
-
Create (and activate) a new environment with Python 3.6.
- Linux or Mac:
conda create --name drlnd python=3.6 source activate drlnd
- Windows:
conda create --name drlnd python=3.6 activate drlnd
-
Follow the instructions in this repository to perform a minimal install of OpenAI gym.
-
Clone the repository (if you haven't already!), and navigate to the
python/
folder. Then, install several dependencies.
git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .
- Create an IPython kernel for the
drlnd
environment.
python -m ipykernel install --user --name drlnd --display-name "drlnd"
- Before running code in a notebook, change the kernel to match the
drlnd
environment by using the drop-downKernel
menu.
Open up a terminal, go to the directory of your choice and clone the repository
git clone https://github.com/wjlgatech/DRL-marl-tennis.git .
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the "headless" version of the environment. You will not be able to watch the agent without enabling a virtual screen, but you will be able to train the agent. (To watch the agent, you should follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)
-
Place the file in the folder of the above local repository, and unzip (or decompress) the file.
One way is that you open the Tennis.ipynb notebook to follow instructions there:
jupyter notebook Tennis.ipynb
Another way is that you train the agents with
python train.py
and test the trained agents with
python test.py
The code consists of the following modules
Tennis.ipynb - the main notebook
Report.ipynb - the report of this project
maddpg_agent.py - defines the Agent that is to be trained
maddpg_model.py - defines the MADDPG model for the Actor and the Critic network
checkpoint_actor1.pth - is the final trained Actor network
checkpoint_criti1c.pth - is the final trained Critic network
train.py - train the MADDPG agent
test.py - test the performance of the trained agent
Environment solved in 5305 episodes with Average Score 0.505 (>=0.5).