Submission for completing the Udacity Project
The agent is DDPG (Deep Deterministic Policy Gradients) with the following upgrades.
- Priority Replay Buffer (With Priority Tree)
- Soft updates
Contains the weights of the trained RL bot to solve the problem. Graphs indicating the progress of the agent and when it solved the problem.
The DDPG agent solved the enviroment in 1450 (fastest solution) episodes (Average Reward over the last 100 steps > 0.5). Which took 20 minutes of actual training time. And a maximum reward of 2.8
I let it train until mean reward > 0.7 for the following graph
- State space = 24
- Action space = 2 (continuous)
In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.
The environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.
Agent Reward Function (dependent):
Striker: +1 When ball enters opponent's goal. -0.1 When ball enters own team's goal. -0.001 Existential penalty. Goalie: -1 When ball enters team's goal. +0.1 When ball enters opponents goal. +0.001 Existential bonus.
Brains: Two Brain with the following observation/action space:
Vector Observation space: 112 corresponding to local 14 ray casts, each detecting 7 possible object types, along with the object's distance. Perception is in 180 degree view from front of agent. Vector Action space: (Discrete) One Branch Striker: 6 actions corresponding to forward, backward, sideways movement, as well as rotation. Goalie: 4 actions corresponding to forward, backward, sideways movement. Visual Observations: None.
Reset Parameters: None Benchmark Mean Reward (Striker & Goalie Brain): 0 (the means will be inverse of each other and criss crosses during training)
DDPG, (Works) (MADDPG, PPO, In process of implementation)
Vanilla ReplayBuffer, Priority Experience Replay
contains noise for ddpg, plotting, ddpg agent configuration file, unity_env wrapper.
DDPG/model_weights/actor DDPG/model_weights/critic
Clone the repository.
git clone [email protected]:MorGriffiths/Udacity_Navigation.git
cd Udacity_Navigation
install anaconda
install the anaconda environment from the conda_requirements.txt file
conda create --name Tennis --file conda_requirements.txt
depending on which version of anaconda you have
conda activate Tennis
or
source activate Tennis
Install Unity ml-agents.
git clone https://github.com/Unity-Technologies/ml-agents.git
git -C ml-agents checkout 0.4.0b
pip install ml-agents/python/.
Install the project requirements.
pip install -r requirements.txt
Place the environment into the Environments folder. If necessary, inside main.py, change the path to the unity environment appropriately
Make sure the environment path is correctly set in main.py and run
cd DDPG
python main.py
See Tennis_report.md along with the performance graph and the weights.