This repository contains material related to Udacity's Deep Reinforcement Learning Nanodegree program.
The tutorials lead you through implementing various algorithms in reinforcement learning. All of the code is in PyTorch (v0.4) and Python 3.
- Dynamic Programming: Implement Dynamic Programming algorithms such as Policy Evaluation, Policy Improvement, Policy Iteration, and Value Iteration.
- Monte Carlo: Implement Monte Carlo methods for prediction and control.
- Temporal-Difference: Implement Temporal-Difference methods such as Sarsa, Q-Learning, and Expected Sarsa.
- Discretization: Learn how to discretize continuous state spaces, and solve the Mountain Car environment.
- Tile Coding: Implement a method for discretizing continuous state spaces that enables better generalization.
- Deep Q-Network: Explore how to use a Deep Q-Network (DQN) to navigate a space vehicle without crashing.
- Robotics: Use a C++ API to train reinforcement learning agents from virtual robotic simulation in 3D. (External link)
- Hill Climbing: Use hill climbing with adaptive noise scaling to balance a pole on a moving cart.
- Cross-Entropy Method: Use the cross-entropy method to train a car to navigate a steep hill.
- REINFORCE: Learn how to use Monte Carlo Policy Gradients to solve a classic control task.
- Proximal Policy Optimization: Explore how to use Proximal Policy Optimization (PPO) to solve a classic reinforcement learning task. (Coming soon!)
- Deep Deterministic Policy Gradients: Explore how to use Deep Deterministic Policy Gradients (DDPG) with OpenAI Gym environments.
- Pendulum: Use OpenAI Gym's Pendulum environment.
- BipedalWalker: Use OpenAI Gym's BipedalWalker environment.
- Finance: Train an agent to discover optimal trading strategies.
The labs and projects can be found below. All of the projects use rich simulation environments from Unity ML-Agents. In the Deep Reinforcement Learning Nanodegree program, you will receive a review of your project. These reviews are meant to give you personalized feedback and to tell you what can be improved in your code.
- The Taxi Problem: In this lab, you will train a taxi to pick up and drop off passengers.
- Navigation: In the first project, you will train an agent to collect yellow bananas while avoiding blue bananas.
- Continuous Control: In the second project, you will train an robotic arm to reach target locations.
- Collaboration and Competition: In the third project, you will train a pair of agents to play tennis!
- Cheatsheet: You are encouraged to use this PDF file to guide your study of reinforcement learning.
Acrobot-v1
with Tile Coding and Q-LearningCartpole-v0
with Hill Climbing | solved in 13 episodesCartpole-v0
with REINFORCE | solved in 691 episodesMountainCarContinuous-v0
with Cross-Entropy Method | solved in 47 iterationsMountainCar-v0
with Uniform-Grid Discretization and Q-Learning | solved in <50000 episodesPendulum-v0
with Deep Deterministic Policy Gradients (DDPG)
BipedalWalker-v2
with Deep Deterministic Policy Gradients (DDPG)CarRacing-v0
with Deep Q-Networks (DQN) | Coming soon!LunarLander-v2
with Deep Q-Networks (DQN) | solved in 1504 episodes
FrozenLake-v0
with Dynamic ProgrammingBlackjack-v0
with Monte Carlo MethodsCliffWalking-v0
with Temporal-Difference Methods
To set up your python environment to run the code in this repository, follow the instructions below.
-
Create (and activate) a new environment with Python 3.6.
- Linux or Mac:
conda create --name drlnd python=3.6 source activate drlnd
- Windows:
conda create --name drlnd python=3.6 activate drlnd
If running in Windows, ensure you have the "Build Tools for Visual Studio 2019" installed from this site. This article may also be very helpful. This was confirmed to work in Windows 10 Home.
Follow the instructions in this repository to perform a minimal install of OpenAI gym.
Clone the repository (if you haven't already!), and navigate to the
python/
folder. Then, install several dependencies.git clone https://github.com/udacity/deep-reinforcement-learning.git cd deep-reinforcement-learning/python pip install .
Create an IPython kernel for the
drlnd
environment.python -m ipykernel install --user --name drlnd --display-name "drlnd"
Before running code in a notebook, change the kernel to match the
drlnd
environment by using the drop-downKernel
menu.Come learn with us in the Deep Reinforcement Learning Nanodegree program at Udacity!
deep-reinforcement-learning's People
Forkers
o-can tarrysingh nyn10 william251082 vishwas31 nivir palashc benzei awesome-resources rahulvijayvargiya parsh jalajthanaki tani-git rizavelioglu 21tushar hzitoun ilopezfr amitkayal xiaohalo anshul-garg27 hal2001 sm-azure self-education-liavontsi-brechka snabeel michaelchi08 oppa3109 jayeshd7 deeplearningpk fundou mehrdad-shokri bapisth etendue getaleks gigwegbe lfwin jdc08161063 rutujawanjari jacksonisaac shaunstanislauslau smyrbdr allensmile o7s8r6 mnrmja007 hadhoryth momor666 mccorby manik-50000 bayesianbrad anu-bioinfo badrinaths collawolley v3dant abinfinity athul100 girishkuniyal mrinalarun er-vivekkumar tusharbihani codeaudit hulalazz jknthn adityabhat95 rubenszimbres kapilkhanal gm-gwu sanjitjain2 mbrukman wolegechu shyamsunder007 chrislit merz9b locnguyenhuu decoderkurt zgsxwsdxg insect-collector andandandand enavarroai aishwarya4444 adityapai2398 samratkashipathi abhik01 mfornet navinthenapster pupupulp eternalfeather raghavendra-gali ah9988 zach14c logan27 abodacs esmaeilinia sondro gabvaztor sandipan1 unnat5 claudecoulombe aipacino soprof alexfridman mygithubsitedeep-reinforcement-learning's Issues
About Crawler for Continuous Control
I am trying to solve the Crawler environment in Continuous control task. I have read the Unity webpage and realized that there were two environments. One with static target and one with dynamic target. Which one is provided through the links? Thank you!
[Question] Is this compatible with Python3.8?
According to the readme, the suggested version for the Python is 3.6. There are a couple of features I like and want to try from the 3.8.
Are there any hard restrictions or known issues outside 3.6?
running environment on Nvidia RTX cards
I found a small issue with the instructions for setting up the udacity environment on a local computer that has an RTX Nvidia card (in my case the RTX 2080Ti). I needed to replace the pytorch 0.4 version (https://github.com/udacity/deep-reinforcement-learning/blob/master/python/requirements.txt#L11) - with the latest pytorch (1.3.1), otherwise the environment just will just not start.
README doesn't indicate how to contribute.
The README.md doesn't indicate how to contribute. I made a local branch to fix issue #34 but when I try to push the branch I get the following error, which indicates I don't have permission.
Please indicate in the README how to contribute.
Thanks.
"xdpyinfo" error in Discretization.ipynb
Running the first code-block in Discretization.ipynb gives:
xdpyinfo was not found, X start can not be checked! Please install xdpyinfo!
This doesn't stop the rest of the notebook from executing, but it is misleading and should be fixable with a few config commands.
torch requirement outdated
Can't setup environment due to outdated requirements
I have been trying to set this up on macOS 12.2.1 Monterey, but all of the software (unityagents, torch, etc) are so old it doesn't install. I used some more modern libraries, but it fails. So I can run the Banana.app, but it times out with:
E0509 13:07:17.806027000 4446950912 fork_posix.cc:76] Other threads are currently calling into gRPC, skipping fork() handlers Mono path[0] = '/Users/kevin/tmp/udacity/drl/udacity-deep-reinforcement-learning-master/p1_navigation/Banana.app/Contents/Resources/Data/Managed' Mono config path = '/Users/kevin/tmp/udacity/drl/udacity-deep-reinforcement-learning-master/p1_navigation/Banana.app/Contents/MonoBleedingEdge/etc'
Is there any way you can update this to current versions of mlagents (unityagent replacement), torch, python, etc?
vector_observations errors in the soccer2 environment
Recently, I was doing some RL experiments on the soccer2 environment, following the https://github.com/udacity/deep-reinforcement-learning/blob/master/p3_collab-compet/Soccer.ipynb. However, the agents couldn’t learn successfully. And I found there are some errors in the vector_observations in the soccer2 environment.
The agent is expected to dected some objects like{ "ball", "redGoal", "blueGoal","wall", "redAgent", "blueAgent", if it is in redTeam. However, when I print the vector_observations, the agents can’t detect "redGoal" and "blueGoal" all the time. These values are always 0. I would appreciate it if you could fix this error and add these information. These will help me a lot.
Thanks.Banana.app Freezes on Mac
Banana.app freezes when i've installed mlagents==0.4.0 when I try to run it from my Mac. Has anyone tried to run Banana.app on a Mac OS? I can't seem to get it working...
Mac Info:
System Version: macOS 10.13.6 (17G8030)
Kernel Version: Darwin 17.7.0
Boot Volume: Macintosh HD
Boot Mode: NormalIs there an estimation as to when PPO will be finished?
The reason I ask is because I tried taking the course last year but much of it wasn't complete yet and have been waiting specifically for PPO. Thank you!
Description of Reacher environment
I tried to collect the rewards for every timestep for a random agent and I found that most non-zero rewards are 0.04, I also got some 0.03, 0.02 and 0.01, whose counts are much less than 0.04. But the description says the reward for any timestep should be 0 or 0.1. Are there more details? Thanks!
tensorflow-gpu in requirements.txt?
I changed my local requirements.txt to include tensorflow-gpu==1.7.1 because "pip install ." tried to download/install non gpu tensorflow but tensorflow-gpu==1.7.1 was already installed.
Maybe the requirements.txt can change to include tensorflow-gpu==1.7.1 ?
DDPG - Bipedal is not converging
In this project we clearly see there is no learning happening : https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-bipedal/DDPG.ipynb
This example should converge and solve the problem.SWIG required on windows for box2d
When installing Box2d it seems SWIG is a requirement on Windows.
After downloading SWIG and change windows path enviromental variable, Box2d was successfully installed.
README doesn't reference required Visual Studio Build Tools for Windows
The OpenAI Gym installation instructions are missing reference to the "Build Tools for Visual Studio 2019" from the following site.
https://visualstudio.microsoft.com/downloads/
I also found this by reading the following article.
https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30Even though this is an issue in the OpenAI gym, a note in this README would be very helpful.
scaling output from tanh activation in the actor
def act(self, state, add_noise=True): """Returns actions for given state as per current policy.""" state = torch.from_numpy(state).float().to(device) self.actor_local.eval() with torch.no_grad(): action = self.actor_local(state).cpu().data.numpy() self.actor_local.train() if add_noise: action += self.noise.sample() return np.clip(action, -1, 1)
The above looks like it is clipping the action to a range within -1 to 1 but Pendulum-v0 has action range of -2 to 2 doesn't it? How does this work out?
low computation resource utilization
I ran reinforce on the server of my lab, which have RTX 3090.
Dramatically, the gpu usage is about 30 percent when running REINCORCE on one card. At the same time the cpu usage percent is not greater than 80%.
Can you tell me what is the real problem. Is that the problem of REINFORCE of is there something wrong with my code ?
I would appreciate it so much if you could help figure out what cause the problem, thank you in advance.OUNoise should use normal distribution
The current implementation uses random.random() which I believe is uniform distribution between [0,1). This can negatively affect exploration abilities of DDPG agent, since noise will have positive bias.
Installation on windows 10 failed
Hello,
I followed the instruction and tried to install from the requirement.txt in python folder. But unfortunately I got the following error message. Can you please help to resolve?
Thanks,
Kim(drlnd) deep-reinforcement-learning\python
pip install .
Collecting torch==0.4.0 (from unityagents==0.4.0)
Could not find a version that satisfies the requirement torch==0.4.0 (from unityagents==0.4.0) (from versions: 0.1.2, 0.1.2.post1)
No matching distribution found for torch==0.4.0 (from unityagents==0.4.0)Use Pseudocount of Ones to Avoid Divide by Zero
In Monte Carlo Solution Notebook and the assignment notebook, the count dictionary (N) uses default value of zeros. Since not all actions at certain state will get updated (especially in my case of using First-Visit MC Prediction), it is better to use default value of ones.
To replicate the issue:
$ python Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.array([2, 0]) / np.array([1, 0]) __main__:1: RuntimeWarning: invalid value encountered in true_divide array([ 2., nan])
Thinking of replacing this:
N = defaultdict(lambda: np.zeros(env.action_space.n))
with this:
N = defaultdict(lambda: np.ones(env.action_space.n))
This implementation is from the cheatsheet. The textbook has no issue as it only mentions
average(Returns())
.REINFORCE Correction
Hello,
In deep-reinforcement-learning/reinforce/REINFORCE.ipynb
R is implemented as a single value in the following code:
discounts = [gamma**i for i in range(len(rewards)+1)] R = sum([a*b for a,b in zip(discounts, rewards)])
However it should be implemented as discounted values as follow:
discounted_rewards = [] for t in range(len(rewards)): Gt = 0 pwr = 0 for r in rewards[t:]: Gt = Gt + gamma**pwr * r pwr = pwr + 1 discounted_rewards.append(Gt) policy_loss = [] for log_prob, Gt in zip(saved_log_probs, discounted_rewards): policy_loss.append(-log_prob * Gt)
This important correction is compatible with the REINFORCE algorithm and leads to a faster and more stable training as shown in the figure.
AttributeError: module 'gym.envs.box2d' has no attribute 'LunarLander'
Thank u a lot for your program, and here is a problem i met when i ran the deep-reinforcement-learning/dqn/solution/Deep_Q_Network_Solution.ipynb on the google colab platform.
as u can see, I cant make the env successfully.ERROR: No matching distribution found for tensorflow==1.7.1 (from unityagents==0.4.0)
Hi guys,
I have a couple of questions:
- Is the TensorFlow successful installation necessary for the package? Based on #1 it might not even be required, though the reply was 2 years ago.
- In case it is required, would TS v2 do the trick? I remember that there's a bit of change in the API but no idea what.
- Why is the suggestion to use conda for the environment and then install packages via pip? Doesn't conda have all required dependencies?
Below is my bash log from trying to install this project's dependences. I have initially initially tried
pip install
, then, after some investigation, I took steps below. My OS is Ubuntu 20.04.(drlnd) kretyn@junk:~/courses/DeRL/deep-reinforcement-learning/python$ pip install --upgrade pip Collecting pip Downloading pip-20.1-py2.py3-none-any.whl (1.5 MB) |████████████████████████████████| 1.5 MB 706 kB/s Installing collected packages: pip Successfully installed pip-20.1 (drlnd) kretyn@junk:~/courses/DeRL/deep-reinforcement-learning/python$ python --version Python 3.6.10 :: Anaconda, Inc. (drlnd) kretyn@junk:~/courses/DeRL/deep-reinforcement-learning/python$ pip install . Processing /home/kretyn/courses/DeRL/deep-reinforcement-learning/python Requirement already satisfied: Pillow>=4.2.1 in /usr/lib/python3/dist-packages (from unityagents==0.4.0) (7.0.0) Collecting docopt Using cached docopt-0.6.2.tar.gz (25 kB) Collecting grpcio==1.11.0 Using cached grpcio-1.11.0.tar.gz (14.2 MB) Collecting ipykernel Using cached ipykernel-5.2.1-py3-none-any.whl (118 kB) Collecting jupyter Using cached jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB) Collecting matplotlib Using cached matplotlib-3.2.1-cp38-cp38-manylinux1_x86_64.whl (12.4 MB) Requirement already satisfied: numpy>=1.11.0 in /usr/lib/python3/dist-packages (from unityagents==0.4.0) (1.17.4) Collecting pandas Using cached pandas-1.0.3-cp38-cp38-manylinux1_x86_64.whl (10.0 MB) Collecting protobuf==3.5.2 Using cached protobuf-3.5.2-py2.py3-none-any.whl (388 kB) Collecting pytest>=3.2.2 Using cached pytest-5.4.1-py3-none-any.whl (246 kB) Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from unityagents==0.4.0) (5.3.1) Requirement already satisfied: scipy in /home/kretyn/.local/lib/python3.8/site-packages (from unityagents==0.4.0) (1.4.1) ERROR: Could not find a version that satisfies the requirement tensorflow==1.7.1 (from unityagents==0.4.0) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4) ERROR: No matching distribution found for tensorflow==1.7.1 (from unityagents==0.4.0)
Discretization Issue when Creating Uniform Grid
In Discretization Solution notebook, space
[0.2 , -1.9]
should be mapped into grid[6, 3]
as described beforeIn [8]
. But the solution ofIn [8]
is[5, 3]
instead. I did the debugging and found that the issue caused bycreate_uniform_grid
function. This notebook produces the expected result.TypeError when passing device
I'm not sure why this is happening. I've printed the type of
device
and find that it's of class torch.device. The relevant code is below (unchanged from Udacity's version aside from the print statement) and the error is below that. Could it be an issue with ML-Agents?device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print(type(device)) class Agent(): """Interacts with and learns from the environment.""" def __init__(self, state_size, action_size, random_seed): """Initialize an Agent object. Params ====== state_size (int): dimension of each state action_size (int): dimension of each action random_seed (int): random seed """ self.state_size = state_size self.action_size = action_size self.seed = random.seed(random_seed) # Actor Network (w/ Target Network) self.actor_local = Actor(state_size, action_size, random_seed).to(device) self.actor_target = Actor(state_size, action_size, random_seed).to(device) self.actor_optimizer = optim.Adam(self.actor_local.parameters(), lr=LR_ACTOR)
Traceback (most recent call last): File "train.py", line 87, in <module> agent_1 = Agent(state_size=48, action_size=action_size, random_seed=0) File "C:\Users\Tester\ml-agents\ml-agents\mlagents\trainers\ddpg\ddpg_agent.py", line 39, in __init__ self.actor_local = Actor(state_size, action_size, random_seed).to(device) File "C:\Users\Tester\ml-agents\ml-agents\mlagents\trainers\ddpg\model.py", line 29, in __init__ self.fc3 = nn.Linear(fc2_units, action_size) File "C:\Users\Tester\AppData\Local\conda\conda\envs\ml-agents\lib\site-packages\torch\nn\modules\linear.py", line 51, in __init__ self.weight = Parameter(torch.Tensor(out_features, in_features)) TypeError: new() received an invalid combination of arguments - got (google.protobuf.pyext._message.RepeatedScalarContainer, int), but expected one of: * (torch.device device) * (torch.Storage storage) * (Tensor other) * (tuple of ints size, torch.device device) didn't match because some of the arguments have invalid types: (e[31;1mgoogle.protobuf.pyext._message.RepeatedScalarContainere[0m, e[31;1minte[0m) * (object data, torch.device device) didn't match because some of the arguments have invalid types: (e[31;1mgoogle.protobuf.pyext._message.RepeatedScalarContainere[0m, e[31;1minte[0m)
Banana.exe Not working
When trying to complete the p1_navigation project, I am not able to open the environment Banana.exe neither from Ubuntu 20.04 nor Windows 10. When I try to run the code cell listed below,
env = UnityEnvironment(file_name="Banana_Windows_x86_64/Banana.exe")
The error that I get is the following:
`timeout Traceback (most recent call last)
~\anaconda3\envs\udacity_rl\lib\site-packages\unityagents\environment.py in init(self, file_name, worker_id, base_port, curriculum)
98 self._socket.listen(1)
---> 99 self._conn, _ = self._socket.accept()
100 self._conn.settimeout(30)~\anaconda3\envs\udacity_rl\lib\socket.py in accept(self)
292 """
--> 293 fd, addr = self._accept()
294 sock = socket(self.family, self.type, self.proto, fileno=fd)timeout: timed out
During handling of the above exception, another exception occurred:
UnityTimeOutException Traceback (most recent call last)
in
----> 1 env = UnityEnvironment(file_name="Banana_Windows_x86_64/Banana.exe")~\anaconda3\envs\udacity_rl\lib\site-packages\unityagents\environment.py in init(self, file_name, worker_id, base_port, curriculum)
102 p = json.loads(p)
103 except socket.timeout as e:
--> 104 raise UnityTimeOutException(
105 "The Unity environment took too long to respond. Make sure {} does not need user interaction to "
106 "launch and that the Academy and the external Brain(s) are attached to objects in the Scene."UnityTimeOutException: The Unity environment took too long to respond. Make sure Banana_Windows_x86_64/Banana does not need user interaction to launch and that the Academy and the external Brain(s) are attached to objects in the Scene.
`Please help. Thank you.
Unable to use unityagents 0.4.0 with MLAgents 0.8.1
A possible plagiarism incident?
The code of this paper, https://github.com/WenhangBao/Multi-Agent-RL-for-Liquidation has an uncanny similarity to https://github.com/udacity/deep-reinforcement-learning/tree/master/finance. The paper has since been submitted to ICML 2019 proceeding.
Either the authors of the paper or Udacity must be the original author. Since https://github.com/udacity/deep-reinforcement-learning/tree/master/finance was committed earlier (~10 months ago), I have a tentative conclusion that the Udacity team was the original author of the code.
I am raising this issue because I have been doing research on similar system, and we want to credit the right party.
Banana environment throws a timeout on Windows64
C:\ProgramData\Anaconda3\envs\drlnd\python.exe: No module named ipykernel
Hello everyone,
I was following the tutorial to install ml agents, but when executing the line:
python -m ipykernel install --user --name drlnd --display-name "drlnd"I found Error :
C:\ProgramData\Anaconda3\envs\drlnd\python.exe: No module named ipykernel
I'm on windows 10 and I've installed anaconda 3 python 3.6
In the bipedal walker, errors occur
File "c:\windows\system32\gym\gym\envs\box2d\bipedal_walker.py", line 383, in step
self.joints[0].motorSpeed = float(SPEED_HIP * np.sign(action[0]))
TypeError: only size-1 arrays can be converted to Python scalarsUnable to use deep reinforcement learning repository with latest MLAgents
Untill now, I am only able use this repository with MLAgents 0.4 version. How can I use it with latest versions of MLAgents.
Calculate correctly the fan-in for DDPG model
fan_in = layer.weight.data.size()[0]
. This is wrong, because fan-in is defined as the maximum number of input units to the layer. The weight matrix is transposed (!), thus we need to access the second component of the size, i.e.fan_in = layer.weight.data.size()[1]
See example of correct implementation using fan-in here: https://pytorch.org/docs/stable/_modules/torch/nn/init.html#kaiming_normal_
specificallydef _calculate_fan_in_and_fan_out(tensor)
DDPG pendulum action scaling
In the file: deep-reinforcement-learning/ddpg-pendulum/DDPG.ipynb
In the Pendulum-v0 environment, the actions are in the range from -2.0 to +2.0
And hence, actions must be scaled before passing to the environment (the actor produces a tanh values from -1 to 1, limiting the agent to act properly)action = agent.act(state) action *= 2.0 # add this line of code next_state, reward, done, _ = env.step(action)
After I added this scaling factor, the agent converged much faster and even much better
Moreover, I used a simpler networks with only 32, 128 units instead of the 400 and 300.
Best Regards
DQL size/reshape error
Hi.
I am new with DQL. I am working with AirSim simulator, and I coded an algorithm on Python on Visual Studio, using keras, to teatch to the drone to avoid obstacles. When I launched the train, the algorithm looks like to work normaly in the begining, but after iteration 400, 1300 or 2308 (it always changes) I have the following error that appear.
I used 'reshape' function only in 2 functions :
Here below is my full code.
`
import numpy as np
import airsim
import time
import math
import tensorflow as tf
import keras
from airsim.utils import to_eularian_angles
from airsim.utils import to_quaternion
from keras.layers import Conv2D,Dense
from keras.layers import Activation
from keras.layers import MaxPool2D
from keras.layers import Dropout
from keras.layers import Input
import keras.backend as K
from keras.models import load_model
from keras import Input
from keras.layers import Flatten
from keras.activations import softmax,elu,relu
from keras.optimizers import Adam
from keras.optimizers import adam
from keras.models import Sequential
from keras.optimizers import Adam, RMSprop
from keras.models import Model
#tf.compat.v1.disable_eager_execution()
import randomfrom collections import deque
client=airsim.MultirotorClient()
z=-5
memory_size=10000000
#pos_0=client.getMultirotorState().kinematics_estimated.position#state_space=[84, 84]
#action_size=3def OurModel(state_size,action_space):
X_input=Input(state_size,name='Input') X=Conv2D(filters=32,kernel_size=(8,8),strides=(4,4),padding='valid',activation='relu')(X_input) X=MaxPool2D(pool_size=(2,2))(X) X=Conv2D(filters=64,kernel_size=(4,4),strides=(2,2),padding='valid',activation='relu')(X) X=MaxPool2D(pool_size=(2,2))(X) X=Conv2D(filters=64,kernel_size=(1,1),strides=(1,1),padding='valid',activation='relu')(X) X=Flatten()(X) X=Dense(525,activation='relu')(X) X=Dense(300,activation='relu')(X) X_output=Dense(action_space,activation='softmax')(X) model=Model(inputs = X_input, outputs = X_output) model.compile(loss="mse", optimizer=RMSprop(lr=0.0005, rho=0.95, epsilon=0.01), metrics=["accuracy"]) model.summary() return model
class MemoryClass():
def init(self,memory_size):
self.memory_size=memory_size
self.buffer=deque(maxlen=memory_size)
self.batch_size=64
#self.start_training=20def add(self,experience): self.buffer.append(experience) def sample(self): buffer_size=len(self.buffer) idx=np.random.choice(np.arange(buffer_size),self.batch_size,False) return [self.buffer[k] for k in idx] def replay(self): batch=self.sample() next_states_mb=np.array([each[0] for each in batch],ndmin=3) actions_mb=np.array([each[1] for each in batch]) states_mb=np.array([each[2] for each in batch],ndmin=3) rewards_mb=np.array([each[3] for each in batch]) dones_mb=np.array([each[4] for each in batch]) return next_states_mb, actions_mb, states_mb, rewards_mb,dones_mb
class Agent():
def init(self):
self.state_size=(84, 84,1)
self.action_space=3
#self.DQNNetwork=DQNN(state_size,action_space)
self.model1=OurModel(self.state_size,self.action_space)
self.memory_size=10000000
self.memory=MemoryClass(memory_size)
self.gamma=0.75
self.epsilon_min=0.001
self.epsilon=1.0
self.epsilon_decay=0.995
self.episodes=120
self.max_step=120
self.step=0
self.count=0
self.pos0=client.getMultirotorState().kinematics_estimated.position
self.z=-5
self.goal_pos=[50,50]
self.initial_position=[0,0]
self.initial_distance=np.sqrt((self.initial_position[0]-self.goal_pos[0])**2+(self.initial_position[1]-self.goal_pos[1])**2)
self.batch_size=30def generate_state(self): responses = client.simGetImages([airsim.ImageRequest("0", airsim.ImageType.DepthPerspective, True, False)]) img1d = np.array(responses[0].image_data_float, dtype=np.float) #img1d = 255/np.maximum(np.ones(img1d.size), img1d) img2d = np.reshape(img1d, (responses[0].height, responses[0].width)) from PIL import Image image = Image.fromarray(img2d) im_final = np.array(image.resize((84, 84)).convert('L')) im_final=np.reshape(im_final,[*self.state_size]) return im_final def load(self, name): self.model1 = load_model(name) def save(self, name): self.model1.save(name) def get_yaw(self): quaternions=client.getMultirotorState().kinematics_estimated.orientation a,b,yaw_rad=to_eularian_angles(quaternions) yaw_deg=math.degrees(yaw_rad) return yaw_deg,yaw_rad def rotate_left(self): client.moveByRollPitchYawrateZAsync(0,0,0.2,self.z,3) n=int(3*5) D=[] done=False for k in range(n): collision=client.simGetCollisionInfo().has_collided done=collision D.append(collision) time.sleep(3/(n*300)) if True in D: done=True time.sleep(3/300) time.sleep(5/300) new_state=self.generate_state() return done,new_state def rotate_right(self): client.moveByRollPitchYawrateZAsync(0,0,-0.2,self.z,3) n=int(3*5) D=[] done=False for k in range(n): collision=client.simGetCollisionInfo().has_collided done=collision D.append(collision) time.sleep(3/(n*300)) if True in D: done=True time.sleep(3/300) time.sleep(5/300) new_state=self.generate_state() return done,new_state def move_forward(self): yaw_deg,yaw_rad=self.get_yaw() #need rad vx=math.cos(yaw_rad)*0.25 vy=math.sin(yaw_rad)*0.25 client.moveByVelocityAsync(vx,vy,0,10,airsim.DrivetrainType.ForwardOnly,airsim.YawMode(False)) done=False n=int(10*5) D=[] done=False for k in range(n): collision=client.simGetCollisionInfo().has_collided D.append(collision) time.sleep(3.4/(n*300)) if True in D: done=True new_state=self.generate_state() time.sleep(15/300) return done,new_state def step_function(self,action): # Returns action,new_state, done # Move forward 3 meters by Pitch done=False if action==0: done,new_state=self.move_forward() # Rotate to right by 20 degress elif action==1: done,new_state=self.rotate_right() # Rotate to left by 30 degress elif action==2: done,new_state=self.rotate_left() self.count+=1 return action,new_state,done def compute_reward(self,done): reward=0.0 pos_now=client.getMultirotorState().kinematics_estimated.position dist=np.sqrt((pos_now.x_val-self.goal_pos[0])**2+(pos_now.y_val-self.goal_pos[1])**2) print('dist: ',dist) if done==False and self.step<self.max_step: reward+=(self.initial_distance-dist)*6 if 10<self.step<40 and dist>self.initial_distance*3/4: reward=-2-(self.step-10) elif 50<self.step<80 and dist>self.initial_distance*2/4: reward=-36-(self.step-50) elif 80<self.step<self.max_step and dist>self.initial_distance*1/4: reward=-80-(self.step-80) elif dist<3: reward+=650.0 elif done==True and dist>3: reward-=180.0 print('reward: ',reward) return reward def choose_action(self,state): r=np.random.rand() print('r: ',r) print('epsilon: ',self.epsilon) print() if r>self.epsilon and self.count>64: #print('predicted action') state=np.reshape(state,[1,*self.state_size]) #action=np.argmax(self.DQNNetwork.OurModel.predict(state)) action=np.argmax(self.model1.predict(state)) else: action=random.randrange(self.action_space) return action def reset(self): client.reset() def initial_pos(self): client.enableApiControl(True) v=0.6 #z0=client.getMultirotorState().kinematics_estimated.position.z_val #t=np.abs(z0-self.z)/v client.moveToZAsync(self.z,v).join() #time.sleep(t+1) def epsilon_policy(self): # Update epsilon if self.epsilon>self.epsilon_min: self.epsilon*=self.epsilon_decay def train(self): for episode in range(self.episodes): self.initial_pos() self.step=0 state=self.generate_state() done=False total_reward,episode_rewards=[],[] while self.step<self.max_step: self.step+=1 print('count:', self.count) choice=self.choose_action(state) self.epsilon_policy() action,next_state,done=self.step_function(choice) reward=self.compute_reward(done) episode_rewards.append(reward) if done==True: total_reward.append(sum(episode_rewards)) self.memory.add([next_state,action,state,reward,done]) self.step=self.max_step self.reset() print("episode: {}, epsilon: {:.5}, total reward :{}".format(episode, self.epsilon,total_reward[-1])) self.save("airsim-dqn.h5") else: state=next_state self.memory.add([next_state,action,state,reward,done]) if len(self.memory.buffer)>64: next_states_mb, actions_mb, states_mb, rewards_mb,dones_mb=self.memory.replay() target = self.model1.predict(states_mb) target_next = self.model1.predict(next_states_mb) for k in range(len(dones_mb)): if dones_mb[k]==True: target[k][actions_mb[k]] = rewards_mb[k] elif dones_mb[k]==False: target[k][actions_mb[k]] = rewards_mb[k] + self.gamma * (np.amax(target_next[k])) self.model1.fit(x=states_mb,y=target,batch_size=self.batch_size)
agent=Agent()
agent.train()`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.