aidudezzz / deepbots-tutorials Goto Github PK
View Code? Open in Web Editor NEWTutorials for the deepbots framework https://github.com/aidudezzz/deepbots
Tutorials for the deepbots framework https://github.com/aidudezzz/deepbots
Make sure all deepbots-tutorials variables follow the snake_case convention.
The code gets stuck in the environment creation step in the env creation step for both the supervisor and emitter tutorials. I am running Windows 10 and have a pretty decent GPU.
Since we are planing to make a major release, tutorials should be updated
I'm getting following error for emitterReceiverSchemeTutorial
:
INFO: supervisorController: Starting controller: python -u supervisorController.py
Traceback (most recent call last):
File "supervisorController.py", line 93, in <module>
supervisor = CartPoleSupervisor()
File "supervisorController.py", line 14, in __init__
self.respawnRobot()
File "supervisorController.py", line 30, in respawnRobot
rootNode = self.supervisor.getRoot() # This gets the root of the scene tree
File "/usr/local/webots/lib/controller/python37/controller.py", line 2888, in <lambda>
__getattr__ = lambda self, name: _swig_getattr(self, Supervisor, name)
File "/usr/local/webots/lib/controller/python37/controller.py", line 96, in _swig_getattr
raise AttributeError("'%s' object has no attribute '%s'" % (class_type.__name__, name))
AttributeError: 'Supervisor' object has no attribute 'supervisor'
WARNING: 'supervisorController' controller exited with status: 1.
I'm using master branch of deepbots and 2021a version of webots.
The link to the PPOAgent.py script in the CartPole beginner tutorial gives a 404 error.
I believe the link should be updated to the following URL to resolve the issue;
Thanks!
Hi, can deepbots be used in a multi robot setting? (i.e., where each robot has its own controller and interacts with the others)
Thanks,
Julio
Basic usage of new version is using the pre-implemented simple reset method. A tutorial should be added for resetting the "old way", i.e. via reloading/respawning the robot. This is already contained in the beginner cartpole tutorial.
#状态输入是相机图像
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.models as models
import gc
from torch.distributions import Categorical
from torch import from_numpy, no_grad, save, load, tensor, clamp
from torch import float as torch_float
from torch import long as torch_long
from torch import min as torch_min
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
import numpy as np
from torch import manual_seed
from collections import namedtuple
import torch
import tracemalloc
Transition = namedtuple('Transition', ['state', 'action', 'a_log_prob', 'reward', 'next_state'])
class PPOAgent:
"""
PPOAgent implements the PPO RL algorithm (https://arxiv.org/abs/1707.06347).
It works with a set of discrete actions.
It uses the Actor and Critic neural network classes defined below.
"""
def __init__(self, numberOfActorOutputs, clip_param=0.2, max_grad_norm=0.5, ppo_update_iters=5,
batch_size=8, gamma=0.99, use_cuda=False, actor_lr=0.001, critic_lr=0.003, seed=None):
super().__init__()
if seed is not None:
manual_seed(seed)
# Hyper-parameters
self.clip_param = clip_param
self.max_grad_norm = max_grad_norm
self.ppo_update_iters = ppo_update_iters
self.batch_size = batch_size
self.gamma = gamma
self.use_cuda = use_cuda
# models
self.actor_net = Actor(numberOfActorOutputs)
self.critic_net = Critic()
if self.use_cuda:
self.actor_net.cuda()
self.critic_net.cuda()
# Create the optimizers
self.actor_optimizer = optim.Adam(self.actor_net.parameters(), actor_lr)
self.critic_net_optimizer = optim.Adam(self.critic_net.parameters(), critic_lr)
# Training stats
self.buffer = []
def work(self, agentInput, type_="simple"):
"""
type_ == "simple"
Implementation for a simple forward pass.
type_ == "selectAction"
Implementation for the forward pass, that returns a selected action according to the probability
distribution and its probability.
type_ == "selectActionMax"
Implementation for the forward pass, that returns the max selected action.
"""
# agentInput = from_numpy(agentInput).float().unsqueeze(0)
if self.use_cuda:
agentInput = agentInput.cuda()
with no_grad():
action_prob = self.actor_net(agentInput)
if type_ == "simple":
output = [action_prob[0][i].data.tolist() for i in range(len(action_prob[0]))]
return output
elif type_ == "selectAction":
c = Categorical(action_prob)
action = c.sample()
return action.item(), action_prob[:, action.item()].item()
elif type_ == "selectActionMax":
return np.argmax(action_prob).item(), 1.0
else:
raise Exception("Wrong type in agent.work(), returning input")
def getValue(self, state):
"""
Gets the value of the current state according to the critic model.
:param state: agentInput
:return: state's value
"""
state = from_numpy(state)
with no_grad():
value = self.critic_net(state)
return value.item()
def save(self, path):
"""
Save actor and critic models in the path provided.
:param path: path to save the models
:return: None
"""
save(self.actor_net.state_dict(), path + '_actor.pkl')
save(self.critic_net.state_dict(), path + '_critic.pkl')
print('模型保存成功')
def load(self, path):
"""
Load actor and critic models from the path provided.
:param path: path where the models are saved
:return: None
"""
actor_state_dict = load(path + '_actor.pkl')
critic_state_dict = load(path + '_critic.pkl')
self.actor_net.load_state_dict(actor_state_dict)
self.critic_net.load_state_dict(critic_state_dict)
print('模型加载成功')
def storeTransition(self, transition):
"""
Stores a transition in the buffer to be used later.
:param transition: state, action, action_prob, reward, next_state
:return: None
"""
self.buffer.append(transition)
print(len(self.buffer))
def trainStep(self, batchSize=None):
"""
Performs a training step or update for the actor and critic models, based on transitions gathered in the
buffer. It then resets the buffer.
If provided with a batchSize, this is used instead of default self.batch_size
:param: batchSize: int
:return: None
"""
tracemalloc.start()
snapshot1 = tracemalloc.take_snapshot()
if batchSize is None:
if len(self.buffer) < self.batch_size:
return
batchSize = self.batch_size
# print(self.buffer[0].state.size())
# mm = [t.state for t in self.buffer]
# print(type(mm))
# state = tensor(mm[0], dtype=torch_float)
state = tensor([t.state.numpy() for t in self.buffer], dtype=torch_float)
state = state.squeeze()
print(state.size())
action = tensor([t.action for t in self.buffer], dtype=torch_long).view(-1, 1)
reward = [t.reward for t in self.buffer]
old_action_log_prob = tensor([t.a_log_prob for t in self.buffer], dtype=torch_float).view(-1, 1)
# Unroll rewards
R = 0
Gt = []
for r in reward[::-1]:
R = r + self.gamma * R
Gt.insert(0, R)
Gt = tensor(Gt, dtype=torch_float)
if self.use_cuda:
state, action, old_action_log_prob = state.cuda(), action.cuda(), old_action_log_prob.cuda()
Gt = Gt.cuda()
for _ in range(self.ppo_update_iters):
for index in BatchSampler(SubsetRandomSampler(range(len(self.buffer))), batchSize, False):
# Calculate the advantage at each step
Gt_index = Gt[index].view(-1, 1)
V = self.critic_net(state[index])
delta = Gt_index - V
advantage = delta.detach()
# Get the current prob
action_prob = self.actor_net(state[index]).gather(1, action[index]) # new policy
# PPO
ratio = (action_prob / old_action_log_prob[index])
surr1 = ratio * advantage
surr2 = clamp(ratio, 1 - self.clip_param, 1 + self.clip_param) * advantage
# update actor network
action_loss = -torch_min(surr1, surr2).mean() # MAX->MIN descent
self.actor_optimizer.zero_grad()
action_loss.backward()
nn.utils.clip_grad_norm_(self.actor_net.parameters(), self.max_grad_norm)
self.actor_optimizer.step()
# update critic network
value_loss = F.mse_loss(Gt_index, V)
self.critic_net_optimizer.zero_grad()
value_loss.backward()
nn.utils.clip_grad_norm_(self.critic_net.parameters(), self.max_grad_norm)
self.critic_net_optimizer.step()
del self.buffer[:]
gc.collect()
print('shit')
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top 10 differences ]")
for stat in top_stats[:10]:
print(stat)
class Actor(nn.Module):
def init(self, numberOfOutputs):
super(Actor, self).init()
self.resnet18 = models.resnet18(pretrained=True)
self.fc1 = nn.Linear(1000, 32)
self.fc2 = nn.Linear(32, 8)
self.action_head = nn.Linear(8, numberOfOutputs)
def forward(self, x):
x = self.resnet18(x)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
action_prob = F.softmax(self.action_head(x), dim=1)
return action_prob
class Critic(nn.Module):
def init(self):
super(Critic, self).init()
self.resnet18 = models.resnet18(pretrained=True)
self.fc1 = nn.Linear(1000, 32)
self.fc2 = nn.Linear(32, 8)
self.state_value = nn.Linear(8, 1)
def forward(self, x):
x = self.resnet18(x)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
value = self.state_value(x)
return value
Hi! Could you please tell me where can I find the method named 'torch_min' and the corresponding folder "torch._C._VariableFunctions"?
Here is your code:
action_loss = -torch_min(surr1, surr2).mean()
After reading some sources, I knew that this function only appears in PyTorch original codes implemented by C\C++. But I still not found this method even I have open the Pytorch repositories. Without the exact perceptive of this method, I cannot subsequently understand the sense of the loss function of actor network in your code. Many thanks!!
Hi, Sorry to trouble you again. In recent months I implemented a 'find-and-avoid' project by myself according to the thought of yours at the algorithm parts and other papers I read at the reward parts. However, the agent finally keeps going around in circles. After checking repeatedly I think the most possible reason is the reward sparsity which commonly leads to poor exploration so that the behaviors fall into the local optimal paradox. Based on this, I hope to discuss two questions with you, hope this will not spend you too much time!
Q1. Based on your experiences, how effectively it can be improved if I deploy the curiosity-driving learning to solve the poverty exploration?
Q2. I found another factor differing from yours is the action mask. By now I still not deploy this method yet. But may I ask you how many it can be improved if this method achieves?
Thank you very much! I'll wait for your reply.
Hi
I follow instructions in the repo but got this error:
Traceback (most recent call last): \
File "supervisorController.py", line 84, in <module>
env = CartpoleRobot()
File "supervisorController.py", line 15, in __init__
self.robot = self.getSelf() # Grab the robot reference from the supervisor to access various robot methods
AttributeError: 'CartpoleRobot' object has no attribute 'getSelf'
WARNING: 'supervisorController' controller exited with status: 1.
I get the NotimplementedError when I run the cartPole tutorial in webot R2022a
INFO: robotSupervisorControlle: Starting controller: python.exe -u robotSupervisorControlle.py Traceback (most recent call last): File "C:\Users\admin\Desktop\ReinLearn\mpsees_rel\webot_world\controllers\robotSupervisorControlle\robotSupervisorControlle.py", line 107, in <module> newObservation, reward, done, info = env.step([selectedAction]) File "C:\Python39\lib\site-packages\deepbots\supervisor\controllers\robot_supervisor.py", line 84, in step self.get_info(), File "C:\Python39\lib\site-packages\deepbots\supervisor\controllers\supervisor_env.py", line 117, in get_info raise NotImplementedError NotImplementedError WARNING: 'robotSupervisorControlle' controller exited with status: 1
Hi
I follow instructions in the repo but got this error:
Traceback (most recent call last): File "supervisorController.py", line 84, in <module> env = CartpoleRobot() File "supervisorController.py", line 15, in __init__ self.robot = self.getSelf() # Grab the robot reference from the supervisor to access various robot methods AttributeError: 'CartpoleRobot' object has no attribute 'getSelf' WARNING: 'supervisorController' controller exited with status: 1.
Hi,
In robotSupervisorController file, line 111, the code:
agent.trainStep(batchSize=step)
does it mean:
agent.trainStep(batchSize=step + 1)
or in the PPO_agent file, line 145,
it will split the samples into (len - 1) & 1, it seems odd.
Thanks a lot.
I followed the tutorial but I don't know why did poleEndPoint.getVelocity returns NAN.
Webots 2020a rev2 will fix some issues related to resetting the Webots world without resetting the controllers (https://cyberbotics.com/doc/reference/supervisor#wb_supervisor_simulation_reset).
When the new version is released, tutorials should be updated to use new deepbots capabilities (aidudezzz/deepbots#25).
Related aidudezzz/deepworlds#8
EDIT: Emitter/receiver tutorial will keep the custom reset as an example, newest robotSupervisor tutorial uses the new reset.
AttributeError: 'CartPoleSupervisor' object has no attribute 'supervisor'
this problem come when i open the emitterReceiverSchemeTutorial's world and i am looking for help to solve this problem
I'm trying to follow the tutorial, but it seems that the link for the cartpole_robot_definition.txt is not working. (Not a big problem though, since the file can be found in the full_project directory here https://github.com/aidudezzz/deepbots-tutorials/blob/update-emitter-receiver-tutorial/robotSupervisorSchemeTutorial/full_project/controllers/robot_supervisor_controller/cartpole_robot_definition.txt
Hi, I am using deepbots to construct an environment for my path planning project. I have successfully executed the CartPole game as the tutorial. When I want to construct an action space of my own, however, I always failed. Just as the same sight of CartPole, I set my action in to discrete 3 instead of 2 meaning my robots got three actions totally:
# six elements include x-axis position, y-axis position, ps0 value, ps1 value, ps6 value, ps7 value.
self.observation_space = Box(low=np.array([-0.25, -0.88, 0.00, 0.00, 0.00, 0.00]),
high=np.array([0.75, 0.12, 4095.00, 4095.00, 4095.00, 4095.00]),
dtype=np.float64)
self.action_space = Discrete(3)
self.robot = self.getSelf()
self.leftMotor = self.getDevice("left wheel motor")
self.rightMotor = self. getDevice("right wheel motor")
self.leftMotor.setPosition(float('inf'))
self.rightMotor.setPosition(float('inf'))
self.leftMotor.setVelocity(0.0)
self.rightMotor.setVelocity(0.0)`
Additionally, I did the same setting in apply_action() method according to the tutorial, in which I mapped the three actions to my robots with detailed behaviors:
def apply_action(self, action):
action = int(action[0])
print("action: ", action)
l_speed = 3.14
r_speed = 3.14
if action == 0: # go straight
l_speed = 3.14
r_speed = 3.14
if action == 1: # turn right
l_speed = 3.14
r_speed = -3.14
if action == 2: # turn left
l_speed = -3.14
r_speed = 3.14
self.leftMotor.setVelocity(l_speed)
self.rightMotor.setVelocity(r_speed)
Here, The control algorithms may be not correct but the robots should at least stochastically move. During iteration, however, I found that my robot (in my projects, I used an e-punk robot) never moved even one step no matter how many timestep I set. I don't know what happened to my code and hope anyone can help me to fix it. Many thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.