aidudezzz / deepbots-tutorials Goto Github PK

View Code? Open in Web Editor NEW

44.0 44.0 13.0 8.67 MB

Tutorials for the deepbots framework https://github.com/aidudezzz/deepbots

Python 100.00%

deepbots-tutorials's People

Contributors

Stargazers

Watchers

Forkers

mdecourse mittoyts eakirtas piyush-555 nikoskokkinis qimingliusjtu kheele lxxblue xfdywy yuanzhehu kalfasgeorgios paul-antoineletolguenec lailufangchang

deepbots-tutorials's Issues

Usage of snake_case

Make sure all deepbots-tutorials variables follow the snake_case convention.

Unable to run the last step of the tutorial in Webots R2022b.

The code gets stuck in the environment creation step in the env creation step for both the supervisor and emitter tutorials. I am running Windows 10 and have a pretty decent GPU.

Update the tutorials according to latest release

Since we are planing to make a major release, tutorials should be updated

Emitter-Receiver scheme not working

I'm getting following error for emitterReceiverSchemeTutorial:

INFO: supervisorController: Starting controller: python -u supervisorController.py
Traceback (most recent call last):
  File "supervisorController.py", line 93, in <module>
    supervisor = CartPoleSupervisor()
  File "supervisorController.py", line 14, in __init__
    self.respawnRobot()
  File "supervisorController.py", line 30, in respawnRobot
    rootNode = self.supervisor.getRoot()  # This gets the root of the scene tree
  File "/usr/local/webots/lib/controller/python37/controller.py", line 2888, in <lambda>
    __getattr__ = lambda self, name: _swig_getattr(self, Supervisor, name)
  File "/usr/local/webots/lib/controller/python37/controller.py", line 96, in _swig_getattr
    raise AttributeError("'%s' object has no attribute '%s'" % (class_type.__name__, name))
AttributeError: 'Supervisor' object has no attribute 'supervisor'
WARNING: 'supervisorController' controller exited with status: 1.

I'm using master branch of deepbots and 2021a version of webots.

404 (File Not Found) on the CartPole beginner tutorial

The link to the PPOAgent.py script in the CartPole beginner tutorial gives a 404 error.

I believe the link should be updated to the following URL to resolve the issue;

https://github.com/aidudezzz/deepbots-tutorials/blob/master/cartPoleTutorial/full_project/controllers/supervisorController/PPOAgent.py

Thanks!

Rename old cartpole tutorial and README

Multi robot setting

Hi, can deepbots be used in a multi robot setting? (i.e., where each robot has its own controller and interacts with the others)
Thanks,

Julio

Add tutorial for respawn/reset

Basic usage of new version is using the pre-implemented simple reset method. A tutorial should be added for resetting the "old way", i.e. via reloading/respawning the robot. This is already contained in the beginner cartpole tutorial.

Ask for help to solve the problem of increasing memory. This part of the code has only been modified a little, and the memory will keep increasing during the training process.

#状态输入是相机图像
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.models as models
import gc
from torch.distributions import Categorical
from torch import from_numpy, no_grad, save, load, tensor, clamp
from torch import float as torch_float
from torch import long as torch_long
from torch import min as torch_min
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
import numpy as np
from torch import manual_seed
from collections import namedtuple
import torch
import tracemalloc

Transition = namedtuple('Transition', ['state', 'action', 'a_log_prob', 'reward', 'next_state'])

class PPOAgent:
"""
PPOAgent implements the PPO RL algorithm (https://arxiv.org/abs/1707.06347).
It works with a set of discrete actions.
It uses the Actor and Critic neural network classes defined below.
"""

def __init__(self, numberOfActorOutputs, clip_param=0.2, max_grad_norm=0.5, ppo_update_iters=5,
             batch_size=8, gamma=0.99, use_cuda=False, actor_lr=0.001, critic_lr=0.003, seed=None):
    super().__init__()
    if seed is not None:
        manual_seed(seed)

    # Hyper-parameters
    self.clip_param = clip_param
    self.max_grad_norm = max_grad_norm
    self.ppo_update_iters = ppo_update_iters
    self.batch_size = batch_size
    self.gamma = gamma
    self.use_cuda = use_cuda

    # models
    self.actor_net = Actor(numberOfActorOutputs)
    self.critic_net = Critic()

    if self.use_cuda:
        self.actor_net.cuda()
        self.critic_net.cuda()

    # Create the optimizers
    self.actor_optimizer = optim.Adam(self.actor_net.parameters(), actor_lr)
    self.critic_net_optimizer = optim.Adam(self.critic_net.parameters(), critic_lr)

    # Training stats
    self.buffer = []

def work(self, agentInput, type_="simple"):
    """
    type_ == "simple"
        Implementation for a simple forward pass.
    type_ == "selectAction"
        Implementation for the forward pass, that returns a selected action according to the probability
        distribution and its probability.
    type_ == "selectActionMax"
        Implementation for the forward pass, that returns the max selected action.
    """
    # agentInput = from_numpy(agentInput).float().unsqueeze(0)
    if self.use_cuda:
        agentInput = agentInput.cuda()
    with no_grad():
        action_prob = self.actor_net(agentInput)

    if type_ == "simple":
        output = [action_prob[0][i].data.tolist() for i in range(len(action_prob[0]))]
        return output
    elif type_ == "selectAction":
        c = Categorical(action_prob)
        action = c.sample()
        return action.item(), action_prob[:, action.item()].item()
    elif type_ == "selectActionMax":
        return np.argmax(action_prob).item(), 1.0
    else:
        raise Exception("Wrong type in agent.work(), returning input")

def getValue(self, state):
    """
    Gets the value of the current state according to the critic model.

    :param state: agentInput
    :return: state's value
    """
    state = from_numpy(state)
    with no_grad():
        value = self.critic_net(state)
    return value.item()

def save(self, path):
    """
    Save actor and critic models in the path provided.
    :param path: path to save the models
    :return: None
    """
    save(self.actor_net.state_dict(), path + '_actor.pkl')
    save(self.critic_net.state_dict(), path + '_critic.pkl')
    print('模型保存成功')

def load(self, path):
    """
    Load actor and critic models from the path provided.
    :param path: path where the models are saved
    :return: None
    """
    actor_state_dict = load(path + '_actor.pkl')
    critic_state_dict = load(path + '_critic.pkl')
    self.actor_net.load_state_dict(actor_state_dict)
    self.critic_net.load_state_dict(critic_state_dict)
    print('模型加载成功')

def storeTransition(self, transition):
    """
    Stores a transition in the buffer to be used later.

    :param transition: state, action, action_prob, reward, next_state
    :return: None
    """
    self.buffer.append(transition)
    print(len(self.buffer))

def trainStep(self, batchSize=None):
    """
    Performs a training step or update for the actor and critic models, based on transitions gathered in the
    buffer. It then resets the buffer.
    If provided with a batchSize, this is used instead of default self.batch_size

    :param: batchSize: int
    :return: None
    """
    tracemalloc.start()

    snapshot1 = tracemalloc.take_snapshot()

    if batchSize is None:
        if len(self.buffer) < self.batch_size:
            return
        batchSize = self.batch_size
    # print(self.buffer[0].state.size())
    # mm = [t.state for t in self.buffer]
    # print(type(mm))
    # state = tensor(mm[0], dtype=torch_float)
    state = tensor([t.state.numpy() for t in self.buffer], dtype=torch_float)
    state = state.squeeze()
    print(state.size())
    action = tensor([t.action for t in self.buffer], dtype=torch_long).view(-1, 1)
    reward = [t.reward for t in self.buffer]
    old_action_log_prob = tensor([t.a_log_prob for t in self.buffer], dtype=torch_float).view(-1, 1)

    # Unroll rewards
    R = 0
    Gt = []
    for r in reward[::-1]:
        R = r + self.gamma * R
        Gt.insert(0, R)
    Gt = tensor(Gt, dtype=torch_float)

    if self.use_cuda:
        state, action, old_action_log_prob = state.cuda(), action.cuda(), old_action_log_prob.cuda()
        Gt = Gt.cuda()

    for _ in range(self.ppo_update_iters):
        for index in BatchSampler(SubsetRandomSampler(range(len(self.buffer))), batchSize, False):
            # Calculate the advantage at each step
            Gt_index = Gt[index].view(-1, 1)
            V = self.critic_net(state[index])
            delta = Gt_index - V
            advantage = delta.detach()

            # Get the current prob
            action_prob = self.actor_net(state[index]).gather(1, action[index])  # new policy

            # PPO
            ratio = (action_prob / old_action_log_prob[index])
            surr1 = ratio * advantage
            surr2 = clamp(ratio, 1 - self.clip_param, 1 + self.clip_param) * advantage

            # update actor network
            action_loss = -torch_min(surr1, surr2).mean()  # MAX->MIN descent
            self.actor_optimizer.zero_grad()
            action_loss.backward()
            nn.utils.clip_grad_norm_(self.actor_net.parameters(), self.max_grad_norm)
            self.actor_optimizer.step()

            # update critic network
            value_loss = F.mse_loss(Gt_index, V)
            self.critic_net_optimizer.zero_grad()
            value_loss.backward()
            nn.utils.clip_grad_norm_(self.critic_net.parameters(), self.max_grad_norm)
            self.critic_net_optimizer.step()

    del self.buffer[:]
    gc.collect()
    print('shit')
    snapshot2 = tracemalloc.take_snapshot()

    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("[ Top 10 differences ]")
    for stat in top_stats[:10]:
        print(stat)

class Actor(nn.Module):

def init(self, numberOfInputs, numberOfOutputs):

super(Actor, self).init()

self.fc1 = nn.Linear(numberOfInputs, 10000)

self.fc2 = nn.Linear(10000, 10000)

self.action_head = nn.Linear(10000, numberOfOutputs)

def forward(self, x):

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

action_prob = F.softmax(self.action_head(x), dim=1)

return action_prob

class Critic(nn.Module):

def init(self, numberOfInputs):

super(Critic, self).init()

self.fc1 = nn.Linear(numberOfInputs, 10000)

self.fc2 = nn.Linear(10000, 10000)

self.state_value = nn.Linear(10000, 1)

def forward(self, x):

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

value = self.state_value(x)

return value

class Actor(nn.Module):
def init(self, numberOfOutputs):
super(Actor, self).init()
self.resnet18 = models.resnet18(pretrained=True)
self.fc1 = nn.Linear(1000, 32)
self.fc2 = nn.Linear(32, 8)
self.action_head = nn.Linear(8, numberOfOutputs)

def forward(self, x):
    x = self.resnet18(x)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    action_prob = F.softmax(self.action_head(x), dim=1)
    return action_prob

class Critic(nn.Module):
def init(self):
super(Critic, self).init()
self.resnet18 = models.resnet18(pretrained=True)
self.fc1 = nn.Linear(1000, 32)
self.fc2 = nn.Linear(32, 8)
self.state_value = nn.Linear(8, 1)

def forward(self, x):
    x = self.resnet18(x)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    value = self.state_value(x)
    return value

The problem about the loss function of actor network

Hi! Could you please tell me where can I find the method named 'torch_min' and the corresponding folder "torch._C._VariableFunctions"?
Here is your code:

action_loss = -torch_min(surr1, surr2).mean()

After reading some sources, I knew that this function only appears in PyTorch original codes implemented by C\C++. But I still not found this method even I have open the Pytorch repositories. Without the exact perceptive of this method, I cannot subsequently understand the sense of the loss function of actor network in your code. Many thanks!!

A doubt for behaviors that the agent finally converged

Hi, Sorry to trouble you again. In recent months I implemented a 'find-and-avoid' project by myself according to the thought of yours at the algorithm parts and other papers I read at the reward parts. However, the agent finally keeps going around in circles. After checking repeatedly I think the most possible reason is the reward sparsity which commonly leads to poor exploration so that the behaviors fall into the local optimal paradox. Based on this, I hope to discuss two questions with you, hope this will not spend you too much time!
Q1. Based on your experiences, how effectively it can be improved if I deploy the curiosity-driving learning to solve the poverty exploration?
Q2. I found another factor differing from yours is the action mask. By now I still not deploy this method yet. But may I ask you how many it can be improved if this method achieves?
Thank you very much! I'll wait for your reply.

AttributeError: 'CartpoleRobot' object has no attribute 'getSelf'

Hi
I follow instructions in the repo but got this error:

Traceback (most recent call last): \
  File "supervisorController.py", line 84, in <module>
    env = CartpoleRobot()
  File "supervisorController.py", line 15, in __init__
    self.robot = self.getSelf()  # Grab the robot reference from the supervisor to access various robot methods
AttributeError: 'CartpoleRobot' object has no attribute 'getSelf'
WARNING: 'supervisorController' controller exited with status: 1.

NotImplementedError get_info ; 0.1.3 dev2; tutorial

I get the NotimplementedError when I run the cartPole tutorial in webot R2022a
INFO: robotSupervisorControlle: Starting controller: python.exe -u robotSupervisorControlle.py Traceback (most recent call last): File "C:\Users\admin\Desktop\ReinLearn\mpsees_rel\webot_world\controllers\robotSupervisorControlle\robotSupervisorControlle.py", line 107, in <module> newObservation, reward, done, info = env.step([selectedAction]) File "C:\Python39\lib\site-packages\deepbots\supervisor\controllers\robot_supervisor.py", line 84, in step self.get_info(), File "C:\Python39\lib\site-packages\deepbots\supervisor\controllers\supervisor_env.py", line 117, in get_info raise NotImplementedError NotImplementedError WARNING: 'robotSupervisorControlle' controller exited with status: 1

AttributeError: 'CartpoleRobot' object has no attribute 'getSelf'

Hi
I follow instructions in the repo but got this error:

Traceback (most recent call last): File "supervisorController.py", line 84, in <module> env = CartpoleRobot() File "supervisorController.py", line 15, in __init__ self.robot = self.getSelf() # Grab the robot reference from the supervisor to access various robot methods AttributeError: 'CartpoleRobot' object has no attribute 'getSelf' WARNING: 'supervisorController' controller exited with status: 1.

A small detail

Hi,

In robotSupervisorController file, line 111, the code:

agent.trainStep(batchSize=step)

does it mean:

agent.trainStep(batchSize=step + 1)

or in the PPO_agent file, line 145,

it will split the samples into (len - 1) & 1, it seems odd.

Thanks a lot.

poleEndPoint.getVelocity returns six NAN

I followed the tutorial but I don't know why did poleEndPoint.getVelocity returns NAN.

Webots 2020a rev2 changes with reset

Webots 2020a rev2 will fix some issues related to resetting the Webots world without resetting the controllers (https://cyberbotics.com/doc/reference/supervisor#wb_supervisor_simulation_reset).

When the new version is released, tutorials should be updated to use new deepbots capabilities (aidudezzz/deepbots#25).

Related aidudezzz/deepworlds#8

EDIT: Emitter/receiver tutorial will keep the custom reset as an example, newest robotSupervisor tutorial uses the new reset.

Comments on the robotSupervisorSchemeTutorial

AttributeError: 'CartPoleSupervisor' object has no attribute 'supervisor'

this problem come when i open the emitterReceiverSchemeTutorial's world and i am looking for help to solve this problem

Short save/load model functionality tutorial

Requested by users

Link for cartpole_robot_definition.txt is not working

I'm trying to follow the tutorial, but it seems that the link for the cartpole_robot_definition.txt is not working. (Not a big problem though, since the file can be found in the full_project directory here https://github.com/aidudezzz/deepbots-tutorials/blob/update-emitter-receiver-tutorial/robotSupervisorSchemeTutorial/full_project/controllers/robot_supervisor_controller/cartpole_robot_definition.txt

The robots doesn't move during iteration

Hi, I am using deepbots to construct an environment for my path planning project. I have successfully executed the CartPole game as the tutorial. When I want to construct an action space of my own, however, I always failed. Just as the same sight of CartPole, I set my action in to discrete 3 instead of 2 meaning my robots got three actions totally:

# six elements include x-axis position, y-axis position, ps0 value, ps1 value, ps6 value, ps7 value.
self.observation_space = Box(low=np.array([-0.25, -0.88, 0.00, 0.00, 0.00, 0.00]),
                                     high=np.array([0.75, 0.12, 4095.00, 4095.00, 4095.00, 4095.00]),
                                     dtype=np.float64)

self.action_space = Discrete(3)

self.robot = self.getSelf()

self.leftMotor = self.getDevice("left wheel motor")
self.rightMotor = self. getDevice("right wheel motor")
self.leftMotor.setPosition(float('inf'))
self.rightMotor.setPosition(float('inf'))
self.leftMotor.setVelocity(0.0)
self.rightMotor.setVelocity(0.0)`

Additionally, I did the same setting in apply_action() method according to the tutorial, in which I mapped the three actions to my robots with detailed behaviors:

def apply_action(self, action):
        action = int(action[0])
        print("action: ", action)

        l_speed = 3.14
        r_speed = 3.14

        if action == 0:     # go straight
            l_speed = 3.14
            r_speed = 3.14
        if action == 1:     # turn right
            l_speed = 3.14
            r_speed = -3.14
        if action == 2:     # turn left
            l_speed = -3.14
            r_speed = 3.14

        self.leftMotor.setVelocity(l_speed)
        self.rightMotor.setVelocity(r_speed)

Here, The control algorithms may be not correct but the robots should at least stochastically move. During iteration, however, I found that my robot (in my projects, I used an e-punk robot) never moved even one step no matter how many timestep I set. I don't know what happened to my code and hope anyone can help me to fix it. Many thanks!

aidudezzz / deepbots-tutorials Goto Github PK

deepbots-tutorials's People

Contributors

Stargazers

Watchers

Forkers

deepbots-tutorials's Issues

class Actor(nn.Module):

def init(self, numberOfInputs, numberOfOutputs):

super(Actor, self).init()

self.fc1 = nn.Linear(numberOfInputs, 10000)

self.fc2 = nn.Linear(10000, 10000)

self.action_head = nn.Linear(10000, numberOfOutputs)

def forward(self, x):

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

action_prob = F.softmax(self.action_head(x), dim=1)

return action_prob

class Critic(nn.Module):

def init(self, numberOfInputs):

super(Critic, self).init()

self.fc1 = nn.Linear(numberOfInputs, 10000)

self.fc2 = nn.Linear(10000, 10000)

self.state_value = nn.Linear(10000, 1)

def forward(self, x):

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

value = self.state_value(x)

return value

Recommend Projects

Recommend Topics

Recommend Org