Coder Social home page Coder Social logo

deepbots-tutorials's People

Contributors

allcontributors[bot] avatar eakirtas avatar tsampazk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

deepbots-tutorials's Issues

Usage of snake_case

Make sure all deepbots-tutorials variables follow the snake_case convention.

Emitter-Receiver scheme not working

I'm getting following error for emitterReceiverSchemeTutorial:

INFO: supervisorController: Starting controller: python -u supervisorController.py
Traceback (most recent call last):
  File "supervisorController.py", line 93, in <module>
    supervisor = CartPoleSupervisor()
  File "supervisorController.py", line 14, in __init__
    self.respawnRobot()
  File "supervisorController.py", line 30, in respawnRobot
    rootNode = self.supervisor.getRoot()  # This gets the root of the scene tree
  File "/usr/local/webots/lib/controller/python37/controller.py", line 2888, in <lambda>
    __getattr__ = lambda self, name: _swig_getattr(self, Supervisor, name)
  File "/usr/local/webots/lib/controller/python37/controller.py", line 96, in _swig_getattr
    raise AttributeError("'%s' object has no attribute '%s'" % (class_type.__name__, name))
AttributeError: 'Supervisor' object has no attribute 'supervisor'
WARNING: 'supervisorController' controller exited with status: 1.

I'm using master branch of deepbots and 2021a version of webots.

Multi robot setting

Hi, can deepbots be used in a multi robot setting? (i.e., where each robot has its own controller and interacts with the others)
Thanks,

Julio

Add tutorial for respawn/reset

Basic usage of new version is using the pre-implemented simple reset method. A tutorial should be added for resetting the "old way", i.e. via reloading/respawning the robot. This is already contained in the beginner cartpole tutorial.

Ask for help to solve the problem of increasing memory. This part of the code has only been modified a little, and the memory will keep increasing during the training process.

#状态输入是相机图像
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.models as models
import gc
from torch.distributions import Categorical
from torch import from_numpy, no_grad, save, load, tensor, clamp
from torch import float as torch_float
from torch import long as torch_long
from torch import min as torch_min
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
import numpy as np
from torch import manual_seed
from collections import namedtuple
import torch
import tracemalloc

Transition = namedtuple('Transition', ['state', 'action', 'a_log_prob', 'reward', 'next_state'])

class PPOAgent:
"""
PPOAgent implements the PPO RL algorithm (https://arxiv.org/abs/1707.06347).
It works with a set of discrete actions.
It uses the Actor and Critic neural network classes defined below.
"""

def __init__(self, numberOfActorOutputs, clip_param=0.2, max_grad_norm=0.5, ppo_update_iters=5,
             batch_size=8, gamma=0.99, use_cuda=False, actor_lr=0.001, critic_lr=0.003, seed=None):
    super().__init__()
    if seed is not None:
        manual_seed(seed)

    # Hyper-parameters
    self.clip_param = clip_param
    self.max_grad_norm = max_grad_norm
    self.ppo_update_iters = ppo_update_iters
    self.batch_size = batch_size
    self.gamma = gamma
    self.use_cuda = use_cuda

    # models
    self.actor_net = Actor(numberOfActorOutputs)
    self.critic_net = Critic()

    if self.use_cuda:
        self.actor_net.cuda()
        self.critic_net.cuda()

    # Create the optimizers
    self.actor_optimizer = optim.Adam(self.actor_net.parameters(), actor_lr)
    self.critic_net_optimizer = optim.Adam(self.critic_net.parameters(), critic_lr)

    # Training stats
    self.buffer = []

def work(self, agentInput, type_="simple"):
    """
    type_ == "simple"
        Implementation for a simple forward pass.
    type_ == "selectAction"
        Implementation for the forward pass, that returns a selected action according to the probability
        distribution and its probability.
    type_ == "selectActionMax"
        Implementation for the forward pass, that returns the max selected action.
    """
    # agentInput = from_numpy(agentInput).float().unsqueeze(0)
    if self.use_cuda:
        agentInput = agentInput.cuda()
    with no_grad():
        action_prob = self.actor_net(agentInput)

    if type_ == "simple":
        output = [action_prob[0][i].data.tolist() for i in range(len(action_prob[0]))]
        return output
    elif type_ == "selectAction":
        c = Categorical(action_prob)
        action = c.sample()
        return action.item(), action_prob[:, action.item()].item()
    elif type_ == "selectActionMax":
        return np.argmax(action_prob).item(), 1.0
    else:
        raise Exception("Wrong type in agent.work(), returning input")

def getValue(self, state):
    """
    Gets the value of the current state according to the critic model.

    :param state: agentInput
    :return: state's value
    """
    state = from_numpy(state)
    with no_grad():
        value = self.critic_net(state)
    return value.item()

def save(self, path):
    """
    Save actor and critic models in the path provided.
    :param path: path to save the models
    :return: None
    """
    save(self.actor_net.state_dict(), path + '_actor.pkl')
    save(self.critic_net.state_dict(), path + '_critic.pkl')
    print('模型保存成功')

def load(self, path):
    """
    Load actor and critic models from the path provided.
    :param path: path where the models are saved
    :return: None
    """
    actor_state_dict = load(path + '_actor.pkl')
    critic_state_dict = load(path + '_critic.pkl')
    self.actor_net.load_state_dict(actor_state_dict)
    self.critic_net.load_state_dict(critic_state_dict)
    print('模型加载成功')

def storeTransition(self, transition):
    """
    Stores a transition in the buffer to be used later.

    :param transition: state, action, action_prob, reward, next_state
    :return: None
    """
    self.buffer.append(transition)
    print(len(self.buffer))

def trainStep(self, batchSize=None):
    """
    Performs a training step or update for the actor and critic models, based on transitions gathered in the
    buffer. It then resets the buffer.
    If provided with a batchSize, this is used instead of default self.batch_size

    :param: batchSize: int
    :return: None
    """
    tracemalloc.start()

    snapshot1 = tracemalloc.take_snapshot()

    if batchSize is None:
        if len(self.buffer) < self.batch_size:
            return
        batchSize = self.batch_size
    # print(self.buffer[0].state.size())
    # mm = [t.state for t in self.buffer]
    # print(type(mm))
    # state = tensor(mm[0], dtype=torch_float)
    state = tensor([t.state.numpy() for t in self.buffer], dtype=torch_float)
    state = state.squeeze()
    print(state.size())
    action = tensor([t.action for t in self.buffer], dtype=torch_long).view(-1, 1)
    reward = [t.reward for t in self.buffer]
    old_action_log_prob = tensor([t.a_log_prob for t in self.buffer], dtype=torch_float).view(-1, 1)

    # Unroll rewards
    R = 0
    Gt = []
    for r in reward[::-1]:
        R = r + self.gamma * R
        Gt.insert(0, R)
    Gt = tensor(Gt, dtype=torch_float)

    if self.use_cuda:
        state, action, old_action_log_prob = state.cuda(), action.cuda(), old_action_log_prob.cuda()
        Gt = Gt.cuda()

    for _ in range(self.ppo_update_iters):
        for index in BatchSampler(SubsetRandomSampler(range(len(self.buffer))), batchSize, False):
            # Calculate the advantage at each step
            Gt_index = Gt[index].view(-1, 1)
            V = self.critic_net(state[index])
            delta = Gt_index - V
            advantage = delta.detach()

            # Get the current prob
            action_prob = self.actor_net(state[index]).gather(1, action[index])  # new policy

            # PPO
            ratio = (action_prob / old_action_log_prob[index])
            surr1 = ratio * advantage
            surr2 = clamp(ratio, 1 - self.clip_param, 1 + self.clip_param) * advantage

            # update actor network
            action_loss = -torch_min(surr1, surr2).mean()  # MAX->MIN descent
            self.actor_optimizer.zero_grad()
            action_loss.backward()
            nn.utils.clip_grad_norm_(self.actor_net.parameters(), self.max_grad_norm)
            self.actor_optimizer.step()

            # update critic network
            value_loss = F.mse_loss(Gt_index, V)
            self.critic_net_optimizer.zero_grad()
            value_loss.backward()
            nn.utils.clip_grad_norm_(self.critic_net.parameters(), self.max_grad_norm)
            self.critic_net_optimizer.step()

    del self.buffer[:]
    gc.collect()
    print('shit')
    snapshot2 = tracemalloc.take_snapshot()

    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("[ Top 10 differences ]")
    for stat in top_stats[:10]:
        print(stat)

class Actor(nn.Module):

def init(self, numberOfInputs, numberOfOutputs):

super(Actor, self).init()

self.fc1 = nn.Linear(numberOfInputs, 10000)

self.fc2 = nn.Linear(10000, 10000)

self.action_head = nn.Linear(10000, numberOfOutputs)

def forward(self, x):

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

action_prob = F.softmax(self.action_head(x), dim=1)

return action_prob

class Critic(nn.Module):

def init(self, numberOfInputs):

super(Critic, self).init()

self.fc1 = nn.Linear(numberOfInputs, 10000)

self.fc2 = nn.Linear(10000, 10000)

self.state_value = nn.Linear(10000, 1)

def forward(self, x):

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

value = self.state_value(x)

return value

class Actor(nn.Module):
def init(self, numberOfOutputs):
super(Actor, self).init()
self.resnet18 = models.resnet18(pretrained=True)
self.fc1 = nn.Linear(1000, 32)
self.fc2 = nn.Linear(32, 8)
self.action_head = nn.Linear(8, numberOfOutputs)

def forward(self, x):
    x = self.resnet18(x)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    action_prob = F.softmax(self.action_head(x), dim=1)
    return action_prob

class Critic(nn.Module):
def init(self):
super(Critic, self).init()
self.resnet18 = models.resnet18(pretrained=True)
self.fc1 = nn.Linear(1000, 32)
self.fc2 = nn.Linear(32, 8)
self.state_value = nn.Linear(8, 1)

def forward(self, x):
    x = self.resnet18(x)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    value = self.state_value(x)
    return value

The problem about the loss function of actor network

Hi! Could you please tell me where can I find the method named 'torch_min' and the corresponding folder "torch._C._VariableFunctions"?
Here is your code:

action_loss = -torch_min(surr1, surr2).mean()

After reading some sources, I knew that this function only appears in PyTorch original codes implemented by C\C++. But I still not found this method even I have open the Pytorch repositories. Without the exact perceptive of this method, I cannot subsequently understand the sense of the loss function of actor network in your code. Many thanks!!

A doubt for behaviors that the agent finally converged

Hi, Sorry to trouble you again. In recent months I implemented a 'find-and-avoid' project by myself according to the thought of yours at the algorithm parts and other papers I read at the reward parts. However, the agent finally keeps going around in circles. After checking repeatedly I think the most possible reason is the reward sparsity which commonly leads to poor exploration so that the behaviors fall into the local optimal paradox. Based on this, I hope to discuss two questions with you, hope this will not spend you too much time!
Q1. Based on your experiences, how effectively it can be improved if I deploy the curiosity-driving learning to solve the poverty exploration?
Q2. I found another factor differing from yours is the action mask. By now I still not deploy this method yet. But may I ask you how many it can be improved if this method achieves?
Thank you very much! I'll wait for your reply.

AttributeError: 'CartpoleRobot' object has no attribute 'getSelf'

Hi
I follow instructions in the repo but got this error:

Traceback (most recent call last): \
  File "supervisorController.py", line 84, in <module>
    env = CartpoleRobot()
  File "supervisorController.py", line 15, in __init__
    self.robot = self.getSelf()  # Grab the robot reference from the supervisor to access various robot methods
AttributeError: 'CartpoleRobot' object has no attribute 'getSelf'
WARNING: 'supervisorController' controller exited with status: 1.

NotImplementedError get_info ; 0.1.3 dev2; tutorial

I get the NotimplementedError when I run the cartPole tutorial in webot R2022a
INFO: robotSupervisorControlle: Starting controller: python.exe -u robotSupervisorControlle.py Traceback (most recent call last): File "C:\Users\admin\Desktop\ReinLearn\mpsees_rel\webot_world\controllers\robotSupervisorControlle\robotSupervisorControlle.py", line 107, in <module> newObservation, reward, done, info = env.step([selectedAction]) File "C:\Python39\lib\site-packages\deepbots\supervisor\controllers\robot_supervisor.py", line 84, in step self.get_info(), File "C:\Python39\lib\site-packages\deepbots\supervisor\controllers\supervisor_env.py", line 117, in get_info raise NotImplementedError NotImplementedError WARNING: 'robotSupervisorControlle' controller exited with status: 1

AttributeError: 'CartpoleRobot' object has no attribute 'getSelf'

Hi
I follow instructions in the repo but got this error:

Traceback (most recent call last): File "supervisorController.py", line 84, in <module> env = CartpoleRobot() File "supervisorController.py", line 15, in __init__ self.robot = self.getSelf() # Grab the robot reference from the supervisor to access various robot methods AttributeError: 'CartpoleRobot' object has no attribute 'getSelf' WARNING: 'supervisorController' controller exited with status: 1.

Webots 2020a rev2 changes with reset

Webots 2020a rev2 will fix some issues related to resetting the Webots world without resetting the controllers (https://cyberbotics.com/doc/reference/supervisor#wb_supervisor_simulation_reset).

When the new version is released, tutorials should be updated to use new deepbots capabilities (aidudezzz/deepbots#25).

Related aidudezzz/deepworlds#8

EDIT: Emitter/receiver tutorial will keep the custom reset as an example, newest robotSupervisor tutorial uses the new reset.

Comments on the robotSupervisorSchemeTutorial

AttributeError: 'CartPoleSupervisor' object has no attribute 'supervisor'

this problem come when i open the emitterReceiverSchemeTutorial's world and i am looking for help to solve this problem

The robots doesn't move during iteration

Hi, I am using deepbots to construct an environment for my path planning project. I have successfully executed the CartPole game as the tutorial. When I want to construct an action space of my own, however, I always failed. Just as the same sight of CartPole, I set my action in to discrete 3 instead of 2 meaning my robots got three actions totally:

# six elements include x-axis position, y-axis position, ps0 value, ps1 value, ps6 value, ps7 value.
self.observation_space = Box(low=np.array([-0.25, -0.88, 0.00, 0.00, 0.00, 0.00]),
                                     high=np.array([0.75, 0.12, 4095.00, 4095.00, 4095.00, 4095.00]),
                                     dtype=np.float64)

self.action_space = Discrete(3)

self.robot = self.getSelf()

self.leftMotor = self.getDevice("left wheel motor")
self.rightMotor = self. getDevice("right wheel motor")
self.leftMotor.setPosition(float('inf'))
self.rightMotor.setPosition(float('inf'))
self.leftMotor.setVelocity(0.0)
self.rightMotor.setVelocity(0.0)`

Additionally, I did the same setting in apply_action() method according to the tutorial, in which I mapped the three actions to my robots with detailed behaviors:

def apply_action(self, action):
        action = int(action[0])
        print("action: ", action)

        l_speed = 3.14
        r_speed = 3.14

        if action == 0:     # go straight
            l_speed = 3.14
            r_speed = 3.14
        if action == 1:     # turn right
            l_speed = 3.14
            r_speed = -3.14
        if action == 2:     # turn left
            l_speed = -3.14
            r_speed = 3.14

        self.leftMotor.setVelocity(l_speed)
        self.rightMotor.setVelocity(r_speed)

Here, The control algorithms may be not correct but the robots should at least stochastically move. During iteration, however, I found that my robot (in my projects, I used an e-punk robot) never moved even one step no matter how many timestep I set. I don't know what happened to my code and hope anyone can help me to fix it. Many thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.