Coder Social home page Coder Social logo

Comments (7)

j3soon avatar j3soon commented on June 12, 2024

The action is generated by the underlying library: Denys88/rl_games. Since we use PPO, you can follow its action sampling code: https://github.com/Denys88/rl_games/blob/d8645b2678c0d8a6e98a6e3f2b17f0ecfbff71ad/rl_games/algos_torch/models.py#L247

from omniisaacgymenvs-dofbotreacher.

j3soon avatar j3soon commented on June 12, 2024

Basically, the PPO actor models the distribution $p(a|s)$ as a gaussian distribution. The action is randomly sampled from a gaussian centered in mu, with standard deviation exp(logstd).

I believe the value is the output of the PPO Critic, which is only required during training to reduce the variance of returns.

from omniisaacgymenvs-dofbotreacher.

DJT777 avatar DJT777 commented on June 12, 2024

I'd like to thank you for your time and attention to this. Typically I work in Computer Vision, but recently my university has asked me to begin exploring the use of Deep Reinforcement Learning for our university's engineering, robotics, and computer science labs. One of my roles in this is to provide documentation for lab operation, and your repo will be used a basis for myself and other students to learn. Your responses are highly valuable and greatly appreciated!

Your response about the use of Gaussian distributions confirms my assumption, so thanks for confirming that for me.

I have a few more questions:

1 In deployment of our A2C model into ONNX, deployment discussed here, can the model be used with only the current join position and the target coordinates? For example, set the current joint position and target position and the rest of the state values to zero. If not, then which values in the state are essential for predicting and which, if any, can be disregarded? My understanding is that Deep Neural Networks are functions of all of the input data, so my assumption is that the entire state needs to be properly represented.

2 When generating an action, let's say with the use of sampling from our Gaussian distribution method, in our current state that is it safe to assume that with every action generated that will be the previous action in our state?

i.e.

  1. Generate prediction from default position
  2. Update state's "previous action" with prediction from step 1
  3. Generate new prediction with updated state

Lastly

Thank you again for your attention and help.

from omniisaacgymenvs-dofbotreacher.

j3soon avatar j3soon commented on June 12, 2024

Thanks for using this repo as a basis for learning DRL & robotics! If you have developed new features that would be useful to others, contributing back to this repo by opening a PR will be highly appreciated.

  1. The current observation for the Dofbot Reacher task aims to be as general as possible. Therefore, there are many redundant inputs:

    self.num_obs_dict = {
    "full": 29,
    # 6: dofbot joints position (action space)
    # 6: dofbot joints velocity
    # 3: goal position
    # 4: goal rotation
    # 4: goal relative rotation
    # 6: previous action
    }
    You can reduce the observation dimension from 29 to 9, by only using the current joint position (6D) and target position (3D). However, you will need to slightly modify the environment and re-train the policy. Specifically, you can set those redundant input values to zeros here:
    # There are many redundant information for the simple Reacher task, but we'll keep them for now.
    self.obs_buf[:, 0:self.num_arm_dofs] = unscale(self.arm_dof_pos[:, :self.num_arm_dofs],
    self.arm_dof_lower_limits, self.arm_dof_upper_limits)
    self.obs_buf[:, self.num_arm_dofs:2*self.num_arm_dofs] = self.vel_obs_scale * self.arm_dof_vel[:, :self.num_arm_dofs]
    base = 2 * self.num_arm_dofs
    self.obs_buf[:, base+0:base+3] = self.goal_pos
    self.obs_buf[:, base+3:base+7] = self.goal_rot
    self.obs_buf[:, base+7:base+11] = quat_mul(self.object_rot, quat_conjugate(self.goal_rot))
    self.obs_buf[:, base+11:base+17] = self.actions

  2. If you only run the simulation in Isaac for training (i.e., no simulation during deployment). I suggest to remove the previous action by re-training the model as mentioned above. I believe reducing the input to minimal will make the debugging process much easier.

from omniisaacgymenvs-dofbotreacher.

DJT777 avatar DJT777 commented on June 12, 2024

I've modified the state dictionary as follows:

        self.num_obs_dict = {
            "full": 9,
            # 6: dofbot joints position (action space)
            # 3: goal position
        }

and the observation buffer as follows:

    def compute_full_observations(self, no_vel=False):
        if no_vel:
            raise NotImplementedError()
        else:
            # There are many redundant information for the simple Reacher task, but we'll keep them for now.
            self.obs_buf[:, 0:self.num_arm_dofs] = unscale(self.arm_dof_pos[:, :self.num_arm_dofs],
                self.arm_dof_lower_limits, self.arm_dof_upper_limits)
            base = self.num_arm_dofs
            self.obs_buf[:, base+0:base+3] = self.goal_pos

However, in simulation I am getting around -952 negative rewards on 2048 bots and not very good performance in simulation still after 5000 epochs

You mentioned that some environment variables need to be changed, what would those need to be?

from omniisaacgymenvs-dofbotreacher.

DJT777 avatar DJT777 commented on June 12, 2024

Reducing the observation buffer to this:

        self.num_obs_dict = {
            "full": 15,
            # 6: dofbot joints position (action space)
            # 6: dofbot joints velocity
            # 3: goal position
            # 4: goal rotation
            # 4: goal relative rotation
            # 6: previous action
        }
    def compute_full_observations(self, no_vel=False):
        if no_vel:
            raise NotImplementedError()
        else:
            # There are many redundant information for the simple Reacher task, but we'll keep them for now.
            self.obs_buf[:, 0:self.num_arm_dofs] = unscale(self.arm_dof_pos[:, :self.num_arm_dofs],
                                                           self.arm_dof_lower_limits, self.arm_dof_upper_limits)
            self.obs_buf[:, self.num_arm_dofs:2 * self.num_arm_dofs] = self.vel_obs_scale * self.arm_dof_vel[:,
                                                                                            :self.num_arm_dofs]
            base = 2 * self.num_arm_dofs
            self.obs_buf[:, base + 0:base + 3] = self.goal_pos
            #print("Observation:" + str(self.obs_buf))
            with open("obs.txt", "a") as obstxt:
                obstxt.write(str(self.obs_buf))

produces good results.

However, I am unsure of why removing of the velocities is causing bad performance.

from omniisaacgymenvs-dofbotreacher.

j3soon avatar j3soon commented on June 12, 2024

There are two potential reasons for this:

  1. The model indeed requires the velocity information to achieve good performance
  2. The first layer should have larger capacity

To rule out the second case, maybe you can try replacing the velocities with joint positions (i.e., duplicated inputs)? Maybe you can also try multiplying the duplicated joint positions with (-1)?

from omniisaacgymenvs-dofbotreacher.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.