How exactly are actions being generated from the model for the Dofbot? The exported mo

The action is generated by the underlying library: <a href="https://github.com/Denys88

Basically, the PPO actor models the distribution <math-renderer class="js-inline-math"

I've modified the state dictionary as follows: <div class="snippet-clipboard-conte

Reducing the observation buffer to this: <div class="snippet-clipboard-content not

There are two potential reasons for this: The model indeed req

How Are Actions Being Generated about omniisaacgymenvs-dofbotreacher HOT 7 CLOSED

DJT777 commented on June 12, 2024

How Are Actions Being Generated

from omniisaacgymenvs-dofbotreacher.

Comments (7)

j3soon commented on June 12, 2024

The action is generated by the underlying library: Denys88/rl_games. Since we use PPO, you can follow its action sampling code: https://github.com/Denys88/rl_games/blob/d8645b2678c0d8a6e98a6e3f2b17f0ecfbff71ad/rl_games/algos_torch/models.py#L247

from omniisaacgymenvs-dofbotreacher.

j3soon commented on June 12, 2024

Basically, the PPO actor models the distribution $p(a|s)$ as a gaussian distribution. The action is randomly sampled from a gaussian centered in mu, with standard deviation exp(logstd).

I believe the value is the output of the PPO Critic, which is only required during training to reduce the variance of returns.

from omniisaacgymenvs-dofbotreacher.

DJT777 commented on June 12, 2024

I'd like to thank you for your time and attention to this. Typically I work in Computer Vision, but recently my university has asked me to begin exploring the use of Deep Reinforcement Learning for our university's engineering, robotics, and computer science labs. One of my roles in this is to provide documentation for lab operation, and your repo will be used a basis for myself and other students to learn. Your responses are highly valuable and greatly appreciated!

Your response about the use of Gaussian distributions confirms my assumption, so thanks for confirming that for me.

I have a few more questions:

1 In deployment of our A2C model into ONNX, deployment discussed here, can the model be used with only the current join position and the target coordinates? For example, set the current joint position and target position and the rest of the state values to zero. If not, then which values in the state are essential for predicting and which, if any, can be disregarded? My understanding is that Deep Neural Networks are functions of all of the input data, so my assumption is that the entire state needs to be properly represented.

2 When generating an action, let's say with the use of sampling from our Gaussian distribution method, in our current state that is it safe to assume that with every action generated that will be the previous action in our state?

i.e.

Generate prediction from default position
Update state's "previous action" with prediction from step 1
Generate new prediction with updated state

Lastly

Thank you again for your attention and help.

from omniisaacgymenvs-dofbotreacher.

j3soon commented on June 12, 2024

Thanks for using this repo as a basis for learning DRL & robotics! If you have developed new features that would be useful to others, contributing back to this repo by opening a PR will be highly appreciated.

The current observation for the Dofbot Reacher task aims to be as general as possible. Therefore, there are many redundant inputs:

OmniIsaacGymEnvs-DofbotReacher/omniisaacgymenvs/tasks/dofbot_reacher.py

Lines 62 to 70 in 26bfe06

    
           self.num_obs_dict = { 
        
               "full": 29, 
        
               # 6: dofbot joints position (action space) 
        
               # 6: dofbot joints velocity 
        
               # 3: goal position 
        
               # 4: goal rotation 
        
               # 4: goal relative rotation 
        
               # 6: previous action 
        
           }

You can reduce the observation dimension from 29 to 9, by only using the current joint position (6D) and target position (3D). However, you will need to slightly modify the environment and re-train the policy. Specifically, you can set those redundant input values to zeros here:

OmniIsaacGymEnvs-DofbotReacher/omniisaacgymenvs/tasks/dofbot_reacher.py

Lines 156 to 164 in 26bfe06

    
           # There are many redundant information for the simple Reacher task, but we'll keep them for now. 
        
           self.obs_buf[:, 0:self.num_arm_dofs] = unscale(self.arm_dof_pos[:, :self.num_arm_dofs], 
        
               self.arm_dof_lower_limits, self.arm_dof_upper_limits) 
        
           self.obs_buf[:, self.num_arm_dofs:2*self.num_arm_dofs] = self.vel_obs_scale * self.arm_dof_vel[:, :self.num_arm_dofs] 
        
           base = 2 * self.num_arm_dofs 
        
           self.obs_buf[:, base+0:base+3] = self.goal_pos 
        
           self.obs_buf[:, base+3:base+7] = self.goal_rot 
        
           self.obs_buf[:, base+7:base+11] = quat_mul(self.object_rot, quat_conjugate(self.goal_rot)) 
        
           self.obs_buf[:, base+11:base+17] = self.actions

If you only run the simulation in Isaac for training (i.e., no simulation during deployment). I suggest to remove the previous action by re-training the model as mentioned above. I believe reducing the input to minimal will make the debugging process much easier.

from omniisaacgymenvs-dofbotreacher.

DJT777 commented on June 12, 2024

I've modified the state dictionary as follows:

        self.num_obs_dict = {
            "full": 9,
            # 6: dofbot joints position (action space)
            # 3: goal position
        }

and the observation buffer as follows:

    def compute_full_observations(self, no_vel=False):
        if no_vel:
            raise NotImplementedError()
        else:
            # There are many redundant information for the simple Reacher task, but we'll keep them for now.
            self.obs_buf[:, 0:self.num_arm_dofs] = unscale(self.arm_dof_pos[:, :self.num_arm_dofs],
                self.arm_dof_lower_limits, self.arm_dof_upper_limits)
            base = self.num_arm_dofs
            self.obs_buf[:, base+0:base+3] = self.goal_pos

However, in simulation I am getting around -952 negative rewards on 2048 bots and not very good performance in simulation still after 5000 epochs

You mentioned that some environment variables need to be changed, what would those need to be?

from omniisaacgymenvs-dofbotreacher.

DJT777 commented on June 12, 2024

Reducing the observation buffer to this:

        self.num_obs_dict = {
            "full": 15,
            # 6: dofbot joints position (action space)
            # 6: dofbot joints velocity
            # 3: goal position
            # 4: goal rotation
            # 4: goal relative rotation
            # 6: previous action
        }

    def compute_full_observations(self, no_vel=False):
        if no_vel:
            raise NotImplementedError()
        else:
            # There are many redundant information for the simple Reacher task, but we'll keep them for now.
            self.obs_buf[:, 0:self.num_arm_dofs] = unscale(self.arm_dof_pos[:, :self.num_arm_dofs],
                                                           self.arm_dof_lower_limits, self.arm_dof_upper_limits)
            self.obs_buf[:, self.num_arm_dofs:2 * self.num_arm_dofs] = self.vel_obs_scale * self.arm_dof_vel[:,
                                                                                            :self.num_arm_dofs]
            base = 2 * self.num_arm_dofs
            self.obs_buf[:, base + 0:base + 3] = self.goal_pos
            #print("Observation:" + str(self.obs_buf))
            with open("obs.txt", "a") as obstxt:
                obstxt.write(str(self.obs_buf))

produces good results.

However, I am unsure of why removing of the velocities is causing bad performance.

from omniisaacgymenvs-dofbotreacher.

j3soon commented on June 12, 2024

There are two potential reasons for this:

The model indeed requires the velocity information to achieve good performance
The first layer should have larger capacity

To rule out the second case, maybe you can try replacing the velocities with joint positions (i.e., duplicated inputs)? Maybe you can also try multiplying the duplicated joint positions with (-1)?

from omniisaacgymenvs-dofbotreacher.

How Are Actions Being Generated about omniisaacgymenvs-dofbotreacher HOT 7 CLOSED

Comments (7)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	self.num_obs_dict = {
	"full": 29,
	# 6: dofbot joints position (action space)
	# 6: dofbot joints velocity
	# 3: goal position
	# 4: goal rotation
	# 4: goal relative rotation
	# 6: previous action
	}

	# There are many redundant information for the simple Reacher task, but we'll keep them for now.
	self.obs_buf[:, 0:self.num_arm_dofs] = unscale(self.arm_dof_pos[:, :self.num_arm_dofs],
	self.arm_dof_lower_limits, self.arm_dof_upper_limits)
	self.obs_buf[:, self.num_arm_dofs:2self.num_arm_dofs] = self.vel_obs_scale self.arm_dof_vel[:, :self.num_arm_dofs]
	base = 2 * self.num_arm_dofs
	self.obs_buf[:, base+0:base+3] = self.goal_pos
	self.obs_buf[:, base+3:base+7] = self.goal_rot
	self.obs_buf[:, base+7:base+11] = quat_mul(self.object_rot, quat_conjugate(self.goal_rot))
	self.obs_buf[:, base+11:base+17] = self.actions