denisyarats / exorl Goto Github PK

View Code? Open in Web Editor NEW

98.0 3.0 8.0 61 KB

ExORL: Exploratory Data for Offline Reinforcement Learning

Home Page: https://sites.google.com/view/exorl

License: MIT License

Python 99.72% Shell 0.28%

python control reinforcement-learning unsupevised offline-rl deep-learning pytorch off-policy mujoco model-free

exorl's People

Contributors

Stargazers

Watchers

Forkers

ethanluoyc souradip-chakraborty adityagudimella aos55 fuyw linamez tongendong dankhap

exorl's Issues

Why the implementation of point mass is different from original dmc one?

In https://github.com/denisyarats/exorl/blob/main/custom_dmc_tasks/point_mass_maze.py#L155, why there are the following modifications over the original dmc point_mass environment?

physics.data.qpos[0] = np.random.uniform(-0.29, -0.15)
physics.data.qpos[1] = np.random.uniform(0.15, 0.29)
physics.named.data.geom_xpos['target'][:] = self._target

Inquiry on Data Collection Step

Is it possible to release the code for data collection? Is each dataset collected using one seed or multiple seeds? Thanks in advance!

Why do action, reward and terminal flag use next_observation's index?

Hi,
I was trying to go through the code and couldn't understand one thing. When sampling a tuple from the replay buffer, why do action, reward and discount use index i which is the same for next_observation? Shouldn't they also be using index i-1 similar to observation?

exorl/replay_buffer.py

Lines 93 to 98 in a3fb07a

    
           idx = np.random.randint(0, episode_len(episode)) + 1 
        
           obs = episode['observation'][idx - 1] 
        
           action = episode['action'][idx] 
        
           next_obs = episode['observation'][idx] 
        
           reward = episode['reward'][idx] 
        
           discount = episode['discount'][idx] * self._discount

Please add data collection code to repo

Hi, I tried to replicate the results from the paper by running from scratch using URL Benchmark to collect data and use your repo to relabel and train offline but could not replicate the results from the paper. Please release the data collection step to this repo for full replication of the papers results. Thanks

denisyarats / exorl Goto Github PK

exorl's People

Contributors

Stargazers

Watchers

Forkers

exorl's Issues

Why the implementation of point mass is different from original dmc one?

Inquiry on Data Collection Step

Why do action, reward and terminal flag use next_observation's index?

Please add data collection code to repo

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	idx = np.random.randint(0, episode_len(episode)) + 1
	obs = episode['observation'][idx - 1]
	action = episode['action'][idx]
	next_obs = episode['observation'][idx]
	reward = episode['reward'][idx]
	discount = episode['discount'][idx] * self._discount