Coder Social home page Coder Social logo

danijar / daydreamer Goto Github PK

View Code? Open in Web Editor NEW
240.0 10.0 26.0 38.97 MB

DayDreamer: World Models for Physical Robot Learning

Home Page: https://danijar.com/daydreamer

Python 15.36% Dockerfile 0.03% Jupyter Notebook 84.61%
reinforcement-learning robotics world-models

daydreamer's Introduction

DayDreamer: World Models for Physical Robot Learning

Official implementation of the DayDreamer algorithm in TensorFlow 2.

DayDreamer Robots

If you find this code useful, please reference in your paper:

@article{wu2022daydreamer,
  title={DayDreamer: World Models for Physical Robot Learning},
  author={Wu, Philipp and Escontrela, Alejandro and Hafner, Danijar and Goldberg, Ken and Abbeel, Pieter},
  journal={Conference on Robot Learning},
  year={2022}
}

Method

DayDreamer learns a world model and an actor critic behavior to train robots from small amounts of experience in the real world, without using simulators. At a high level, DayDreamer consists of two processes. The actor process interacts with the environment and stores experiences into the replay buffer. The learner samples data from the replay buffer to train the world model, and then uses imagined predictions of the world model to train the behavior.

DayDreamer Model

To learn from proprioceptive and visual inputs alike, the world model fuses the sensory inputs of the same time step together into a compact discrete representation. A recurrent neural network predicts the sequence of these representations given actions. From the resulting recurrent states and representations, DayDreamer reconstructs its inputs and predicts rewards and episode ends.

Given the world model, the actor critic learns farsighted behaviors using on-policy reinforcement learning purely inside the representation space of the world model.

For more information:

Setup

pip install tensorflow tensorflow_probability ruamel.yaml cloudpickle

Instructions

To run DayDreamer, open two terminals to execute the commands for the learner bnd the actor in parallel. To view metrics, point TensorBoard at the log directory. For more information, also see the DreamerV2 repository.

A1 Robot:

rm -rf ~/logdir/run1
CUDA_VISIBLE_DEVICES=0 python embodied/agents/dreamerv2plus/train.py --configs a1 --task a1_sim --run learning --tf.platform gpu --logdir ~/logdir/run1
CUDA_VISIBLE_DEVICES=1 python embodied/agents/dreamerv2plus/train.py --configs a1 --task a1_real --run acting --tf.platform gpu --env.kbreset True --imag_horizon 1 --replay_chunk 8 --replay_fixed.minlen 32 --imag_horizon 1 --logdir ~/logdir/run1

XArm Robot:

rm -rf ~/logdir/run1
CUDA_VISIBLE_DEVICES=0 python embodied/agents/dreamerv2plus/train.py --configs xarm --run learning --task xarm_dummy --tf.platform gpu --logdir ~/logdir/run1
CUDA_VISIBLE_DEVICES=-1 python embodied/agents/dreamerv2plus/train.py --configs xarm --run acting --task xarm_real --env.kbreset True --tf.platform cpu --tf.jit False --logdir ~/logdir/run1

UR5 Robot:

rm -rf ~/logdir/run1
CUDA_VISIBLE_DEVICES=0 python embodied/agents/dreamerv2plus/train.py --configs ur5 --run learning --task ur5_dummy --tf.platform gpu --logdir ~/logdir/run11
CUDA_VISIBLE_DEVICES=1 python embodied/agents/dreamerv2plus/train.py --configs ur5 --run acting --task ur5_real --env.kbreset True --tf.platform cpu --tf.jit False --logdir ~/logdir/run11

Questions

Please open an issue on Github.

daydreamer's People

Contributors

danijar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

daydreamer's Issues

Guideline on real robot

Hi,
awesome project - thanks for sharing your code.

Is there a guideline on how to apply the algorithm to real robot like quadruped step by step? including which type of quadruped to buy and how to build interface between quadruped and DayDreamer.

Thanks

Pytorch Version

How can we get a pytorch version of the Day Dreamer Algorithm?

set gripper speed 10000 may break gripper

Thank you for sharing the code!

We are testing this repository's xArm part too, but we noticed xarm-python-SDK have some problems.
In this code, gripper speed was set to 10000, but the actual maximum speed is below this value. The problem is that xarm-python-SDK seems not set upper thresh of gripper speed so it may cause unknown behavior.

self._arm.set_gripper_speed(10000)

We found it is better to set around 4500.

May I know how you configrue the network between A1 and remote PC?

May I know how you configrue the network between A1 and remote PC?
I get this project worked through configure route with a lan of 192.168.123.0/24 which is at the same network segment with A1( I connet PC and A1 to the route), but after I train this several minutes later, the net opnening was damaged, therfore making it work again is a difficult problem.
Thank you Sir.

No module named "usb"

My python version "3.8.15" , usb version is "0.0.83.dev0". When I run

CUDA_VISIBLE_DEVICES=0 python embodied/agents/dreamerv2plus/train.py --configs ur5 --run learning --task ur5_dummy --tf.platform gpu --logdir ~/logdir/run11

It goes wrong whit the module error.

I have it working but pygame window never updates

Thanks for posting this. I have a Go1 and am doing active research with NN's and locomotion on it using motion capture. I would like to reproduce your work but things seem to just hang.

in one window I have this:

........

Encoder CNN shapes: {}
Encoder MLP shapes: {'vector': (16,)}
Decoder CNN shapes: {}
Decoder MLP shapes: {'vector': (16,)}
Synced last 0/0 trajectories.
Synced last 0/0 trajectories.
Logdir /home/bizon/logdir/run1
Initializing training replay...
Waiting for episodes.
Replay server listening on *:2222
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.
Waiting for episodes.

Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/bizon/anaconda3/envs/rldda1/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/bizon/anaconda3/envs/rldda1/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/bizon/eric/daydreamer/embodied/replay/store.py", line 290, in _server
self[key] = traj
File "/home/bizon/eric/daydreamer/embodied/replay/store.py", line 272, in setitem
self.store[key] = traj
File "/home/bizon/eric/daydreamer/embodied/replay/store.py", line 231, in setitem
self.store[key] = traj
File "/home/bizon/eric/daydreamer/embodied/replay/store.py", line 188, in setitem
self.disk_store[key] = traj
File "/home/bizon/eric/daydreamer/embodied/replay/store.py", line 98, in setitem
filename = self._format(key, traj)
File "/home/bizon/eric/daydreamer/embodied/replay/store.py", line 144, in _format
reward = str(int(traj['reward'].sum())).replace('-', 'm')
ValueError: cannot convert float NaN to integer
Initializing agent...
Tracing train function.
Found 6727954 model parameters.

.......

Tracing train function.
Initialization done.
Existing checkpoint not found.
Saving checkpoint: /home/bizon/logdir/run1/agent.pkl
Saving module with 341 tensors and 30344083 parameters.
Existing checkpoint not found.
Saving checkpoint: /home/bizon/logdir/run1/learner.pkl
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...
Waiting for train data prefill (0/5000.0)...

in the other I have this:

......

/home/bizon/eric/daydreamer/motion_imitation/robots/a1.py:190: RuntimeWarning: invalid value encountered in arccos
theta_knee = -np.arccos(
argv[0]=
UDP Initialized. Port: 8080
Stand up reset called! -1 None
Stand up reset called! -1 None
Stand up reset called! 0.0 None
Encoder CNN shapes: {}
Encoder MLP shapes: {'vector': (16,)}
Decoder CNN shapes: {}
Decoder MLP shapes: {'vector': (16,)}
Using remote store via ZMQ on localhost:2222
Waiting for response from localhost:2222...
Connection to localhost:2222 successful!
Logdir: /home/bizon/logdir/run1
Existing checkpoint not found.
Saving checkpoint: /home/bizon/logdir/run1/worker0/actor.pkl
Fill dataset (5000.0 steps, 1 episode).
Stand up reset called! -1 None
Stand up reset called! 0.0 None
Episode has 250 steps and return nan.
[12550] episode/length 250 / episode/score nan / episode/average_reward nan / episode/reward_rate 0 / replay/replay_episodes 1 / replay/replay_ep_length 251 / replay/replay_ep_return nan

And the pygame window is saying "train.py" is not responding. Any ideas? What version of pygame are you using?

How to run it on my robot?

Hi,
awesome project - thanks for sharing your code.

I would love to try the algorithm on the hexapod I recently build. What are the necessary steps to do that? Create a custom Gym wrapper? Is there a guideline on how to find the correct config for such a custom setup/robot?

Thanks

setup.py is required.

I want to try this paper's method but I got lots of problems about dependency, such as gym==0.18.0, sonnet, boost/ptr.h and so on.

What the environment and embedded computing platforms use

Hi,@danijar

Great project - thanks for sharing your code.

I am a newbie to learning Dreamer, please forgive my ignorance, what environment does DayDreamer need to configure, and what model of embedded computing platform is used by the robot dog in the article. We have a Jetson Nano and don't know if that's enough.

Looking forward to your response,thanks.

Library requirements

Could you please specify the requirements for library versions?
tensorflow - tf-probability and numpy

Train ratio on physical robots

Hi Danijar, first of all, thank you so much for making the code publicly available! Let me ask a question regarding train ratio on physical robots.

In previous Dreamer implementation for sim, I see a mechanism to control train ratio to data collection, e.g. with config.train_every in V1 and V2 and config.run.train_ratio in V3. Also, Fig. 6 (a) from Dreamer V3 paper (also shown below left) shows that different train ratio leads to distinct sample efficiency although each configuration shown in the figure ended up with success, which means that the algorithm is pretty robust to training ratio. Still, I think we may want to do hyperparameter search over train ratio to find an optimal number for an environment which is also evidenced by varying train ratio used for different environment in Table A.1 from Dreamer V3 (also shown below right).

On the other hand, to the best of my knowledge, the mechanism of train_ratio control is absent in this repository (and also DreamerV3 based experiment where --run.train_ratio is set to -1). Considering learning process is faster than acting process in real robots, I assume train ratio would be high although I don't have an exact measurement nor a control to adjust train ratio for now. Referring to Fig. 6 (a), higher train ratio tends to have higher sample efficiency, but uncontrollably and unexpectedly high train ratio might result in performance degradation. For example, world models might overfit to the current replay buffer, which in turn would damage the overall performance.

I wonder you had similar kinds of concerns w.r.t. train ratio on training physical robots. And how do you think about exploring different train ratio on physical robots?

Thanks. :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.