kchua / handful-of-trials Goto Github PK
View Code? Open in Web Editor NEWExperiment code for "Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models"
License: MIT License
Experiment code for "Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models"
License: MIT License
Thanks for great work.
I would like to ask about execution time per each environment.
For example, I am running Cartpole MBExperiment. On my computer, it seems that the average action selection time is about 3~4sec. Is it normal?
Thanks.
The log file has two separate parameters called rewards and returns. What are the differences between the two?
Are there mathematical equations to define the two terms?
Hi Kurtland,
Thanks! This is great work! A quick question. Does this support multiple GPUs? I don't think I see settings that allow me to specify the number of GPUs used for training. If not, is there a plan to add it? If not, where should I look at for adding the support myself?
-C
I'm trying to understand the cost function for the reacher task which uses the function here.
However, I can't find a reference for what each index of the observation is in the Reacher task in the OpenAI documentation.
Can someone help explain what this function is computing? Also does anyone have a better source of documentation on the state observations in the Reacher task?
Hi, thanks for the work so much,
I would like to ask how we can use this with another gym-like environments?
For example, I create a robot simulation env that have API like Open AI gym like step, start etc. How could I adapt that env to your implementation?
Dear author,
In your in Environment, for example HalfCheetah, you access the reward function and use reward_run and reward_ctrl separately instead of using original reward come from step(action) and you also add reward_run to first dim of state.
There are any config option that allow us evaluating on the original env provided by OpenAi Gym?
thank you so much for your support!
Is the following line written as intended?
Thank you for your great codes.
I have a question for the initial variance of CEM.
According your code, you used the initial variance as follows:
self.init_var = np.tile(np.square(self.ac_ub - self.ac_lb) / 16, [self.plan_hor])
I am trying to implement CEM for MBRLHalfCheetah-v0 by using the true model like you did in your paper.
However, I got the poor results. Did you you use this parameter for all tasks ??
Actually, I got a good result for the CartPole task.
Also, How did you set the parameters of CEM to get the stable results.
"CEM": { "popsize": 500, "num_elites": 50, "max_iters": 5, "alpha": 0.1 }
Thank you for your help.
Hi!
I ran the environment using the latest pre-built docker image. When training on RTX 3080Ti, I have the following failure:
python scripts/mbexp.py -env halfcheetah
2022-08-12 09:10:09.936602: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-08-12 09:10:10.312749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-12 09:10:10.312892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 Ti major: 8 minor: 6 memoryClockRate(GHz): 1.71
pciBusID: 0000:07:00.0
totalMemory: 11.76GiB freeMemory: 11.52GiB
2022-08-12 09:10:10.312909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2022-08-12 09:13:36.133367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-12 09:13:36.133405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2022-08-12 09:13:36.133414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2022-08-12 09:13:36.133525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11135 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce R
TX 3080 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6)
{'ctrl_cfg': {'env': <dmbrl.env.half_cheetah.HalfCheetahEnv object at 0x7f8c91ede160>,
'opt_cfg': {'ac_cost_fn': <function HalfCheetahConfigModule.ac_cost_fn at 0x7f8c32140488>,
'cfg': {'alpha': 0.1,
'max_iters': 5,
'num_elites': 50,
'popsize': 500},
'mode': 'CEM',
'obs_cost_fn': <function HalfCheetahConfigModule.obs_cost_fn at 0x7f8c32140400>,
'plan_hor': 30},
'prop_cfg': {'mode': 'TSinf',
'model_init_cfg': {'model_class': <class 'dmbrl.modeling.models.BNN.BNN'>,
'model_constructor': <bound method HalfCheetahConfigModule.nn_constructor of <halfcheetah.HalfCheetahConfigModule object at 0x7f8c32138748>>,
'num_nets': 5},
'model_train_cfg': {'epochs': 5},
'npart': 20,
'obs_postproc': <function HalfCheetahConfigModule.obs_postproc at 0x7f8c321402f0>,
'obs_preproc': <function HalfCheetahConfigModule.obs_preproc at 0x7f8c32140268>,
'targ_proc': <function HalfCheetahConfigModule.targ_proc at 0x7f8c32140378>}},
'exp_cfg': {'exp_cfg': {'nrollouts_per_iter': 1, 'ntrain_iters': 300},
'log_cfg': {'logdir': 'log'},
'sim_cfg': {'env': <dmbrl.env.half_cheetah.HalfCheetahEnv object at 0x7f8c91ede160>,
'task_hor': 1000}}}
Created an ensemble of 5 neural networks with variance predictions.
Created an MPC controller, prop mode TSinf, 20 particles.
Trajectory prediction logging is disabled.
Average action selection time: 1.0358095169067383e-05
Rollout length: 1000
Network training: 0%| | 0/5 [00:00<?, ?epoch(s)/s]2022-08-12 09:15:02.909158: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemmBatched: CUBLAS_STATUS_EXECUTION_FAILED
2022-08-12 09:15:02.909199: E tensorflow/stream_executor/cuda/cuda_blas.cc:2505] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[5,32,24], b.shape=[5,24,200], m=32, n=200, k=24, batch_size=5
[[Node: model_1/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model_1/truediv, model/Layer0/FC_weights/read)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "scripts/mbexp.py", line 44, in <module>
main(args.env, "MPC", args.ctrl_arg, args.override, args.logdir)
File "scripts/mbexp.py", line 29, in main
exp.run_experiment()
File "/workspace/dmbrl/misc/MBExp.py", line 96, in run_experiment
[sample["rewards"] for sample in samples]
File "/workspace/dmbrl/controllers/MPC.py", line 180, in train
self.model.train(self.train_in, self.train_targs, **self.model_train_cfg)
File "/workspace/dmbrl/modeling/models/BNN.py", line 260, in train
feed_dict={self.sy_train_in: inputs[batch_idxs], self.sy_train_targ: targets[batch_idxs]}
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[5,32,24], b.shape=[5,24,200], m=32, n=200, k=24, batch_size=5
[[Node: model_1/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model_1/truediv, model/Layer0/FC_weights/read)]]
Caused by op 'model_1/MatMul', defined at:
File "scripts/mbexp.py", line 44, in <module>
main(args.env, "MPC", args.ctrl_arg, args.override, args.logdir)
File "scripts/mbexp.py", line 22, in main
cfg.exp_cfg.exp_cfg.policy = MPC(cfg.ctrl_cfg)
File "/workspace/dmbrl/controllers/MPC.py", line 90, in __init__
)(params.prop_cfg.model_init_cfg)
File "/workspace/dmbrl/config/halfcheetah.py", line 83, in nn_constructor
model.finalize(tf.train.AdamOptimizer, {"learning_rate": 0.001})
File "/workspace/dmbrl/modeling/models/BNN.py", line 180, in finalize
train_loss = tf.reduce_sum(self._compile_losses(self.sy_train_in, self.sy_train_targ, inc_var_loss=True))
File "/workspace/dmbrl/modeling/models/BNN.py", line 436, in _compile_losses
mean, log_var = self._compile_outputs(inputs, ret_log_var=True)
File "/workspace/dmbrl/modeling/models/BNN.py", line 408, in _compile_outputs
cur_out = layer.compute_output_tensor(cur_out)
File "/workspace/dmbrl/modeling/layers/FC.py", line 71, in compute_output_tensor
raw_output = tf.matmul(input_tensor, self.weights) + self.biases
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1976, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[5,32,24], b.shape=[5,24,200], m=32, n=200, k=24, batch_size=5
[[Node: model_1/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model_1/truediv, model/Layer0/FC_weights/read)]]
cloning and pip install -r requirements.txt
from scratch resulting in this error
Downloading/unpacking dotmap==1.2.20 (from -r requirements.txt (line 1))
Downloading dotmap-1.2.20.tar.gz
Running setup.py (path:/home/jhoang/dev/handful-of-trials/env/build/dotmap/setup.py) egg_info for package dotmap
Downloading/unpacking future==0.16.0 (from -r requirements.txt (line 2))
Downloading future-0.16.0.tar.gz (824kB): 824kB downloaded
Running setup.py (path:/home/jhoang/dev/handful-of-trials/env/build/future/setup.py) egg_info for package future
warning: no files found matching '*.au' under directory 'tests'
warning: no files found matching '*.gif' under directory 'tests'
warning: no files found matching '*.txt' under directory 'tests'
Downloading/unpacking gpflow==1.1.0 (from -r requirements.txt (line 3))
Could not find a version that satisfies the requirement gpflow==1.1.0 (from -r requirements.txt (line 3)) (from versions: 1.0.0, 1.0.0, 1.1.1, 1.2.0)
Cleaning up...
No distributions matching the version for gpflow==1.1.0 (from -r requirements.txt (line 3))
I'm assuming 1.1.1 should also work, just wanted to throw this out there
https://github.com/kchua/handful-of-trials/blob/master/dmbrl/modeling/models/BNN.py#L414-L415
Intuitively we can directly model the Gaussian variance using the output of last layer, but you created max_logvar&min_logvar. Could you explain this kind of implementation for variance?
If the sigma is < 1e-12, you are setting sigma to 1 which does not do any scaling. Shouldn't it be
sigma[sigma < 1e-12] = 1e-12 ?
In MBExp.py
, is there a reason you're saving logs.mat
to the base log directory, and not the iter_dir? It seems like this overrides logs.mat
at each iteration.
savemat(
os.path.join(self.logdir, "logs.mat"), # <-- This line
{
"observations": traj_obs,
"actions": traj_acs,
"returns": traj_rets,
"rewards": traj_rews
}
)
I would think you'd instead want
savemat(
os.path.join(iter_dir, "logs.mat"),
...
but maybe I'm missing something.
The docker image that is provided (kchua/handful-of-trials) can run the mbexp.py script, but cannot run the render.py script. The error is as follows:
-> python scripts/render.py -env cartpole -logdir log/ -model-dir log/2021-11-06--06\:49\:30/
2022-04-21 09:37:34.268473: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-21 09:37:34.346063: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-21 09:37:34.346510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.63GiB
2022-04-21 09:37:34.346552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2022-04-21 09:37:34.538159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-21 09:37:34.538212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2022-04-21 09:37:34.538222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2022-04-21 09:37:34.538359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3359 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
{'ctrl_cfg': {'env': <dmbrl.env.cartpole.CartpoleEnv object at 0x7fb8eb83fd68>,
'opt_cfg': {'ac_cost_fn': <function CartpoleConfigModule.ac_cost_fn at 0x7fb8eb8438c8>,
'cfg': {'alpha': 0.1,
'max_iters': 5,
'num_elites': 40,
'popsize': 400},
'mode': 'CEM',
'obs_cost_fn': <function CartpoleConfigModule.obs_cost_fn at 0x7fb8eb843840>,
'plan_hor': 25},
'prop_cfg': {'mode': 'TSinf',
'model_init_cfg': {'load_model': True,
'model_class': <class 'dmbrl.modeling.models.BNN.BNN'>,
'model_constructor': <bound method CartpoleConfigModule.nn_constructor of <cartpole.CartpoleConfigModule object at 0x7fb8eb83f9e8>>,
'model_dir': 'log/2021-11-06--06:49:30/',
'num_nets': 5},
'model_pretrained': True,
'model_train_cfg': {'epochs': 5},
'npart': 20,
'obs_postproc': <function CartpoleConfigModule.obs_postproc at 0x7fb8eb843730>,
'obs_preproc': <function CartpoleConfigModule.obs_preproc at 0x7fb8eb8436a8>,
'targ_proc': <function CartpoleConfigModule.targ_proc at 0x7fb8eb8437b8>}},
'exp_cfg': {'exp_cfg': {'ninit_rollouts': 0,
'nrollouts_per_iter': 1,
'ntrain_iters': 1},
'log_cfg': {'logdir': 'log/', 'nrecord': 1},
'sim_cfg': {'env': <dmbrl.env.cartpole.CartpoleEnv object at 0x7fb8eb83fd68>,
'task_hor': 200}}}
Model loaded from log/2021-11-06--06:49:30/.
Created an ensemble of 5 neural networks with variance predictions.
Created an MPC controller, prop mode TSinf, 20 particles.
Trajectory prediction logging is disabled.
####################################################################
Starting training iteration 1.
ERROR:mujoco_py.mjviewer:GLFW error: 65543, desc: b'GLX: Failed to create context: BadValue (integer parameter out of range for operation)'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
Average action selection time: 1.2227837240695953
Rollout length: 200
Rewards obtained: [179.53440146325102]
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
Exception ignored in: <bound method Env.__del__ of <dmbrl.env.cartpole.CartpoleEnv object at 0x7fb8eb83fd68>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/gym/core.py", line 203, in __del__
self.close()
File "/usr/local/lib/python3.5/dist-packages/gym/core.py", line 164, in close
self.render(close=True)
File "/usr/local/lib/python3.5/dist-packages/gym/core.py", line 150, in render
return self._render(mode=mode, close=close)
File "/usr/local/lib/python3.5/dist-packages/gym/envs/mujoco/mujoco_env.py", line 105, in _render
self._get_viewer().finish()
File "/usr/local/lib/python3.5/dist-packages/mujoco_py/mjviewer.py", line 325, in finish
glfw.destroy_window(self.window)
File "/usr/local/lib/python3.5/dist-packages/mujoco_py/glfw.py", line 809, in destroy_window
window_addr = ctypes.cast(ctypes.pointer(window),
TypeError: _type_ must have storage info
I ran this programme on our university's GPU server, which has Ubuntu on it. In the log folder there is only a folder, whose name is in the form of "2GU49Z~4", and it is zero KB. I tried add -o ctrl_cfg.log_cfg.save_all_models True -o ctrl_cfg.log_cfg.log_traj_preds True -o ctrl_cfg.log_cfg.log_particles True
at the end, but there is still no difference.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.