Coder Social home page Coder Social logo

kchua / handful-of-trials Goto Github PK

View Code? Open in Web Editor NEW
418.0 418.0 97.0 175 KB

Experiment code for "Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models"

License: MIT License

Dockerfile 1.05% Python 97.50% Jupyter Notebook 1.46%
model-based-rl reinforcement-learning

handful-of-trials's People

Contributors

kchua avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

handful-of-trials's Issues

About the running time of experiment.

Thanks for great work.

I would like to ask about execution time per each environment.

For example, I am running Cartpole MBExperiment. On my computer, it seems that the average action selection time is about 3~4sec. Is it normal?

Thanks.

Equations for rewards and returns

The log file has two separate parameters called rewards and returns. What are the differences between the two?
Are there mathematical equations to define the two terms?

Multi-GPU support?

Hi Kurtland,

Thanks! This is great work! A quick question. Does this support multiple GPUs? I don't think I see settings that allow me to specify the number of GPUs used for training. If not, is there a plan to add it? If not, where should I look at for adding the support myself?

-C

More information on reacher task

I'm trying to understand the cost function for the reacher task which uses the function here.
However, I can't find a reference for what each index of the observation is in the Reacher task in the OpenAI documentation.

Can someone help explain what this function is computing? Also does anyone have a better source of documentation on the state observations in the Reacher task?

Using with other RL env

Hi, thanks for the work so much,

I would like to ask how we can use this with another gym-like environments?

For example, I create a robot simulation env that have API like Open AI gym like step, start etc. How could I adapt that env to your implementation?

Running experiment on OpenAi Gym Mujoco Environment

Dear author,
In your in Environment, for example HalfCheetah, you access the reward function and use reward_run and reward_ctrl separately instead of using original reward come from step(action) and you also add reward_run to first dim of state.

There are any config option that allow us evaluating on the original env provided by OpenAi Gym?

thank you so much for your support!

About Initial Variance of CEM

Thank you for your great codes.
I have a question for the initial variance of CEM.
According your code, you used the initial variance as follows:

self.init_var = np.tile(np.square(self.ac_ub - self.ac_lb) / 16, [self.plan_hor])

I am trying to implement CEM for MBRLHalfCheetah-v0 by using the true model like you did in your paper.
However, I got the poor results. Did you you use this parameter for all tasks ??
Actually, I got a good result for the CartPole task.

Also, How did you set the parameters of CEM to get the stable results.

"CEM": { "popsize": 500, "num_elites": 50, "max_iters": 5, "alpha": 0.1 }

Thank you for your help.

Failed to reproduce with Docker

Hi!

I ran the environment using the latest pre-built docker image. When training on RTX 3080Ti, I have the following failure:

python scripts/mbexp.py -env halfcheetah
2022-08-12 09:10:09.936602: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-08-12 09:10:10.312749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-12 09:10:10.312892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: NVIDIA GeForce RTX 3080 Ti major: 8 minor: 6 memoryClockRate(GHz): 1.71
pciBusID: 0000:07:00.0                                   
totalMemory: 11.76GiB freeMemory: 11.52GiB                                                                        
2022-08-12 09:10:10.312909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2022-08-12 09:13:36.133367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-12 09:13:36.133405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2022-08-12 09:13:36.133414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2022-08-12 09:13:36.133525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11135 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce R
TX 3080 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6)                                                    
{'ctrl_cfg': {'env': <dmbrl.env.half_cheetah.HalfCheetahEnv object at 0x7f8c91ede160>,
              'opt_cfg': {'ac_cost_fn': <function HalfCheetahConfigModule.ac_cost_fn at 0x7f8c32140488>,     
                          'cfg': {'alpha': 0.1,                                                                   
                                  'max_iters': 5,                                                                 
                                  'num_elites': 50,                                                               
                                  'popsize': 500},
                          'mode': 'CEM',                                                                          
                          'obs_cost_fn': <function HalfCheetahConfigModule.obs_cost_fn at 0x7f8c32140400>,
                          'plan_hor': 30},                                                                        
              'prop_cfg': {'mode': 'TSinf',           
                           'model_init_cfg': {'model_class': <class 'dmbrl.modeling.models.BNN.BNN'>,             
                                              'model_constructor': <bound method HalfCheetahConfigModule.nn_constructor of <halfcheetah.HalfCheetahConfigModule object at 0x7f8c32138748>>,
                                              'num_nets': 5},                                                                                                                                                                        
                           'model_train_cfg': {'epochs': 5},
                           'npart': 20,                                                                           
                           'obs_postproc': <function HalfCheetahConfigModule.obs_postproc at 0x7f8c321402f0>,
                           'obs_preproc': <function HalfCheetahConfigModule.obs_preproc at 0x7f8c32140268>,
                           'targ_proc': <function HalfCheetahConfigModule.targ_proc at 0x7f8c32140378>}},
 'exp_cfg': {'exp_cfg': {'nrollouts_per_iter': 1, 'ntrain_iters': 300},
             'log_cfg': {'logdir': 'log'},                                                                                                                                                                                           
             'sim_cfg': {'env': <dmbrl.env.half_cheetah.HalfCheetahEnv object at 0x7f8c91ede160>,                                                                                                                                    
                         'task_hor': 1000}}} 
Created an ensemble of 5 neural networks with variance predictions.
Created an MPC controller, prop mode TSinf, 20 particles. 
Trajectory prediction logging is disabled.
Average action selection time:  1.0358095169067383e-05
Rollout length:  1000
Network training:   0%|                                                               | 0/5 [00:00<?, ?epoch(s)/s]2022-08-12 09:15:02.909158: E tensorflow/stream_executor/cuda/cuda_blas.cc:647] failed to run cuBLAS routine cublasSgemmBatched: CUBLAS_STATUS_EXECUTION_FAILED
2022-08-12 09:15:02.909199: E tensorflow/stream_executor/cuda/cuda_blas.cc:2505] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[5,32,24], b.shape=[5,24,200], m=32, n=200, k=24, batch_size=5
         [[Node: model_1/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model_1/truediv, model/Layer0/FC_weights/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/mbexp.py", line 44, in <module>
    main(args.env, "MPC", args.ctrl_arg, args.override, args.logdir)
  File "scripts/mbexp.py", line 29, in main
    exp.run_experiment()
  File "/workspace/dmbrl/misc/MBExp.py", line 96, in run_experiment
    [sample["rewards"] for sample in samples]
  File "/workspace/dmbrl/controllers/MPC.py", line 180, in train
    self.model.train(self.train_in, self.train_targs, **self.model_train_cfg)
  File "/workspace/dmbrl/modeling/models/BNN.py", line 260, in train
    feed_dict={self.sy_train_in: inputs[batch_idxs], self.sy_train_targ: targets[batch_idxs]}
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[5,32,24], b.shape=[5,24,200], m=32, n=200, k=24, batch_size=5
         [[Node: model_1/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model_1/truediv, model/Layer0/FC_weights/read)]]

Caused by op 'model_1/MatMul', defined at:
  File "scripts/mbexp.py", line 44, in <module>
    main(args.env, "MPC", args.ctrl_arg, args.override, args.logdir)
  File "scripts/mbexp.py", line 22, in main
    cfg.exp_cfg.exp_cfg.policy = MPC(cfg.ctrl_cfg)
  File "/workspace/dmbrl/controllers/MPC.py", line 90, in __init__
    )(params.prop_cfg.model_init_cfg)
  File "/workspace/dmbrl/config/halfcheetah.py", line 83, in nn_constructor
    model.finalize(tf.train.AdamOptimizer, {"learning_rate": 0.001})
  File "/workspace/dmbrl/modeling/models/BNN.py", line 180, in finalize
    train_loss = tf.reduce_sum(self._compile_losses(self.sy_train_in, self.sy_train_targ, inc_var_loss=True))
  File "/workspace/dmbrl/modeling/models/BNN.py", line 436, in _compile_losses
    mean, log_var = self._compile_outputs(inputs, ret_log_var=True)
  File "/workspace/dmbrl/modeling/models/BNN.py", line 408, in _compile_outputs
    cur_out = layer.compute_output_tensor(cur_out)
  File "/workspace/dmbrl/modeling/layers/FC.py", line 71, in compute_output_tensor
    raw_output = tf.matmul(input_tensor, self.weights) + self.biases
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1976, in matmul
    a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1236, in batch_mat_mul
    "BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[5,32,24], b.shape=[5,24,200], m=32, n=200, k=24, batch_size=5
         [[Node: model_1/MatMul = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model_1/truediv, model/Layer0/FC_weights/read)]]

Could not find a version that satisfies the requirement gpflow==1.1.0

cloning and pip install -r requirements.txt from scratch resulting in this error

Downloading/unpacking dotmap==1.2.20 (from -r requirements.txt (line 1))
  Downloading dotmap-1.2.20.tar.gz
  Running setup.py (path:/home/jhoang/dev/handful-of-trials/env/build/dotmap/setup.py) egg_info for package dotmap

Downloading/unpacking future==0.16.0 (from -r requirements.txt (line 2))
  Downloading future-0.16.0.tar.gz (824kB): 824kB downloaded
  Running setup.py (path:/home/jhoang/dev/handful-of-trials/env/build/future/setup.py) egg_info for package future

    warning: no files found matching '*.au' under directory 'tests'
    warning: no files found matching '*.gif' under directory 'tests'
    warning: no files found matching '*.txt' under directory 'tests'
Downloading/unpacking gpflow==1.1.0 (from -r requirements.txt (line 3))
  Could not find a version that satisfies the requirement gpflow==1.1.0 (from -r requirements.txt (line 3)) (from versions: 1.0.0, 1.0.0, 1.1.1, 1.2.0)
Cleaning up...
No distributions matching the version for gpflow==1.1.0 (from -r requirements.txt (line 3))

I'm assuming 1.1.1 should also work, just wanted to throw this out there

Don't overwrite logs

In MBExp.py, is there a reason you're saving logs.mat to the base log directory, and not the iter_dir? It seems like this overrides logs.mat at each iteration.

            savemat(
                os.path.join(self.logdir, "logs.mat"),  # <-- This line
                {
                    "observations": traj_obs,
                    "actions": traj_acs,
                    "returns": traj_rets,
                    "rewards": traj_rews
                }
            )

I would think you'd instead want

savemat(
    os.path.join(iter_dir, "logs.mat"),
    ...

but maybe I'm missing something.

Dockerfile does not support mujoco rendering

The docker image that is provided (kchua/handful-of-trials) can run the mbexp.py script, but cannot run the render.py script. The error is as follows:

-> python scripts/render.py -env cartpole -logdir log/ -model-dir log/2021-11-06--06\:49\:30/
2022-04-21 09:37:34.268473: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-21 09:37:34.346063: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-21 09:37:34.346510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: NVIDIA GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.63GiB
2022-04-21 09:37:34.346552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2022-04-21 09:37:34.538159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-21 09:37:34.538212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2022-04-21 09:37:34.538222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2022-04-21 09:37:34.538359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3359 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
{'ctrl_cfg': {'env': <dmbrl.env.cartpole.CartpoleEnv object at 0x7fb8eb83fd68>,
              'opt_cfg': {'ac_cost_fn': <function CartpoleConfigModule.ac_cost_fn at 0x7fb8eb8438c8>,
                          'cfg': {'alpha': 0.1,
                                  'max_iters': 5,
                                  'num_elites': 40,
                                  'popsize': 400},
                          'mode': 'CEM',
                          'obs_cost_fn': <function CartpoleConfigModule.obs_cost_fn at 0x7fb8eb843840>,
                          'plan_hor': 25},
              'prop_cfg': {'mode': 'TSinf',
                           'model_init_cfg': {'load_model': True,
                                              'model_class': <class 'dmbrl.modeling.models.BNN.BNN'>,
                                              'model_constructor': <bound method CartpoleConfigModule.nn_constructor of <cartpole.CartpoleConfigModule object at 0x7fb8eb83f9e8>>,
                                              'model_dir': 'log/2021-11-06--06:49:30/',
                                              'num_nets': 5},
                           'model_pretrained': True,
                           'model_train_cfg': {'epochs': 5},
                           'npart': 20,
                           'obs_postproc': <function CartpoleConfigModule.obs_postproc at 0x7fb8eb843730>,
                           'obs_preproc': <function CartpoleConfigModule.obs_preproc at 0x7fb8eb8436a8>,
                           'targ_proc': <function CartpoleConfigModule.targ_proc at 0x7fb8eb8437b8>}},
 'exp_cfg': {'exp_cfg': {'ninit_rollouts': 0,
                         'nrollouts_per_iter': 1,
                         'ntrain_iters': 1},
             'log_cfg': {'logdir': 'log/', 'nrecord': 1},
             'sim_cfg': {'env': <dmbrl.env.cartpole.CartpoleEnv object at 0x7fb8eb83fd68>,
                         'task_hor': 200}}}
Model loaded from log/2021-11-06--06:49:30/.
Created an ensemble of 5 neural networks with variance predictions.
Created an MPC controller, prop mode TSinf, 20 particles. 
Trajectory prediction logging is disabled.
####################################################################
Starting training iteration 1.
ERROR:mujoco_py.mjviewer:GLFW error: 65543, desc: b'GLX: Failed to create context: BadValue (integer parameter out of range for operation)'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
Average action selection time:  1.2227837240695953
Rollout length:  200
Rewards obtained: [179.53440146325102]
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
ERROR:mujoco_py.mjviewer:GLFW error: 65537, desc: b'The GLFW library is not initialized'
Exception ignored in: <bound method Env.__del__ of <dmbrl.env.cartpole.CartpoleEnv object at 0x7fb8eb83fd68>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/gym/core.py", line 203, in __del__
    self.close()
  File "/usr/local/lib/python3.5/dist-packages/gym/core.py", line 164, in close
    self.render(close=True)
  File "/usr/local/lib/python3.5/dist-packages/gym/core.py", line 150, in render
    return self._render(mode=mode, close=close)
  File "/usr/local/lib/python3.5/dist-packages/gym/envs/mujoco/mujoco_env.py", line 105, in _render
    self._get_viewer().finish()
  File "/usr/local/lib/python3.5/dist-packages/mujoco_py/mjviewer.py", line 325, in finish
    glfw.destroy_window(self.window)
  File "/usr/local/lib/python3.5/dist-packages/mujoco_py/glfw.py", line 809, in destroy_window
    window_addr = ctypes.cast(ctypes.pointer(window),
TypeError: _type_ must have storage info

No log has been saved

I ran this programme on our university's GPU server, which has Ubuntu on it. In the log folder there is only a folder, whose name is in the form of "2GU49Z~4", and it is zero KB. I tried add -o ctrl_cfg.log_cfg.save_all_models True -o ctrl_cfg.log_cfg.log_traj_preds True -o ctrl_cfg.log_cfg.log_particles True at the end, but there is still no difference.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.