tianheyu927 / mopo Goto Github PK

View Code? Open in Web Editor NEW

164.0 164.0 41.0 117 KB

Code for MOPO: Model-based Offline Policy Optimization

License: MIT License

Python 99.95% Shell 0.05%

mopo's People

Contributors

Stargazers

Watchers

Forkers

capybaralet sweetice zhaoyi11 zhan0903 cdyangbo xionghuichen staminatang danielhp95 brunobsm freor slin70 jacarvalho qingxinhu123 radum2275 apogue sparkmxy jasonliuuu cfeng783 ruebin franktiantt djmartingale liangzp xtwentian3 zhanghc12 t6-thu takuyahiraoka thatscotdatasci zrz-unknow niceboy120 victordongy lamhagoel pjw1 anyasims ikasou shenjiede greenantoflw shotsan dtbinh howdydoodoo rickconci

mopo's Issues

In the current code, spectral normalization does not seem to be properly applied

When adding FC layer, BNN class adds copy of layer, according to

https://github.com/tianheyu927/mopo/blob/master/mopo/models/bnn.py#L141

Further, when copying a layer, the code uses repr function:
https://github.com/tianheyu927/mopo/blob/master/mopo/models/fc.py#L132
https://github.com/tianheyu927/mopo/blob/master/mopo/models/fc.py#L51

However, the sn attribute is not specified in repr, and it uses default value sn=False.

Even if I add sn in the repr, tensorflow gives error due to the redundant name 'u'.

running example gives: "ModuleNotFoundError: No module named 'mopo.env'"

And it looks to me like this module is indeed missing.
Traceback:

(mopo) capybara@blg4101:~/mopo$ mopo run_local examples.development --config=examples.config.d4rl.halfcheetah_mixed --gpus=1 --trial-gpus=1
WARNING: Logging before flag parsing goes to stderr.
W0909 03:02:30.004770 47483036384896 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

running build_ext
building 'mujoco_py.cymj' extension
gcc -pthread -B /home/capybara/anaconda3/envs/mopo/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py -I/home/capybara/.mujoco/mujoco200/include -I/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/numpy/core/include -I/home/capybara/anaconda3/envs/mopo/include/python3.6m -c /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/cymj.c -o /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.10_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/cymj.o -fopenmp -w
gcc -pthread -B /home/capybara/anaconda3/envs/mopo/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py -I/home/capybara/.mujoco/mujoco200/include -I/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/numpy/core/include -I/home/capybara/anaconda3/envs/mopo/include/python3.6m -c /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.c -o /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.10_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.o -fopenmp -w
creating /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.10_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6
creating /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.10_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6/mujoco_py
gcc -pthread -shared -B /home/capybara/anaconda3/envs/mopo/compiler_compat -L/home/capybara/anaconda3/envs/mopo/lib -Wl,-rpath=/home/capybara/anaconda3/envs/mopo/lib -Wl,--no-as-needed -Wl,--sysroot=/ /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.10_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/cymj.o /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.10_36_linuxcpuextensionbuilder/temp.linux-x86_64-3.6/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.o -L/home/capybara/.mujoco/mujoco200/bin -Wl,-R/home/capybara/.mujoco/mujoco200/bin -lmujoco200 -lglewosmesa -lOSMesa -lGL -o /home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/mujoco_py/generated/_pyxbld_2.0.2.10_36_linuxcpuextensionbuilder/lib.linux-x86_64-3.6/mujoco_py/cymj.cpython-36m-x86_64-linux-gnu.so -fopenmp
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
Traceback (most recent call last):
  File "/home/capybara/anaconda3/envs/mopo/bin/mopo", line 33, in <module>
    sys.exit(load_entry_point('mopo', 'console_scripts', 'mopo')())
  File "/home/capybara/mopo/softlearning/scripts/console_scripts.py", line 202, in main
    return cli()
  File "/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/capybara/anaconda3/envs/mopo/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/capybara/mopo/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd
    return run_example_local(example_module_name, example_argv)
  File "/home/capybara/mopo/examples/instrument.py", line 209, in run_example_local
    example_args = example_module.get_parser().parse_args(example_argv)
  File "/home/capybara/mopo/examples/development/__init__.py", line 37, in get_parser
    from examples.utils import get_parser
  File "/home/capybara/mopo/examples/utils.py", line 9, in <module>
    import softlearning.environments.utils as env_utils
  File "/home/capybara/mopo/softlearning/environments/utils.py", line 6, in <module>
    import mopo.env as env_overwrite
ModuleNotFoundError: No module named 'mopo.env'

What's the difference between MOPO-no_penalty and MBPO

Hi, thanks so much for your nice work and code. I have a question about the results in Table 3, specifically, the MOPO-no_pen and MBPO. Could you clarify the difference between these two experiments? Noticed that, on page 19, the paper mentioned: "For simplicity, we use MBPO, which essentially MOPO without reward penalty, for this ablation study", so what causes the performance gap in Table 3?

MOPO predict

Do you have code example on how to predict and evaluate MOPO after fitting the model?

Questions about the number in the paper

Hi, I really appreciate your open source code. My question is how is your performance number reported in the paper.

For example, in Table 1, do you use the max evaluation return during the learning process or use the last evaluation return. The return of the policy has large variance in different iteration.

Thanks,
Yue

Open the hiv dataset

Hi, Tianhe. Could you please release the hiv dataset buffer by SAC agent?

Thanks a lot.

The working packages of viskit have been updated in July 2021

Given that the working packages of viskit have been updated in July 2021, one should make sure he/she uses the working packages supported before when installing viskit.
vitchyr/viskit@5411264

Any plan to fit in tensorflow_gpu 2 or 2.7?

Hello!
Do you have any plan to fit in tensorflow-gpu 2?

Because of the python and cuda version, I have to use mopo with tensorflow-gpu 2.7. So I have done some normal modifications to fit it from 1.14.0 to 2.7. Before adding in tape, it says tape should be added as:

raise ValueError("tape is required when a Tensor loss is passed. "
ValueError: tape is required when a Tensor loss is passed. Received: loss=Tensor("BNN_1/add_9:0", shape=(), dtype=float32), tape=None.

But once I added var_list=self.optvars,tape=tf.GradientTape() in BNN.py, it raised error as below:

File "/code/com//mopo/mopo/models/bnn.py", line 256, in finalize
self.train_op = self.optimizer.minimize(train_loss, var_list=self.optvars, tape=tf.GradientTape())
File "/root/miniconda3/envs/mopo/lib/python3.9/site-packages/keras/optimizer_v2/optimizer_v2.py", line 532, in minimize
return self.apply_gradients(grads_and_vars, name=name)
File "/root/miniconda3/envs/mopo/lib/python3.9/site-packages/keras/optimizer_v2/optimizer_v2.py", line 633, in apply_gradients
grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
File "/root/miniconda3/envs/mopo/lib/python3.9/site-packages/keras/optimizer_v2/utils.py", line 73, in filter_empty_gradients
raise ValueError(f"No gradients provided for any variable: {variable}. "

ValueError: No gradients provided for any variable: (['BNN/Layer0_mean/FC_weights:0', 'BNN/Layer0_mean/FC_biases:0', 'BNN/Layer1_mean/FC_weights:0', 'BNN/Layer1_mean/FC_biases:0', 'BNN/Layer2_mean/FC_weights:0', 'BNN/Layer2_mean/FC_biases:0', 'BNN/Layer3_mean/FC_weights:0', 'BNN/Layer3_mean/FC_biases:0', 'BNN/Layer4_mean/FC_weights:0', 'BNN/Layer4_mean/FC_biases:0', 'BNN/Layer0_var/FC_weights:0', 'BNN/Layer0_var/FC_biases:0', 'BNN/max_log_var:0', 'BNN/min_log_var:0'],). Provided grads_and_vars is ((None, <tf.Variable 'BNN/Layer0_mean/FC_weights:0' shape=(7, 14, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer0_mean/FC_biases:0' shape=(7, 1, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer1_mean/FC_weights:0' shape=(7, 200, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer1_mean/FC_biases:0' shape=(7, 1, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer2_mean/FC_weights:0' shape=(7, 200, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer2_mean/FC_biases:0' shape=(7, 1, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer3_mean/FC_weights:0' shape=(7, 200, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer3_mean/FC_biases:0' shape=(7, 1, 200) dtype=float32>), (None, <tf.Variable 'BNN/Layer4_mean/FC_weights:0' shape=(7, 200, 267) dtype=float32>), (None, <tf.Variable 'BNN/Layer4_mean/FC_biases:0' shape=(7, 1, 267) dtype=float32>), (None, <tf.Variable 'BNN/Layer0_var/FC_weights:0' shape=(7, 200, 267) dtype=float32>), (None, <tf.Variable 'BNN/Layer0_var/FC_biases:0' shape=(7, 1, 267) dtype=float32>), (None, <tf.Variable 'BNN/max_log_var:0' shape=(1, 267) dtype=float32>), (None, <tf.Variable 'BNN/min_log_var:0' shape=(1, 267) dtype=float32>)).

Do you have any idea to solve this problem? And do you have any plan to fit in tensorflow-gpu 2?

Thank you very much!

No module named 'examples.instruments'

Hi, first of all, thanks for sharing the code of your paper.

I encountered many problems during the installation, since viskit has incompatible dependencies to the one you specified in your library (e.g., the NumPy version, and so on). I solved it by removing some of the version specification in the installer of your library and the viskit's.

For this reason, I share with you in the end of this messages, all the libraries with relative version of the mopo conda environment.

When I lunch mopo run_local examples.development --config=examples.config.d4rl.halfcheetah_mixed --gpus=1 --trial-gpus=1 from the main folder i get the following error stack,

Traceback (most recent call last):
  File "/home/samuele/miniconda3/envs/mopo/bin/mopo", line 33, in <module>
    sys.exit(load_entry_point('mopo', 'console_scripts', 'mopo')())
  File "/home/samuele/miniconda3/envs/mopo/bin/mopo", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/samuele/miniconda3/envs/mopo/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 100, in load
    module = import_module(match.group('module'))
  File "/home/samuele/miniconda3/envs/mopo/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/samuele/Projects/mopo/softlearning/scripts/console_scripts.py", line 26, in <module>
    from examples.instrument import (
ModuleNotFoundError: No module named 'examples.instrument'

I launched the command from different folders, e.g, the main folder, the mopo folder, etc.
I always experience the same error.

The detail about my environment

aiohttp-cors              0.7.0                    pypi_0    pypi
aioredis                  1.3.1                    pypi_0    pypi
asn1crypto                1.4.0                      py_0  
astor                     0.8.1                    pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
async-timeout             3.0.1                    pypi_0    pypi
atari-py                  0.2.6                    pypi_0    pypi
atomicwrites              1.4.0                    pypi_0    pypi
attrs                     20.3.0                   pypi_0    pypi
blessings                 1.7                      pypi_0    pypi
boto3                     1.17.14                  pypi_0    pypi
botocore                  1.20.14                  pypi_0    pypi
brotlipy                  0.7.0           py36h27cfd23_1003  
ca-certificates           2021.1.19            h06a4308_0  
cachetools                4.2.1                    pypi_0    pypi
certifi                   2020.12.5        py36h06a4308_0  
cffi                      1.14.5                   pypi_0    pypi
chardet                   3.0.4                    pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
cloudpickle               1.6.0                    pypi_0    pypi
colorama                  0.4.4                    pypi_0    pypi
colorful                  0.5.4                    pypi_0    pypi
conda                     4.9.2            py36h06a4308_0  
conda-package-handling    1.7.2            py36h03888b9_0  
contextvars               2.4                      pypi_0    pypi
cryptography              3.4.6                    pypi_0    pypi
cython                    0.29.22                  pypi_0    pypi
d4rl                      1.1                      pypi_0    pypi
dask                      2021.2.0                 pypi_0    pypi
dataclasses               0.8                      pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
deepdiff                  5.2.3                    pypi_0    pypi
dm-control                0.0.355168290            pypi_0    pypi
dm-env                    1.4                      pypi_0    pypi
dm-tree                   0.1.5                    pypi_0    pypi
dotmap                    1.3.23                   pypi_0    pypi
filelock                  3.0.12                   pypi_0    pypi
flask                     1.0.2                    pypi_0    pypi
flatbuffers               1.12                     pypi_0    pypi
funcsigs                  1.0.2                    pypi_0    pypi
gast                      0.3.3                    pypi_0    pypi
gitdb                     4.0.5                    pypi_0    pypi
gitdb2                    4.0.2                    pypi_0    pypi
gitpython                 3.1.13                   pypi_0    pypi
glfw                      2.0.0                    pypi_0    pypi
google-api-core           1.26.0                   pypi_0    pypi
google-api-python-client  1.12.8                   pypi_0    pypi
google-auth               1.27.0                   pypi_0    pypi
google-auth-httplib2      0.0.4                    pypi_0    pypi
google-auth-oauthlib      0.4.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
googleapis-common-protos  1.52.0                   pypi_0    pypi
gpflow                    2.1.4                    pypi_0    pypi
gpustat                   0.6.0                    pypi_0    pypi
grpcio                    1.32.0                   pypi_0    pypi
gtimer                    1.0.0b5                  pypi_0    pypi
gym                       0.18.0                   pypi_0    pypi
h5py                      2.10.0                   pypi_0    pypi
hiredis                   1.1.0                    pypi_0    pypi
httplib2                  0.19.0                   pypi_0    pypi
idna                      2.10               pyhd3eb1b0_0  
idna-ssl                  1.1.0                    pypi_0    pypi
immutables                0.15                     pypi_0    pypi
importlib-metadata        3.6.0                    pypi_0    pypi
iniconfig                 1.1.1                    pypi_0    pypi
itsdangerous              1.1.0                    pypi_0    pypi
jmespath                  0.10.0                   pypi_0    pypi
joblib                    1.0.1                    pypi_0    pypi
jsonschema                3.2.0                    pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.3.1                    pypi_0    pypi
labmaze                   1.0.3                    pypi_0    pypi
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.2.1             hf484d3e_1007  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
lockfile                  0.12.2                   pypi_0    pypi
lxml                      4.6.2                    pypi_0    pypi
lz4                       3.1.3                    pypi_0    pypi
markdown                  3.3.3                    pypi_0    pypi
matplotlib                2.0.2                    pypi_0    pypi
mjrl                      1.0.0                    pypi_0    pypi
mopo                      0.0.1                     dev_0    <develop>
more-itertools            8.7.0                    pypi_0    pypi
msgpack                   1.0.2                    pypi_0    pypi
multidict                 5.1.0                    pypi_0    pypi
multipledispatch          0.6.0                    pypi_0    pypi
multiworld                0.0.0                    pypi_0    pypi
ncurses                   6.2                  he6710b0_1  
networkx                  2.5                      pypi_0    pypi
numpy                     1.16.0                   pypi_0    pypi
nvidia-ml-py3             7.352.0                  pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
opencensus                0.7.12                   pypi_0    pypi
opencensus-context        0.1.2                    pypi_0    pypi
opencv-python-headless    4.3.0.36                 pypi_0    pypi
openssl                   1.0.2u               h7b6447c_0  
opt-einsum                3.3.0                    pypi_0    pypi
ordered-set               4.0.2                    pypi_0    pypi
packaging                 20.9                     pypi_0    pypi
pandas                    1.1.5                    pypi_0    pypi
patchelf                  0.9                  he6710b0_3  
pip                       21.0.1           py36h06a4308_0  
plotly                    4.0.0                    pypi_0    pypi
pluggy                    0.13.1                   pypi_0    pypi
prometheus-client         0.9.0                    pypi_0    pypi
protobuf                  3.15.2                   pypi_0    pypi
psutil                    5.8.0                    pypi_0    pypi
py                        1.10.0                   pypi_0    pypi
py-spy                    0.3.4                    pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pybullet                  3.0.8                    pypi_0    pypi
pycosat                   0.6.3            py36h27cfd23_0  
pycparser                 2.20                       py_2  
pygame                    2.0.1                    pypi_0    pypi
pyglet                    1.5.0                    pypi_0    pypi
pyopengl                  3.1.5                    pypi_0    pypi
pyopenssl                 20.0.1                   pypi_0    pypi
pyrsistent                0.17.3                   pypi_0    pypi
pysocks                   1.7.1            py36h06a4308_0  
pytest                    6.2.2                    pypi_0    pypi
python                    3.6.5                hc3d631a_2  
pytz                      2021.1                   pypi_0    pypi
pywavelets                1.1.1                    pypi_0    pypi
pyyaml                    5.4.1                    pypi_0    pypi
ray                       1.2.0                    pypi_0    pypi
readline                  7.0                  h7b6447c_5  
redis                     3.5.3                    pypi_0    pypi
requests                  2.25.1             pyhd3eb1b0_0  
requests-oauthlib         1.3.0                    pypi_0    pypi
retrying                  1.3.3                    pypi_0    pypi
rsa                       4.7.2                    pypi_0    pypi
ruamel_yaml               0.15.87          py36h7b6447c_1  
s3transfer                0.3.4                    pypi_0    pypi
scikit-image              0.17.2                   pypi_0    pypi
scikit-learn              0.24.1                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
serializable              0.1.0                    pypi_0    pypi
setproctitle              1.2.2                    pypi_0    pypi
setuptools                52.0.0           py36h06a4308_0  
six                       1.15.0           py36h06a4308_0  
smmap                     3.0.5                    pypi_0    pypi
smmap2                    3.0.1                    pypi_0    pypi
sqlite                    3.33.0               h62c20be_0  
tabulate                  0.8.9                    pypi_0    pypi
tensorboard               2.4.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.0                    pypi_0    pypi
tensorboardx              2.1                      pypi_0    pypi
tensorflow                2.4.1                    pypi_0    pypi
tensorflow-estimator      2.4.0                    pypi_0    pypi
tensorflow-gpu            2.4.1                    pypi_0    pypi
tensorflow-probability    0.12.1                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
threadpoolctl             2.1.0                    pypi_0    pypi
tifffile                  2020.9.3                 pypi_0    pypi
tk                        8.6.10               hbc83047_0  
toml                      0.10.2                   pypi_0    pypi
toolz                     0.11.1                   pypi_0    pypi
tqdm                      4.57.0                   pypi_0    pypi
typing-extensions         3.7.4.3                  pypi_0    pypi
uritemplate               3.0.1                    pypi_0    pypi
urllib3                   1.26.3             pyhd3eb1b0_0  
viskit                    0.1                       dev_0    <develop>
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.36.2             pyhd3eb1b0_0  
wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h7b6447c_0  
yaml                      0.2.5                h7b6447c_0  
yarl                      1.6.3                    pypi_0    pypi
zipp                      3.4.0                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3

Thanks in advance!
Best regards,
Samuele.

installing viskit

This line of the installation instructions didn't work for me:
pip install -e viskit

Does this just mean that you should clone viskit somewhere else and install it with pip install -e?

e.g.
cd ~
git clone [email protected]:vitchyr/viskit.git
cd viskit
pip install -e .

What is the difference between MOPO and SAC

I recently found an MOPO code implemented using pytorch (https://github.com/junming-yang/mopo-pytorch). I cannot find the difference between MOPO and SAC. The only difference is that there data are sampled from the rollout replay buffer generated by the learned dynamcis?

ModuleNotFoundError: No module named 'ray.autoscaler.commands'

(base) hzx@hzx-System-Product-Name:~/mopo$ conda env create -f environment/gpu-env.yml
Collecting package metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
current version: 22.9.0
latest version: 23.10.0

Please update conda by running

$ conda update -n base -c defaults conda

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: | Ran pip subprocess with arguments:
['/home/hzx/anaconda3/envs/mopo/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/hzx/mopo/environment/condaenv.dv0e6q7u.requirements.txt']
Pip subprocess output:
Collecting git+https://github.com/vitchyr/multiworld.git@d76b3dae2e8cbca02924f93d6cc0239c552f6408 (from -r /home/hzx/mopo/environment/requirements.txt (line 53))
Cloning https://github.com/vitchyr/multiworld.git (to revision d76b3dae2e8cbca02924f93d6cc0239c552f6408) to /tmp/pip-req-build-l357d2x_

Pip subprocess error:
Running command git clone -q https://github.com/vitchyr/multiworld.git /tmp/pip-req-build-l357d2x_
error: RPC failed; curl 92 HTTP/2 stream 0 was not closed cleanly: CANCEL (err 8)
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
WARNING: Discarding git+https://github.com/vitchyr/multiworld.git@d76b3dae2e8cbca02924f93d6cc0239c552f6408. Command errored out with exit status 128: git clone -q https://github.com/vitchyr/multiworld.git /tmp/pip-req-build-l357d2x_ Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone -q https://github.com/vitchyr/multiworld.git /tmp/pip-req-build-l357d2x_ Check the logs for full command output.

failed

CondaEnvException: Pip failed

Why MOPO is given access to terminal function in rollout generation?

Hi, thanks for sharing your great work. However, I am confused about the rollout generation process.

As I see in the code, the agent can access to a pre-defined terminal function to cut down the unrealistic state. Is this assumption generally holds up for broad cases of offline RL? To my understanding, in the offline setting, the agent should only access to a fix dataset without anything else. It feels like a little cheating for me, especially when, in the paper, the authors argue that one of the difference between MOPO and MOReL is that the soft penalty, rather than a hard terminal, of MOPO allow the agent to take risky actions.

Besides, if MOPO really needs the terminal function, why not learn one by neural net? I have already seen many model-based works on Atari games that uses a learned terminal function.