polixir / offlinerl Goto Github PK

View Code? Open in Web Editor NEW

151.0 5.0 20.0 351 KB

A collection of offline reinforcement learning algorithms.

License: Apache License 2.0

Python 100.00%

offline-reinforcement-learning reinforcement-learning

offlinerl's Introduction

OfflineRL

OfflineRL is a repository for Offline RL (batch reinforcement learning or offline reinforcement learning).

Re-implemented Algorithms

Model-free methods

CRR: Wang, Ziyu, et al. “Critic Regularized Regression.” Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 7768–7778. paper
CQL: Kumar, Aviral, et al. “Conservative Q-Learning for Offline Reinforcement Learning.” Advances in Neural Information Processing Systems, vol. 33, 2020. paper code
PLAS: Zhou, Wenxuan, et al. “PLAS: Latent Action Space for Offline Reinforcement Learning.” ArXiv Preprint ArXiv:2011.07213, 2020. website paper code
BCQ: Fujimoto, Scott, et al. “Off-Policy Deep Reinforcement Learning without Exploration.” International Conference on Machine Learning, 2018, pp. 2052–2062. paper code
EDAC: An, Gaon, et al. "Uncertainty-based offline reinforcement learning with diversified q-ensemble." Advances in neural information processing systems 34 (2021): 7436-7447. paper code
MCQ: Lyu, Jiafei, et al. "Mildly conservative q-learning for offline reinforcement learning." Advances in Neural Information Processing Systems 35 (2022): 1711-1724. paper code
TD3BC: Fujimoto, Scott, and Shixiang Shane Gu. "A minimalist approach to offline reinforcement learning." Advances in neural information processing systems 34 (2021): 20132-20145. paper code
PRDC: Ran, Yuhang, et al. “Policy Regularization with Dataset Constraint for Offline Reinforcement Learning.” International Conference on Machine Learning, 2023, pp. 28701-28717. paper code

Model-based methods

BREMEN: Matsushima, Tatsuya, et al. “Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization.” International Conference on Learning Representations, 2021. paper code
COMBO: Yu, Tianhe, et al. "COMBO: Conservative Offline Model-Based Policy Optimization." arXiv preprint arXiv:2102.08363 (2021). paper
MOPO: Yu, Tianhe, et al. “MOPO: Model-Based Offline Policy Optimization.” Advances in Neural Information Processing Systems, vol. 33, 2020. paper code
MAPLE: Xiong-Hui Chen, et al. "MAPLE: Offline Model-based Adaptable Policy Learning". Advances in Neural Information Processing Systems, vol. 34, 2021. paper code
MOBILE: Yihao Sun, et al. "Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning". Proceedings of the 40th International Conference on Machine Learning, PMLR 202:33177-33194, 2023. paper code
RAMBO: Rigter, Marc, Bruno Lacerda, and Nick Hawes. "Rambo-rl: Robust adversarial model-based offline reinforcement learning." Advances in neural information processing systems 35 (2022): 16082-16097. paper code

Install Datasets

NeoRL

git clone https://agit.ai/Polixir/neorl.git
cd neorl
pip install -e .

For more details on use, please see neorl.

D4RL (Optional)

pip install git+https://github.com/rail-berkeley/d4rl@master#egg=d4rl

For more details on use, please see d4rl.

Install offlinerl

pip install -e .

Example

# Training in HalfCheetah-v3-L-9 task using default parameters of cql algorithm
python examples/train_task.py --algo_name=cql --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 100

# Training in SafetyHalfCheetahtask using default parameters of cql algorithm
python examples/train_task.py --algo_name=mcq --exp_name=SafetyHalfCheetah --task SafetyHalfCheetah 

# Parameter search in the default parameter space using the cql algorithm in the HalfCheetah-v3-L-9 task
python examples/train_tune.py --algo_name=cql --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 100

# Parameter search in the default parameter space using the cql algorithm in the SafetyHalfCheetahtask task
# python examples/train_tune.py --algo_name=mcq --exp_name=SafetyHalfCheetah --task SafetyHalfCheetah 

# Training in D4RL halfcheetah-medium task using default parameters of cql algorithm (D4RL need to be installed)
python examples/train_d4rl.py --algo_name=cql --exp_name=d4rl-halfcheetah-medium-cql --task d4rl-halfcheetah-medium-v0

Parameters:

algo_name: Algorithm name . There are now bc, cql, plas, bcq and mopo algorithms available.
exp_name: Experiment name for easy visualization using aim.
task: Task name, See neorl for details.
task_data_type: Data level. Each task collects data using low, medium, and high level strategies in neorl.
task_train_num: Number of training data trajectories. For each task, neorl provides training data for up to 10000 trajectories.

View experimental results

We use Aim to store and visualize results. Aim is an experiment logger that is easy to manage thousands of experiments. For more details, see aim.

To visualize results in this repository:

cd offlinerl_tmp
aim up

Then you can see the results on http://127.0.0.1:43800.

Model-based Running Example

# Tune and save the transition models
python examples/model_tune.py --algo_name bc_model --exp_name neorl-RandomFrictionHopper-model --task RandomFrictionHopper

# Training MOPO and load the best transition model
python examples/train_task.py --algo_name mopo --exp_name neorl-safecheetah-mopo-new --task SafetyHalfCheetah --dynamics_path best_run_id

# Training COMBO and load the best transition model
python examples/train_task.py --algo_name combo --exp_name neorl-safecheetah-combo-new --task SafetyHalfCheetah --dynamics_path best_run_id

# Training RAMBO and load the best transition model
python examples/train_task.py --algo_name rambo --exp_name neorl-safecheetah-rambo-new --task SafetyHalfCheetah --dynamics_path best_run_id

# Training MOBILE and load the best transition model
python examples/train_task.py --algo_name mobile --exp_name neorl-safecheetah-mobile-new --task SafetyHalfCheetah --dynamics_path best_run_id

offlinerl's People

Contributors

Stargazers

Watchers

offlinerl's Issues

Cumulative rewards drop sharply

When i run mopo , i find that the Cumulative rewards drop sharply. Why? Has this ever happened to you too? Thanks

The torch version issue

Hi there,

After getting everything installed, I run the script provided and meet the error:

File "/home/hsy/PycharmProjects/OfflineRL/offlinerl/utils/net/tanhpolicy.py", line 29, in __init__ self.mode = torch.tanh(normal_mean) AttributeError: can't set attribute.

Pycharm also highlights the error on that line as: Property 'mode' cannot be set
Can you look into this issue and maybe provide your torch version?

Currently I am using Torch 1.13.0.

Thanks!

when i using this to train ,there comes a attributrerror

Usage of Session.init is deprecated!
Traceback (most recent call last):
File "examples/train_d4rl.py", line 19, in
fire.Fire(run_algo)
File "/root/anaconda3/envs/off/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/off/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/root/anaconda3/envs/off/lib/python3.7/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "examples/train_d4rl.py", line 12, in run_algo
algo_trainer = algo_trainer_obj(algo_init, algo_config)
File "/root/git/OfflineRL/offlinerl/algo/modelfree/cql.py", line 100, in init
super(AlgoTrainer, self).init(args)
File "/root/git/OfflineRL/offlinerl/algo/base.py", line 29, in init
self.index_path = self.exp_logger.repo.index_path
AttributeError: 'Session' object has no attribute 'repo'

环境搭建过程记录

创建激活环境

创建Conda 环境，这里取Python3.7，因为这是TensorFlow 1.X 的最后支持版本，之后的Python只能用TensorFlow 2.0之后的版本了。2.0 大改，很多老代码用不了。

conda create -n offline python=3.7

conda 重新初始化一下。

conda init

激活刚刚创建的环境

conda activate offline

TensorFlow 和 Pytorch 安装

输入代码 nvidia-smi，看一下Cuda最高支持到哪里。因为我是租用的云服务器，所以我这里不用安装驱动。安装步骤请参考其他人。

(offline) root@autodl-container-a129119e3c-3de27f6e:~/offline# nvidia-smi
Wed Nov 15 14:50:21 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57       Driver Version: 515.57       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:57:00.0 Off |                  N/A |
| 49%   28C    P8    30W / 350W |      0MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

TensorFlow

先安装 TensorFlow

可以在链接这里参考各个版本TensorFlow对应的Cuda版本，还有cuDNN，一般说来最好就按着官方说明装，但是我们后面要装一个较新版本的Pytorch，所以Cuda安装10.2版本的。

先查询查询一下安装没有

# 查询平台内置镜像中的cuda版本
ldconfig -p | grep cuda
# 查询平台内置镜像中的cudnn版本
ldconfig -p | grep cudnn

这里没有安装，不过有没有安装其实无所谓，还是需要自己装。

先搜索一下cudatoolkit有哪些版本。

$ conda search cudatoolkit
Loading channels: done
# Name                       Version           Build  Channel             
......
cudatoolkit                  10.2.89      hfd86e86_0  anaconda/pkgs/main  
cudatoolkit                  10.2.89      hfd86e86_0  pkgs/main           
cudatoolkit                  10.2.89      hfd86e86_1  anaconda/pkgs/main  
cudatoolkit                  10.2.89      hfd86e86_1  pkgs/main           
cudatoolkit                 11.0.221      h6bb024c_0  anaconda/pkgs/main  
cudatoolkit                 11.0.221      h6bb024c_0  pkgs/main           
......

安装10.2版本的

conda install cudatoolkit==10.2

接下来安装cudnn

conda search cudnn

根据build一栏里对应的cuda版本，这里选择7.6.5版本的cudnn。

conda install cudnn==7.6.5

然后如果你愿意可以检查一下

conda list | grep cudatoolkit
conda list | grep cudnn

接下来安装1.15版本的tensorflow_gpu

pip install tensorflow_gpu==1.15.5

Pytorch

接下来安装Pytorch

在官网查询有

对应安装代码是

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch

这是因为 polixir/OfflineRL 用到了 torch.nn.init.trunc_normal_函数。如果前面安装cuda10.0的话，cuda10.0的Pytorch只支持到1.2.0。

在torch.nn.init里查询commit记录，最终定位到2020年5月。

根据日期查询release，那么PyTorch的版本是至少要大于1.5.1的

此时的1.5.1已经支持cuda10.2了，因为随着版本更新，PyTorch对计算性能也有优化，不如一步到位更新到cuda10.2所能支持的最高版本。

Mujoco

先将Mujoco和mujoco-py安装上，这里安装200版本的。

先安装依赖

sudo apt update
sudo apt-get install build-essential libgl1-mesa-dev libglew-dev libsdl2-dev libsdl2-image-dev libglm-dev libfreetype6-dev libglfw3-dev libglfw3 patchelf libosmesa6-dev

然后安装强化学习所需要的一些包。setuptools版本过高安装有些东西会报错，然后cython版本过高也是无法编译mujoco，

pip install setuptools==63.2.0
pip install cython==0.29
pip install swig
pip install mujoco-py==2.0.2.13

这一步会出现错误提示，需要将Mujoco安装到指定位置：

You appear to be missing MuJoCo.  We expected to find the file here: 
	/root/.mujoco/mujoco200

This package only provides python bindings, the library must be installed separately.

Please follow the instructions on the README to install MuJoCo
	https://github.com/openai/mujoco-py#install-mujoco

Which can be downloaded from the website
	https://www.roboti.us/index.html

这里下载Mujoco本体。

这里下载许可证。其实Openai收购Mujoco后，Gym在某一个版本的更新中就已经不需要Mujoco-py了。直接下载链接

我推荐用FileZilla传文件更快，可以本地先试着配置好了，然后在传到服务器端安装。

再配置一下环境变量

vim ~/.bashrc

在末尾加入：

# Mujoco
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/.mujoco/mujoco200/bin
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

然后输入 :wq（保存并退出），再source ~/.bashrc关闭重启命令行。

pip3 install -r requirements.txt
pip3 install -r requirements.dev.txt
pip install -e .

再执行命令安装

pip install mujoco-py==2.0.2.13
pip install gym==0.19

如果mujoco-py无法正常安装（其实很有可能），请在Github的release里，手动下载源码到服务器，然后执行命令

pip install -r requirements.txt
pip install -r requirements.dev.txt
pip3 install -e .

这样应该就没问题了。也可以本地试试先满足requirements.dev.txt，之后pip应该可以直接安装了。

D4RL

先安装一些前置包

pip install absl-py
pip install matplotlib

先安装mjrl

git clone https://github.com/aravindr93/mjrl.git
cd mjrl
pip install -e .

然后再安装d4rl

cd ..
git clone https://github.com/Farama-Foundation/d4rl.git
cd d4rl

请将mjrl后面那一段删掉，否则不能正常安装。

再执行命令安装

pip install -e .

OfflineRL

安装neorl和OfflineRL：

cd ..
git clone https://agit.ai/Polixir/neorl.git
cd neorl
pip install -e .

cd ..
git clone https://github.com/polixir/OfflineRL.git
cd OfflineRL
pip install -e .

请注意，安装OfflineRL前有这三个地方需要修改：

改fire的原因我忘了。

scikit-learn是因为sklearn已经被deprecated了。

ray是因为不改会有 #7 的问题。

现在试一下运行效果

(offline) root@autodl-container-a129119e3c-3de27f6e:~/offline/OfflineRL# python examples/train_d4rl.py --algo_name=mopo --exp_name=d4rl-halfcheetah-medium-mopo --task d4rl-halfcheetah-medium-v0
Traceback (most recent call last):
......
  File "/root/offline/OfflineRL/offlinerl/config/algo/cql_config.py", line 10, in <module>
    device = 'cuda'+":"+str(select_free_cuda()) if torch.cuda.is_available() else 'cpu'
  File "/root/offline/OfflineRL/offlinerl/utils/exp.py", line 26, in select_free_cuda
    return np.argmax(memory_gpu)
......
ValueError: attempt to get argmax of an empty sequence

这个问题可能比较罕见，因为我只有一个GPU我直接简单粗暴的指定为 0 了。

依然是版本问题。我前面没有指定protobuf安装特定版本是因为有一堆冲突的，最后安装它来覆盖。

......
File "/root/miniconda3/envs/offline/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 561, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

pip install protobuf==3.19.5

再来试试运行效果

(offline) root@autodl-container-a129119e3c-3de27f6e:~/offline/OfflineRL# python examples/train_d4rl.py --algo_name=cql --exp_name=d4rl-halfcheetah-medium-cql --task d4rl-halfcheetah-medium-v0
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
pybullet build time: May 20 2022 19:43:01
2023-11-15 at 17:58:37.381 | INFO | Use cql algorithm!
......
Traceback (most recent call last):
  File "examples/train_d4rl.py", line 19, in <module>
    fire.Fire(run_algo)
......
  File "/root/offline/OfflineRL/offlinerl/evaluation/neorl.py", line 39, in test_one_trail_sp_local
    action = policy.get_action(state).reshape(-1, act_dim)
......

RuntimeError: mat1 and mat2 shapes cannot be multiplied (17x1 and 17x256)

同 #4 ，感谢 @linhlpv 。

同时也可以看见运行时会提示运行提示缺少CARLA和Flow，那是因为这两个库官方文档中给出说明是要额外安装的。

其他问题（尚未解决）

2023-11-15 19:30:18,405 WARNING utils.py:538 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set `RAY_USE_MULTIPROCESSING_CPU_COUNT=1` as an env var before starting Ray. Set the env var: `RAY_DISABLE_DOCKER_CPU_WARNING=1` to mute this warning.
2023-11-15 19:30:19,649 WARNING function_runner.py:599 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
2023-11-15 19:30:19,655 INFO logger.py:618 -- pip install "ray[tune]" to see TensorBoard files.
2023-11-15 19:30:19,656 WARNING callback.py:126 -- The TensorboardX logger cannot be instantiated because either TensorboardX or one of it's dependencies is not installed. Please make sure you have the latest version of TensorboardX installed: `pip install -U tensorboardx`

2023-11-15 19:34:04,050 INFO utils.py:519 -- Detected RAY_USE_MULTIPROCESSING_CPU_COUNT=1: Using multiprocessing.cpu_count() to detect the number of CPUs. This may be inconsistent when used inside docker. To correctly detect CPUs, unset the env var: `RAY_USE_MULTIPROCESSING_CPU_COUNT`.
2023-11-15 19:34:15,814 WARNING function_runner.py:599 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
2023-11-15 19:34:15,823 INFO logger.py:618 -- pip install "ray[tune]" to see TensorBoard files.
2023-11-15 19:34:15,824 WARNING callback.py:126 -- The TensorboardX logger cannot be instantiated because either TensorboardX or one of it's dependencies is not installed. Please make sure you have the latest version of TensorboardX installed: `pip install -U tensorboardx`
2023-11-15 19:34:15,825 WARNING trial_runner.py:288 -- The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (140 CPUs/pending trials). If you're running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the `TUNE_MAX_PENDING_TRIALS_PG` environment variable to the desired maximum number of concurrent trials.

Question about cql_loss calculation in COMBO

When COMBO is derived from CQL, why do they calculate CQL_loss differently?

Why is there no LaGrange adjustment in COMBO?

Since COMBO is derived from CQL, I just wonder why the auto adjustment used in CQL is not discussed in COMBO? Is it useless in COMBO?

Problem with d4rl env

建议setup.py的install_requires将'sklearn'换成'scikit-learn'，只安装sklearn的话，在data.py的from sklearn.preprocessing import MinMaxScaler步骤会报错。
evaluation/neorl.py中的test_on_real_env函数中有判断：if "sp" or "sales" in env._name，但是d4rl环境似乎没有_name属性，会报错。
ray==1.2版本报错：
Traceback (most recent call last): File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 323, in <module> loop.run_until_complete(agent.run()) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 138, in run modules = self._load_modules() File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/new_dashboard/agent.py", line 92, in _load_modules c = cls(self) File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/new_dashboard/modules/reporter/reporter_agent.py", line 72, in __init__ self._metrics_agent = MetricsAgent(dashboard_agent.metrics_export_port) File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/metrics_agent.py", line 74, in __init__ prometheus_exporter.new_stats_exporter( File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/prometheus_exporter.py", line 333, in new_stats_exporter exporter = PrometheusStatsExporter( File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/prometheus_exporter.py", line 266, in __init__ self.serve_http() File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/ray/prometheus_exporter.py", line 320, in serve_http start_http_server( File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/prometheus_client/exposition.py", line 169, in start_wsgi_server TmpServer.address_family, addr = _get_best_family(addr, port) File "/home/luofm/Utils/pyenv/offlinerl/lib/python3.8/site-packages/prometheus_client/exposition.py", line 158, in _get_best_family infos = socket.getaddrinfo(address, port) File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known
以上报错在ray==1.5.0版本依然存在，但是在ray==1.0.0版本不存在。

OfflineRL 版本: 08f51b5
系统 Ubuntu 20.04.1 LTS x86_64
python==3.8.10
运行的命令：

python examples/train_d4rl.py --algo_name=mopo --exp_name=d4rl-halfcheetah-medium-mopo --task d4rl-halfcheetah-medium-v2

When I run the example. I have an RuntimeError: mat1 and mat2 shapes cannot be multiplied (18x1 and 18x256)

When I run the command
python examples/train_task.py --algo_name=mopo --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 2
It shows :

File "examples/train_task.py", line 19, in <module>
   fire.Fire(run_algo)
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
   component_trace = _Fire(component, args, parsed_flag_args, context, name)
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
   component, remaining_args = _CallAndUpdateTrace(
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
   component = fn(*varargs, **kwargs)
 File "examples/train_task.py", line 16, in run_algo
   algo_trainer.train(train_buffer, val_buffer, callback_fn=callback)
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/algo/modelbase/mopo.py", line 94, in train
   self.train_policy(train_buffer, val_buffer, self.transition, callback_fn)
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/algo/modelbase/mopo.py", line 206, in train_policy
   res = callback_fn(self.get_policy())
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/evaluation/__init__.py", line 80, in __call__
   eval_res.update(test_on_real_env(policy, self.env, number_of_runs=self.number_of_runs))
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/evaluation/neorl.py", line 54, in test_on_real_env
   results = [test_one_trail_sp_local(env, policy) for _ in range(number_of_runs)]
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/evaluation/neorl.py", line 54, in <listcomp>
   results = [test_one_trail_sp_local(env, policy) for _ in range(number_of_runs)]
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/evaluation/neorl.py", line 39, in test_one_trail_sp_local
   action = policy.get_action(state).reshape(-1, act_dim)
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/utils/net/common.py", line 33, in get_action
   act = to_array_as(self.policy_infer(obs_tensor), obs)
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/utils/net/tanhpolicy.py", line 164, in policy_infer
   return self(obs).mode
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
   return forward_call(*input, **kwargs)
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/utils/net/tanhpolicy.py", line 147, in forward
   logits, h = self.preprocess(obs, state)
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
   return forward_call(*input, **kwargs)
 File "/media/lksgcc/new_disk/lk_git/3_Reinforcement_Learning/3_2_Offline_Learning/OfflineRL/offlinerl/utils/net/common.py", line 113, in forward
   logits = self.model(s)
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
   return forward_call(*input, **kwargs)
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
   input = module(input)
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
   return forward_call(*input, **kwargs)
 File "/home/lksgcc/.pyenv/versions/anaconda3-5.0.1/envs/mujoco_py/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
   return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (18x1 and 18x256)

Other algos also show the same error. Thanks for solving this problem!

AttributeError: 'Session' object has no attribute 'repo'

I run the command python examples/train_d4rl.py --algo_name=cql --exp_name=d4rl-halfcheetah-medium-cql --task d4rl-halfcheetah-medium-v0 .
it failed with error as follow:
File "examples/../offlinerl/algo/base.py", line 30, in __init__
if self.exp_logger.repo is not None: # a naive fix of aim exp_logger.repo is None
AttributeError: 'Session' object has no attribute 'repo'

[Solved] Something went wrong in `get_repo` & Aim up don't work well

$ python examples/train_task.py --algo_name=cql --exp_name=halfcheetah --task HalfCheetah-v3 --task_data_type low --task_train_num 100
2023-11-03 at 16:35:56.112 | INFO | Use cql algorithm!
running build_ext
2023-11-03 at 16:35:56.618 | INFO | obs shape: (1000, 18)
......
......
2023-11-03 at 16:35:56.870 | INFO | Init AlgoTrainer
Something went wrong in `get_repo`. The process will continue to execute.
`get_repo`: branch name must be at least 2 characters and contain only latin letters, numbers, dash and underscore
Something went wrong in `__init__`. The process will continue to execute.
`__init__`: 'NoneType' object has no attribute 'path'
Something went wrong in `track`. The process will continue to execute.
`track`: session is closed

I tried to follow the steup requirements ("aim==2.0.27"), but some nuisance still happened.

After investigation, the error was located on line 306 of the Session class in the aim up module.

Amend the file "/home/username/anaconda3/envs/env_name/lib/python3.8/site-packages/aim/sdk/session/session.py"

if path is not None:
    repo = AimRepo(path)
    if not repo.exists():
        if not repo.init():
            raise ValueError('can not create repo `{}`'.format(path))
    repo = AimRepo(path, branch_name, commit_hash)

Replace the second line of the above code with repo=AimRepo (path, branchname, commit_hash)(or delete it), but it still don't work very well.

After about 6 hours of tortuous process😵, I finally get a hapharzcan env which works well.🥳

The following table shows some versions of libraries that I think are worth emphasizing.

Name	Version
python	3.8.18
cython	0.29
setuptools	63.2.0
pysqlite3	0.5.2
pip	23.3.1
ray	1.12.0
aim	2.3.0
fire	0.3.0
glfw	2.6.2
numpy	1.20.3
protobuf	3.19.0
torch	2.1.0
scikit-learn	1.3.2
mujoco-py	1.50.1.0
swig	4.1.1
gym	0.19.0

The following table is the environment profile after I deleted some unimportant libraries for your reference. 🤝I hope it can be helpful to you!

#Name	Version
_libgcc_mutex	0.1
_openmp_mutex	5.1
absl-py	2.0.0
aim	2.3.0
aimrecords	0.0.7
aimrocks	0.0.7
aiofiles	23.2.1
aiohttp	3.7.4
aiohttp-cors	0.7.0
aioredis	1.3.1
aiosignal	1.3.1
asttokens	2.4.1
async-exit-stack	1.0.1
async-generator	1.10
async-timeout	3.0.1
atari-py	0.2.6
attrdict	2.0.1
attrs	23.1.0
box2d-py	2.3.5
cython	0.29
decorator	4.4.2
deprecated	1.2.14
distlib	0.3.7
dm-tree	0.1.8
docker	6.1.3
filelock	3.13.1
fire	0.3.0
flask	1.1.2
glfw	2.6.2
google-api-core	2.12.0
google-auth	1.35.0
google-auth-oauthlib	0.4.6
googleapis-common-protos	1.61.0
gtimer	1.0.0b5
gunicorn	20.1.0
gym	0.19.0
gym-notices	0.0.8
libgcc-ng	11.2.0
libgomp	11.2.0
libstdcxx-ng	11.2.0
lockfile	0.12.2
mujoco-py	1.50.1.0
multidict	6.0.4
neorl	0.3.1
networkx	3.1
numpy	1.20.3
numpydoc	1.6.0
oauthlib	3.2.2
offlinerl	0.0.1
opencensus	0.11.3
opencv-python	4.8.1.78
openssl	3.0.11
pandas	2.0.3
parso	0.8.3
pexpect	4.8.0
pickleshare	0.7.5
pillow	10.0.1
pip	23.3.1
protobuf	3.19.0
pyasn1	0.5.0
pyasn1-modules	0.3.0
pycparser	2.21
pydantic	1.10.13
pygame	2.1.0
pyglet	2.0.9
pygments	2.16.1
pyopengl	3.1.7
pyopengl-accelerate	3.1.7
pyparsing	3.1.1
pyrser	0.2.0
pysqlite3	0.5.2
python	3.8.18
ray	1.12.0
readline	8.2
redis	4.1.4
rsa	4.9
scikit-learn	1.3.2
scipy	1.10.1
setuptools	63.2.0
sphinx	7.1.2
sqlalchemy	1.4.13
sqlite	3.41.2
stack-data	0.6.3
starlette	0.14.2
swig	4.1.1
sympy	1.12
tabulate	0.9.0
tensorboard	2.3.0
tensorboardx	2.6.2.2
termcolor	2.3.0
threadpoolctl	3.2.0
tk	8.6.12
tomli	2.0.1
torch	2.1.0
tqdm	4.66.1
urllib3	2.0.7

polixir / offlinerl Goto Github PK

offlinerl's Introduction

OfflineRL

Re-implemented Algorithms

Model-free methods

Model-based methods

Install Datasets

NeoRL

D4RL (Optional)

Install offlinerl

Example

View experimental results

Model-based Running Example

offlinerl's People

Contributors

Stargazers

Watchers

Forkers

offlinerl's Issues

创建激活环境

TensorFlow 和 Pytorch 安装

TensorFlow

Pytorch

Mujoco

D4RL

OfflineRL

其他问题（尚未解决）

Recommend Projects

Recommend Topics

Recommend Org