paddlepaddle / parl Goto Github PK
View Code? Open in Web Editor NEWA high-performance distributed training framework for Reinforcement Learning
Home Page: https://parl.readthedocs.io/
License: Apache License 2.0
A high-performance distributed training framework for Reinforcement Learning
Home Page: https://parl.readthedocs.io/
License: Apache License 2.0
我运行的默认的IMPALA算法,actor数量为2
运行环境:
显卡:mx150
cuda:10.02.89
paddlepaddle-gpu (1.6.3.post107)
parl (1.2.1)
错误如下:
[02-29 17:50:42 MainThread @train.py:148] Waiting for 2 remote actors to connect.
[02-29 17:50:42 MainThread @train.py:152] Remote actor count: 1
[02-29 17:50:42 MainThread @train.py:152] Remote actor count: 2
Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/xtq/PARL/examples/IMPALA/train.py", line 163, in run_remote_sample
remote_actor = Actor(self.config)
File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 127, in __init__
raise RemoteError('__init__', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `__init__`]:
No module named 'atari_model'
traceback:
Traceback (most recent call last):
File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/job.py", line 298, in wait_for_connection
cls = cloudpickle.loads(message[1])
ModuleNotFoundError: No module named 'atari_model'
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/xtq/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/xtq/PARL/examples/IMPALA/train.py", line 163, in run_remote_sample
remote_actor = Actor(self.config)
File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 127, in __init__
raise RemoteError('__init__', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function `__init__`]:
No module named 'atari_model'
traceback:
Traceback (most recent call last):
File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/remote/job.py", line 298, in wait_for_connection
cls = cloudpickle.loads(message[1])
ModuleNotFoundError: No module named 'atari_model'
When running sh scripts/train_difficulty1.sh ./low_speed_model
, PARL remote error occurred. It
looks like the local variable 'reward_footstep_0' referenced before assignment.
[12-21 10:39:48 Thread-1 @train.py:287] saving models
[12-21 10:39:48 Thread-1 @train.py:290] saving rpm
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 190, in run_remote_sample
obs, reward, done, info = remote_actor.step(action)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/remote/remote_decorator.py", line 189, in wrapper
raise RemoteError(attr, error_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function step
]:
local variable 'reward_footstep_0' referenced before assignment
traceback:
Traceback (most recent call last):
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/remote/job.py", line 379, in single_task
ret = getattr(obj, function_name)(*args, **kwargs)
File "/home/luo/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge/actor.py", line 56, in step
return self.env.step(action, project=False)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 219, in step
obs, r, done, info = self.env.step(action, **kwargs)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 51, in step
return self.env.step(action, **kwargs)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 68, in step
obs, reward, done, info = self.env.step(action, **kwargs)
File "/tmp/tmpm9llqji1/env_wrapper.py", line 119, in step
obs, r, done, info = self.env.step(action, **kwargs)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 562, in step
_, reward, done, info = super(L2M2019Env, self).step(action_mapped, project=project, obs_as_dict=obs_as_dict)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 356, in step
return [ obs, self.get_reward(), self.is_done() or (self.osim_model.istep >= self.spec.timestep_limit), {} ]
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 764, in get_reward
return self.get_reward_1()
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/osim/env/osim.py", line 821, in get_reward_1
reward += reward_footstep_0 + 10
UnboundLocalError: local variable 'reward_footstep_0' referenced before assignment
I worked on Ubuntu 16.04 with Titan XP GPU.
batch_norm have two accessible params "bias" and "scale", as well as two hidden params "mean" and "var". Current code won't copy the "mean" and "var".
win7环境
本地使用DQN算法训练迷宫问题,训练完成后达到了最优解,并save模型;
重新运行程序,load模型,得到的结果并不是最优解;
效果在这里:
https://github.com/kosoraYintai/PARL-Sample/blob/master/dqn_dnn/README.md
不知道错在哪里
DDPG、DQN、PPO等示例均未找到save和load模型的示例,可否提供代码模板?
ppo和sac都要计算动作的概率,但是处理方法不一样。ppo设置单独的可训练的参数作为方差,并且手写了计算概率和kl的方法。而sac用神经网络拟合出均值和方差,并通过layers.Normal建立动作分布,Normal.sample()采样,Normal.kl_divergence(other)、 Normal.log_prob(action)计算动作概率和kl。这两种有什么区别吗?
PPO:
def _calc_kl(self, means, logvars, old_means, old_logvars):
log_det_cov_old = layers.reduce_sum(old_logvars)
log_det_cov_new = layers.reduce_sum(logvars)
tr_old_new = layers.reduce_sum(layers.exp(old_logvars - logvars))
kl = 0.5 * (layers.reduce_sum(
layers.square(means - old_means) / layers.exp(logvars), dim=1) + (
log_det_cov_new - log_det_cov_old) + tr_old_new - self.act_dim)
return kl
SAC:
def sample(self, obs):
mean, log_std = self.actor.policy(obs)
std = layers.exp(log_std)
normal = Normal(mean, std)
x_t = normal.sample([1])[0]
y_t = layers.tanh(x_t)
action = y_t * self.max_action
log_prob = normal.log_prob(x_t)
log_prob -= layers.log(self.max_action * (1 - layers.pow(y_t, 2)) +
epsilon)
log_prob = layers.reduce_sum(log_prob, dim=1, keep_dim=True)
log_prob = layers.squeeze(log_prob, axes=[1])
return action, log_prob
Currently, remote actors with PARL record no logs, which is inconvenient for debugging.
Add logs for remote actors and display it in the front end(Web UI).
We are looking for interns with a machine learning background and good skill at programming.
We hope that you have at least three months for the internship position.
If you are interested in machine learning and its application in industrial productions, please email us with the cv through [email protected].
Woking City: Shenzhen, China.
ERROR: parl 1.2.3 has requirement flask==1.0.4, but you'll have flask 1.1.1 which is incompatible.
ERROR: paddlehub 1.6.1 has requirement flask>=1.1.0, but you'll have flask 1.0.4 which is incompatible.
I build a cluster successfully. But when I run the example and recommended Practice, it will return an error.
[02-27 09:23:17 MainThread @logger.py:224] Argv: a.test.py
[02-27 09:23:37 Thread-2 @client.py:244] ERR [xparl] lost connection with a job, current actor num: 0
I test it in ubuntu 18, ubuntu 16 and WSL.
Locale setting's error occurred when i run parl in Windows OS
C:\Users\mxfeng>xparl start --cpu_num 64 --port 8010 --monitor_port 7777
[32m[09-26 15:08:16 MainThread @logger.py:217] [0m Argv: C:\Users\mxfeng\Anaconda3\envs\dist_deep_rl\Scripts\xparl.exe start --cpu_num 64 --port 8010 --monitor_port 7777
Traceback (most recent call last):
File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\mxfeng\Anaconda3\envs\dist_deep_rl\Scripts\xparl.exe\__main__.py", line 5, in <module>
File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\site-packages\parl\remote\scripts.py", line 36, in <module>
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
File "c:\users\mxfeng\anaconda3\envs\dist_deep_rl\lib\locale.py", line 598, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting
I know how to fix the bug in Ubuntu, but i have no idea in Windows OS
在/PARL/examples/QuickStart目录下执行:
python train.py
W1105 17:30:22.264058 15205 init.cc:212] *** Aborted at 1572946222 (unix time) try "date -d @1572946222" if you are using GNU date ***
W1105 17:30:22.266913 15205 init.cc:212] PC: @ 0x0 (unknown)
W1105 17:30:22.267369 15205 init.cc:212] *** SIGSEGV (@0x0) received by PID 15205 (TID 0x7f9838abb740) from PID 0; stack trace: ***
W1105 17:30:22.269906 15205 init.cc:212] @ 0x7f98386a8130 (unknown)
W1105 17:30:22.270401 15205 init.cc:212] @ 0x7f97c49c6a96 pybind11::detail::make_new_python_type()
W1105 17:30:22.270716 15205 init.cc:212] @ 0x7f97c49c81f8 pybind11::detail::generic_type::initialize()
W1105 17:30:22.271059 15205 init.cc:212] @ 0x7f97c4b4e552 ZN8pybind115enum_IN10onnx_torch20TensorProto_DataTypeEEC1IJEEERKNS_6handleEPKcDpRKT
W1105 17:30:22.271669 15205 init.cc:212] @ 0x7f97c4b46848 torch::onnx::initONNXBindings()
W1105 17:30:22.272096 15205 init.cc:212] @ 0x7f97c482fc28 initModule()
W1105 17:30:22.275000 15205 init.cc:212] @ 0x7f9838cef335 _PyImport_LoadDynamicModuleWithSpec
W1105 17:30:22.278131 15205 init.cc:212] @ 0x7f9838cef540 _imp_create_dynamic
W1105 17:30:22.281011 15205 init.cc:212] @ 0x7f9838bec711 PyCFunction_Call
W1105 17:30:22.283577 15205 init.cc:212] @ 0x7f9838c9a4ad _PyEval_EvalFrameDefault
W1105 17:30:22.285887 15205 init.cc:212] @ 0x7f9838c698e4 _PyEval_EvalCodeWithName
W1105 17:30:22.288254 15205 init.cc:212] @ 0x7f9838c6a771 fast_function
W1105 17:30:22.290830 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.293635 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.296411 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.298804 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.301241 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.303527 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.305831 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.308269 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.310564 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.312858 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.315277 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.317591 15205 init.cc:212] @ 0x7f9838c6a53b fast_function
W1105 17:30:22.319972 15205 init.cc:212] @ 0x7f9838c70505 call_function
W1105 17:30:22.322440 15205 init.cc:212] @ 0x7f9838c9538a _PyEval_EvalFrameDefault
W1105 17:30:22.324836 15205 init.cc:212] @ 0x7f9838c6abab _PyFunction_FastCallDict
W1105 17:30:22.327309 15205 init.cc:212] @ 0x7f9838be9b0f _PyObject_FastCallDict
W1105 17:30:22.329708 15205 init.cc:212] @ 0x7f9838c2b810 _PyObject_CallMethodIdObjArgs
W1105 17:30:22.332087 15205 init.cc:212] @ 0x7f9838be0b10 PyImport_ImportModuleLevelObject
W1105 17:30:22.334471 15205 init.cc:212] @ 0x7f9838c97a8b _PyEval_EvalFrameDefault
W1105 17:30:22.336856 15205 init.cc:212] @ 0x7f9838c6b289 PyEval_EvalCodeEx
Segmentation fault
请问是什么原因?多谢
paddle版本是:https://paddle-wheel.bj.bcebos.com/1.6.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-1.6.0.post97-cp36-cp36m-linux_x86_64.whl
目前Save、Load模型参数在本地可以试验成功,但在AIStudio中会出现以下问题:
1、调用fluid.io.save_params()函数后,模型参数的序号会变化,而不是从0开始:
2、将模型的参数拷贝到本地之后,是无法运行的;将他们的需要重命名为从0开始,本地才可以调用fluid.io.load_params()函数运行成功:
3、但是,无论是否重命名,AIStudio调用fluid.io.load_params()函数都会报错;而且,每次load的时候,序号都会发生改变,比如下图的12、16:
项目地址:
https://aistudio.baidu.com/aistudio/projectdetail/63441?_=1560765585066
内含报错信息,麻烦解决一下。
how to deal with importing Multi-Agent Env or custom envs?
When running sh scripts/train_difficulty1.sh ./low_speed_model
in /PARL/examples/NeurIPS2019-Learn-to-Move-Challenge, absurd errors occurred (as shown below). Can anyone help me? Thanks in advance!
(opensim-rl) luo@idserver:~/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge$ sh scripts/train_difficulty1.sh ./low_speed_model
/home/luo/anaconda3/envs/opensim-rl/bin/python
[12-16 23:08:12 MainThread @logger.py:224] Argv: train.py --actor_num 300 --difficulty 1 --penalty_coeff 3.0 --logdir ./output/difficulty1 --restore_model_path ./low_speed_model
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/opensim/simbody.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
[12-16 23:08:12 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
[12-16 23:08:13 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
W1216 23:08:14.078102 6084 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 8.0
W1216 23:08:14.081565 6084 device_context.cc:267] device: 0, cuDNN Version: 7.5.
[12-16 23:08:16 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
""")
WARNING:root:
You can try our memory optimize feature to save your memory usage:
# create a build_strategy variable to set memory optimize option
build_strategy = compiler.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True
# pass the build_strategy to with_data_parallel API
compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
!!! Memory optimize is our experimental feature !!!
some variables may be removed/reused internal to save memory usage,
in order to fetch the right value of the fetch_list, please set the
persistable property to true for each variable in fetch_list
# Sample
conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
# if you need to fetch conv1, then:
conv1.persistable = True
I1216 23:08:16.079864 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies
I1216 23:08:17.081748 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[12-16 23:08:17 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
""")
WARNING:root:
You can try our memory optimize feature to save your memory usage:
# create a build_strategy variable to set memory optimize option
build_strategy = compiler.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True
# pass the build_strategy to with_data_parallel API
compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
!!! Memory optimize is our experimental feature !!!
some variables may be removed/reused internal to save memory usage,
in order to fetch the right value of the fetch_list, please set the
persistable property to true for each variable in fetch_list
# Sample
conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
# if you need to fetch conv1, then:
conv1.persistable = True
I1216 23:08:17.209542 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies
I1216 23:08:17.324332 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[12-16 23:08:17 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py:239: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
""")
WARNING:root:
You can try our memory optimize feature to save your memory usage:
# create a build_strategy variable to set memory optimize option
build_strategy = compiler.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True
# pass the build_strategy to with_data_parallel API
compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
!!! Memory optimize is our experimental feature !!!
some variables may be removed/reused internal to save memory usage,
in order to fetch the right value of the fetch_list, please set the
persistable property to true for each variable in fetch_list
# Sample
conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
# if you need to fetch conv1, then:
conv1.persistable = True
share_vars_from is set, scope is ignored.
I1216 23:08:17.525264 6084 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies
I1216 23:08:17.640771 6084 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
[12-16 23:08:17 MainThread @train.py:303] restore model from ./low_speed_model
Traceback (most recent call last):
File "train.py", line 327, in
learner = Learner(args)
File "train.py", line 85, in init
self.restore(args.restore_model_path)
File "train.py", line 304, in restore
self.agent.restore(model_path)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 699, in load_params
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 611, in load_vars
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 648, in load_vars
executor.run(load_prog)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/executor.py", line 651, in run
use_program_cache=use_program_cache)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/executor.py", line 749, in run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: Invoke operator load_combine error.
Python Callstacks:
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1771, in append_op
attrs=kwargs.get("attrs", None))
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 647, in load_vars
attrs={'file_path': os.path.join(load_dirname, filename)})
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 611, in load_vars
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/io.py", line 699, in load_params
filename=filename)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore
filename=filename)
File "train.py", line 304, in restore
self.agent.restore(model_path)
File "train.py", line 85, in init
self.restore(args.restore_model_path)
File "train.py", line 327, in
learner = Learner(args)
C++ Callstacks:
tensor version 3393762800 is not supported. at [/paddle/paddle/fluid/framework/lod_tensor.cc:256]
PaddlePaddle Call Stacks:
0 0x7efdba6c1f10p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352
1 0x7efdba6c2289p paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const*, int) + 137
2 0x7efdbc38c7d4p paddle::framework::DeserializeFromStream(std::istream&, paddle::framework::LoDTensor*, paddle::platform::DeviceContext const&) + 724
3 0x7efdbb35e480p paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>::LoadParamsFromBuffer(paddle::framework::ExecutionContext const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, std::istream*, bool, std::vector<std::string, std::allocatorstd::string > const&) const + 352
4 0x7efdbb35edfep paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 798
5 0x7efdbb35f273p std::Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, signed char>, paddle::operators::LoadCombineOpKernel<paddle::platform::CUDADeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::M_invoke(std::Any_data const&, paddle::framework::ExecutionContext const&) + 35
6 0x7efdbc7411e7p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 375
7 0x7efdbc7415c1p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 529
8 0x7efdbc73ebbcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
9 0x7efdba84cd0ep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
10 0x7efdba84fdafp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143
11 0x7efdba6b359dp
12 0x7efdba6f4826p
13 0x7efe81ea2df2p _PyCFunction_FastCallDict + 258
14 0x7efe81f282bbp
15 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
16 0x7efe81f26a60p
17 0x7efe81f2848ap
18 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
19 0x7efe81f26a60p
20 0x7efe81f2848ap
21 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
22 0x7efe81f26a60p
23 0x7efe81f2848ap
24 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
25 0x7efe81f26a60p
26 0x7efe81f2848ap
27 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
28 0x7efe81f26a60p
29 0x7efe81f2848ap
30 0x7efe81f2a8ddp _PyEval_EvalFrameDefault + 7805
31 0x7efe81f26a60p
32 0x7efe81f2848ap
33 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
34 0x7efe81f25e74p
35 0x7efe81f285e8p
36 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
37 0x7efe81f25e74p
38 0x7efe81f26e75p _PyFunction_FastCallDict + 645
39 0x7efe81e4bba6p _PyObject_FastCallDict + 358
40 0x7efe81e4bdfcp _PyObject_Call_Prepend + 204
41 0x7efe81e4be96p PyObject_Call + 86
42 0x7efe81ec4233p
43 0x7efe81eb9d4cp
44 0x7efe81e4badep _PyObject_FastCallDict + 158
45 0x7efe81f282bbp
46 0x7efe81f2b15dp _PyEval_EvalFrameDefault + 9981
47 0x7efe81f26a60p
48 0x7efe81f26ee3p PyEval_EvalCodeEx + 99
49 0x7efe81f26f2bp PyEval_EvalCode + 59
50 0x7efe81f596c0p PyRun_FileExFlags + 304
51 0x7efe81f5ac83p PyRun_SimpleFileExFlags + 371
52 0x7efe81f760b5p Py_Main + 3621
53 0x400c1dp main + 365
54 0x7efe80f01830p __libc_start_main + 240
55 0x4009e9p
我改动过的IMPALA算法,运行出现问题。不知道怎么定位到这个问题。
[03-02 14:47:41 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 1
[03-02 14:47:41 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 1
[03-02 14:47:41 MainThread @train.py:154] Waiting for 1 remote actors to connect.
[03-02 14:47:41 MainThread @train.py:158] Remote actor count: 1
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.
......
Error: /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.cu.h:88 Assertion `y_[id] != 0` failed. InvalidArgumentError: Integer division by zero encountered in divide.Please check.
F0302 14:47:44.609232 23176 device_context.cc:328] cudaStreamSynchronize unspecified launch failure errno: 4
*** Check failure stack trace: ***
@ 0x7f1e600d138d google::LogMessage::Fail()
@ 0x7f1e600d4e3c google::LogMessage::SendToLog()
@ 0x7f1e600d0eb3 google::LogMessage::Flush()
@ 0x7f1e600d634e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f1e629ee8f7 paddle::platform::CUDADeviceContext::Wait()
@ 0x7f1e601184fb paddle::framework::Executor::RunPreparedContext()
@ 0x7f1e60a5c312 paddle::operators::RecurrentOp::RunImpl()
@ 0x7f1e6295783c paddle::framework::OperatorBase::Run()
@ 0x7f1e601184c6 paddle::framework::Executor::RunPreparedContext()
@ 0x7f1e6011bd0f paddle::framework::Executor::Run()
@ 0x7f1e5ff5609d _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbRKSt6vectorISsSaISsEEE103_vIS8_SB_SD_ibbSI_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNES10_
@ 0x7f1e5ff9edd1 pybind11::cpp_function::dispatcher()
@ 0x7f1eb6d6a302 _PyCFunction_FastCallDict
@ 0x7f1eb6def95b call_function
@ 0x7f1eb6df2d40 _PyEval_EvalFrameDefault
@ 0x7f1eb6dee100 _PyEval_EvalCodeWithName
@ 0x7f1eb6defb2a call_function
@ 0x7f1eb6df32cc _PyEval_EvalFrameDefault
@ 0x7f1eb6dee100 _PyEval_EvalCodeWithName
@ 0x7f1eb6defb2a call_function
@ 0x7f1eb6df32cc _PyEval_EvalFrameDefault
@ 0x7f1eb6dee100 _PyEval_EvalCodeWithName
@ 0x7f1eb6defb2a call_function
@ 0x7f1eb6df32cc _PyEval_EvalFrameDefault
@ 0x7f1eb6ded514 _PyFunction_FastCall
@ 0x7f1eb6defc88 call_function
@ 0x7f1eb6df2d40 _PyEval_EvalFrameDefault
@ 0x7f1eb6ded514 _PyFunction_FastCall
@ 0x7f1eb6dee515 _PyFunction_FastCallDict
@ 0x7f1eb6d12ce6 _PyObject_FastCallDict
@ 0x7f1eb6d12f3c _PyObject_Call_Prepend
@ 0x7f1eb6d12fd6 PyObject_Call
已放弃
Currently, it takes around 25 minutes in unittest teamcity. We can speed up the test by running multiple tests concurrently as the test finishes one by one.
That is, we use ctest -j10
in .teamcity/build.sh
, but we need to change our code it make it executable in parallel.
看了一下DDPG与PPO例子,有两个细节不太清楚。
1、无论是DDPG还是PPO,都采取了"train阶段加clip和action_mapping,而test阶段只加action_mapping"的方式,代码如下所示:
#train
action = agent.policy_sample(obs)
action = np.clip(action, -1.0, 1.0)
action = action_mapping(action, env.action_space.low[0],
env.action_space.high[0])
#test
action = agent.policy_predict(obs)
action = action_mapping(action, env.action_space.low[0],
env.action_space.high[0])
由于agent和model层并未进行action的clip,这样能保证test阶段action的预测值在(-1.0,1.0)之间吗?如果不能保证,直接action_mapping会不会有偏差?
2、DDPG与PPO加高斯噪声的时机不一样。
DDPG在模型之外加的噪声:
# Add exploration noise, and clip to [-1.0, 1.0]
action = np.clip(np.random.normal(action, 1.0), -1.0, 1.0)
action = action_mapping(action, env.action_space.low[0],
env.action_space.high[0])
PPO在模型内部加的噪声:
def sample(self, obs):
means, logvars = self.policy(obs)
sampled_act = means + (
layers.exp(logvars / 2.0) * # stddev
layers.gaussian_random(shape=(self.act_dim, ), dtype='float32'))
return sampled_act
这两种不同的策略是微调出来的经验,还是都可以使用?另外,直接在model层对于action进行fluid.clip(action,low,high)的方式是不是也可以?
请问如何实现以下的模型呢?我已经有了一个初步的设计(在GA3C
的基础上进行的修改):
现在有以下的几个疑问:
self.learn_exe = fluid.ParallelExecutor(
use_cuda=use_cuda,
main_program=self.learn_program,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
# 从并行运行的环境中获得训练数据,将会得到 base,pc,vr,rp 的数据
self.base_sample_exes = []
for _ in range(base_predict_thread_num):
with fluid.scope_guard(fluid.global_scope().new_scope()):
pe = fluid.ParallelExecutor(
use_cuda=use_cuda,
main_program=self.base_sample_program,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
self.base_sample_exes.append(pe)
self.pc_sample_exes = []
for _ in range(pc_predict_thread_num):
with fluid.scope_guard(fluid.global_scope().new_scope()):
pe = fluid.ParallelExecutor(
use_cuda=use_cuda,
main_program=self.pc_sample_program,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
self.pc_sample_exes.append(pe)
self.vr_sample_exes = []
for _ in range(vr_predict_thread_num):
with fluid.scope_guard(fluid.global_scope().new_scope()):
pe = fluid.ParallelExecutor(
use_cuda=use_cuda,
main_program=self.vr_sample_program,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
self.vr_sample_exes.append(pe)
# 可能是类似于获得输出名称的 program
self.base_sample_program = fluid.Program()
self.vr_sample_program = fluid.Program()
self.pc_sample_program = fluid.Program()
# 获取 base,vr,rp ,pc的 program
# self.predict_program = fluid.Program()
self.learn_program = fluid.Program()
with fluid.program_guard(self.base_sample_program):
# 输入的额state
base_states = layers.data(name='base_states', shape=self.obs_shape, dtype='float32')
base_sample_actions, base_values = self.alg.sample(base_states)
# 输出行为 index,值函数. 为何需要他们的名称?
self.base_sample_outputs = [base_sample_actions.name, base_values.name]
with fluid.program_guard(self.pc_sample_program):
pc_states = layers.data(name='pc_states', shape=self.obs_shape, dtype='float32')
pc_q, pc_q_max = self.alg.predict_pc_q_and_pc_q_max(pc_states)
# 输出行为 index,值函数. 为何需要他们的名称?
self.pc_outputs = [pc_q_max.name]
with fluid.program_guard(self.vr_sample_program):
vr_states = layers.data(name='vr_states', shape=self.obs_shape, dtype='float32')
vr_v = self.alg.predict_vr_value(vr_states)
# 输出行为 index,值函数. 为何需要他们的名称?
self.vr_outputs = [vr_v.name]
with fluid.program_guard(self.learn_program):
base_states = layers.data(name='base_states', shape=self.obs_shape, dtype='float32')
# 这个算法自己进行 onehot
base_actions = layers.data(name='base_actions', shape=[6], dtype='float32')
base_R = layers.data(name='base_R', shape=[], dtype='float32')
base_values = layers.data(name='base_values', shape=[], dtype='float32')
# 辅助任务的数据
pc_states = layers.data(name='pc_states', shape=self.obs_shape, dtype='float32'); pc_R = layers.data(name='pc_R', shape=[20, 20], dtype='float32'); pc_actions = layers.data(name='pc_actions', shape=[6], dtype='float32')
vr_states = layers.data(name='vr_states', shape=self.obs_shape, dtype='float32'); vr_R = layers.data(name='vr_R', shape=[], dtype='float32') # vr
rp_states = layers.data(name='rp_states', shape=self.obs_shape, dtype='float32'); rp_C = layers.data(name='rp_C', shape=[3], dtype='float32') # rp
lr = layers.data(name='lr', shape=[1], dtype='float32', append_batch_size=False)
entropy_coeff = layers.data(name='entropy_coeff', shape=[], dtype='float32')
# 包装训练数据
self.learn_reader = fluid.layers.create_py_reader_by_data(
capacity=32,
feed_list=[
base_states, base_actions, base_R, base_values,
# 其他的辅助任务的数据
pc_states, pc_actions, pc_R,
vr_states, vr_R,
rp_states, rp_C,
# 训练网络的数据
lr, entropy_coeff
])
base_states, base_actions, base_R, base_values, pc_states, pc_actions, pc_R, vr_states, vr_R, rp_states, rp_C, lr, entropy_coeff = fluid.layers.read_file(self.learn_reader)
total_loss, pi_loss, vf_loss, entropy, pc_loss, vr_loss, rp_loss = self.alg.learn(
base_states, base_actions, base_R, base_values,
# 训练数据
pc_states, pc_R, pc_actions,
vr_states, vr_R,
rp_states, rp_C,
lr, entropy_coeff)
self.learn_outputs = [
total_loss.name, pi_loss.name, vf_loss.name, entropy.name, pc_loss.name, vr_loss.name, rp_loss.name
]
Currently we are using envname
&NoFrameSkip-v4
for training.
PARL/examples/A2C/a2c_config.py
Line 21 in ee3e8dc
你好,我想使用PARL在一个计算机集群上进行并行计算和强化学习训练,计算机集群分为两个部分,一部分是GPU集群,运行的是linux系统,主要跑强化学习算法,另一部分是CPU集群,运行的是Windows系统,主要跑仿真环境,仿真环境会通过grpc方式与强化学习算法通迅,我想咨询在这种架构模式下使用PARL是否可行?谢谢!
When running sh scripts/train_difficulty1.sh ./low_speed_model
in /PARL/examples/NeurIPS2019-Learn-to-Move-Challenge, an AssertionError occurred.
`(opensim-rl) luo@idserver:~/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge$ sh scripts/train_difficulty1.sh ./low_speed_model
/home/luo/anaconda3/envs/opensim-rl/bin/python
[12-15 11:41:49 MainThread @logger.py:224] Argv: train.py --actor_num 300 --difficulty 1 --penalty_coeff 3.0 --logdir ./output/difficulty1 --restore_model_path ./low_speed_model
/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/opensim/simbody.py:15: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
[12-15 11:41:49 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
[12-15 11:41:49 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
W1215 11:41:50.501416 31581 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0
W1215 11:41:50.504907 31581 device_context.cc:243] device: 0, cuDNN Version: 7.5.
[12-15 11:41:51 MainThread @machine_info.py:86] nvidia-smi -L found gpu count: 4
Traceback (most recent call last):
File "train.py", line 327, in
learner = Learner(args)
File "train.py", line 78, in init
self.agent = OpenSimAgent(algorithm, OBS_DIM, ACT_DIM)
File "/home/luo/PARL/examples/NeurIPS2019-Learn-to-Move-Challenge/opensim_agent.py", line 40, in init
build_strategy=build_strategy)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/parallel_executor.py", line 201, in init
if share_vars_from else None)
File "/home/luo/anaconda3/envs/opensim-rl/lib/python3.6/site-packages/paddle/fluid/compiler.py", line 244, in with_data_parallel
assert self._loss_name is not None, "The loss_name should be set here."
AssertionError: The loss_name should be set here.
`
Does anybody know how to solve that? Thanks in advance.
We should manage all tests using cmake so that all the tests can be run using ctest, similar to https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/CMakeLists.txt
你好,我在左庆的ZQCNN文章下见你在PC上运行了,我也在初手ZQCNN。现在我已在ubuntu系统ZQCNN文件中cmake..和make完成,但接下来不知道做哪一步或者怎么做使用这个库对图片进行识别实现。
export CUDA_VISIBLE_DEVICES="0"
>> from parl.utils import machine_info
>> machine_info.is_gpu_available()
>> True
The return of machine_info.is_gpu_available()
is supposed to be False
in Mac OS, where GPU has not been installed.
MAML - RL is an important class of RL algorithms
However, MAML seems to be not compatible with current PARL framework
Please see: https://github.com/cbfinn/maml_rl
在impala基础上的改动,运行出现如下错误。运行其他算法没有问题。
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::ReadOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
3 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
4 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
5 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool)
------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
File "/home/xtq/anaconda3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2488, in append_op
attrs=kwargs.get("attrs", None))
File "/home/xtq/anaconda3/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/home/xtq/anaconda3/lib/python3.6/site-packages/paddle/fluid/layers/io.py", line 872, in read_file
type='read', inputs={'Reader': [reader]}, outputs={'Out': out})
File "/home/xtq/DVTrace v2.3.1/atari_agent.py", line 74, in build_program
self.learn_reader)
File "/home/xtq/anaconda3/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 87, in __init__
self.build_program()
File "/home/xtq/DVTrace v2.3.1/atari_agent.py", line 29, in __init__
super(AtariAgent, self).__init__(algorithm)
File "train.py", line 65, in __init__
self.learn_data_provider)
File "train.py", line 276, in <module>
learner = Learner(config)
----------------------
Error Message Summary:
----------------------
Error: Paddle internal Check failed. (Please help us create a new issue, here we need to find the developer to add a user friendly error message)
[Hint: Expected ins.size() == out_arg_names.size(), but received ins.size():9 != out_arg_names.size():8.] at (/paddle/paddle/fluid/operators/reader/read_op.cc:92)
[operator < read > error]
请问下,我看到几种target实现分为server和client,意思是一个在服务器上运行,一个在客户端运行吗?client中看到了target的运行速度,但是server中并没有写出。请我我该怎么理解这部分的意思?
Can we add an example for illustrating how to save and restore the trained model?
I think it is a basic feature of a RL lib and it will be helpful for newbies like me to use PARL in our projects.
I tried to follow the example in https://parl.readthedocs.io/en/latest/parallel_training/setup.html to setup my windows machine as a cluster. However the xparl start command failed and it seemed the get_ip_address function implemented in parl/utils/machine_info.py doesnt support windows.
my questions are threefold:
请问,Winning Solution for NIPS2018: AI for Prosthetics Challenge的Part3: Training in random velocity environment for round2 evaluation中ensemble_num指的是什么?,是类似A3C的**吗?还是同时并行训练ensemble_num个模型,test的时候再分别检测各个模型的效果,从中选出一个好的?
When running python3 simulator_server.py --port 8030 --ensemble_num 1
within NeurIPS2018-AI-for-Prosthetics-Challenge
, I got the following error:
share_vars_from is set, scope is ignored.
Traceback (most recent call last):
File "simulator_server.py", line 331, in <module>
simulator_server = SimulatorServer()
File "simulator_server.py", line 84, in __init__
self.agent = OpenSimAgent(alg, OBS_DIM, ACT_DIM, args.ensemble_num)
File "/home/rsa-key-20191010/test/PARL/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/opensim
_agent.py", line 64, in __init__
share_vars_parallel_executor=self.learn_pe[i])
File "/home/rsa-key-20191010/test/PARL/examples/NeurIPS2018-AI-for-Prosthetics-Challenge/multi_h
ead_ddpg.py", line 121, in sync_target
share_vars_parallel_executor=share_vars_parallel_executor)
File "/home/davidzhenggd/.local/lib/python3.5/site-packages/parl/core/fluid/model.py", line 182,
in sync_weights_to
self._cached_fluid_executor.run(fetch_list=[])
File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/parallel_executor.py",
line 280, in run
return_numpy=return_numpy)
File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/executor.py", line 664,
in run
program._compile(scope, self.place)
File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/compiler.py", line 376,
in _compile
scope=self._scope)
File "/home/davidzhenggd/.local/lib/python3.5/site-packages/paddle/fluid/compiler.py", line 284,
in _compile_data_parallel
"share_vars_from is not compiled and run, so there is no "
ValueError: share_vars_from is not compiled and run, so there is no var to share.
Do you know how to resolve this issue? Thanks in advance!
PARL/parl/layers/layer_wrappers.py
Line 156 in e48f67a
# save the parameters to ./model.ckpt
self.agent.save('./model.ckpt')
I have saved model successfully!
And I want to restore model and render.
algorithm = IMPALA(
model,
sample_batch_steps=config['sample_batch_steps'],
gamma=config['gamma'],
vf_loss_coeff=config['vf_loss_coeff'],
clip_rho_threshold=config['clip_rho_threshold'],
clip_pg_rho_threshold=config['clip_pg_rho_threshold'])
agent = MPEAgent(algorithm, obs_shape, act_dim)
agent.restore('./model.ckpt')
Then the TypeError happend:
agent.restore('./model.ckpt')
File "/home/tianqi/anaconda3/lib/python3.6/site-packages/parl/core/fluid/agent.py", line 221, in restore
filename=filename)
File "/home/tianqi/anaconda3/lib/python3.6/site-packages/paddle/fluid/io.py", line 798, in load_params
filename=filename)
File "/home/tianqi/anaconda3/lib/python3.6/site-packages/paddle/fluid/io.py", line 675, in load_vars
raise TypeError("program's type should be Program")
TypeError: program's type should be Program```
I cannot run IMPALA algorithm in my docker's conda environment.
My docker container is built on nvidia/cuda:18.04 with anaconda 5.3.0, and i create an environment named dist-rl with installing python=3.7 paddlepaddle-gpu=1.5.2 cudatoolkit=10.0 via conda and installing parl/gym[atari]/opencv-python via pip
When i run python train.py
after starting the cpu cluster using xparl start --port 8010 --cpu_num 5
(i also changed the number of cpus in impala_config.py), it occurred errors as follows:
It seems that the main error is paddle.fluid.core_avx.EnforceNotMet: Invoke operator elementwise_mul error.
, but i don't know how to deal with it.
Thanks very much~
I want to Implement COMA with parl, and I use two fluid.Program() to train critic and actor respectively. however I meat two error related to optimizer.
code:
def learn(self, obs, actions, last_actions, q_vals, lr):
"""
Args:
obs: [4*env*batch,time,84]
actions: [4*env*batch,time,1]
last_actions: [4*env*batch,time,1]
q_vals:[env*batch,4,time,22]
lr: float scalar of learning rate.
"""
mac_out = []
hidden_state = None
pre_cell = None
obs_batch = self._build_actor_inputs(obs, last_actions) # [4*env*batch,time,106]
for t in range(obs_batch.shape[1]):
obs_ = layers.slice(obs_batch, axes=[1], starts=[t], ends=[t + 1]) # [4*env*batch,106]
if hidden_state is None:
hidden_state, pre_cell = self.model.init_hidden_state(obs_) # [4*env*batch,64]
logits, hidden_state, pre_cell = self.model.policy(obs_, hidden_state, pre_cell) # [4*env*batch, 22]
mac_out.append(logits) # [times,4*env*batch, 22]
mac_out = layers.stack(mac_out, axis=1) # [4*env*batch,time,22]
# Calculated baseline
q_vals = layers.reshape(q_vals, [-1, self.action_dim]) # [4*env*batch*(time),22]
pi = layers.reshape(mac_out, [-1, self.action_dim]) # [4*env*batch*(time),22]
baseline = layers.reduce_sum(pi * q_vals, dim=-1, keep_dim=True) # [4*env*batch*(time),1]
# Calculate policy grad
actions_for_one_hot = layers.reshape(actions, [-1, 1]) # [4*env*batch*(time),1]
actions_one_hot = layers.one_hot(actions_for_one_hot, self.action_dim) # [4*env*batch*(time),22]
q_taken = layers.reduce_sum(actions_one_hot * q_vals, dim=-1, keep_dim=True) # [4*env*batch*(time),1]
pi_taken = layers.reduce_sum(actions_one_hot * pi, dim=-1, keep_dim=True) # [4*env*batch*time,1]
log_pi_taken = layers.log(pi_taken) # [4*env*batch*time,1]
advantages = (q_taken - baseline)
coma_loss = layers.reduce_sum(advantages * log_pi_taken) # [1]
# Optimise agents
fluid.clip.set_gradient_clip(
clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=self.grad_norm_clip))
optimizer = fluid.optimizer.RMSPropOptimizer(lr, rho=self.optim_alpha, epsilon=self.optim_eps)
optimizer.minimize(coma_loss) # error
return coma_loss
error
line 300, in learn
optimizer.minimize(total_loss)
File "", line 2, in minimize
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/dygraph/base.py", line 87, in impl
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 594, in minimize
no_grad_set=no_grad_set)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 493, in backward
no_grad_set, callbacks)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 578, in append_backward
append_backward_vars(root_block, fwd_op_num, grad_to_var, grad_info_map)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 392, in append_backward_vars
op_desc.infer_shape(block.desc)
paddle.fluid.core_avx.EnforceNotMet: Input(C@GRAD) should not be null at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/operators/lstm_unit_op.cc:88]
code
def _train_critic(self, obs, actions, last_actions, rewards, targets, lr_critic):
"""
:param obs: [4*env*batch,time,84]
:param actions: [4*env*batch,time,1]
:param last_actions: [4*env*batch,time,1]
:param rewards: [env*batch,time]
:param targets: [env*batch,4,time]
:return: q_vals, critic_train_stats [env*batch,4,time,22]
"""
# init state
batch = self._build_critic_inputs(obs, actions, last_actions) # [env*batch,time,452]
actions_one_hot = layers.one_hot(actions, self.action_dim) # [4*env*batch,time,22]
actions_one_hot = layers.reshape(actions_one_hot, [-1, 4, batch.shape[-2], self.action_dim]) # [env*batch,4,time,22]
# Optimise agents
fluid.clip.set_gradient_clip(
clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=self.grad_norm_clip))
optimizer = fluid.optimizer.RMSPropOptimizer(lr_critic, rho=self.optim_alpha, epsilon=self.optim_eps)
critic_train_stats = {
"critic_loss": [],
"td_error_abs": [],
"target_mean": [],
"q_taken_mean": []
}
q_vals_list = []
for t in range(rewards.shape[1]): # time
obs_ = batch[:, t] # [env*batch,452]
q_t = self.model.value(obs_) # [env*batch,22]
q_t = layers.reshape(q_t, [q_t.shape[0], 1, q_t.shape[-1]])
q_t = layers.expand(q_t, [1, 4, 1]) # [env*batch,4,22]
q_taken = layers.reduce_sum(q_t * actions_one_hot[:, :, t, :], dim=-1) # [env*batch,4]
q_t_taken = targets[:, :, t] # [env*batch,4]
td_error = q_taken - q_t_taken # [env*batch,4]
q_vals_list.append(q_t) # [env*batch,4,22]
loss = layers.reduce_sum(td_error ** 2) # [1]
optimizer.minimize(loss)
critic_train_stats["critic_loss"].append(loss)
critic_train_stats['td_error_abs'].append(td_error)
critic_train_stats['q_taken_mean'].append(q_taken)
critic_train_stats['target_mean'].append(q_t_taken)
q_vals = layers.stack(q_vals_list, axis=2) # [env*batch,4,time,22]
for key in critic_train_stats.keys():
critic_train_stats[key] = layers.reduce_sum(layers.stack(critic_train_stats[key]))
return q_vals, critic_train_stats
error:
File ,line 150, in learn
q_vals, critic_train_stats = self._train_critic(obs, actions, last_actions, rewards, targets, lr_critic)
File line 116, in train_critic
optimizer.minimize(loss)
File "", line 2, in minimize
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/dygraph/base.py", line 87, in impl
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 594, in minimize
no_grad_set=no_grad_set)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 493, in backward
no_grad_set, callbacks)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 571, in append_backward
input_grad_names_set=input_grad_names_set)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/paddle/fluid/backward.py", line 310, in append_backward_ops
op.desc, cpt.to_text(no_grad_dict[block.idx]), grad_sub_block_list)
paddle.fluid.core_avx.EnforceNotMet: grad_op_maker should not be null
Operator GradOpMaker has not been registered. at [/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/framework/op_info.h:69]
how can I solve it?
thanks
where are pytorch example?
`paddle.fluid.core_avx.EnforceNotMet:
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::GetCUDADeviceCount()
PaddleCheckError: cudaGetDeviceCount failed in paddle::platform::GetCUDADeviceCountImpl, error code : 30, Please see detail in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038: unknown error at [/paddle/paddle/fluid/platform/gpu_info.cc:67]
`
this problem happend sometimes, it can be solved after I reboot the computer.
what should I do?
PARL/parl/layers/layer_wrappers.py
Line 262 in 21a9efe
Should add h_0/c_0 parameter? , like http://www.paddlepaddle.org/docs/0.14.0/api/fluid/en/layers.html#dynamic-lstm
RT
经过N次的超参数调整,使用Rainbow(Double-Q + SegmentTree + Advantage)解决八数码问题的模型终于训练完毕。在不知道如何搜索最优解的情况下,根据reward的返回信息,从0开始训练,并自动生成了打表代码。没想到的是,居然冲到了LeetCode的Top1,成为了默认的submission!(虽然本地使用A*算法对拍正确率为98.6%,但并不影响AC,呵呵)
所以,困扰我三年以上的一系列问题终于解决:
1、不少知乎大V认为:有了机器学习算法之后,传统《算法导论》了解即可,不需要精读,两者联系不大。
2、99%脉脉帖子认为:刷算法题只是为了应付面试而已,工作上用不到
3、如何使用机器学习解决传统图搜索和离散优化问题?又如何使用传统动态规划和树递归优化深度学习模型?
有了PARL后,以上困惑真的是迎刃而解了——两手抓两手都要硬即可,不仅要要刷更多的题,也要学习更多的模型。
最后,为PARL打call:
深度强化学习框架PARL——联结传统数据结构与算法和新兴深度学习算法的桥梁,算法工程师的不二选择!
O(∩_∩)O
百度的各位大大们,A2C算法框架什么时候能出?
请问下,paddle mobile是否可以使用parl?需要做哪些修改?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.