davidhershey / feudal_networks Goto Github PK
View Code? Open in Web Editor NEWAn implementation of FeUdal Networks for Hierarchical Reinforcement Learning as published : https://arxiv.org/abs/1703.01161
License: MIT License
An implementation of FeUdal Networks for Hierarchical Reinforcement Learning as published : https://arxiv.org/abs/1703.01161
License: MIT License
Actually it's done the way it should - only the code is different in both places
I have checked the code for a long time,but I still can't successfully run the code.Train.py seems have some problems.
Does this code achieve the benchmarks given in the paper? I modified this to work on my system, but it doesn't converge after running it for a few days.
test_init (tests.test_policies.test_lstm_policy.TestLSTMPolicy) ... ERROR
Traceback (most recent call last):
File "/Users/user/Desktop/feudal_networks/tests/test_policies/test_feudal_batch_processor.py", line 154, in test_intrinsic_reward_and_gsum_calculation
b = Batch(obs, a, returns, terminal, s, g, features)
TypeError: new() takes exactly 7 arguments (8 given)
Not sure why theres more arguments than what it can take.
Is this expected or are specific settings/libraries/etc required in order to run passing tests?
when i run it by typing train.py, i got this:
Executing the following commands:
mkdir -p /tmp/pong
echo /usr/bin/python train.py > /tmp/pong/cmd.sh
kill $( lsof -i:12345 -t ) > /dev/null 2>&1
kill $( lsof -i:12222-12223 -t ) > /dev/null 2>&1
tmux kill-session -t a3c
tmux new-session -s a3c -n ps -d bash
tmux new-window -t a3c -n w-0 bash
tmux new-window -t a3c -n tb bash
tmux new-window -t a3c -n htop bash
sleep 1
tmux send-keys -t a3c:ps 'CUDA_VISIBLE_DEVICES= /usr/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v4 --num-workers 1 --job-name ps' Enter
tmux send-keys -t a3c:w-0 'CUDA_VISIBLE_DEVICES= /usr/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v4 --num-workers 1 --job-name worker --task 0 --remotes 1 --policy lstm' Enter
tmux send-keys -t a3c:tb 'tensorboard --logdir /tmp/pong --port 12345' Enter
tmux send-keys -t a3c:htop htop Enter
Use tmux attach -t a3c
to watch process output
Use tmux kill-session -t a3c
to kill the job
Point your browser to http://localhost:12345 to see Tensorboard
I don't know how tmux works, but there is no error sign.
What did do wrong?
From the paper:
in fact the properform for the transition policy gradient arrived at in eqn.10.
manager_loss = -tf.reduce_sum((self.r-cutoff_vf_manager)*dcos) ( from code )
why not implement the eqn 10.
Hi, I would like to use your project,but I got some trouble in setting "--policy feudal". I can directly run python train.py
and it works normally with default "--policy lstm". But when I switch to add this parameter as python train.py --policy feudal
, I got following output:
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/contextlib.py", l
ine 88, in exit
next(self.gen)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok
_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Key global/FeUdal/worker/
rnn/basic_lstm_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "worker.py", line 174, in
tf.app.run()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "worker.py", line 166, in main
run(args, server)
File "worker.py", line 94, in run
with sv.managed_session(server.target, config=config) as sess, sess.as_defau
lt():
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/contextlib.py", l
ine 81, in enter
return next(self.gen)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/six
.py", line 686, in reraise
raise value
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/session_manager.py", line 273, in prepare_session
config=config)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1560, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key global/FeUdal/worker/
rnn/basic_lstm_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
Caused by op 'save/RestoreV2_55', defined at:
File "worker.py", line 174, in
tf.app.run()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "worker.py", line 166, in main
run(args, server)
File "worker.py", line 50, in run
saver = FastSaver(variables_to_save)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1140, in init
self.build()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1172, in build
filename=self._filename)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 688, in build
restore_sequentially, reshape)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
dtypes=dtypes, name=name)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-
access
NotFoundError (see above for traceback): Key global/FeUdal/worker/rnn/ba[26/480]
_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dty
pe=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "worker.py", line 174, in \n tf.app.run()', 'File "/home/xunti
an2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/plat
form/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthro
ugh))', 'File "worker.py", line 166, in main\n run(args, server)', 'File "wor
ker.py", line 77, in run\n ready_op=tf.report_uninitialized_variables(variabl
es_to_save),', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/sit
e-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped\n r
eturn _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/xuntian2/anaco
nda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/util/tf_shoul
d_use.py", line 144, in _add_should_use_warning\n wrapped = TFShouldUseWarni$
gWrapper(x)', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site
-packages/tensorflow/python/util/tf_should_use.py", line 101, in init\n s
tack = [s.strip() for s in traceback.format_stack()]']
==================================
[2018-04-19 22:01:29,676] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dty
pe=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "worker.py", line 174, in \n tf.app.run()', 'File "/home/xunti
an2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/plat
form/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthro
ugh))', 'File "worker.py", line 166, in main\n run(args, server)', 'File "wor
ker.py", line 77, in run\n ready_op=tf.report_uninitialized_variables(variabl
es_to_save),', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/sit
e-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped\n r
eturn _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/xuntian2/anaco
nda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/util/tf_shoul
d_use.py", line 144, in _add_should_use_warning\n wrapped = TFShouldUseWarnin
gWrapper(x)', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site
-packages/tensorflow/python/util/tf_should_use.py", line 101, in init\n s
tack = [s.strip() for s in traceback.format_stack()]']
could you please tell me what is the problem? Thanks a lot.
I have been trying to get the 'feudal' policy to work on the 'PongDeterministic-v4' environment but I had no luck. The 'lstm' policy seems to work for me, but If I change it to 'feudal' the episode rewards do not increase even after of 8 hours of training with 1 worker, they are stuck to -20, both on the 'master' branch and the 'dilated_fix' branch.
I saw the other issues mentioning that it doesn't achieve the benchmarks from the paper, but is it supposed to work on pong at least? or am I doing something wrong?
Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.