davidhershey / feudal_networks Goto Github PK

An implementation of FeUdal Networks for Hierarchical Reinforcement Learning as published : https://arxiv.org/abs/1703.01161

License: MIT License

Python 100.00%

feudal_networks's People

Contributors

Stargazers

Watchers

Forkers

ml-lab amoliu j143-zz andrewszwec roy-algoritm williamd4112 chuxiaoselena johndpope jithsjoy phinx guoyijie zqcchris littleredhat shanlior daominglyu junchenjin huoliangyu leflemluc pathway bhargav5 fuxianh b-kartal yaserkl alexzhou1995 skyisnotwarm bpleshakov jmenashe kim10481 mdrw adak32 johnyfeng cuspymd ahavenoname ii0 christopherkang alt113 jotdevelopers teenspirit-hao lamperougeyxy huanpass txing-casia nuaasgq johan-kallstrom estherz-ele shi-yiwei iq-scm

feudal_networks's Issues

Expand dimensions?

https://github.com/dmakian/feudal_networks/blob/5c988023d87206a739b44cd59f60556c7de4e289/feudal_networks/policies/feudal_policy.py#L110

Actually it's done the way it should - only the code is different in both places

https://github.com/dmakian/feudal_networks/blob/5c988023d87206a739b44cd59f60556c7de4e289/feudal_networks/policies/feudal_policy.py#L95

How can I run this project?

I have checked the code for a long time,but I still can't successfully run the code.Train.py seems have some problems.

Does code achieve benchmarks

Does this code achieve the benchmarks given in the paper? I modified this to work on my system, but it doesn't converge after running it for a few days.

more arguments ??

test_init (tests.test_policies.test_lstm_policy.TestLSTMPolicy) ... ERROR

======================================================================
ERROR: test_intrinsic_reward_and_gsum_calculation (tests.test_policies.test_feudal_batch_processor.TestFeudalBatchProcessor)

Traceback (most recent call last):
File "/Users/user/Desktop/feudal_networks/tests/test_policies/test_feudal_batch_processor.py", line 154, in test_intrinsic_reward_and_gsum_calculation
b = Batch(obs, a, returns, terminal, s, g, features)
TypeError: new() takes exactly 7 arguments (8 given)

Not sure why theres more arguments than what it can take.

The majority of the tests seem to fail.

Is this expected or are specific settings/libraries/etc required in order to run passing tests?

Could you explain how to run it?

when i run it by typing train.py, i got this:

Executing the following commands:
mkdir -p /tmp/pong
echo /usr/bin/python train.py > /tmp/pong/cmd.sh
kill $( lsof -i:12345 -t ) > /dev/null 2>&1
kill $( lsof -i:12222-12223 -t ) > /dev/null 2>&1
tmux kill-session -t a3c
tmux new-session -s a3c -n ps -d bash
tmux new-window -t a3c -n w-0 bash
tmux new-window -t a3c -n tb bash
tmux new-window -t a3c -n htop bash
sleep 1
tmux send-keys -t a3c:ps 'CUDA_VISIBLE_DEVICES= /usr/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v4 --num-workers 1 --job-name ps' Enter
tmux send-keys -t a3c:w-0 'CUDA_VISIBLE_DEVICES= /usr/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v4 --num-workers 1 --job-name worker --task 0 --remotes 1 --policy lstm' Enter
tmux send-keys -t a3c:tb 'tensorboard --logdir /tmp/pong --port 12345' Enter
tmux send-keys -t a3c:htop htop Enter

Use tmux attach -t a3c to watch process output
Use tmux kill-session -t a3c to kill the job
Point your browser to http://localhost:12345 to see Tensorboard

I don't know how tmux works, but there is no error sign.
What did do wrong?

Transition Policy Gradients

From the paper：
in fact the properform for the transition policy gradient arrived at in eqn.10.

manager_loss = -tf.reduce_sum((self.r-cutoff_vf_manager)*dcos) ( from code )
why not implement the eqn 10.

trouble in "--policy feudal"

Hi, I would like to use your project,but I got some trouble in setting "--policy feudal". I can directly run python train.py and it works normally with default "--policy lstm". But when I switch to add this parameter as python train.py --policy feudal, I got following output:

[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
[2018-04-19 22:01:28,989] Events directory: /tmp/pong/train_0
[2018-04-19 22:01:29,342] Starting session. If this hangs, we're mostly likely w
aiting to connect to the parameter server. One common cause is that the paramete
r server DNS name isn't resolving yet, or is misspecified.
2018-04-19 22:01:29.431565: I tensorflow/core/distributed_runtime/master_session
.cc:998] Start master session 0f5becf7698cbfb7 with config: intra_op_parallelism
_threads: 1 device_filters: "/job:ps" device_filters: "/job:worker/task:0/cpu:0"
inter_op_parallelism_threads: 2
Traceback (most recent call last):
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1306, in _run_fn
status, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/contextlib.py", l
ine 88, in exit
next(self.gen)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok
_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Key global/FeUdal/worker/
rnn/basic_lstm_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "worker.py", line 174, in
tf.app.run()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "worker.py", line 166, in main
run(args, server)
File "worker.py", line 94, in run
with sv.managed_session(server.target, config=config) as sess, sess.as_defau
lt():
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/contextlib.py", l
ine 81, in enter
return next(self.gen)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/six
.py", line 686, in reraise
raise value
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/session_manager.py", line 273, in prepare_session
config=config)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1560, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key global/FeUdal/worker/
rnn/basic_lstm_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
Caused by op 'save/RestoreV2_55', defined at:
File "worker.py", line 174, in
tf.app.run()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "worker.py", line 166, in main
run(args, server)
File "worker.py", line 50, in run
saver = FastSaver(variables_to_save)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1140, in init
self.build()
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 1172, in build
filename=self._filename)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 688, in build
restore_sequentially, reshape)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
dtypes=dtypes, name=name)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/ten
sorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-
access
NotFoundError (see above for traceback): Key global/FeUdal/worker/rnn/ba[26/480]
_cell/bias/Adam_1 not found in checkpoint
[[Node: save/RestoreV2_55 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:
ps/replica:0/task:0/cpu:0"](_recv_save/Const_0_S1, save/RestoreV2_55/tensor_name
s, save/RestoreV2_55/shape_and_slices)]]
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dty
pe=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "worker.py", line 174, in \n tf.app.run()', 'File "/home/xunti
an2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/plat
form/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthro
ugh))', 'File "worker.py", line 166, in main\n run(args, server)', 'File "wor
ker.py", line 77, in run\n ready_op=tf.report_uninitialized_variables(variabl
es_to_save),', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/sit
e-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped\n r
eturn _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/xuntian2/anaco
nda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/util/tf_shoul
d_use.py", line 144, in _add_should_use_warning\n wrapped = TFShouldUseWarni$
gWrapper(x)', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site
-packages/tensorflow/python/util/tf_should_use.py", line 101, in init\n s
tack = [s.strip() for s in traceback.format_stack()]']
==================================
[2018-04-19 22:01:29,676] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dty
pe=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "worker.py", line 174, in \n tf.app.run()', 'File "/home/xunti
an2/anaconda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/plat
form/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthro
ugh))', 'File "worker.py", line 166, in main\n run(args, server)', 'File "wor
ker.py", line 77, in run\n ready_op=tf.report_uninitialized_variables(variabl
es_to_save),', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/sit
e-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped\n r
eturn _add_should_use_warning(fn(*args, **kwargs))', 'File "/home/xuntian2/anaco
nda2/envs/fedal_tf16/lib/python3.6/site-packages/tensorflow/python/util/tf_shoul
d_use.py", line 144, in _add_should_use_warning\n wrapped = TFShouldUseWarnin
gWrapper(x)', 'File "/home/xuntian2/anaconda2/envs/fedal_tf16/lib/python3.6/site
-packages/tensorflow/python/util/tf_should_use.py", line 101, in init\n s
tack = [s.strip() for s in traceback.format_stack()]']

could you please tell me what is the problem? Thanks a lot.

Feudal policy on PongDeterministic-v4

I have been trying to get the 'feudal' policy to work on the 'PongDeterministic-v4' environment but I had no luck. The 'lstm' policy seems to work for me, but If I change it to 'feudal' the episode rewards do not increase even after of 8 hours of training with 1 worker, they are stuck to -20, both on the 'master' branch and the 'dilated_fix' branch.

I saw the other issues mentioning that it doesn't achieve the benchmarks from the paper, but is it supposed to work on pong at least? or am I doing something wrong?

Shouldn't manager_vf be function of x_t?

Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.