Coder Social home page Coder Social logo

self-imitation-learning's Introduction

Introduction

This repository is an implementation of ICML 2018 Self-Imitation Learning in Tensorflow.

@inproceedings{Oh2018SIL,
  title={Self-Imitation Learning},
  author={Junhyuk Oh and Yijie Guo and Satinder Singh and Honglak Lee},
  booktitle={ICML},
  year={2018}
}

Our code is based on OpenAI Baselines.

Training

The following command runs A2C+SIL on Atari games:

python baselines/a2c/run_atari_sil.py --env FreewayNoFrameskip-v4

The following command runs PPO+SIL on MuJoCo tasks:

python baselines/ppo2/run_mujoco_sil.py --env Ant-v2 --num-timesteps 10000000 --lr 5e-05

self-imitation-learning's People

Contributors

20chase avatar andrewliao11 avatar christopherhesse avatar cxxgtxy avatar definitelyuncertain avatar guoyijie avatar jonasschneider avatar joschu avatar junhyukoh avatar linzichuan avatar louiehelm avatar matthiasplappert avatar mirceamironenco avatar mkarutz avatar ngc92 avatar olegklimov avatar omoindrot avatar ppwwyyxx avatar pzhokhov avatar quanvuong avatar ryanjulian avatar shakenes avatar siemanko avatar stevenschmatz avatar tiagosgc avatar uidilr avatar unixpickle avatar welinder avatar yenchenlin avatar zach-nervana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

self-imitation-learning's Issues

SIL Value update

In the paper, sil value loss is defined as 0.5 * max(0, (R-V))^2. Howerver in the code, the value loss is defined as below
self.vf_loss = tf.reduce_sum(self.W * v_estimate * tf.stop_gradient(delta)) / self.num_samples
which means that the value loss is 0.5 * V * clip((V-R), -5, 0).
What's the advantage of this implementation. Thanks

Policy 'lstm' doesn't work

Hello.

I firstly change the policy in <run_atari_sil.py> by:

parser.add_argument('--policy', help='Policy architecture', choices=['cnn', 'lstm', 'lnlstm'], default='lstm')

Then I run A2C+SIL on Atari games :

python baselines/a2c/run_atari_sil.py --env BreakoutNoFrameskip-v4

I got error:

Logging to /tmp/a2c
2018-12-25 14:46:34.107377: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
WARNING:tensorflow:From e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\common\distributions.py:148: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be re
moved in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Traceback (most recent call last):
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 1628, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension size must be evenly divisible by 15 but is 8192 for 'model_2/Reshape_1' (op: 'Reshape') with input shapes: [16,512], [3] and with input tensors computed as partia
l shapes: input[1] = [3,5,?].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "baselines/a2c/run_atari_sil.py", line 38, in <module>
    main()
  File "baselines/a2c/run_atari_sil.py", line 35, in main
    num_env=16)
  File "baselines/a2c/run_atari_sil.py", line 20, in train
    sil_update=sil_update, sil_beta=sil_beta)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\a2c_sil.py", line 161, in learn
    max_grad_norm=max_grad_norm, lr=lr, alpha=alpha, epsilon=epsilon, total_timesteps=total_timesteps, lrschedule=lrschedule, sil_update=sil_update, sil_beta=sil_beta)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\a2c_sil.py", line 35, in __init__
    sil_model = policy(sess, ob_space, ac_space, nenvs, nsteps, reuse=True)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\policies.py", line 66, in __init__
    xs = batch_to_seq(h, nenv, nsteps)
  File "e:\output\python_output\hardrlwithyoutube\self-imitation-learning-master\baselines\a2c\utils.py", line 74, in batch_to_seq
    h = tf.reshape(h, [nbatch, nsteps, -1])
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 7759, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 1792, in __init__
    control_input_ops)
  File "E:\Output\Python_output\HardRLWithYoutube\venv_self-imitation-learning-master\lib\site-packages\tensorflow\python\framework\ops.py", line 1631, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension size must be evenly divisible by 15 but is 8192 for 'model_2/Reshape_1' (op: 'Reshape') with input shapes: [16,512], [3] and with input tensors computed as partial shapes: input[1] = [3,5,?].

What can I do to fix this? Thank you very much!

np.sign(rewards)

Is there a reason that SIL requires using the np.sign(reward) to do all of the training, rather than the raw rewards themselves?

How the policy and the value function use the same parameters $\theta$ ?

Thanks for this paper.

In the third part (the last line on the right of the second page), you say that

$\pi_{\theta}, V_{\theta}(s)$ are the policy (i.e. actor) and the value function parameterized by $\theta$ .

image

I want to know how the policy and the value function use the same parameters $\theta$.

Looking forward to your answers. Thanks in advance.

entropy in SIL policy loss

In the equation in the paper, there is no entropy term in the SIL policy loss, how come in the code there is one?

self.loss = self.pg_loss - entropy * self.w_entropy

Key-Door-Treasure

I do not see a way to replicate grid world experiment from the paper using code that is available in the repository. Is there a way and if not, could you please publish the code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.