Coder Social home page Coder Social logo

ptf_code's People

Contributors

tianpeiyang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ptf_code's Issues

Unable to reproduce Reacher ptf-ppo results.

Hi, I am trying to reproduce ptf-ppo on the Reacher domain. I run the commands provided in README.md:
CUDA_VISIBLE_DEVICES=4 python main.py -a ptf_ppo -c ptf_ppo_conf -g reacher -d reacher_conf -n 10000 -e 1000 -s 2 -o adam n_layer_a_1=256 n_layer_c_1=256 learning_rate_a=3e-4 learning_rate_c=3e-4 learning_rate_o=1e-3 learning_rate_t=1e-3 e_greedy=0.95 e_greedy_increment=1e-2 replace_target_iter=1000 reward_decay=0.99 option_model_path=['source_policies/reacher/t1/model','source_policies/reacher/t2/model','source_policies/reacher/t3/model','source_policies/reacher/t4/model'] learning_step=10000 save_per_episodes=1000 task=hard c1=0.001 source_policy=a3c clip_value=10 batch_size=300 option_batch_size=16 reward_normalize=True done_reward=10 option_layer_1=20

According to the results in the paper, it should eventually get return of about 60, but I am not geting the expected outcome. It only gets a final return of about 15, as shown in the attached figure (the blue line was ran for 10e4 episodes). I have not modified any code, and it would be great if you look into it.

image

In paper, why loss term is added in the gradient part?

image

Hi, I‘m a beginner of Reinforcement Learning.

As the picture above illustrated ( Line 18 ), I wonder why the cross-entropy loss term is added in the process of calculating gradients.To my knowledge, the new auxiliary loss function is generally added to the existing loss function, instead of adding it when calculating the gradient.

Can you give a detailed explanation?

When will the source code be relesed?

I have read your paper and noticed the GitHub address(https: //github.com/PTF-transfer/Code_PTF) is not available now, So I searched in Github and got here. However, it seems that the source code is not released yet. May I ask if and when will you release it?

Can you release commands for training the source policies?

Hi there! Thank you for releasing the code.

I am trying to run PTF on my own environments, and now I am trying to train source policies with your codes. It would be of great help if you can provide commands for training the source policies contained in the codebase, as I can refer to the hyper-parameter settings.

As my environment is built upon MuJoCo and has comtinuous action spaces, I think the commands for training Reacher's source policies will help me the most. Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.