tianpeiyang / ptf_code Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 6.0 7.6 MB

Source code for paper: Efficient deep reinforcement learning via adaptive policy transfer

Python 100.00%

ptf_code's People

Contributors

Stargazers

Watchers

Forkers

charlesrice-scu hlhang9527 yyds-xtt nagisazj zcchenvy thueea-bcb

ptf_code's Issues

Unable to reproduce Reacher ptf-ppo results.

Hi, I am trying to reproduce ptf-ppo on the Reacher domain. I run the commands provided in README.md:
CUDA_VISIBLE_DEVICES=4 python main.py -a ptf_ppo -c ptf_ppo_conf -g reacher -d reacher_conf -n 10000 -e 1000 -s 2 -o adam n_layer_a_1=256 n_layer_c_1=256 learning_rate_a=3e-4 learning_rate_c=3e-4 learning_rate_o=1e-3 learning_rate_t=1e-3 e_greedy=0.95 e_greedy_increment=1e-2 replace_target_iter=1000 reward_decay=0.99 option_model_path=['source_policies/reacher/t1/model','source_policies/reacher/t2/model','source_policies/reacher/t3/model','source_policies/reacher/t4/model'] learning_step=10000 save_per_episodes=1000 task=hard c1=0.001 source_policy=a3c clip_value=10 batch_size=300 option_batch_size=16 reward_normalize=True done_reward=10 option_layer_1=20

According to the results in the paper, it should eventually get return of about 60, but I am not geting the expected outcome. It only gets a final return of about 15, as shown in the attached figure (the blue line was ran for 10e4 episodes). I have not modified any code, and it would be great if you look into it.

请问下，您这套代码可以分享下么？

深受这篇文章Efficient Deep Reinforcement Learning through Policy Transfer启发，奈何没有相应程序，请问下方便分享下代码么？

In paper, why loss term is added in the gradient part?

Hi, I‘m a beginner of Reinforcement Learning.

As the picture above illustrated ( Line 18 ), I wonder why the cross-entropy loss term is added in the process of calculating gradients.To my knowledge, the new auxiliary loss function is generally added to the existing loss function， instead of adding it when calculating the gradient.

Can you give a detailed explanation?

什么时候开源？？？

When will the source code be relesed?

I have read your paper and noticed the GitHub address(https: //github.com/PTF-transfer/Code_PTF) is not available now, So I searched in Github and got here. However, it seems that the source code is not released yet. May I ask if and when will you release it?

Why L

Can you release commands for training the source policies?

Hi there! Thank you for releasing the code.

I am trying to run PTF on my own environments, and now I am trying to train source policies with your codes. It would be of great help if you can provide commands for training the source policies contained in the codebase, as I can refer to the hyper-parameter settings.

As my environment is built upon MuJoCo and has comtinuous action spaces, I think the commands for training Reacher's source policies will help me the most. Thank you very much!

tianpeiyang / ptf_code Goto Github PK

ptf_code's People

Contributors

Stargazers

Watchers

Forkers

ptf_code's Issues

Unable to reproduce Reacher ptf-ppo results.

请问下，您这套代码可以分享下么？

In paper, why loss term is added in the gradient part?

什么时候开源？？？

When will the source code be relesed?

Why L

Can you release commands for training the source policies?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent