tianpeiyang / ptf_code Goto Github PK
View Code? Open in Web Editor NEWSource code for paper: Efficient deep reinforcement learning via adaptive policy transfer
Source code for paper: Efficient deep reinforcement learning via adaptive policy transfer
Hi, I am trying to reproduce ptf-ppo on the Reacher domain. I run the commands provided in README.md:
CUDA_VISIBLE_DEVICES=4 python main.py -a ptf_ppo -c ptf_ppo_conf -g reacher -d reacher_conf -n 10000 -e 1000 -s 2 -o adam n_layer_a_1=256 n_layer_c_1=256 learning_rate_a=3e-4 learning_rate_c=3e-4 learning_rate_o=1e-3 learning_rate_t=1e-3 e_greedy=0.95 e_greedy_increment=1e-2 replace_target_iter=1000 reward_decay=0.99 option_model_path=['source_policies/reacher/t1/model','source_policies/reacher/t2/model','source_policies/reacher/t3/model','source_policies/reacher/t4/model'] learning_step=10000 save_per_episodes=1000 task=hard c1=0.001 source_policy=a3c clip_value=10 batch_size=300 option_batch_size=16 reward_normalize=True done_reward=10 option_layer_1=20
According to the results in the paper, it should eventually get return of about 60, but I am not geting the expected outcome. It only gets a final return of about 15, as shown in the attached figure (the blue line was ran for 10e4 episodes). I have not modified any code, and it would be great if you look into it.
深受这篇文章Efficient Deep Reinforcement Learning through Policy Transfer启发,奈何没有相应程序,请问下方便分享下代码么?
Hi, I‘m a beginner of Reinforcement Learning.
As the picture above illustrated ( Line 18 ), I wonder why the cross-entropy loss term is added in the process of calculating gradients.To my knowledge, the new auxiliary loss function is generally added to the existing loss function, instead of adding it when calculating the gradient.
Can you give a detailed explanation?
什么时候开源???
I have read your paper and noticed the GitHub address(https: //github.com/PTF-transfer/Code_PTF) is not available now, So I searched in Github and got here. However, it seems that the source code is not released yet. May I ask if and when will you release it?
Hi there! Thank you for releasing the code.
I am trying to run PTF on my own environments, and now I am trying to train source policies with your codes. It would be of great help if you can provide commands for training the source policies contained in the codebase, as I can refer to the hyper-parameter settings.
As my environment is built upon MuJoCo and has comtinuous action spaces, I think the commands for training Reacher's source policies will help me the most. Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.