haarnoja / softqlearning Goto Github PK
View Code? Open in Web Editor NEWReinforcement Learning with Deep Energy-Based Policies
Home Page: https://arxiv.org/abs/1702.08165
Reinforcement Learning with Deep Energy-Based Policies
Home Page: https://arxiv.org/abs/1702.08165
Hi!
Thanks a lot making the code available!
One question: The paper discusses composing Q-functions by averaging them w.r.t the number of sub-tasks (equation 6). The pusher example (pusher_combine) appears to be adding the Q-functions with no averaging happening. Is this correct or did I miss anything?
Error importing, pywrap_tensorflow.
Steps to reproduce:
The instructions manual did not reproduce a clean working model for me.
The blog has more experiments than there are in this distribution. ant, compositionality, etc
http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/
Could we get that code as well?
Thanks for such a great paper and code!!!
Hi,
First of all, thanks for this inspiring work!
In
softqlearning/softqlearning/algorithms/sql.py
Line 164 in 59c0bbb
it seems to me that action is sampled from uniform distribution when estimating V_{soft}
.
In Sec. 3.2. of your original paper, it is stated that
For q_a we have more options. A
convenient choice is a uniform distribution. However, this
choice can scale poorly to high dimensions. A better choice
is to use the current policy, which produces an unbiased
estimate of the soft value as can be confirmed by substi-
tution.
have you experimented with sampling from current policy to estimate V
? Or, how good does uniform distribution do in practice, especially in higher dimensional cases?
thanks,
Hi, how can I record videos for various Gym envs?
Thanks!
We are currently encountering that mujoco 1.3 does not run on any newer OSX versions. openai/mujoco-py#36
Are you planning on supporting 1.5 anytime soon?
The temperature parameter (alpha) is missing from the TD updates. For example
As a quick fix to change the temperature, you can set scale_reward = 1 / temperature and alpha=1, which has an equivalent effect as discussed on page 2.
I had to do the following things to get this to work for me and I think most people will as well. Maybe you could add these changes to it
cloned latest rllab and copied mujoco_py to the softqlearning/rllab dir
cp -rp rllab/rllab/mujoco_py ~/clones/softqlearning/rllab/
mkdir /home/gary/clones/softqlearning/vendor/mujoco
cp /home/gary/mjpro131/bin/libmujoco131.so /home/gary/clones/softqlearning/vendor/mujoco
cp /home/gary/mjpro131/bin/libglfw.so.3 /home/gary/clones/softqlearning/vendor/mujoco
cp /home/gary/mjpro131/bin/mjkey.txt /home/gary/clones/softqlearning/rllab/mujoco_py/../../vendor/mujoco
Added the following to rllab/config.py
TF_GPU_ALLOW_GROWTH = True
TF_GPU_MEM_FRAC = .1
TF_LOG_DEVICE_PLACEMENT = False
TF_USE_GPU = True
I'm trying SQL on a simple manipulator reaching task, the agent quickly learns to get to the vicinity of the goal but then the learning curve plateaus and the agent never quite get to the goal. Some of my hyperparameters are
Is there something I can do to improve this? Thanks.
I installed rllab correctly and cloned softqlearning into a different folder. When I was trying to run an example file e.g. python softqlearning/scripts/learn_swimmer.py
I was told that
File "softqlearning/scripts/learn_swimmer.py", line 10, in
from rllab.tf.envs.base import TfEnv
ImportError: No module named 'rllab.tf'
It seems that I need to add some path?
Hi, it seems to me the policy is not using a target network as stated in the paper Ihttps://github.com/haarnoja/softqlearning/blob/aca29d2aee66c44ee052a298f049a22fa14792a5/softqlearning/algos/softqlearning.py#L380
Am I miss something here?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.