Coder Social home page Coder Social logo

softqlearning's People

Contributors

azhou42 avatar haarnoja avatar henry-zhang-bohan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

softqlearning's Issues

Pusher Combine

Hi!

Thanks a lot making the code available!

One question: The paper discusses composing Q-functions by averaging them w.r.t the number of sub-tasks (equation 6). The pusher example (pusher_combine) appears to be adding the Q-functions with no averaging happening. Is this correct or did I miss anything?

pywrap_tensorflow error

Error importing, pywrap_tensorflow.

Steps to reproduce:

  1. run a simple swimmer experiment

The instructions manual did not reproduce a clean working model for me.

action distribution for estimating V

Hi,
First of all, thanks for this inspiring work!

In

next_value = tf.reduce_logsumexp(q_value_targets, axis=1)

it seems to me that action is sampled from uniform distribution when estimating V_{soft} .

In Sec. 3.2. of your original paper, it is stated that

For q_a we have more options. A
convenient choice is a uniform distribution. However, this
choice can scale poorly to high dimensions. A better choice
is to use the current policy, which produces an unbiased
estimate of the soft value as can be confirmed by substi-
tution.

have you experimented with sampling from current policy to estimate V? Or, how good does uniform distribution do in practice, especially in higher dimensional cases?

thanks,

Needs some more work to distribute

I had to do the following things to get this to work for me and I think most people will as well. Maybe you could add these changes to it

cloned latest rllab and copied mujoco_py to the softqlearning/rllab dir
cp -rp rllab/rllab/mujoco_py ~/clones/softqlearning/rllab/
mkdir /home/gary/clones/softqlearning/vendor/mujoco
cp /home/gary/mjpro131/bin/libmujoco131.so /home/gary/clones/softqlearning/vendor/mujoco
cp /home/gary/mjpro131/bin/libglfw.so.3 /home/gary/clones/softqlearning/vendor/mujoco
cp /home/gary/mjpro131/bin/mjkey.txt /home/gary/clones/softqlearning/rllab/mujoco_py/../../vendor/mujoco

Added the following to rllab/config.py
TF_GPU_ALLOW_GROWTH = True
TF_GPU_MEM_FRAC = .1
TF_LOG_DEVICE_PLACEMENT = False
TF_USE_GPU = True

Suboptimal policy

I'm trying SQL on a simple manipulator reaching task, the agent quickly learns to get to the vicinity of the goal but then the learning curve plateaus and the agent never quite get to the goal. Some of my hyperparameters are

  • policy learning rate 0.0005
  • Q learning rate 0.001
  • reward scale 20
  • alpha 1.0

Is there something I can do to improve this? Thanks.

Should I add path in order to run example script?

I installed rllab correctly and cloned softqlearning into a different folder. When I was trying to run an example file e.g. python softqlearning/scripts/learn_swimmer.py I was told that

File "softqlearning/scripts/learn_swimmer.py", line 10, in
from rllab.tf.envs.base import TfEnv
ImportError: No module named 'rllab.tf'

It seems that I need to add some path?

Not using target network for policy

Hi, it seems to me the policy is not using a target network as stated in the paper Ihttps://github.com/haarnoja/softqlearning/blob/aca29d2aee66c44ee052a298f049a22fa14792a5/softqlearning/algos/softqlearning.py#L380

Am I miss something here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.