Comments (5)
Latest refactor removes the temperature coefficient (alpha
). To adjust the temperature, you can change reward_scale
instead.
from softqlearning.
sorry to re-raise this issue,
I have implemented a version of soft q learning, and find it convenient to have a separate alpha
, I think alpha
acts effectively as entropy coefficient in policy gradient methods, which can be annealed without affecting value iteration of the critic.
from softqlearning.
In my experience, annealing alpha to zero was little problematic because of how it enters the value function (V = alpha * log sum exp (Q / alpha)) and naive way of implementing this and setting alpha -> 0 obviously fails. How did you fix this? If you'd like to share your code, I'd be happy to merge your PR :).
from softqlearning.
that's right, in my experiment, if alpha is annealed below a threshold (near 0.08 or so), the training becomes numerically unstable. But it could take a much larger value at the beginning of the training to encourage exploration.
the code is now in a private branch of hobotrl. the code structure is very different from this repo i think, and is hard to merge. I created a gist with relevant code pieces.
alpha_exploration
could be a object of a subclass of python float
, with variable values when evaluated each time as float
.
from softqlearning.
I've pushed to hobotrl for your reference.
from softqlearning.
Related Issues (11)
- Should I add path in order to run example script? HOT 12
- Recording videos HOT 2
- Suboptimal policy HOT 1
- pywrap_tensorflow error HOT 1
- Pusher Combine HOT 2
- Needs some more work to distribute HOT 1
- Can we have some more examples the are in the blog? HOT 1
- Not using target network for policy HOT 1
- change to mujoco 1.5 HOT 1
- action distribution for estimating V HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from softqlearning.