Comments (4)
Yes, this is true, and I like your proposal on how to specify initial parameters. Currently, however, the mean and variance are estimated by the output of a linear layer, so I don't know spontaneously how to integrate the start value in a sensible way (unless the entire process is best rearranged. Let me know if you have suggestions, I will definitely have a think about it.
Regarding your comment about rescaling the policy: I agree fully that it's not a particularly good solution, more a first attempt. A good solution is probably to just provide another distribution type for the case of bounded action values. Moreover, one should be able to provide custom implementations of distributions. However, I don't think we will integrate this into the network part, since this would mean that everyone has to take care about that when defining a network, and ideally we would want to hide all this from the user who doesn't want to get into this (but provide the possibility to others, when possible).
from tensorforce.
If you want to use the distribution parameters as initial guesses for the parameterization this would mean initializing the weights of the linear unit with zero and the bias to the desired standard deviation.
Mhm, okay I see the point for making it easy to use without. But there should be a convenient way for a user to define the entire policy (including the parameterization of the standard deviation) without having to worry about implementing the KL-divergence stuff. I guess right now the way to do so is to inherit from the Gaussian
class and overwrite create_tf_operations
. Not sure if that's the best solution.
from tensorforce.
Oh, my bad, yes, that's rather straightforward. :-)
It is a good solution, but maybe one that can be improved on by, for instance, providing functions estimate_mean()
and estimate_stddev()
. Custom implementations are actually something where user-friendliness can be improved in general, for instance, by providing a good inheritance interface. Thanks for pointing this out.
from tensorforce.
I modified the action/distribution interface, so Gaussian(mean=..., log_stddev=...)
works now (by setting weight as zero and bias as given value, as you suggested). Moreover, the action definition can optionally contain a distribution value (where the value for type
could also be a custom class MODULE.distr.CustomDistribution
), for instance:
dict(
continuous=True,
distribution=dict(
type='gaussian',
mean=0.5,
log_stddev=0.1
)
)
from tensorforce.
Related Issues (20)
- Gym envirnoment broken: 'dict' object has no attribute 'env_specs HOT 3
- Issues installing Tensorforce from pip on Python 3.10
- is it still active? HOT 2
- How to change epsilon value when using epsilon-greedy policy? HOT 2
- Can I customize the loss function?
- Saver documentation inconsistent with example
- End-to-end data collection and policy updates on the GPU possible with tensorforce
- how to modify the loss function of the value network in PPO
- AttributeError: 'Adam' object has no attribute '_create_all_weights' HOT 3
- Why different models performs the same HOT 1
- AttributeError: type object 'Module' has no attribute '_MODULE_STACK' HOT 1
- tensorforce.exception.TensorforceError: Invalid value for variable argument spec: TensorSpec HOT 1
- Comparison of "online" and "offline" agent-enviroment interactions
- error creating an agent
- TypeError: CCompiler_spawn() got an unexpected keyword argument 'env' HOT 2
- A minimal example of custom Environment fails on protobuf or dtensor import from tensorflow.compat.v2.experimental HOT 6
- How to specify min_value and max_value in a custom environment when shape of the state is a vector? HOT 1
- Does Runner.run perform training given it never invokes agent.experience(...) ? HOT 1
- logging to logdir for tensorboard? HOT 1
- Some issue about PPOAgent update
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorforce.