Coder Social home page Coder Social logo

Comments (4)

AlexKuhnle avatar AlexKuhnle commented on July 21, 2024

Yes, this is true, and I like your proposal on how to specify initial parameters. Currently, however, the mean and variance are estimated by the output of a linear layer, so I don't know spontaneously how to integrate the start value in a sensible way (unless the entire process is best rearranged. Let me know if you have suggestions, I will definitely have a think about it.

Regarding your comment about rescaling the policy: I agree fully that it's not a particularly good solution, more a first attempt. A good solution is probably to just provide another distribution type for the case of bounded action values. Moreover, one should be able to provide custom implementations of distributions. However, I don't think we will integrate this into the network part, since this would mean that everyone has to take care about that when defining a network, and ideally we would want to hide all this from the user who doesn't want to get into this (but provide the possibility to others, when possible).

from tensorforce.

befelix avatar befelix commented on July 21, 2024

If you want to use the distribution parameters as initial guesses for the parameterization this would mean initializing the weights of the linear unit with zero and the bias to the desired standard deviation.

Mhm, okay I see the point for making it easy to use without. But there should be a convenient way for a user to define the entire policy (including the parameterization of the standard deviation) without having to worry about implementing the KL-divergence stuff. I guess right now the way to do so is to inherit from the Gaussian class and overwrite create_tf_operations. Not sure if that's the best solution.

from tensorforce.

AlexKuhnle avatar AlexKuhnle commented on July 21, 2024

Oh, my bad, yes, that's rather straightforward. :-)

It is a good solution, but maybe one that can be improved on by, for instance, providing functions estimate_mean() and estimate_stddev(). Custom implementations are actually something where user-friendliness can be improved in general, for instance, by providing a good inheritance interface. Thanks for pointing this out.

from tensorforce.

AlexKuhnle avatar AlexKuhnle commented on July 21, 2024

I modified the action/distribution interface, so Gaussian(mean=..., log_stddev=...) works now (by setting weight as zero and bias as given value, as you suggested). Moreover, the action definition can optionally contain a distribution value (where the value for type could also be a custom class MODULE.distr.CustomDistribution), for instance:

dict(
    continuous=True,
    distribution=dict(
        type='gaussian',
        mean=0.5,
        log_stddev=0.1
    )
)

from tensorforce.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.