Coder Social home page Coder Social logo

Comments (3)

AlexKuhnle avatar AlexKuhnle commented on July 21, 2024 1

First, the problem with TRPO should be fixed now. Second, you're absolutely right, this is unexpected behavior, and what you suggest seems like a good solution. I will open an issue to track the config and the min/max value problem, and will close this one.

from tensorforce.

AlexKuhnle avatar AlexKuhnle commented on July 21, 2024

Hey,

First, thanks for reporting the issue. The first problem you encounter is most likely due to a bug in our current implementation of TRPO with multiple actions, which should hopefully be fixed in the next 1-2 days. I'll let you know.

I'm not sure about the second exception you get -- what are you redefining afterwards such that it works? Anyway, the problem with min_value and max_value you mention afterwards is a general problem currently. Although the feature, which I guess makes sense generally, is already supported for the action interface, it is not yet supported for action distributions etc, so it is essentially just ignored. This is because we so far only provide Gaussian as continuous distribution, which does not naturally define min/max values. Does it nevertheless work, ignoring the out-of-bound values?

from tensorforce.

JannesKlaas avatar JannesKlaas commented on July 21, 2024

Regarding the second issue, I looked at it further: Configuration objects are changed when they are used to create an agent. This makes them unusable when creating the next agent.
Here is how the issue shows itself:

#Define config
config = Configuration(
    batch_size=100,
    states=dict(shape=(10,), type='float'),
    actions=dict(continuous=False, num_actions=2),
    network=layered_network_builder([dict(type='dense', size=50), dict(type='dense', size=50)])
)

#Define first agent (works)
agent = TRPOAgent(config=config)

#Define second agent (also works)
agent1 = TRPOAgent(config=config)

#Define state
state = np.array([1,2,3,4,5,6,7,8,9,10])

#First agent acts (works)
agent.act(state)

#Second agent acts (crashes)
agent1.act(state)

I looked into the agent code and I think I found the issue. The code creating the agent modifies the configuration passed along. Before declaring the agent, print(config) prints:
{actions={continuous=False, num_actions=2}, states={type=float, shape=(10,)}, batch_size=100, network=<function layered_network_builder.<locals>.network_builder at 0x111b8d598>}

after

agent = TRPOAgent(config=config)

print(config) outputs:
{device=None, cg_iterations=20, optimizer=None, cg_damping=0.001, log_level=info, network=<function layered_network_builder.<locals>.network_builder at 0x111b8d598>, global_model=False, exploration=None, normalize_advantage=False, max_kl_divergence=0.001, preprocessing=None, discount=0.97, states={state={type=float, shape=(10,)}}, session=None, distributed=False, line_search_steps=20, batch_size=100, actions={action={continuous=False, num_actions=2}}, tf_summary=None, learning_rate=0.0001, generalized_advantage_estimation=False, tf_saver=False, baseline=None, gae_lambda=0.97, override_line_search=False}

On several points, the agent class directly modifies the config passed along. This leads to problems when the config is used later. A better way would probably be to create a copy of the config before modifying it.

from tensorforce.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.