Hi, first of all, thanks for the hard work that is going into this project. You ar

Issues with multiple continuous actions about tensorforce HOT 3 CLOSED

JannesKlaas commented on July 21, 2024

Issues with multiple continuous actions

from tensorforce.

Comments (3)

AlexKuhnle commented on July 21, 2024 1

First, the problem with TRPO should be fixed now. Second, you're absolutely right, this is unexpected behavior, and what you suggest seems like a good solution. I will open an issue to track the config and the min/max value problem, and will close this one.

from tensorforce.

AlexKuhnle commented on July 21, 2024

Hey,

First, thanks for reporting the issue. The first problem you encounter is most likely due to a bug in our current implementation of TRPO with multiple actions, which should hopefully be fixed in the next 1-2 days. I'll let you know.

I'm not sure about the second exception you get -- what are you redefining afterwards such that it works? Anyway, the problem with min_value and max_value you mention afterwards is a general problem currently. Although the feature, which I guess makes sense generally, is already supported for the action interface, it is not yet supported for action distributions etc, so it is essentially just ignored. This is because we so far only provide Gaussian as continuous distribution, which does not naturally define min/max values. Does it nevertheless work, ignoring the out-of-bound values?

from tensorforce.

JannesKlaas commented on July 21, 2024

Regarding the second issue, I looked at it further: Configuration objects are changed when they are used to create an agent. This makes them unusable when creating the next agent.
Here is how the issue shows itself:

#Define config
config = Configuration(
    batch_size=100,
    states=dict(shape=(10,), type='float'),
    actions=dict(continuous=False, num_actions=2),
    network=layered_network_builder([dict(type='dense', size=50), dict(type='dense', size=50)])
)

#Define first agent (works)
agent = TRPOAgent(config=config)

#Define second agent (also works)
agent1 = TRPOAgent(config=config)

#Define state
state = np.array([1,2,3,4,5,6,7,8,9,10])

#First agent acts (works)
agent.act(state)

#Second agent acts (crashes)
agent1.act(state)

I looked into the agent code and I think I found the issue. The code creating the agent modifies the configuration passed along. Before declaring the agent, print(config) prints:
{actions={continuous=False, num_actions=2}, states={type=float, shape=(10,)}, batch_size=100, network=<function layered_network_builder.<locals>.network_builder at 0x111b8d598>}

after

agent = TRPOAgent(config=config)

print(config) outputs:
{device=None, cg_iterations=20, optimizer=None, cg_damping=0.001, log_level=info, network=<function layered_network_builder.<locals>.network_builder at 0x111b8d598>, global_model=False, exploration=None, normalize_advantage=False, max_kl_divergence=0.001, preprocessing=None, discount=0.97, states={state={type=float, shape=(10,)}}, session=None, distributed=False, line_search_steps=20, batch_size=100, actions={action={continuous=False, num_actions=2}}, tf_summary=None, learning_rate=0.0001, generalized_advantage_estimation=False, tf_saver=False, baseline=None, gae_lambda=0.97, override_line_search=False}

On several points, the agent class directly modifies the config passed along. This leads to problems when the config is used later. A better way would probably be to create a copy of the config before modifying it.

from tensorforce.

Issues with multiple continuous actions about tensorforce HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent