Comments (3)
First, the problem with TRPO should be fixed now. Second, you're absolutely right, this is unexpected behavior, and what you suggest seems like a good solution. I will open an issue to track the config and the min/max value problem, and will close this one.
from tensorforce.
Hey,
First, thanks for reporting the issue. The first problem you encounter is most likely due to a bug in our current implementation of TRPO with multiple actions, which should hopefully be fixed in the next 1-2 days. I'll let you know.
I'm not sure about the second exception you get -- what are you redefining afterwards such that it works? Anyway, the problem with min_value
and max_value
you mention afterwards is a general problem currently. Although the feature, which I guess makes sense generally, is already supported for the action interface, it is not yet supported for action distributions etc, so it is essentially just ignored. This is because we so far only provide Gaussian as continuous distribution, which does not naturally define min/max values. Does it nevertheless work, ignoring the out-of-bound values?
from tensorforce.
Regarding the second issue, I looked at it further: Configuration objects are changed when they are used to create an agent. This makes them unusable when creating the next agent.
Here is how the issue shows itself:
#Define config
config = Configuration(
batch_size=100,
states=dict(shape=(10,), type='float'),
actions=dict(continuous=False, num_actions=2),
network=layered_network_builder([dict(type='dense', size=50), dict(type='dense', size=50)])
)
#Define first agent (works)
agent = TRPOAgent(config=config)
#Define second agent (also works)
agent1 = TRPOAgent(config=config)
#Define state
state = np.array([1,2,3,4,5,6,7,8,9,10])
#First agent acts (works)
agent.act(state)
#Second agent acts (crashes)
agent1.act(state)
I looked into the agent code and I think I found the issue. The code creating the agent modifies the configuration passed along. Before declaring the agent, print(config) prints:
{actions={continuous=False, num_actions=2}, states={type=float, shape=(10,)}, batch_size=100, network=<function layered_network_builder.<locals>.network_builder at 0x111b8d598>}
after
agent = TRPOAgent(config=config)
print(config) outputs:
{device=None, cg_iterations=20, optimizer=None, cg_damping=0.001, log_level=info, network=<function layered_network_builder.<locals>.network_builder at 0x111b8d598>, global_model=False, exploration=None, normalize_advantage=False, max_kl_divergence=0.001, preprocessing=None, discount=0.97, states={state={type=float, shape=(10,)}}, session=None, distributed=False, line_search_steps=20, batch_size=100, actions={action={continuous=False, num_actions=2}}, tf_summary=None, learning_rate=0.0001, generalized_advantage_estimation=False, tf_saver=False, baseline=None, gae_lambda=0.97, override_line_search=False}
On several points, the agent class directly modifies the config passed along. This leads to problems when the config is used later. A better way would probably be to create a copy of the config before modifying it.
from tensorforce.
Related Issues (20)
- Gym envirnoment broken: 'dict' object has no attribute 'env_specs HOT 3
- Issues installing Tensorforce from pip on Python 3.10
- is it still active? HOT 2
- How to change epsilon value when using epsilon-greedy policy? HOT 2
- Can I customize the loss function?
- Saver documentation inconsistent with example
- End-to-end data collection and policy updates on the GPU possible with tensorforce
- how to modify the loss function of the value network in PPO
- AttributeError: 'Adam' object has no attribute '_create_all_weights' HOT 3
- Why different models performs the same HOT 1
- AttributeError: type object 'Module' has no attribute '_MODULE_STACK' HOT 1
- tensorforce.exception.TensorforceError: Invalid value for variable argument spec: TensorSpec HOT 1
- Comparison of "online" and "offline" agent-enviroment interactions
- error creating an agent
- TypeError: CCompiler_spawn() got an unexpected keyword argument 'env' HOT 2
- A minimal example of custom Environment fails on protobuf or dtensor import from tensorflow.compat.v2.experimental HOT 6
- How to specify min_value and max_value in a custom environment when shape of the state is a vector? HOT 1
- Does Runner.run perform training given it never invokes agent.experience(...) ? HOT 1
- logging to logdir for tensorboard? HOT 1
- Some issue about PPOAgent update
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorforce.