llnl / abmarl Goto Github PK
View Code? Open in Web Editor NEWAgent Based Modeling and Reinforcement Learning
License: Other
Agent Based Modeling and Reinforcement Learning
License: Other
Every experiment must be reproducible or the environments have not much worth for research because we can't compare runs properly. There needs to be a random number generator which gets a seed for each environment and agent so that one can run identical experiments if they choose to. The particle environment can be used as an example--it sets the random number generator for the action spaces as well. This is a huge thing in RL.
Support communication in grid-mode observations. This will likely mean that the agents need to have a full observation space, even if they are created with partial observability. Then we need to create "fog" for regions where the agent cannot see. When communication occurs, that fog is "lifted" and the agent can observer that region of space.
Consolidation of all open efforts related to the predator prey environment.
Grid mode communication #6
How do the observations change in grid mode with communications?
Distance observation mode see edge #8
Should the agents see the edge of the map in distance observation mode?
Stochasticity #51, #36
Stochasticity in the observations and actions, especially action effectiveness.
Broadcast communication. #65
All agents within some distance can receive the message, with some randomness. In addition, all agents within some distance (can be larger than message distance) can see the broadcaster's location.
Highly-componentized design of environments
PredatorPreyEnv currently processes some interactions between the predators and prey. Each agent can move around on the grid and the predators can attack the prey. The prey can also harvest resources. This could be split up into 3 components: a movement handler, an agent-agent-interaction handler, and an agent-environment-interaction handler. This is a significant redesign, so it should be considered very carefully and not as part of the main development push.
Add the ability to specify agent start positions
Agents are just started randomly in the grid. For reproducibility, including in the test suite, we should add the ability to specific the agents' starting locations.
Agent Health
Each agent is given some health. This health degenerates slowly over time as the agent grows hungrier. The health can also decrease if the agent is attacked. If the health reaches zero, then the agent dies. Health can also increase by eating
resources. For prey, that means foraging. For predators, that means eating prey.
In the future, if health increases above some maximum limit, the agent can reproduce.
Attack Strength
We already have attack range, which is the number of squares away the attack is effective. Now, we want attack strength, which indicates how much health the attacked agent will lose when attacked. Prey also have an attack strength in the sense of how much of the resource they will deplete when they eat it.
Entropy
The amount of health the agent loses each time it takes an action.
Revival
The amount of health the agent will receive when it consumes a resource.
How can we add agents mid simulation? Some important factors to think about:
all done
and the way the managers loop over the agents.The components allow agents in self.agents
that are not acting and not observing agents. However, the manager assumes that all agents in this list is an actual agent (acting and observing; trainable).
In distance observation mode, how will the agent know that it is near the edge of the region?
In Epic #10
Position observations produce the relative position of other agents in the grid, and they also produce the relative position of boundaries. Perhaps it would be better to break down the boundaries observation as its own channel, separate from the agents (and from the resources).
In Epic #12
Agent classes are really just dictionaries, and they're all pretty much the same. We should modify the dataclass decorator to give agents a configured function automatically and use this decorator everywhere we define an Agent to save from writing so much boilerplate code.
In epic #12
Include Continuous Movement Actor Component from Particle Env.
Movement is represented as velocity in x and y. These agents have a max speed. Velocity is damped, like friction, which we can treat like an entropy on the velocity. Velocity is updated by acceleration according to dv = a * dt
. Position is then updated like dp = v * dt
. dt
is a scaling on the velocity and position updates, which we can normalize away.
Actions can be discrete UDLR, which translates to x, y acceleration vectors with magnitude = move_speed
.
Actions can also directly be the x, y acceleration vector.
In Epic #39
Some components are almost exactly the same with very small differences. For example, take a look at the attacking components. Team based attack is almost the same as non-team based attack, just adds one check on the team. Is there a way to use inheritance or some other code design to capture and reduce this duplication?
Life and death component was prototyped in hackathon. Get that feature into its own component and figure out resolution to the bug where they all take the same actions. Create example demonstrating reproduction with this feature.
In epic #12
PredatorPreyEnv currently processes some interactions between the predators and prey. Each agent can move around on the grid and the predators can attack the prey. With #53, the prey can also harvest resources. This could be split up into 3 components: a movement handler, an agent-agent-interaction handler, and an agent-environment-interaction handler. This is a significant redesign, so it should be considered very carefully and not as part of the main development push.
The core of the environment is a grid with agents in cells. So each agent has a position.
Features are then layered onto this grid in the form of wrappers:
Two important questions:
In Epic #10
Using wrappers for each feature is poor design. We are using component-based design instead.
Add support for movement and positions in 3 dimensions.
We should convert the Corridor and MultiCorridor example environments to use components.
Agents can collide with each other and with landmarks.
Sometimes, that collision results in "bouncing" the "entities" away from each other. Collisions occur when agents are within some distance of each other. Keep in mind that agents have sizes that effect the collision calculation.
In Epic #39
Following the initial efforts in #12, we can try to improve the design of the components by framing them in terms of the part of the state that they control. For example, suppose an environment's state can be broken down into the following:
Consider that there are "sub-state handlers" that are made up certain components. For example, we could have:
Then, we can consider that an AgentBasedSimulation is composed of the "sub-state handlers".
In Epic #12
We need to explicitly indicate when agents should be acting agents, observing agents, or just state agents. For some agent classes, we've combined the actions and observations together. For example, SpeedAngle
agent is given speed and angle action space by the actor and observation space by the observer. LifeAgent
is given an observation space by the observer. However, we sometimes don't want agents to have certain actions and/or observations while still having the state. For example, VelocityAgent
should split out the acceleration capabilities so that we can take advantage of velocity state without giving an action space (such as for using landmarks/dumb agents). We have a good example of this with resources, where we have separated HarvestAgent
and ResourceObservingAgent
because we don't want Predators
to harvest
but we do want them to observe resources. We should do this for all agents.
It would be nice to have wrappers that can convert gym and MultiAgent environments to our AgentEnvironment interface so that all the wrappers can be used. We should also have wrappers to go back out to those types.
Important question to answer is that Abmarl simulation is composed of Agents in an Agent Based Simulation managed by a Simulation Manager. In a gym environment, the conversion to ABS is simple because there is only a single agent; however, the other environment types allow multiple agents, so it's not as clear.
Tensorflow cannot work with integer Boxes because tensorflow random uniform cannot do the necessary broadcast with integers.
Current workaround is to use np.float and offset by 0.5, have the component floor the input/output, and convert the array to int with ndarray.astype(int). Potential fix could be a component wrapper.
The Monte Carlo algorithms only work with the gym environment interface, which is made explicit in #62. We should enable multi-agent monte-carlo learning. We can approach it in the following way:
This can be broken into Speed and Angle agents. Then we can merge the accelerating part with AcceleratingAgent below and the actors and states can determine how to use that information (velocity updater vs speed-angle updater).
Originally posted by @rusu24edward in #50 (comment)
Build components for continuous movement and built example environment.
Add the communication component.
In Epic #12
The whole thing is more complex than how I handled it so far... if we do it properly we would have to do all these things in this picture... so 3 steps: get position without collision, rollback to point where the circles collide, and move the rest of the way in the right direction... if we don't do the last step, the traveled distance between two timesteps would be shortened...
There might also be a case where we don't even detect a collision because maybe they move perpendicular and after the timestep it looks as though they just passed through each other... so if we want to make it super correct we would have to calculate rays or something... very annoying... on the other hand we could just ignore the cases where objetcs pass through each other and stop after undoing the overlap as we are doing so far...
we can also just ignore it and make smaller steps so the overlaps are much smaller. I think this is probably the best solution, either smaller time intervals or lower velocities. I would do smaller time intervals for smoother movement but make the agent decision only after every couple of frames (this is also done in Atari games for example where a decision is only made every 4th frame).
This looks a lot like what we do. They completely ignore overlap and it looks nice in the gif
https://github.com/xnx/collision
This guy describes the continuous collision detection and finding the point where two objects collide.
https://www.toptal.com/game/video-game-physics-part-ii-collision-detection-for-solid-objects
Add communication channels to the environment. Things to consider sharing:
It currently just prints out failure:
>>> abmarl
Traceback (most recent call last):
File "/Users/rusu1/.virtual_envs/abmarl/bin/abmarl", line 11, in <module>
load_entry_point('abmarl', 'console_scripts', 'abmarl')()
File "/Users/rusu1/abmarl/abmarl/scripts/scripts.py", line 33, in cli
path_config = os.path.join(os.getcwd(), parameters.configuration)
AttributeError: 'Namespace' object has no attribute 'configuration'
>>> abmarl --help
usage: abmarl [-h] {train,analyze,play} ...
Train, analyze, and play MARL policies.
positional arguments:
{train,analyze,play}
train Train MARL policies
analyze Analyze MARL policies
play Play MARL policies
optional arguments:
-h, --help show this help message and exit
Example usage for training:
abmarl train my_experiment.py
Example usage for analysis:
abmarl analyze my_experiment_directory/ my_analysis_script.py
Example usage for playing:
abmarl play my_experiment_directory/ --some-args
Some of the components instance checks can be loosed a bit, and we should explore the best way to do this.
Box observations must be np.array to work with RLlib. This is pretty straightforward when the array has multiple elements, but it is not intuitive when there is only a single output. Rather than changing all scalar values to 1-element arrays, we can just create a wrapper that does the conversion for us.
Add stochasticity in the observations and actions. Observations can be filtered through a Bernoulli distribution that are correlated to distance. For attack actions, we want to correlate to distance and to the number of observable prey.
In epic #10
Make a read the docs page with all documentation and add a highlights page that showcases what people have done using this software.
If the actor does not receive an input, it should handle the null case itself rather than requiring the environment to do it. For example, instead of actor.process_move(agent, action.get('move', np.zeros(2)))
, we could just do actor.process_move(agent, actor.get('move'))
, and the process_move
function can have logic for dealing with no move.
Fundamentals are position, team and life. The attribute agents should just be one and we default the parameters as needed.
It is not entirely clear what GridPositionComponent does. It keeps track of agents' positions, yes, but then it also has some capabilities of generating observations of other agents based on that position. As a result, if we want to change how observations are made, then we need a new class, or a subclass. This becomes more ambiguous when we start to think about obstacles in the grid.
Furthermore, we still struggle to define how env features should be observed in conjunction with the agents' positions, such as resources.
Network model. Each node has a set of preferences--likes and dislikes. Each node can also choose to share something, either its own opinion or something that it hears from its neighbors. Nodes can choose to break links and add links (dynamics are simpler than voting model because the node doesn't change teams). Nodes might choose to do so based on what they hear from various agents. For example, if a node always hears something it doesn't like from another node, then it may choose to break that link and select a new one from the network.
The idea here is that we can study the polarization of a population under different reward functions.
Extensions:
In Epic #2
Design of the movement component is a bit ambiguous. On the one hand, it seems like the movement component should take the position and movement and output the desired new location, without constraints. Then, the environment can decide what to do about the movement output (e.g. if the new location is out of bounds, then don't move there). Currently, movement component does this constraint internally, checking for out of bounds movement, which it can do because it has the region. But as we add more constraints, such as landmarks and other agents' locations, the movement component will need to be updated to handle all these constraints. The question is: does it make sense for movement components to handle this internally, or is it better for movement to just process the desired location and let the environment or another component process the constraints? This has parallel considerations with the attack component with the question of processing the agents' health.
In epic #12
Attack Types
Binary
Box(0, 1, (2 * attack_range + 1,2 * attack_range + 1), int)
(2 * attack_range + 1)^2
and i goes from 1 to K. The action space would be MultiDiscrete(N+1, N, N-1, ...)
, up to K entries. Each element in the attack vector would be converted to a grid cell or no attack.(N+1)^K
, where we have N+1 because the agent can choose not to use one of its K attacks. The action space would be MultiDiscrete(N+1, N+1, N+1, ...)
up to K entries. Each element of the attack vector would be converted to a grid cell or no attack.attack_mapping
parameter specifies which agent types can attack other agent types. We use this in the simulation dynamics already, and I think I can make an attack actor that makes each channel explicit. This would use the binary attack logic under the hood.Variations
Each of the attack types above can have variations. For example,
We can have many more variations, but this is a good starting point.
Design
Position state is a bit clumpy right now. We have a Grid representation and two Continuous representations, one which uses velocity and another which uses speed and angle. In addition, the corridor example can reference agents based on their positions by mapping from cell to agent object. NetLogo unifies the position between grid and continuous spaces by tracking an agents continuous position and by mapping that agent to a grid cell based on its position. We should do this too. We should also do the mapping that corridor has.
The plume has a concentration source with wind that pushes the concentration a certain direction. This is set at the beginning of the episode and never modified.
Parameters: strength, diffusion factor in y and z, noise, upper and lower bounds for concentrations.
In Epic #39
Attack strength is specified by AttackingAgent but it is not used by the corresponding component. It is used by the environment, which can modify the agents' healths based on the attacking agent's strength. Perhaps we should add some attacking components that have this feature?
I think in total there are 4 cases to be considered.
2 and 4 are variants of the existing behavior in predator prey with the difference that in one of them the step size can be learned as well (it’s like a cheap way to regulate velocity).
1 and 3 however both operate on a more “realistic” basis in that they let agents move and interact with other objects in a continuous space which makes sense for collisions and such things. I think both of 1+2 are common ways of controlling movement and if possible we should have them both.
AgentObservingAgent allows the observer to mask the agent's observation, resulting in partial observability through the agent_view
parameter. This is an awkward design, and we should try to rework it. Perhaps we can use composition through wrappers on the observation component itself?
Bring over the Particle Plume env from the dance repo to this repo. Create all the necessary components and the examples scenarios.
Supported actions and features in particle-plume:
Components breakdown:
Scenarios:
Broadcasting agent with broadcast range. Agents within this range will receive the message. If the agent is on the same team, it's observation will fuse the observation from the broadcasting agent as appropriate. For example, if the broadcasting agent sends position, team, and life information; but the receiving agent only observes position and life channels, then the receiving agent will only fuse the channels it supports. If the receiving agent is on another team, then it will only receive the state information of the broadcasting agent, also only the observation channels that its supports.
Actor: agent can choose to broadcast
State: which agent is broadcasting this step
Observer: Update obs as described above. This will be implemented as a wrapper, and should go after partial observation wrappers.
Does not support "grid" observation style. See #5, #6, #19 for thoughts there.
In Epic #10
Support that some teams can/cannot attack other teams.
It might be nice to give states the "render" function that allows them to turn the state into an image. This is what we were doing before #26, and the design we had hinted at the necessary design for all renderers to work together to create a single composite image. Currently, the examples just render per use case.
Agents can share observations in DistanceMode using the CommunicationWrapper.
Support agents sharing their observations with other agents. Some thoughts on how to do this:
Based on the above ideas, there are two approaches that make the most sense to me:
+ -------- + ------------------------------- + ------------------ + ------------------------ + -------------------- +
| timestep | agent0 obs | agent0 action | agent1 obs | agent1 action |
+ -------- + ------------------------------- + ------------------ + ------------------------ + -------------------- +
| 0 | augmented_obs[] | request agent1 obs | incoming_request[] | some_action |
| 1 | augmented_obs[] | some_action | incoming_request[agent0] | fulfill/deny request |
| 2 | augmented_obs[agent1 fulfilled] | some_action | incoming_request[] | some_action |
+ -------- + ------------------------------- + ------------------ + ------------------------ + -------------------- +
Receiving messages does directly affect the agent's own observation space.
+ -------- + ------------------------------- + ------------------ + ------------------------------ + -------------------- +
| timestep | agent0 obs | agent0 action | agent1 obs | agent1 action |
+ -------- + ------------------------------- + ------------------ + ------------------------------ + -------------------- +
| 0 | broadcasted_messages[] | broadcast message | broadcasted_messages[] | some_action |
| 1 | broadcasted_messages[] | some_action | broadcasted_messages[agent0] | trust/ignore message |
| 2 | broadcasted_messages[] | some_action | broadcasted_messages[] | some_action |
+ -------- + ------------------------------- + ------------------ + ------------------------------ + -------------------- +
Sending messages does not directly affect the agent's own observation space. The agent's observation space is effected by others sending me messages.
In Epic #4
Dead agents can still be observed, which we don't want since we want them to be effectively removed from the simulation. We cannot remove them from the agent dict without doing a deepcopy at each reset. The observers right now force the agent to be alive to be observed. What if we just set their position to None so the observers ignore them?
Include landmarks in the environment.
Landmarks have a position and a velocity (and a color for rendering). Some may also have specific id's/type that the agent needs to observe if it needs to go to a specific landmark. They are similar to agents, but they don't decide actions. Agents can collide with landmarks.
Is the landmark's movement/position effected by a collision with the agent? Or is it treated as "infinite mass" and only the agents bounce off of it. What happens if two landmarks collide?
In Epic #39
This epic ticket tracks what example simulations we implement in Abmarl. Here is the list, each new game should spawn a new ticket in this epic:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.