mnoukhov / emergent-compete Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 1.0 10.42 MB

Code for Emergent Communication under Competition (AAMAS 2021)

License: GNU General Public License v3.0

Python 1.19% Jupyter Notebook 98.78% Shell 0.02%

deep-learning emergent-communication

emergent-compete's People

Contributors

Stargazers

Watchers

Forkers

tianyu-z

emergent-compete's Issues

achieve 100% accuracy on last round in simple setup

To test the usefulness of the state "context", we have a receiver agent (e.g. deterministic) paired with a sender that simply sends it's input as a message. Therefore the receiver is just trying to guess the bias based off of repeated interactions and with gradient-based methods should get 100% accuracy on the later rounds.

current situation with deterministic, bias [0,10] shows that it is not near 100% accuracy

but that the current state is indeed helpful to some extent (rewards per round shown here)

need to change:

the state
how states are accumulated

look into:

better loss function?
(cos has bad gradients at certain points)

MADDPG

implement MADDPG (https://arxiv.org/abs/1706.02275) to train the agents

Get 100% accuracy using only states

Following #1, no try to achieve 100% accuracy without being given the extra state variable prev_diff (previous signed distance to target). Instead use previous reward states to infer the same information.

Since only one previous reward state does not give you this information because it is unsigned (e.g. guess = 7, distance = 1 means the true reward was either 6 or 8) use two previous states which should allow to disambiguate

Does Selfish Communication Emerge?

Does a protocol always emerge in the biased game when we expect it to?

We expect communication to emerge when the reward with communication > reward without communication. In our specific game this translates to bias < 2 pi aka not fully competitive

Show language drift from cooperation

Show that changing from cooperative to non-cooperative causes language drift

MultiAgent Deterministic Gradient Agents

Both agents with a simple deterministic network do not manage to converge even with bias=0

loss function doesn't work without good initialization

the current format of actions and loss is:

an actions and targets correspond to a points on the edge of a circle
the loss function is the cosine of the angle between the action and the target

the issue is that the output of the network can be arbitrarily large or small, and the loss for some prediction x is the same as the loss for x + k*2pi for any integer k. This means that the gradient can go towards one minima for some input x and towards a different minima for another similar input x+eps.

one way to fix this has been to initialize the network to strictly positive values that guarantee all outputs are in the range [0, circumference] but this will be a lot harder if the model is larger, so perhaps it makes sense to change the loss function or output to be resilient to this

ideas:

sigmoid(output) * (circumference + 2k) - k so that all necessary output points are k from being output infinity

separate writers for each agent with different comments (/agent_name/)
custom scalars for logging multiple round rewards on the same chart

currently not updating live for unknown reasons

mnoukhov / emergent-compete Goto Github PK

emergent-compete's People

Contributors

Stargazers

Watchers

Forkers

emergent-compete's Issues

achieve 100% accuracy on last round in simple setup

MADDPG

Get 100% accuracy using only states

Does Selfish Communication Emerge?

Show language drift from cooperation

MultiAgent Deterministic Gradient Agents

loss function doesn't work without good initialization

Deter-Deter with Bias, State

Outperform one-shot baseline with RL

Logging with Tensorboard

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent