Coder Social home page Coder Social logo

mnoukhov / emergent-compete Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 1.0 10.42 MB

Code for Emergent Communication under Competition (AAMAS 2021)

License: GNU General Public License v3.0

Python 1.19% Jupyter Notebook 98.78% Shell 0.02%
deep-learning emergent-communication

emergent-compete's People

Contributors

mnoukhov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

tianyu-z

emergent-compete's Issues

achieve 100% accuracy on last round in simple setup

To test the usefulness of the state "context", we have a receiver agent (e.g. deterministic) paired with a sender that simply sends it's input as a message. Therefore the receiver is just trying to guess the bias based off of repeated interactions and with gradient-based methods should get 100% accuracy on the later rounds.

current situation with deterministic, bias [0,10] shows that it is not near 100% accuracy
diff
but that the current state is indeed helpful to some extent (rewards per round shown here)
round

need to change:

  • the state
  • how states are accumulated

look into:

  • better loss function?
    (cos has bad gradients at certain points)

Get 100% accuracy using only states

Following #1, no try to achieve 100% accuracy without being given the extra state variable prev_diff (previous signed distance to target). Instead use previous reward states to infer the same information.

Since only one previous reward state does not give you this information because it is unsigned (e.g. guess = 7, distance = 1 means the true reward was either 6 or 8) use two previous states which should allow to disambiguate

Does Selfish Communication Emerge?

Does a protocol always emerge in the biased game when we expect it to?

We expect communication to emerge when the reward with communication > reward without communication. In our specific game this translates to bias < 2 pi aka not fully competitive

loss function doesn't work without good initialization

the current format of actions and loss is:

  • an actions and targets correspond to a points on the edge of a circle
  • the loss function is the cosine of the angle between the action and the target

the issue is that the output of the network can be arbitrarily large or small, and the loss for some prediction x is the same as the loss for x + k*2pi for any integer k. This means that the gradient can go towards one minima for some input x and towards a different minima for another similar input x+eps.

one way to fix this has been to initialize the network to strictly positive values that guarantee all outputs are in the range [0, circumference] but this will be a lot harder if the model is larger, so perhaps it makes sense to change the loss function or output to be resilient to this

ideas:

  • sigmoid(output) * (circumference + 2k) - k so that all necessary output points are k from being output infinity

Outperform one-shot baseline with RL

now that #3 is solved for a deterministic policy learned with backprop, try to do the same thing learning with RL

current A2C method only matches one-shot baseline
a2c-reward
a2c-round
but results across rounds look promising, test some hyperparameter tuning and see if we can't improve performance, otherwise maybe switch to DDPG?

Logging with Tensorboard

It seems that the right way of logging with tensorboard could be

  1. separate writers for each agent with different comments (/agent_name/)
  2. custom scalars for logging multiple round rewards on the same chart

currently not updating live for unknown reasons

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.