endolith / elsim Goto Github PK

Election Simulator 3000: Simulates a variety of elections and voting methods

Home Page: https://endolith.github.io/elsim

License: MIT License

Python 100.00%

condorcet condorcet-methods elections monte-carlo simulation voting-methods approval-voting democracy voting borda-count

elsim's Introduction

Election Simulator 3000

This is a library of functions for simulating thousands of elections held using different voting methods (Borda count, Approval voting, etc.) under different voter models (impartial culture, spatial model, etc.) and estimating various metrics from them (Social Utility Efficiency = Voter Satisfaction Efficiency = VSE, Condorcet Efficiency, likelihood of Condorcet cycles, etc.)

For example, it can be used to reproduce Figure 1 from Merrill 1984:

Or the table of Effectiveness from Weber 1977:

	Standard	Vote-for-half	Borda
2	81.37	81.71	81.41
3	75.10	75.00	86.53
4	69.90	79.92	89.47
5	65.02	79.09	91.34
6	61.08	81.20	92.61
10	50.78	82.94	95.35
255	12.78	86.37	99.80

See /examples folder for more on what it can do, such as reproductions of previous research.

Goals

Fast (~25,000 elections per second on Core i7-9750H)
Flexible
Well-documented, easily-used and improved upon by other people
Well-tested and bug-free
Able to reproduce peer-reviewed research

Requirements

See requirements.txt. As of this README, it includes numpy and scipy for the simulations, tabulate for printing example tables, joblib for parallelizing extreme examples, and pytest, hypothesis, and pytest-cov for running the tests. All should be installable through conda.

Optionally, elsim can use numba for speed. If not available, the code will still run, just more slowly.

Installation

One possibility is to install with pip:

pip install git+https://github.com/endolith/elsim.git

Documentation

Currently just the docstrings of the submodules and functions themselves, in numpydoc format. Now being rendered at https://endolith.github.io/elsim/

Usage

Specify an election with three candidates (0, 1, 2), where two voters rank candidates 0 > 2 > 1, two voters rank candidates 1 > 2 > 0, and one ranks candidates 2 > 0 > 1:

>>> election = [[0, 2, 1],
...             [0, 2, 1],
...             [1, 2, 0],
...             [1, 2, 0],
...             [2, 0, 1]]

Calculate the winner using Black's method:

>>> from elsim.methods import black
>>> black(election)
2

Candidate 2 is the Condorcet winner, and wins under Black's method.

Submodules and chained functions

Originally, the functions in submodules were meant to be chained together in a simple flow:

A function from elsim.elections takes parameters as input (number of candidates, number of voters, dispersion in spatial model, etc.) and produces an array of utilities (each voter's appraisal of each candidate).
Then a function from elsim.strategies converts each voter's utilities into a ballot.
Then a function from elsim.methods counts the collection of ballots and chooses a winner.

flowchart LR
    Parameters -- Election --> Utilities
    Utilities -- Strategy --> Ballots
    Ballots -- Method --> Winner

However, while implementing many different types of simulations, it has become more complicated. Some functions produce intermediate results, while others skip over multiple steps. I'm no longer sure the best way to organize these functions into submodules. Here is a diagram showing the flow of every function currently in the submodules:

%%{ init: { 'flowchart': { 'curve': 'monotoneX' } } }%%
flowchart LR
    %% elections.py
    Parameters -- <code>normal_electorate</code> --> Positions[Spatial positions]
    Positions -- <code>normed_dist_utilities</code> --> Utilities
    Parameters -- <code>random_utilities</code> --> Utilities
    Parameters -- <code>impartial_culture</code> --> ranked_ballots

    %% strategies.py
    Utilities -- <code>approval_optimal</code> --> approval_ballots
    Utilities -- <code>vote_for_k</code> --> approval_ballots
    Utilities -- <code>honest_normed_scores</code> --> score_ballots
    Utilities -- <code>honest_rankings</code> --> ranked_ballots

    subgraph Ballots
        approval_ballots[Approval ballots]
        score_ballots[Score ballots]
        ranked_ballots[Ranked ballots]
    end

    %% approval.py
    approval_ballots -- <code>approval</code> --> Winner
    score_ballots -- <code>combined_approval</code> --> Winner

    %% condorcet.py (moved out of order so it renders with fewer line collisions)
    ranked_ballots -- <code>ranked_election_to_matrix</code> --> Matrix
    Matrix -- <code>condorcet_from_matrix</code> --> Winner
    ranked_ballots -- <code>condorcet</code> --> Winner

    %% black.py
    ranked_ballots -- <code>black</code> --> Winner

    %% borda.py
    ranked_ballots -- <code>borda</code> --> Winner

    %% coombs.py
    ranked_ballots -- <code>coombs</code> --> Winner

    %% fptp.py
    ranked_ballots -- <code>fptp</code> --> Winner
    ranked_ballots -- <code>sntv</code> --> Winner

    %% irv.py
    ranked_ballots -- <code>irv</code> --> Winner

    %% runoff.py
    ranked_ballots -- <code>runoff</code> --> Winner

    %% score.py
    score_ballots -- <code>score</code> --> Winner

    %% star.py
    score_ballots -- <code>star</code> --> Winner
    score_ballots -- <code>matrix_from_scores</code> --> Matrix

    %% utility_winner.py
    Utilities -- <code>utility_winner</code> --> Winner

Tests

Tests can be run by installing the testing dependencies and then running pytest in the project folder.

Bugs / Requests

File issues on the GitHub issue tracker.

Similar projects

Election simulators

1D:
- Election Methods in Pictures ("Voteline") by Ka-Ping Yee - 5 candidates, normal/uniform/bimodal distribution
- Comparing Voting Systems for a Normal Distribution of Voters - Wolfram demonstration clone
2D:
- Voting Simulation Visualizations by Ka-Ping Yee - "Yee diagrams"
- Elections On The Plane - Yee diagrams by Brian Olson
- Yee Pictures illustrating voting method behavior by Warren D. Smith
ND:
- Voter Satisfaction Efficiency by Jameson Quinn

Voting system implementations

See the list on Electowiki

elsim's People

Contributors

Stargazers

Watchers

Forkers

ozcan-durak masiarek cdchushig arendpeter

elsim's Issues

runoff can use the tally functions from IRV

Add uncertainty to estimates

"Estimated variance of estimated probability"

https://gist.github.com/endolith/9e73ea30511f0befb9b336526f34d0a6

https://stats.stackexchange.com/q/430409/11633

https://stats.stackexchange.com/q/576491/11633

Fix package structure, absolute imports

Currently the overall package structure is what I want:

import elsim

elsim.__dir__()
Out[3]: 
[...
 'strategies',
 'elections',
 'methods']

As is the methods sub-package:

elsim.methods.__dir__()
Out[4]: 
[...
 '_common',
 'approval',
 'combined_approval',
 'condorcet',
 'borda',
 'black',
 'condorcet_from_matrix',
 'ranked_election_to_matrix',
 'coombs',
 'fptp',
 'irv',
 'runoff',
 'score',
 'star',
 'utility_winner']

But the other two include cruft that shouldn't be exposed to the user, such as np and cdist, and elections includes itself:

elsim.elections.__dir__()
Out[5]: 
[...
 'elections',
 'numbers',
 'np',
 'cdist',
 'honest_rankings',
 'elections_rng',
 'check_random_state',
 'random_utilities',
 'impartial_culture',
 'normal_electorate',
 'normed_dist_utilities']

Also the __init__.py files have relative imports, despite this being mildly recommended against by PEP8:

Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path):

from . import elections, strategies, methods

Though there are a lot of other packages that do it this way.

I've always been confused about the best way to do this stuff.

Add Approval+Runoff

https://en.wikipedia.org/wiki/Unified_primary

Vectorize methods

I think all of these functions could accept 3D arrays of shape (n_voters, n_cands, n_elections) or (n_elections, n_voters, n_cands), whichever makes more sense, and return a list of winners for each election.

np.array((impartial_culture(5, 4), impartial_culture(5, 4)))
Out[16]:
array([[[1, 2, 0, 3],
		[2, 1, 3, 0],
		[3, 0, 1, 2],
		[3, 1, 2, 0],
		[2, 3, 0, 1]],

	   [[0, 2, 3, 1],
		[2, 3, 1, 0],
		[1, 3, 2, 0],
		[2, 1, 0, 3],
		[0, 2, 3, 1]]], dtype=uint8)

_.shape
Out[17]: (2, 5, 4)

Use docrep to simplify docstrings?

https://docrep.readthedocs.io/en/latest/

Replace

    utilities : array_like
         ….

with

...     %(honest_rankings.parameters.utilities)s'''

and have the description just live in one place?

add normal() or spatial_model() to reproduce the rest of Merrill's results

Remove discrepancy from winner distribution plots

The ideal winner is relative to the voters, not relative to the random number generator. It should remove any random bias before plotting.

Add docs

Copy a template that includes sphinx like

https://github.com/seanfisk/python-project-template

https://github.com/Kwpolska/python-project-template

Switch ranked ballot representation?

Currently it's using the representation that matches how we often write ballots:

A > B > C
B > C > A
B > A > C

becomes

0, 1, 2
1, 2, 0
1, 0, 2

Where rows are voters, columns are rankings, and cells contain candidate IDs.

But this format doesn't allow for expressions of indifference, and for elimination methods, this requires either shuffling cells around or separate pointers for each row and is a bit clunky.

The other convention is more general because it allows indifference:

1, 2, 3
3, 1, 2
2, 1, 3

Here the rows are voters, columns are candidates, and cells store rankings. Does this also simplify or speed up the calculation of elimination methods? Borda just becomes a column sum, for instance.

Make sure I'm not killing performance in using smaller dtypes

Winner distribution plots TODO

Publish the winner distribution plot scripts
Compare optimal approval strategy with vote-for-half, etc.
Requested simulations:

Requested simulations:

Bimodal voters.
- Instead of 10k voters normally distributed with a mean of Zero, can you input two normal curves of 5k each, with means of +.5 and -.5? That seems to me to be something approximating the politically active in well established two-party systems.
- Alternately, use the FPTP distribution of winners (divided by 10, to get back to 10k voters) from the 0.7 or 0.5 dispersion with FPTP, which seems to approximate what I expect the previous would come out to.
Additional Methods to be included:
- Open Partisan Primaries (i.e., two round FPTP, but with the first round only allowing voters to vote for the one candidate closest to them, and instead of picking the top two, pick the top one from each party distribution)
- Closed Partisan Primaries (basically FPTP internal to each group of 5k voters & C candidates, with a runoff between the two winners)
- Score
  and, just out of curiosity:
- 3-2-1
Different "Turnouts" for multi-round methods. Probably only relevant for Bimodal, if at all.
- For Primary type methods, have fewer voters in the primary. Say, 3.5k-4k per distribution for the primary, but the full 5k for the General.
- For "Runoff" type, have fewer in the Runoff than General

For Score/STAR:

Direct comparisons between Score and STAR, using the input parameters.
For Score and STAR, vary the range. Ranges I'm particularly interested in are:
- The 0-5 range the STAR folks are advocating
- 10 point (1-10) and 11 point (0-10), to see if there's a meaningful difference for having a median point.
- 5 point (converting to 4.0, no + or - grades)
- 13 point (converting 4.0+, with [A+,A,A-,...,D,D-,F] corresponding to a single point)
- 15 point (4.0+, with additional values corresponding to a hypothetical F+ and F-, in case anyone uses them, which, for the purposes of reporting, would be 1/3, and -1/3, respectively)

https://www.reddit.com/r/EndFPTP/comments/yqavco/the_us_forward_party_now_includes_approval_star/j03926b/

Parallelize all examples

This is "embarrassingly parallel" but how to implement it while also reusing the generated data to avoid wasting time calculating that?

Create a nice API

My plan is to create a bunch of example scripts reproducing papers in computational social choice theory, and then after getting a bunch of features implemented, figure out the best way to package them all up in a nice API. I have no clue what it would look like, though.

Should be able to specify a number of voters, or a set/list of numbers of voters
Should be able to specify a number of candidates, or a set/list of numbers of candidates
Should be able to specify a single voting system or a set of them
Should be able to support strategies
Should it always calculate the Cartesian product of the above sets? No, because of examples like https://github.com/endolith/elsim/blob/master/examples/merrill_1984_table_2.py
- But it should be easy to specify that you want the Cartesian product of the input sets...
Parallelization should be automatic, but user should be able to override the backend to turn it off, compute it on a cluster, or whatever
etc.

Split strategies from elections/methods

strategies should be its own submodule, which takes utilities and generates ballots.