pbstark / shangrla Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 13.0 9.42 MB

Sets of Half-Average Nulls Generate Risk-Limiting Audits: tools for assertion-based risk-limiting election audits

License: GNU Affero General Public License v3.0

Python 100.00%

shangrla's People

Contributors

Stargazers

Watchers

Forkers

nealmcb umbernhard wannabesmith sophchan spertus akglazer dvukcevic michelleblom democracydevelopers mrpnelson cdesaintguilhem bsheehan-sf-rla fleverest

shangrla's Issues

Where is ballot_comparison module?

The tests and notebooks related to SUITE and hybrid audits fail for lack of a ballot_comparison module. I presume that a dependency is missing.

# python Code/suite_tools.py
Traceback (most recent call last):
  File "Code/suite_tools.py", line 11, in <module>
    from ballot_comparison import ballot_comparison_pvalue
ModuleNotFoundError: No module named 'ballot_comparison'

Note also that there are some extraneous files like
Code/.ipynb_checkpoints/hybrid-audit-example-3-checkpoint.ipynb
that should be removed and avoided via an addition to .gitignore.

kaplan_martingale returns nan

There are some cases where kaplan_martingale returns nan.

Steps to reproduce

s = [1, 0, 1, 1, 0, 0, 1] 
TestNonnegMean.kaplan_martingale(s, N=7, t=3/7, random_order=True)

returns

array([1.66666667, 0.72222222, 1.13888889, 2.64722222,        nan,
              nan,        nan]))

I think that this was due to c values of infinity being sent to integral_from_roots.

In addition, changing the last observed data point results in the entire martingale changing.

Steps to reproduce

s = [1, 0, 1, 1, 0, 0, 0]                                                   
TestNonnegMean.kaplan_martingale(s, N=7, t=3/7, random_order=True)

returns

(1.0, array([1., 1., 1., 1., 1., 1., 1.]))

ONEAudit implementation questions

For Dominion as used by San Francisco, treat tabulator scan batches as the ONEAudit batches for pooling assorters.
- Treat any ballot in counting group 1 (cards cast in-person) as a pooled CVR, pooled over its (tabulator, batch) combination.
For others jurisdictions, need flexibility to use ONEAudit reporting batches that do not correspond to scan batches, for instance, the votes in a contest that were cast in person.
- add function in class CVR that assigns tabulation groups based on the contests the CVR contains?
Implementation uses Contest-level flag for whether to use ONEAudit (using the Audit.AUDIT_TYPE.ONEAUDIT class constant), but it might make sense to specify the audit strategy at the Audit level or the Stratum level.
Need a way to estimate sample sizes. Can use the actual CVRs as MVRs together with the ONEAudit CVRs in a simulation, perhaps with additional hypothetical errors in the actual CVRs.
Need to work through the sampling when use_style == True.
- Sample from CVRs that contain the contest. If a selected CVR is in a pool batch, select a random card from the pool batch that contains it
- Every card in a batch for which at least one CVR contains the contest should be treated as if it contains the contest: if it does not, the MVR should record a non-vote (rather than treating it as an error if the card does not contain the contest, as it would for use_style without ONEAudit. Could do that by adding the contest (with empty vote dict) to every CVR in the pool batch to which the CVR belongs and that did not already contain the contest.** Existing functions automatically take care of the rest.

**A "scoped" defaultdict could do that, if such a construct exists, but it's also straightforward to code.

kaplan_markov function is broken

The kaplan_markov function currently doesn't work, as it tries to do
any(x < 0) while x is a list in find_p_values.

Wrapping the definition of d with np.array(...) in find_p_values on line 1442 of assertion_audit_utils fixes this issue.

Welford's Algorithm in NonnegMean

I compared the implementation of Welford's algorithm for calculating the running mean implemented in NonnegMean.shrink_trunc to an existing implementation online, and I found some discrepancy in the results.

The specific lines I'm referring to can be found here:

SHANGRLA/shangrla/core/NonnegMean.py

Lines 301 to 308 in 38ffc5d

    
           # Welford's algorithm for running mean and running sd 
        
           mj = [x[0]] 
        
           sdj = [0] 
        
           for i, xj in enumerate(x[1:]): 
        
               mj.append(mj[-1] + (xj - mj[-1]) / (i + 1)) 
        
               sdj.append(sdj[-1] + (xj - mj[-2]) * (xj - mj[-1])) 
        
           sdj = np.sqrt(sdj / j) 
        
           # end of Welford's algorithm.

Extracting this code into a short function, we have:

import numpy as np

def shrink_trunc_welford(x: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
        j = np.arange(1, len(x) + 1)
        mj = [x[0]]
        sdj = [0]
        for i, xj in enumerate(x[1:]):
            mj.append(mj[-1] + (xj - mj[-1]) / (i + 1))
            sdj.append(sdj[-1] + (xj - mj[-2]) * (xj - mj[-1]))
        sdj = np.sqrt(sdj / j)
        return mj, sdj

Then, a quick test:

x = np.array([1.5,2.5,1.7,1.0,1.2,2.1,1.4,1.8])
m, sd = shrink_trunc_welford(x)
print(m)
# [1.5, 2.5, 2.1, 1.7333333333333334, 1.6, 1.7000000000000002, 1.6500000000000001, 1.6714285714285715]
print(np.mean(x))
# 1.65

It seems the currently implemented running mean overshoots the true mean of the sample.

I'm not sure whether this is expected or not, but I compared this to an existing implementation in the welford package on pypi:

from welford import Welford

welford = Welford()
m = []
for xi in x:
    welford.add(xi)
    m.append(float(welford.mean))
print(m)
# [1.5, 2.0, 1.9, 1.6749999999999998, 1.5799999999999998, 1.6666666666666665, 1.6285714285714283, 1.65]

This implementation does not overshoot the sample mean. If this is not intended, let me know and I should be able to tweak the implementation accordingly quite easily.

Also, it appears the algorithm is implemented again in agrapa, although the code looks to be slightly different (perhaps calculating variance rather than sd):

SHANGRLA/shangrla/core/NonnegMean.py

Lines 419 to 426 in 38ffc5d

    
           # Welford's algorithm for running mean and running sd 
        
           mj = [x[0]] 
        
           sdj2 = [0] 
        
           for i, xj in enumerate(x[1:]): 
        
               mj.append(mj[-1] + (xj - mj[-1]) / (i + 1)) 
        
               sdj2.append(sdj2[-1] + (xj - mj[-2]) * (xj - mj[-1])) 
        
           sdj2 = sdj2 / j 
        
           # end of Welford's algorithm.

Impact of RuntimeWarning: overflow encountered in Notebook and tests?

In the published notebook assertion-RLA.ipynb there are some warnings displayed:

Find initial sample size

/opt/tljh/user/lib/python3.6/site-packages/numpy/core/fromnumeric.py:90: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Find measured risks for all assertions

/opt/shared/Code/assertion_audit_utils.py:984: RuntimeWarning: overflow encountered in double_scalars
  a[k+1,j] += 0 if j==0 else (1-c[k])*(j/(k+1))*a[k,j-1]
/opt/shared/Code/assertion_audit_utils.py:983: RuntimeWarning: overflow encountered in double_scalars
  a[k+1,j] = -c[k]*((k+1-j)/(k+1))*a[k,j]
/opt/shared/Code/assertion_audit_utils.py:1048: RuntimeWarning: invalid value encountered in multiply
  mart_vec = np.multiply(Y_norm,integrals)

I also see these among the warnings displayed when running pytest assertion_audit_utils.py.

Should we worry about these? Fix them? Suppress them somehow?

supermajority assorters: need to implement "no winner"

The current supermajority assorter assumes that there is at least one winner. Need to implement an assorter for "no winner".

Break loop in new_sample_size

The while loop in new_sample_size needs to break at some point. Obvious choice is to break when the population size is reached; this is certainly true when sampling w/o replacement. If population size is reached without confirming the outcome, raise a message? When sampling w/ replacement could add a parameter to break once a certain fraction (potentially bigger than 1) of the population has been sampled; for now, take it to be 1.

initial sample size returns 1

Somehow the calculation of the initial sample size broke when ballot polling was introduced. Presumably, calculating the incremental sample size is also wrong. And the sample size calculations should terminate if the size reaches the appropriate population size (max_cards or contest['cards']).

Looking for oc_cvrs.zip

Hi All:

Im working with Vanessa Teague et al, to understand SHANGRLA and evaluate porting it.

Im trying to run examples/OC_example.ipynb in SHANGRLA python repo. It references RLA files not in the repo, apparently from the Orange County 2022 election:

'cvr_file': '/Users/amanda/Downloads/oc_cvrs.zip',
#'cvr_file': '/Users/Jake/Desktop/oc_cvrs.zip',

Are these available somewhere that I can download?

If not, is there another example data set I can use to run a complete SHANGRLA workflow?

Thanks,
John (https://github.com/JohnLCaron)

incremental sample size: condition on the observed sample

Condition on the observed sample.
Make assumption about the error rate for future sample (default: observed rate in the sample so far).
Find quantile or expected incremental sample size for each assertion; sum maximum sampling probabilities across as-yet-unsampled cards.

Bug in find_p_values()?

Line 1609 of assertion_audit_utils.py calls overstatement_assorter() with a "cards" argument (which was inserted in commit #37). However, overstatement_assorter() does not have a cards parameter, so an error is thrown.

	# Welford's algorithm for running mean and running sd
	mj = [x[0]]
	sdj = [0]
	for i, xj in enumerate(x[1:]):
	mj.append(mj[-1] + (xj - mj[-1]) / (i + 1))
	sdj.append(sdj[-1] + (xj - mj[-2]) * (xj - mj[-1]))
	sdj = np.sqrt(sdj / j)
	# end of Welford's algorithm.

	# Welford's algorithm for running mean and running sd
	mj = [x[0]]
	sdj2 = [0]
	for i, xj in enumerate(x[1:]):
	mj.append(mj[-1] + (xj - mj[-1]) / (i + 1))
	sdj2.append(sdj2[-1] + (xj - mj[-2]) * (xj - mj[-1]))
	sdj2 = sdj2 / j
	# end of Welford's algorithm.

pbstark / shangrla Goto Github PK

shangrla's People

Contributors

Stargazers

Watchers

Forkers

shangrla's Issues

Steps to reproduce

Steps to reproduce

Recommend Projects

Recommend Topics

Recommend Org