Coder Social home page Coder Social logo

openelections-data-wi's People

Contributors

acouch avatar davipo avatar dwillis avatar nbdavies avatar warwickmm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openelections-data-wi's Issues

Republican State Senate results not in CSV for 2008-09-09

CSV file: 20080909__wi__primary_ward.csv

This should, for example, include Chad Fradette as a state senate candidate in Green Bay. It's present in the source data (Republican_2008_FallElection_StateSenator_WardbyWard.xls). Not sure where it's getting lost along the way.

Elections only available in PDF

These elections have results for some or all offices that are available only in PDF:

  • 2005-02-15 (ID 443)
  • 2004-11-02 (ID 444)
  • 2004-09-14 (ID 445)
  • 2004-04-06 (ID 446)
  • 2004-02-17 (ID 447)
  • 2003-07-22 (ID 685)
  • 2003-04-29 (ID 1756)

I pressed WEC to get Excel versions of these added to their site. Their spokesperson said that since they've gone through a couple reorganizations since these elections occurred, they no longer have the database that these files were produced out of. So for them to produce and host Excel files for these elections, they'd have to extract results from the PDFs.

We'll have to figure out how to process these files. I've tried Tabula and PDF python libraries, but everything I've managed to produce has been lossy and/or messy.

Here's an example input file:
http://elections.wi.gov/sites/default/files/elecSpec03_wbw_assm18.pdf

St Croix and Saint Croix should be the same county

In the county column, it's called St. Croix.

The circuit court office name is inconsistent:

  • Saint Croix County Circuit Court, Branch x (6 elections)
  • St Croix County Circuit Court, Branch x (4 elections)

For the county's DA, the office name is also a mix:

  • St. Croix County District Attorney (4 elections)
  • Saint Croix County District Attorney (3 elections)

For consumers of this data that might slice it by office name, it'd be nice to coalesce these spellings so that they don't have to.

St. Croix would be the best name to standardize to, so that it matches the county column. There are 13 files total that have the other two variants (Saint Croix/St Croix) that should show up as modified when a fix is in place.

Recounted elections ought to use recount results

Elections with recounts involved (from following the portal link to WEC's site and looking for links with "recount" visible):
2016-11-08
2014-08-12
2013-04-02
2012-06-05
2011-04-05
2010-11-02
2010-09-14
2008-11-04
2006-11-07
2004-09-14
2003-07-22
2002-11-05
2002-02-19 (now in metadata)

La Crosse County Misspelled

This came to light when I tried to run the load utility in the openelections-core repo. This tried to match county name from the CSVs in this repo to OCD IDs.

It failed, because there is no "Lacrosse" county in Wisconsin, but "Lacrosse" appears in about 1/3 of our CSV data, and "La Crosse" (correct) in the other 2/3.

There are also office names that include the incorrect county name, including circuit court and district attorney.

"reporting units" to ward-level results

Wisconsin doesn't actually publish official ward(precinct)-level results - they're rolled up into "Reporting Units" which might have multiple wards included. (Bigger cities are required to publish report ward-by-ward)

The LTSB disaggregates the GAB/Wisconsin Elections Commission official results and publishes them as CSVs and as Shapefiles. The LTSB version is just estimates - they proportionally allocate per-ward results from the "reporting units" because there is no better data. (They don't take into account different turnout levels of different wards, which would be nice)

Anyway, I've asked for the code for how they do that but they said they weren't able to share - some of it I guess is built into their district-management utility and some of it I get the sense that they just manually edit.

I put together some Jupyter notebooks that I think replicates how the LTSB processes the GAB results. They'd be pretty easy to turn into some scripts, but there'd be a bunch of code to handle special cases for each election. This is what someone would have to do if they wanted to map any of the current election results in this repo.

Long-term OpenElections might want to think about publishing the Wisconsin results like the LTSB does, so they're at least sort-of "ward" level. Alternatively, including a set of files that has "reporting_unit" -> (list of FIPS codes that make up that reporting unit) keeps both the original results as-reported but makes it a bit easier to map.

Primary election data duplicated in 2012-11-06 general election file

id 409 general election file Ward%20by%20Ward_11.6.12.xls includes results for the District 33 special primary election that occurred on the same date.

This data is duplicated in the id 410 special election file:
Ward%20Results-11.6.12-Sen%2033%20Spec%20Pri.xls

Results file 20121106__wi__general__ward.csv contains all of the data in
20121106__wi__special__primary__ward.csv

Should the duplicated data be removed from 20121106__wi__general__ward.csv ?
We might print a warning when detecting primary data in a general election file. (Primary election office names end in a party name.)

total votes is 0 in 2012-08-14 D.A. primary election results

The input file for this election has an unusual format. It is not simple to compute total votes for each reporting unit. This calculation has not been implemented. The total votes field in the results for county D.A.s is currently 0.

Results file is 20120814__wi__primary__ward.csv

Handle 2000-11-07 input files with single-line headers

We've seen single-line headers in just one election so far, id 1845 (results in 20001107__wi__general__ward.csv). Two of the six input files have a single-line header:
001107_US_SEN_SORT.xls
001107_PRES_SORT.xls
The rest have the usual two-line headers.

In the single-line headers, the party name is appended to the name of each candidate. To separate these, we can look for known party names. (Splitting off the last word does not work for two-word parties, such as "Wisconsin Greens".)

Party Oddities in Source Files

Some of these are things we could follow up on with WEC.

  • In 20140218__wi__primary__ward, all the candidates (for two circuit court seats) are labeled with party IND. It should be NP.
  • 20130402__wi__general__ward - same problem (party IND should be NP)
  • 20130219__wi__primary__ward - same problem (party IND should be NP)
  • 20110405__wi__general__ward.csv - candidates do not have parties listed. Both candidates should be NP.

2011-04-05 election

The metadata shows election 422 (on 2011-04-05) as a special primary, but the current direct links are for judicial results.

What happened on April 5, 2011 was two things:

  1. General (non-special) elections for circuit court, court of appeals, and supreme court.
  2. Special primary elections for assembly districts 60 and 94.

The source files currently in election 422 should be moved over to election 421, making sure that election 421 doesn't then include the same results twice. And then this will be the source file for election 422:
http://elections.wi.gov/sites/default/files/page/wxw_assm_60_94_pdf_16250.pdf

District sometimes lost in pre-2010 files with truncated columns

This test is currently failing:

    | party | candidate                     | county    | office                                    | district  | ward                                          | votes | total |
    | LIB   | Scattering                    | Marquette | State Senate                              | 14        | TOWN OF BUFFALO Wards 1 & 2                   | 1     | 1     |

Because the column is in fact blank in the CSV output:

# 20080909__wi__primary__ward.csv
Marquette,Town Of Buffalo Wards 1 & 2,State Senate,,1,LIB,Scattering,1

And that's because the table in the source file is one in the middle of a pre-2010 file with some of the columns cut off:
image
We would normally extract the district from the office name column, but it isn't present here. We're copying over the office name from the previous chunk of the file, which is correct. But carrying over the district number would be incorrect: the previous section of the file is for district 12, and the next section is for district 16.

Party is missing from many tests

Our parser captures party data from many input files, but many of our tests don't check this field. We test it mainly for primary elections, but also for several general elections. Because party data is in our results files, it should be tested.

(The party field is needed to distinguish "Scattering" records in primary elections.)

To add party data to the tests, it should be looked up manually in the input files, to guarantee an independent test. Currently about 74 tests need party data. (109 tests specify party, about 97 are for nonpartisan offices.)

Please edit the wi-elections.feature.csv file to add party data. (The older format wi-elections.feature file may be removed, since it is not used by the new faster testing program.)

Add district field to tests

We should be testing the district field. It contains a district number for House, Court of Appeals, State Senate, and State Assembly offices.

Also, without the district field, it is difficult to find the data for a (failed) test, because the data is ordered by office, which (in the input) includes the district.

Elections Without Source Files/Direct Links

ID 448: 2004-01-27 special (Assembly District 17)
664: 2003-11-18 recall general (State Senate District 6)
674: 2003-10-21 recall primary (State Senate District 6)
689: 2003-06-24 special primary (Assembly Districts 21, 71)

These are listed in the elections index API. Currently Wisconsin Elections Commission doesn't have results online for these. The Wisconsin Blue Book doesn't have as detailed results as the City of Milwaukee:

http://city.milwaukee.gov/January2720041720.htm
http://city.milwaukee.gov/November1820031723.htm
http://city.milwaukee.gov/October2120031722.htm
http://city.milwaukee.gov/June2420031726.htm

I think all of these elections (for statewide offices) occurred within the city limits. But the city's results don't get any more granular than candidate totals (county-wide, given that they're also within Milwaukee County).

Milwaukee County's election results online don't go as far back as these races. (There is an offer on their site to provide "more detailed results", for a price of $0.50 per "page".)

We can check with WEC on why these aren't listed on the state-level site.

Extra spaces in office, ward, and candidate fields

The first data example in Issue #37 shows a double space in the office name. Searching for more, I found 3172 double spaces in 25 results files. These are in the input data, but should be removed for consistency.

Adding ward-level data via Open Data Portal/ArcGIS

Wisconsin has released election data in CSV format at a finer resolution than what OpenElections currently offers. Currently, wards are frequently lumped together in the OpenElections data (see ward names like "Town Of Merton Ward 1-3,7-9"), whereas the Wisconsin data on the ArcGIS website is broken out by individual wards. I'd be more than happy to try to integrate this data with what's currently available from OpenElections if anyone's interested. The only caveat is that individual candidate names are not included with the ArcGIS data, though that may be fixable.

Office title strips periods, which disrupts county names

This is coming up with regard to things like district attorney and circuit court elections. (Which I guess aren't part of the typical openelections scope...) Currently the office name includes the county name for these county offices. But periods get stripped out of the office, so we end up with "St Croix County Circuit Court".

Maybe that's okay, maybe not. We could try not stripping out periods and see what problems that causes. :-) I haven't spotted where this is happening in parser.py though.

Wisconsin 2014 general elections data missing an Assembly district?

It looks like Assembly district 99 got dropped in the data as checked in:

In [1]: import pandas as pd

In [2]: df = pd.read_csv("20141104__wi__general_ward.csv")

In [3]: df.columns
Out[3]: 
Index(['county', 'ward', 'office', 'district', 'total votes', 'party',
       'candidate', 'votes'],
      dtype='object')

In [4]: df.loc[df['office'] == 'Assembly']['district'].unique()
Out[4]: 
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,
        23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,
        34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,
        45.,  46.,  47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,  55.,
        56.,  57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,  66.,
        67.,  68.,  69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,
        78.,  79.,  80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,
        89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.])

In [5]: len(df.loc[df['office'] == 'Assembly']['district'].unique())
Out[5]: 98

Grabbing the API metadata from http://openelections.net/api/v1/election/?format=json&limit=0&state__postal=WI and checking the .xlsx the GAB supplies I do see AD 99:

"direct_links": [
"http://www.gab.wi.gov/sites/default/files/11.4.2014%20Election%20Results%20-%20all%20offices%20w%20x%20w%20report.xlsx"
],
"end_date": "2014-11-04",
"gov": true,
"house": true,
"id": 1574,

Not sure what other elections might be missing data in case this is an off-by-one error in the parser somewhere...

Election source files that include more than one election

For election 1573 on 2015-02-17, WEC has only one portal page and one ward-level Excel results file:

http://elections.wi.gov/sites/default/files/Spring%20Primary%202.17.15%20Results%20by%20Ward%20Report.xlsx

This file includes results for a partisan primary for state senate, and the non-partisan primary for judicial offices. These two things should be defined as two separate elections, but there's only the one source file.

So one the following would be needed:

  • WEC would have to create a separate Excel file and/or webpage.
  • We would need to be able to feed the same source file for both elections, and specify somewhere (in metadata?) which tabs of the file belong to which election.
  • Use metadata to define one election as containing state legislators, and the other as not containing state legislators (having no checkboxes for office checked). Then only include results in the election's output file that match the expected offices.

Scattering in primaries (post-2010) lacks party

Check out test failures here: https://travis-ci.org/nbdavies/openelections-data-wi/builds/262484146
(These came up because I added tests for elections that were previously untested.)

An example failing assertion:

  Scenario Outline: Tests -- @42.3 20150217__wi__special__primary__ward.csv 
    When I visit the election file
    And I search for CON party candidate Scattering running for State Senate in the VILLAGE OF KEWASKUM WD 6 in Fond du Lac
      Assertion Failed: No record found with expected values
      Captured stdout:
      Expecting Con, Scattering, State Senate, Village Of Kewaskum Wd 6, Fond Du Lac

If the test didn't specify a party for this, it would fail for more than one matching row (since Scattering appears across parties' primaries).

Fix capitalization in results data

Currently we titlecase all text fields (county, office, ward, candidate). This corrupts many spellings:
county: Fond du Lac
ward: McFarland, McKinley, Prairie du Sac, Prairie du Chien, Fond du Lac, ...
candidate: McCain, de Felice, VanDierendonck, LaDuke, FitzGerald, MaryAnn, "Ben Olson, III", ...

Our test data also has inconsistent capitalization. Current test code titlecases test data before comparing to results. When results capitalization is corrected, titlecasing should be removed, and test data corrected so it tests proper capitalization of results.

2008-09-09 State Assembly Results are cumulative/redundant

For the 2008-09-09 election, the direct links for State Assembly results are:

Contrary to what the file names imply, each of these files include the state assembly results for all parties. This leads to double-reporting in our CSV output.

Either we should only include one these files for processing, or we would need to build this one exception into the parser, to be the only time where we don't process all of the direct links for an election.

Broken links in Metadata API

Running fetch.py in the repo to download source files from the "direct_link" urls in the API includes some links that are now broken. Could be due to things getting shifted around in Wisconsin's transition from the GAB to the Elections Commission.

`
Downloading data/docview.asp?docid=6472&locid=47

Could not download file http://elections.state.wi.us/docview.asp?docid=6472&locid=47, status code 404

Downloading data/CxC_fallpri04_recount_dem_assm36.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/CxC_fallpri04_recount_dem_assm36.pdf , status code 404

Downloading data/WxW%20Fall%20Pri04_dem%20assm36%20recount.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20Fall%20Pri04_dem%20assm36%20recount.pdf , status code 404

Downloading data/CxC%20results_fall04%20primary_con.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04%20primary_con.pdf , status code 404

Downloading data/WxW%20fall%20primary%2004con.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004con.pdf , status code 404

Downloading data/CxC%20results_fall04_primary_dem.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04_primary_dem.pdf , status code 404

Downloading data/WxW%20fall%20primary%2004dem.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004dem.pdf , status code 404

Downloading data/CxC%20results_fall04%20primary_ind.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04%20primary_ind.pdf , status code 404

Downloading data/WxW%20fall%20primary%2004ind.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004ind.pdf , status code 404

Downloading data/CxC%20results_fall04_primary_lib.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04_primary_lib.pdf , status code 404

Downloading data/WxW%20Fall%20primary%2004lib.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20Fall%20primary%2004lib.pdf , status code 404

Downloading data/CxC%20results_fall04%20primary_rep.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04%20primary_rep.pdf , status code 404

Downloading data/WxW%20fall%20primary%2004rep.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004rep.pdf , status code 404

Downloading data/CxC%20results_fall04_primary_wgr.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04_primary_wgr.pdf , status code 404

Downloading data/WxW%20fall%20primary%2004wgr.pdf

Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004wgr.pdf , status code 404

Downloading data/docview.asp?docid=1444&locid=47

Could not download file http://elections.state.wi.us/docview.asp?docid=1444&locid=47, status code 404

Downloading data/docview.asp?docid=1480&locid=47

Could not download file http://elections.state.wi.us/docview.asp?docid=1480&locid=47, status code 404

no results for id: 674

no results for id: 685

no results for id: 689
`

2012-05-08 includes office "Recall State Senate-21 - Democratic" (should be "State Senate")

For context, election 1830 (date 2012-05-08) is a recall primary election for governor, lieutenant governor, and state senate.

The office name for the state senate results in our output is currently "Recall State Senate-21 - Democratic" and should just be "State Senate".

Here's an example output row:

Racine,"Village Of Mount Pleasant Wards 10,11,12,15",Recall State Senate-21 - Democratic,,1256,DEM,Tamra Varebrook,390

"21" belongs in the District column instead, which is currently empty. We're already picking up "DEM" for the party.

Circuit court branch with/without comma

According to @davipo:

Some Circuit Court office names use a comma before Branch, and some do not. There are examples without commas in our tests for 2016 and 2017.

An example without the comma, in 2016/20160216__wi__primary__ward.csv:

Portage,Town Of Alban Ward 1,Portage County Circuit  Court Branch 2,,107,NP,Trish Baker,40

An example with the comma, in 2014/20140401__wi__general__ward.csv:

Barron,Town Of Almena Wards 1-2,"Barron County Circuit Court, Branch 2",,69,NP,J. Michael Bitney,69

There are about 31,000 lines of output without the comma vs 166,000 with comma, so I think "with comma" wins?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.