openelections / openelections-data-wi Goto Github PK
View Code? Open in Web Editor NEWPre-processed election results for Wisconsin elections
Pre-processed election results for Wisconsin elections
CSV file: 20080909__wi__primary_ward.csv
This should, for example, include Chad Fradette as a state senate candidate in Green Bay. It's present in the source data (Republican_2008_FallElection_StateSenator_WardbyWard.xls). Not sure where it's getting lost along the way.
These elections have results for some or all offices that are available only in PDF:
I pressed WEC to get Excel versions of these added to their site. Their spokesperson said that since they've gone through a couple reorganizations since these elections occurred, they no longer have the database that these files were produced out of. So for them to produce and host Excel files for these elections, they'd have to extract results from the PDFs.
We'll have to figure out how to process these files. I've tried Tabula and PDF python libraries, but everything I've managed to produce has been lossy and/or messy.
Here's an example input file:
http://elections.wi.gov/sites/default/files/elecSpec03_wbw_assm18.pdf
In election 411, these linked files contain the same results:
"http://www.gab.wi.gov/sites/default/files/Ward%20x%20Ward%20Results_8.14.12%20primary_Assembly.xls",
"http://www.gab.wi.gov/sites/default/files/Ward%20x%20Ward%20Results_8.14.12%20primary_D.A..xls",
The second file should probably contain results for district attorney (based on the file name anyway), but it contains the same Assembly results as the first file.
In the county column, it's called St. Croix
.
The circuit court office name is inconsistent:
Saint Croix County Circuit Court, Branch x
(6 elections)St Croix County Circuit Court, Branch x
(4 elections)For the county's DA, the office name is also a mix:
St. Croix County District Attorney
(4 elections)Saint Croix County District Attorney
(3 elections)For consumers of this data that might slice it by office name, it'd be nice to coalesce these spellings so that they don't have to.
St. Croix
would be the best name to standardize to, so that it matches the county column. There are 13 files total that have the other two variants (Saint Croix
/St Croix
) that should show up as modified when a fix is in place.
Elections with recounts involved (from following the portal link to WEC's site and looking for links with "recount" visible):
2016-11-08
2014-08-12
2013-04-02
2012-06-05
2011-04-05
2010-11-02
2010-09-14
2008-11-04
2006-11-07
2004-09-14
2003-07-22
2002-11-05
2002-02-19 (now in metadata)
The API for the election index currently doesn't have 2016 elections:
http://openelections.net/api/v1/election/?format=json&limit=0&state__postal=WI
Then we can include those in the parser.
This came to light when I tried to run the load
utility in the openelections-core repo. This tried to match county name from the CSVs in this repo to OCD IDs.
It failed, because there is no "Lacrosse" county in Wisconsin, but "Lacrosse" appears in about 1/3 of our CSV data, and "La Crosse" (correct) in the other 2/3.
There are also office names that include the incorrect county name, including circuit court and district attorney.
Wisconsin doesn't actually publish official ward(precinct)-level results - they're rolled up into "Reporting Units" which might have multiple wards included. (Bigger cities are required to publish report ward-by-ward)
The LTSB disaggregates the GAB/Wisconsin Elections Commission official results and publishes them as CSVs and as Shapefiles. The LTSB version is just estimates - they proportionally allocate per-ward results from the "reporting units" because there is no better data. (They don't take into account different turnout levels of different wards, which would be nice)
Anyway, I've asked for the code for how they do that but they said they weren't able to share - some of it I guess is built into their district-management utility and some of it I get the sense that they just manually edit.
I put together some Jupyter notebooks that I think replicates how the LTSB processes the GAB results. They'd be pretty easy to turn into some scripts, but there'd be a bunch of code to handle special cases for each election. This is what someone would have to do if they wanted to map any of the current election results in this repo.
Long-term OpenElections might want to think about publishing the Wisconsin results like the LTSB does, so they're at least sort-of "ward" level. Alternatively, including a set of files that has "reporting_unit" -> (list of FIPS codes that make up that reporting unit) keeps both the original results as-reported but makes it a bit easier to map.
id 409 general election file Ward%20by%20Ward_11.6.12.xls includes results for the District 33 special primary election that occurred on the same date.
This data is duplicated in the id 410 special election file:
Ward%20Results-11.6.12-Sen%2033%20Spec%20Pri.xls
Results file 20121106__wi__general__ward.csv contains all of the data in
20121106__wi__special__primary__ward.csv
Should the duplicated data be removed from 20121106__wi__general__ward.csv ?
We might print a warning when detecting primary data in a general election file. (Primary election office names end in a party name.)
The input file for this election has an unusual format. It is not simple to compute total votes for each reporting unit. This calculation has not been implemented. The total votes field in the results for county D.A.s is currently 0.
Results file is 20120814__wi__primary__ward.csv
We've seen single-line headers in just one election so far, id 1845 (results in 20001107__wi__general__ward.csv). Two of the six input files have a single-line header:
001107_US_SEN_SORT.xls
001107_PRES_SORT.xls
The rest have the usual two-line headers.
In the single-line headers, the party name is appended to the name of each candidate. To separate these, we can look for known party names. (Splitting off the last word does not work for two-word parties, such as "Wisconsin Greens".)
Maybe we can add something like this in tests/features/steps/steps.py?
Most input files use the second format.
Input files for two elections use the first format:
Current results for id 409 have "District Attorney" in office field, county is missing.
Some of these are things we could follow up on with WEC.
The metadata shows election 422 (on 2011-04-05) as a special primary, but the current direct links are for judicial results.
What happened on April 5, 2011 was two things:
The source files currently in election 422 should be moved over to election 421, making sure that election 421 doesn't then include the same results twice. And then this will be the source file for election 422:
http://elections.wi.gov/sites/default/files/page/wxw_assm_60_94_pdf_16250.pdf
This test is currently failing:
| party | candidate | county | office | district | ward | votes | total |
| LIB | Scattering | Marquette | State Senate | 14 | TOWN OF BUFFALO Wards 1 & 2 | 1 | 1 |
Because the column is in fact blank in the CSV output:
# 20080909__wi__primary__ward.csv
Marquette,Town Of Buffalo Wards 1 & 2,State Senate,,1,LIB,Scattering,1
And that's because the table in the source file is one in the middle of a pre-2010 file with some of the columns cut off:
We would normally extract the district from the office name column, but it isn't present here. We're copying over the office name from the previous chunk of the file, which is correct. But carrying over the district number would be incorrect: the previous section of the file is for district 12, and the next section is for district 16.
Our parser captures party data from many input files, but many of our tests don't check this field. We test it mainly for primary elections, but also for several general elections. Because party data is in our results files, it should be tested.
(The party field is needed to distinguish "Scattering" records in primary elections.)
To add party data to the tests, it should be looked up manually in the input files, to guarantee an independent test. Currently about 74 tests need party data. (109 tests specify party, about 97 are for nonpartisan offices.)
Please edit the wi-elections.feature.csv file to add party data. (The older format wi-elections.feature file may be removed, since it is not used by the new faster testing program.)
We should be testing the district field. It contains a district number for House, Court of Appeals, State Senate, and State Assembly offices.
Also, without the district field, it is difficult to find the data for a (failed) test, because the data is ordered by office, which (in the input) includes the district.
This is something I can probably help with, but just wanted to get it on the list.
ID 448: 2004-01-27 special (Assembly District 17)
664: 2003-11-18 recall general (State Senate District 6)
674: 2003-10-21 recall primary (State Senate District 6)
689: 2003-06-24 special primary (Assembly Districts 21, 71)
These are listed in the elections index API. Currently Wisconsin Elections Commission doesn't have results online for these. The Wisconsin Blue Book doesn't have as detailed results as the City of Milwaukee:
http://city.milwaukee.gov/January2720041720.htm
http://city.milwaukee.gov/November1820031723.htm
http://city.milwaukee.gov/October2120031722.htm
http://city.milwaukee.gov/June2420031726.htm
I think all of these elections (for statewide offices) occurred within the city limits. But the city's results don't get any more granular than candidate totals (county-wide, given that they're also within Milwaukee County).
Milwaukee County's election results online don't go as far back as these races. (There is an offer on their site to provide "more detailed results", for a price of $0.50 per "page".)
We can check with WEC on why these aren't listed on the state-level site.
There are examples of this inconsistency in our test file.
The first data example in Issue #37 shows a double space in the office name. Searching for more, I found 3172 double spaces in 25 results files. These are in the input data, but should be removed for consistency.
Wisconsin has released election data in CSV format at a finer resolution than what OpenElections currently offers. Currently, wards are frequently lumped together in the OpenElections data (see ward names like "Town Of Merton Ward 1-3,7-9"), whereas the Wisconsin data on the ArcGIS website is broken out by individual wards. I'd be more than happy to try to integrate this data with what's currently available from OpenElections if anyone's interested. The only caveat is that individual candidate names are not included with the ArcGIS data, though that may be fixable.
This is coming up with regard to things like district attorney and circuit court elections. (Which I guess aren't part of the typical openelections scope...) Currently the office name includes the county name for these county offices. But periods get stripped out of the office, so we end up with "St Croix County Circuit Court".
Maybe that's okay, maybe not. We could try not stripping out periods and see what problems that causes. :-) I haven't spotted where this is happening in parser.py though.
It looks like Assembly district 99 got dropped in the data as checked in:
In [1]: import pandas as pd
In [2]: df = pd.read_csv("20141104__wi__general_ward.csv")
In [3]: df.columns
Out[3]:
Index(['county', 'ward', 'office', 'district', 'total votes', 'party',
'candidate', 'votes'],
dtype='object')
In [4]: df.loc[df['office'] == 'Assembly']['district'].unique()
Out[4]:
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.,
12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.,
23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33.,
34., 35., 36., 37., 38., 39., 40., 41., 42., 43., 44.,
45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55.,
56., 57., 58., 59., 60., 61., 62., 63., 64., 65., 66.,
67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77.,
78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88.,
89., 90., 91., 92., 93., 94., 95., 96., 97., 98.])
In [5]: len(df.loc[df['office'] == 'Assembly']['district'].unique())
Out[5]: 98
Grabbing the API metadata from http://openelections.net/api/v1/election/?format=json&limit=0&state__postal=WI and checking the .xlsx the GAB supplies I do see AD 99:
"direct_links": [
"http://www.gab.wi.gov/sites/default/files/11.4.2014%20Election%20Results%20-%20all%20offices%20w%20x%20w%20report.xlsx"
],
"end_date": "2014-11-04",
"gov": true,
"house": true,
"id": 1574,
Not sure what other elections might be missing data in case this is an off-by-one error in the parser somewhere...
For election 1573 on 2015-02-17, WEC has only one portal page and one ward-level Excel results file:
This file includes results for a partisan primary for state senate, and the non-partisan primary for judicial offices. These two things should be defined as two separate elections, but there's only the one source file.
So one the following would be needed:
Check out test failures here: https://travis-ci.org/nbdavies/openelections-data-wi/builds/262484146
(These came up because I added tests for elections that were previously untested.)
An example failing assertion:
Scenario Outline: Tests -- @42.3 20150217__wi__special__primary__ward.csv
When I visit the election file
And I search for CON party candidate Scattering running for State Senate in the VILLAGE OF KEWASKUM WD 6 in Fond du Lac
Assertion Failed: No record found with expected values
Captured stdout:
Expecting Con, Scattering, State Senate, Village Of Kewaskum Wd 6, Fond Du Lac
If the test didn't specify a party for this, it would fail for more than one matching row (since Scattering appears across parties' primaries).
Currently
2004\20041102__wi__general__ward.csv includes only presidential and us senate results
More seems to be available (in pdfs) here:
https://elections.wi.gov/elections-voting/results/2004/fall-general
Currently we titlecase all text fields (county, office, ward, candidate). This corrupts many spellings:
county: Fond du Lac
ward: McFarland, McKinley, Prairie du Sac, Prairie du Chien, Fond du Lac, ...
candidate: McCain, de Felice, VanDierendonck, LaDuke, FitzGerald, MaryAnn, "Ben Olson, III", ...
Our test data also has inconsistent capitalization. Current test code titlecases test data before comparing to results. When results capitalization is corrected, titlecasing should be removed, and test data corrected so it tests proper capitalization of results.
For the 2008-09-09 election, the direct links for State Assembly results are:
Contrary to what the file names imply, each of these files include the state assembly results for all parties. This leads to double-reporting in our CSV output.
Either we should only include one these files for processing, or we would need to build this one exception into the parser, to be the only time where we don't process all of the direct links for an election.
Running fetch.py in the repo to download source files from the "direct_link" urls in the API includes some links that are now broken. Could be due to things getting shifted around in Wisconsin's transition from the GAB to the Elections Commission.
`
Downloading data/docview.asp?docid=6472&locid=47
Could not download file http://elections.state.wi.us/docview.asp?docid=6472&locid=47, status code 404
Downloading data/CxC_fallpri04_recount_dem_assm36.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/CxC_fallpri04_recount_dem_assm36.pdf , status code 404
Downloading data/WxW%20Fall%20Pri04_dem%20assm36%20recount.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20Fall%20Pri04_dem%20assm36%20recount.pdf , status code 404
Downloading data/CxC%20results_fall04%20primary_con.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04%20primary_con.pdf , status code 404
Downloading data/WxW%20fall%20primary%2004con.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004con.pdf , status code 404
Downloading data/CxC%20results_fall04_primary_dem.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04_primary_dem.pdf , status code 404
Downloading data/WxW%20fall%20primary%2004dem.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004dem.pdf , status code 404
Downloading data/CxC%20results_fall04%20primary_ind.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04%20primary_ind.pdf , status code 404
Downloading data/WxW%20fall%20primary%2004ind.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004ind.pdf , status code 404
Downloading data/CxC%20results_fall04_primary_lib.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04_primary_lib.pdf , status code 404
Downloading data/WxW%20Fall%20primary%2004lib.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20Fall%20primary%2004lib.pdf , status code 404
Downloading data/CxC%20results_fall04%20primary_rep.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04%20primary_rep.pdf , status code 404
Downloading data/WxW%20fall%20primary%2004rep.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004rep.pdf , status code 404
Downloading data/CxC%20results_fall04_primary_wgr.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/CxC%20results_fall04_primary_wgr.pdf , status code 404
Downloading data/WxW%20fall%20primary%2004wgr.pdf
Could not download file http://www.gab.wi.gov/sites/default/files/WxW%20fall%20primary%2004wgr.pdf , status code 404
Downloading data/docview.asp?docid=1444&locid=47
Could not download file http://elections.state.wi.us/docview.asp?docid=1444&locid=47, status code 404
Downloading data/docview.asp?docid=1480&locid=47
Could not download file http://elections.state.wi.us/docview.asp?docid=1480&locid=47, status code 404
no results for id: 674
no results for id: 685
no results for id: 689
`
For context, election 1830 (date 2012-05-08) is a recall primary election for governor, lieutenant governor, and state senate.
The office name for the state senate results in our output is currently "Recall State Senate-21 - Democratic" and should just be "State Senate".
Here's an example output row:
Racine,"Village Of Mount Pleasant Wards 10,11,12,15",Recall State Senate-21 - Democratic,,1256,DEM,Tamra Varebrook,390
"21" belongs in the District column instead, which is currently empty. We're already picking up "DEM" for the party.
These offices require a district: House, State Senate, State Assembly, Court Of Appeals. Parser should check that a district is present in records with these offices, and not present for others.
According to @davipo:
Some Circuit Court office names use a comma before Branch, and some do not. There are examples without commas in our tests for 2016 and 2017.
An example without the comma, in 2016/20160216__wi__primary__ward.csv
:
Portage,Town Of Alban Ward 1,Portage County Circuit Court Branch 2,,107,NP,Trish Baker,40
An example with the comma, in 2014/20140401__wi__general__ward.csv
:
Barron,Town Of Almena Wards 1-2,"Barron County Circuit Court, Branch 2",,69,NP,J. Michael Bitney,69
There are about 31,000 lines of output without the comma vs 166,000 with comma, so I think "with comma" wins?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.