Coder Social home page Coder Social logo

alarm-redist / census-2020 Goto Github PK

View Code? Open in Web Editor NEW
10.0 4.0 3.0 546.5 MB

Joined 2020 Census and election files for redistricting.

Home Page: https://alarm-redist.github.io/posts/2021-08-10-census-2020/

License: Other

R 100.00%
census census-data redistricting election-data r

census-2020's Introduction

2020 Redistricting Data Files

Christopher T. Kenny and Cory McCartan

License: CC BY-SA 4.0 License: MIT

The precinct-level demographic and election data from the 2020 decennial census and the Voting and Election Science Team which have been tidied and joined together using 2020 precinct boundaries. Where 2020 precinct boundaries are not available, Census block-level data is provided instead, and where no VEST data is available, only demographic information is provided. Code to generate the data from these sources is included; the entire workflow is open-source and reproducible.

Getting the data

The easiest way to get the data is to download it from our website. You can also download a ZIP of all the data here.

However, if you want to work with a specific set of states, or wish to join the data to a precinct shapefile, you can use the alarmdata package.

Using the data

Please make sure to cite the Voting and Election Science Team (CC-4.0) and the U.S. Census Bureau. Consult the license for information on modifying and sharing the data and/or code.

  • For redistricting and voting rights analysis, we recommend the redist package.
  • For pre-processing and tidying data for redistricting analysis, we recommend the geomander package.
  • For more custom tabulations of the 2020 census data, we recommend the PL94171 package.
  • For general-purpose census data processing, we recommend the censable package.
  • For alternate data unaffected by Census differential privacy, you may want to consider FCC block-level estimates, available using the blockpop package.

Data Format

Each data table contains several identification columns, a set of census-derived demographic columns, and a set of VEST-derived election columns.

  • GEOID20 is the unique identifier for a precinct or Census block. The state and county of the precinct or block are also provided.

  • Census variables are prefixed with pop_ or vap_, depending on whether they are for the entire population or the voting-age population. Suffixes refer to racial and ethnic categories, as follows:

    • _hisp: Hispanic or Latino (of any race)
    • _white: White alone, not Hispanic or Latino
    • _black: Black or African American alone, not Hispanic or Latino
    • _aian: American Indian and Alaska Native alone, not Hispanic or Latino
    • _asian: Asian alone, not Hispanic or Latino
    • _nhpi: Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
    • _other: Some Other Race alone, not Hispanic or Latino
    • _two: Population of two or more races, not Hispanic or Latino
  • Election variables consist of average vote counts for Democratic and Republican candidates. The adv_## and arv_## columns report the average vote count in the ## year election, across all statewide races contested by both parties. The ndv and nrv columns further average the vote counts across all available election years. For specific statewide races, you may download the files in vest-2020/ and join them to the data using the GEOID20 column. Additional election data is provided with the following naming convention: off_yr_par_can where:

    • off indicates the three letter office abbreviation. Possible choices are:
      • pre: President
      • uss: United States Senate
      • gov: Governor
      • atg: Attorney General
      • sos: Secretary of State
    • yr indicates the year of the election
    • par inidcates the party
      • rep: Republican
      • dem: Democratic
    • can indicates the first three letters of the candidate's last name

Technical notes

To produce election data using 2020 precinct boundaries, election results were projected down to the 2010 block level using voting-age population as weights. Results for 2020 blocks were then estimated using 2010 blocks and the land-use-based crosswalk files from VEST. Finally, 2020 blocks were aggregated to 2020 Census VTDs using the Census' 2020 block assignment files.

census-2020's People

Contributors

christopherkenny avatar corymccartan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

census-2020's Issues

Disaggregation method

Hello! I'm looking to learn about your disaggregation process from precincts to blocks a little more. In your code 00_build_vest.R it looks like you make the block file here at line 77 dec <- build_dec('block', state = state, geometry = FALSE, groups = 'all', year = 2010) for 2010 and then assign the votes down based on the 2010 VAP ratio here (lines 86-89) elec_at_2010 <- elec_at_2010 %>% mutate(!!election := estimate_down( value = vest[[election]], wts = dec[['vap']], group = match_list )).
I wanted to confirm that you do not take intra-Census data into account for the disag process (e.g. for 2018 elections, disaggregting on some sort of 2018 demographic, like a voter file)? As I understand it, you use one ratio per block, dependent on which Census year and demographic are used (in build_dec and estimate_down, respectively). If you do integrate intra-decennial data, please let me know where I could find it in the code!

download_redistricting_file

Thanks for the great data and associated pkgs. If I'm not mistaken, the namespace of download() function was unclear in the original function. Also, I found that creating res objective is unnecessary. The downside of my take of the below function is it has additional dependencies like {here} and {RCurl}. But I think that the overall code is less dense and more readable.

download_redistricting_file <- function(abbr, folder) {

    abbr = tolower(abbr)

    url_vtd = paste0("https://raw.githubusercontent.com/alarm-redist/census-2020/",
                     "main/census-vest-2020/", abbr, "_2020_vtd.csv")

    url_block = paste0("https://raw.githubusercontent.com/alarm-redist/census-2020/",
                       "main/census-vest-2020/", abbr, "_2020_block.csv")

    if (RCurl::url.exists(url_vtd)) {

    path = here::here(folder, basename(url_vtd))

    download.file(url_vtd, path)

    } else {

        path = here(folder, basename(url_block))

        download.file(url_block, path)
    }

}

more detailed readme.md

You've published your R codes. For people who don't know R though, can the readme.md please describe the code's logic in more detail? Thanks.

States without 2016 Presidential vote

Only two states are missing 2016 Presidential vote:

  • West Virginia
  • Mississippi.

This is not really a bug because VEST, MGGG, openelections don't seem to have that data either. But just flagging here in case someone finds a solution. It would be great to have all 50-states.

It is worth noting the NYT Upshot has 2020 precinct results in GeoJSON for those WV and MS. They don't seem to have 2016 for those states, though.

more votes in a VTD than population

In New Mexico, there are voting districts that have more votes than voting population (or than total population). This holds when joining back to the race-specific file. Some examples below where adrv_18 is the sum of the average democratic and average republican votes and sen_18 is the number of votes counted in the 2018 Senate race in New Mexico.

Should this be possible?

# A tibble: 52 × 7
       GEOID20   pop   vap adv_18 arv_18 adrv_18 sen_18
         <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>  <dbl>
 1 35027000028   178   151   187    440.    627.   465.
 2 35001000567   197   151   244.   202.    446.   330.
 3 35001000368   421   339   476.   351.    827.   615.
 4 35049000123   413   396   822.   138.    960.   698.
 5 35051000009   153   123   146.   142.    288.   214.
 6 35001000380   491   387   458.   373.    831.   625.
 7 35049000078   835   693  1248    269.   1517.  1097.
 8 35049000120   554   495   847.   168.   1016.   735.
 9 35001000537   514   436   523    348.    871.   645.
10 35049000122   480   437   728.   114.    842    607 
# … with 42 more rows

Blocks with zero population in California

About 25 percent of rows in the CA data have 0 population. 11 percent of rows have 0 population and 0 election vote tallies. Every CA county has these zero blocks to some extent. What should we make of these blocks?

suppressPackageStartupMessages(library(dplyr))
library(readr)


gh_url <- "https://github.com/alarm-redist/census-2020/raw/main/"
ca_alarm <- read_csv(paste0(gh_url, "census-vest-2020/ca_2020_block.csv"),
                     show_col_types = FALSE)

nrow(ca_alarm) # total rows
#> [1] 519723

sum(ca_alarm$pop == 0) # rows with 0 population
#> [1] 142132
sum(ca_alarm$vap == 0) # rows with 0 vap
#> [1] 144154

# row number of rows with no pres data
no_pres_rows <- which(with(ca_alarm, (pre_16_rep_tru == 0 & pre_16_dem_cli == 0 &  pre_20_rep_tru == 0 & pre_20_dem_bid == 0)))
length(no_pres_rows) # rows with no election data
#> [1] 187930

sum(ca_alarm$pop[no_pres_rows] == 0) # rows with no election data AND no population
#> [1] 59249

Created on 2021-11-04 by the reprex package (v2.0.1)

Oregon and Hawaii also have similar zero pop blocks, though to a lesser extent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.