Coder Social home page Coder Social logo

subeniwal / birdie Goto Github PK

View Code? Open in Web Editor NEW

This project forked from corymccartan/birdie

0.0 0.0 0.0 25.5 MB

Bayesian Instrumental Regression for Disparity Estimation

Home Page: http://corymccartan.com/birdie/

License: GNU General Public License v3.0

C++ 44.38% C 0.19% R 55.03% Stan 0.40%

birdie's Introduction

BIRDiE: Estimating disparities when race is not observed

R-CMD-check

Bayesian Instrumental Regression for Disparity Estimation (BIRDiE) is a class of Bayesian models for accurately estimating conditional distributions by race, using Bayesian Improved Surname Geocoding (BISG) probability estimates of individual race. This package implements BIRDiE as described in McCartan, Goldin, Ho and Imai (2022). It also implements standard BISG and an improved measurement-error BISG model as described in Imai, Olivella, and Rosenman (2022).

BIRDiE Overview Poster

Installation

BIRDiE is not yet available on CRAN. You can install the latest version of the package with:

install.packages("birdie", repos = "https://corymccartan.r-universe.dev")

You can also install the development version with:

# install.packages("remotes")
remotes::install_github("CoryMcCartan/birdie")

Basic Usage

A basic analysis has two steps. First, you compute BISG probability estimates with the bisg() or bisg_me() functions (or using any other probabilistic race prediction tool). Then, you estimate the distribution of an outcome variable by race using the birdie() function.

library(birdie)

data(pseudo_vf)

head(pseudo_vf)
#> # A tibble: 6 × 4
#>   last_name zip   race  turnout
#>   <fct>     <fct> <fct> <fct>  
#> 1 BEAVER    28748 white yes    
#> 2 WILLIAMS  28144 black no     
#> 3 ROSEN     28270 white yes    
#> 4 SMITH     28677 black yes    
#> 5 FAY       28748 white no     
#> 6 CHURCH    28215 white yes

To compute BISG probabilities, you provide the last name and (optionally) geography variables as part of a formula.

r_probs = bisg(~ nm(last_name) + zip(zip), data=pseudo_vf)

head(r_probs)
#> # A tibble: 6 × 6
#>   pr_white pr_black pr_hisp pr_asian  pr_aian pr_other
#>      <dbl>    <dbl>   <dbl>    <dbl>    <dbl>    <dbl>
#> 1    0.956  0.00371  0.0103 0.000674 0.00886    0.0202
#> 2    0.162  0.795    0.0122 0.00102  0.000873   0.0292
#> 3    0.943  0.00378  0.0218 0.0107   0.000386   0.0202
#> 4    0.569  0.365    0.0302 0.00114  0.00108    0.0339
#> 5    0.971  0.00118  0.0131 0.00149  0.00118    0.0125
#> 6    0.524  0.315    0.0909 0.00598  0.00255    0.0610

Computing regression estimates requires specifying a model structure. Here, we’ll use a Categorical-Dirichlet regression model that lets the relationship between turnout and race vary by ZIP code. This is the “no-pooling” model from McCartan et al. We’ll use Gibbs sampling for inference, which will also let us capture the uncertainty in our estimates.

fit = birdie(r_probs, turnout ~ proc_zip(zip), data=pseudo_vf, 
             family=cat_dir(), algorithm="gibbs")
#> Using weakly informative empirical Bayes prior for Pr(Y | R)
#> This message is displayed once every 8 hours.

print(fit)
#> Categorical-Dirichlet BIRDiE model
#> Formula: turnout ~ proc_zip(zip)
#>    Data: pseudo_vf
#> Number of obs: 5,000
#> Estimated distribution:
#>     white black  hisp asian  aian other
#> no  0.293  0.34 0.372 0.569 0.685 0.499
#> yes 0.707  0.66 0.628 0.431 0.315 0.501

The proc_zip() function fills in missing ZIP codes, among other things. We can extract the estimated conditional distributions with coef(). We can also get updated BISG probabilities that additionally condition on turnout using fitted(). Additional functions allow us to extract a tidy version of our estimates (tidy()) and visualize the estimated distributions (plot()).

coef(fit)
#>         white     black      hisp     asian      aian     other
#> no  0.2934753 0.3403649 0.3720582 0.5687325 0.6847874 0.4994076
#> yes 0.7065247 0.6596351 0.6279418 0.4312675 0.3152126 0.5005924

head(fitted(fit))
#> # A tibble: 6 × 6
#>   pr_white pr_black pr_hisp pr_asian  pr_aian pr_other
#>      <dbl>    <dbl>   <dbl>    <dbl>    <dbl>    <dbl>
#> 1   0.961   0.00349 0.0101  0.000523 0.00577    0.0195
#> 2   0.0765  0.893   0.00814 0.00102  0.00106    0.0207
#> 3   0.932   0.00542 0.0287  0.00538  0.000384   0.0286
#> 4   0.587   0.352   0.0260  0.000833 0.000783   0.0335
#> 5   0.945   0.00224 0.0219  0.00368  0.00334    0.0238
#> 6   0.528   0.324   0.0895  0.00379  0.00143    0.0538

tidy(fit)
#> # A tibble: 12 × 3
#>    turnout race  estimate
#>    <chr>   <chr>    <dbl>
#>  1 no      white    0.293
#>  2 yes     white    0.707
#>  3 no      black    0.340
#>  4 yes     black    0.660
#>  5 no      hisp     0.372
#>  6 yes     hisp     0.628
#>  7 no      asian    0.569
#>  8 yes     asian    0.431
#>  9 no      aian     0.685
#> 10 yes     aian     0.315
#> 11 no      other    0.499
#> 12 yes     other    0.501

plot(fit)

A more detailed introduction to the method and software package can be found on the Get Started page.

birdie's People

Contributors

corymccartan avatar kosukeimai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.