Coder Social home page Coder Social logo

purging's Introduction

purging An R package for addressing mediation effects among independent variables

CRAN_Status_Badge Downloads CRAN RStudio mirror downloads PRs Welcome

Why use purging?

Though there are some great packages for mediation analysis out there, the simple intuition of its need is often ambiguous, especially for younger graduate students. Thus, this set of files provides a quick, intuitive overview of mediation and offers a simple method for "purging" variables for use in multivariate analysis. The result of these files is an R package I recently released on CRAN. See the purging.R file for the script across several functional forms as well as the purging example.R script for an applied example usingreal data (see examples below).

Suppose we are interested in whether committee membership relating to a specific issue domain influences the likelihood of sponsoring related issue-specific legislation. However, in the American context as representational responsibilities permeate legislative behavior, district characteristics in similar employment-related industries likely influence self-selection onto the issue-specific committees in the first place, which we also suggest should influence likelihood of related-issue bill sponsorship. Therefore, in this context, we have a mediation model, where employment/industry (indirect) -> committee membership (direct) -> sponsorship. Thus, we would want to purge committee membership of the effects of employment/industry in the district to observe the "pure" effect of committee membership on the likelihood of related sponsorship.

Or consider another example in a different realm. Let's say we had a model where women's level of labor force participation determines their level of contraceptive use, and that the effect of female labor force participation on fertility is indirect, essentially filtered through its impact on contraceptive use. Once we control for contraceptive use, the direct effect of labor force participation (seen in the simple bivariate model, lm1, in the example code, purging example.R) goes away. In other words, the effect of labor force participation on fertility is likely indirect, and filtered through contraceptive use, which means the variables are also highly correlated. Many thanks to Scott Basinger and Patrick Shea (University of Houston) for the base of this second example here and the UN data from their graduate stats labs, which gave me the idea of expanding this out to develop an R package dedicated to addressing this issue.

These two examples offer simple ways of thinking about mediation effects (e.g., labor force (indirect) -> contraception (direct) -> fertility). If we run into this problem, a simple solution is "purging". The steps in this simple method are to, first, regress the direct variable (in this case, "contraceptive use") on the indirect variable (in this case, "labor force participation"). Then, store those residuals, which is the direct effect of contraception after accounting for the indirect effect of labor force participation. Then, we add the stored residuals as their own "purged variable" in the updated specification. Essentially, this purging process allows for a new direct variable that is uncorrelated with the indirect variable. When we do this, we will see that each variable is explaining unique variance in the DV of interest (you can double check this several ways, such as comparing correlation coefficients or by comparing R^2 across specifications). See the code file, purging.R for the example using real data from the United Nations Human Development Programme.

The UN.csv file is a small dataset based on the 2005 United Nations Human Development Programme report, with all data from 2003. Variables include: Human Development Index (HDI: combining female life expectancy at birth (Life), educational attainment, and income per capita); fertility rate (Fert: births per adult female); percentage of women using contraception (Cont); tech adoption as share of the population using cell phones (Cell: subscribers per 1000 people) and the share of the population using the internet (Inter: subscribers per 1000 people); per capita gross domestic product (GDP) in US dollars; carbon dioxide emissions per capita (CO2) measured in metric tons; female adult literacy rate (Liter); and finally adult women in the labor force per 100 men (FemEc).

Feel free to reach out if anything is unclear or if you want to chat more about mediation models/analysis, causal inference, etc. Once the intuition is mastered, check out some great work on mediation from many folks, including Kosuke Imai (Princeton), Luke Keele (Georgetown), and several others. See Imai's mediation site as a sound starting place with code, papers, and more.

How do I use purging?

Researchers should use purging if they are concerned mediation may be "canceling out" the effects of one variable as a result of another, when both are included in a single multivariate analysis. The idea behind the package then, is to generate the new direct-impact variable to be used in the analysis, purged of the effects of the indirect variable, simply by inputting the name of the data frame, direct variable, and indirect variable in the function. Calling the function will generate a new object (i.e., the variable), which can then be added to a data frame using the $ operator in R, with the following line of code: df$purged.var <- purged.var.

Importantly, the package supports several functional forms, dependent on the mediating variable in question. These forms include linear for continuous data, logit and probit for binary data, and Poisson and negative binomial for event count data, where the functional form should be included after the purge. in the command (e.g., purge.logit). See the following three examples corresponding with each type of data.

## First, linear/continuous example
df <- data.frame(A = 1:10, B = 2:11) # create continuous some data

purge.lm(df, "A", "B") # where, df = data frame; A = column name considered as "direct"; and B = column considered as "indirect"

## Second, logit/binary example
df <- data.frame(A = rep(0:1, 20), B = 1:20) # create some binary response data

purge.logit(df, "A", "B") # same syntax as above; To use the probit iteration, substitute `.logit` for `.probit`.

## Third, Poisson/counts example
df <- data.frame(A = c(1,1,1,1,1,2,2,2,3,4), B = 1:10) # create some count data

purge.poisson(df, "A", "B") # same syntax as above; To use the negative binomial iteration, substitute `.poisson` for `.negbin`.

How do I get purging ?

You can download the package and documentation at CRAN. If you have any questions or find any bugs requiring fixing, please feel free to contact me.

How do I cite purging?

A formal paper is being prepared and is hopefully coming soon. Stand by...

But for now, you can manually cite it:

@Manual{waggoner2018purging,
title = {purging: An R package for addressing mediation effects among independent variables},
author = {Philip D. Waggoner},
year = {2018},
note = {R package version 1.0.0}
}

Thanks and enjoy!

purging's People

Contributors

pdwaggoner avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.