Income inequality across genders is a pernicious issue in the United States. From the national soccer team to C-suites, the quest for equality perseveres. Many sources have cited various statistics, such as the wage discrepancy for the same job or the gendered patterns of industries, to address the inequality. Therefore, here is the underlying question: Is there a significant difference in income between men and women? If yes, what factors drive the difference?
The National Longitudinal Survey of Youth, 1979 cohort (NLSY79) data set is a rich source to answer the question. The NLSY79 data set contains survey responses on thousands of individuals who have been surveyed every one or two years starting in 1979, all the way to 2012. Respondents were 14 years old when surveyed in 1979, and the dataset has 12686 observations before cleaning.
This project addresses the questions above, analyzing the NLSY79 dataset. It presents statistical evidence for the income difference across genders, even within the same profession and industry. Following is the structure of the project:
- Importing the Dataset
- Variables Considered for Analysis and Rationale
- Data Cleaning and Imputation
- Recoding Factors
- Exploratory Analysis of the Variables
- Regression Analysis
- Findings & Conclusion
Note: The files in this repository are uploaded in retrospection.
Project Date: Fall 2019