The data comprise of roughly 25,000 records for males between the age of 18 and 70 who are full time workers. A variety of variables are given for each subject: years of education and job experience, college graduate (yes, no), working in or near a city (yes, no), US region (midwest, northeast, south, west), commuting distance, number of employees in a company, and race (African America, Caucasian, Other). The response variable is weekly wages (in dollars). The data are taken many decades ago so the wages are low compared to current times. The data set salary.txt is included in this directory.
We are interested in whether the average male wages are statistically different for the three race classes. Specifically, answer the following research questions:
- Do African American males have statistically different wages compared to Caucasian males?
- Do African American males have statistically different wages compared to all other males?
You may answer this in whatever way you would like. Data Scientists always have a wide range of tools in their toolkit and with that requires that you make decisions about the best way to approach problems. Some ideas for you are a linear regression, logistic regression, and k-nearest neighbors. Please include whatever graphs, visualizations, or key results that you used to help formulate a response to these questions. In terms of code, I would recommend using python or R; however, you are welcome to use other languages if you see fit.