Coder Social home page Coder Social logo

natvalenz / fertility Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3 MB

The use of machine learning on the Fertility and Women's Labor Supply data set to predict whether someone will want more kids based on their age, ethinicity, work hours, and gender of their 1st child?

Jupyter Notebook 100.00%
decision-trees fertility knn logistic-regression machine-learning-algorithms random-forest

fertility's Introduction

Fertility and Women's Labor Supply

Research question

Are KNN, logistic regression, decision trees and random forest good models for predicting whether someone will want more kids based on their age, ethinicity, work hours, and gender of their 1st child?

Source

https://vincentarelbundock.github.io/Rdatasets/doc/AER/Fertility.html

Dataset

Cross-section data from the 1980 US Census on married women aged 21โ€“35 with two or more childre A data frame containing 254,654 (and 30,000, respectively) observations on 8 variables.

Variables

  • morekids factor. Does the mother have more than 2 children?
  • gender1 factor indicating gender of first child.
  • gender2 factor indicating gender of second child.
  • age age of mother at census.
  • afam factor. Is the mother African-American?
  • hispanic factor. Is the mother Hispanic?
  • other factor. Is the mother's ethnicity neither African-American nor Hispanic, nor Caucasian? (see below)
  • work number of weeks in which the mother worked in 1979.

Compare

Overfitting? Which model is better for classification for the dataset and why?

Decision tree classifier and a random forest classifier

There does not appear to be overfitting in any of the models. The accuracy score for training and test sets is similar. Decision tree is a simpler model. Also, decision tree takes up less resources. The accuracy scores for both models are similar. Decision tree appears to be the better model. Although normalization of the data and hyper parameter tuning was applied the metrics are not promising. The highest accuracy score achieved was of 63%.

KNN Classifier

There does not appear to be overfitting and the model is performing slightly worse than Random Forest and Decision Trees.

Logistic Regression

The model does not appear to be overfitting and 60% accuracy isn't terrible, but the Pseudo R^2 is very low. Looking more closely at the data in work hours you can see an imbalance there is a high number of individuals who work 0 hours. Are these stay-at-home moms and should this be included as a separate variable?

fertility's People

Contributors

natvalenz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.