Coder Social home page Coder Social logo

finding_donors_project's Introduction

Finding Donors for CharityML Project

Introduction

This project is part of The Udacity Data Scientist Nanodegree Program which is composed by:

  • Term 1
    • Supervised Learning
    • Deep Learning
    • Unsupervised Learning
  • Term 2
    • Write A Data Science Blog Post
    • Disaster Response Pipelines
    • Recommendation Engines

The goal of this project is to apply Supervised learning techniques on data collected for the U.S. census to help CharityML (a fictitious charity organization) to identify people most likely to donate to their cause. From Udacity a template code was provided in the finding_donors.ipynb notebook and the visuals.py module which is some out-of-the-box code needed for visualizations.

Software and libraries

This project uses Python 3.10.4 and the most important packages are:

To create the virtual enviroment you can run python -m venv .venv.

More informations in requirements.txt. I am providing a simplified version of the file and letting pip handle the dependencies to avoid maintenance overhead.

To create a complete requirements file you can run pip freeze > requirements.txt and to install all python packages in it you can run pip install -r requirements.txt.

To setup a new enviroment and install all requirements you can go in folder others and run setup.cmd

Data

The modified census dataset consists of approximately 32,000 data points, with each datapoint having 13 features. This dataset is a modified version of the dataset published in the paper "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", by Ron Kohavi. You may find this paper online, with the original dataset hosted on UCI.

  • census.csv: census dataset

Features

  • age: Age
  • workclass: Working Class (Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked)
  • education_level: Level of Education (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool)
  • education-num: Number of educational years completed
  • marital-status: Marital status (Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse)
  • occupation: Work Occupation (Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces)
  • relationship: Relationship Status (Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried)
  • race: Race (White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black)
  • sex: Sex (Female, Male)
  • capital-gain: Monetary Capital Gains
  • capital-loss: Monetary Capital Losses
  • hours-per-week: Average Hours Per Week Worked
  • native-country: Native Country (United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands)

Target variable

  • income: Income Class (<=50K, >50K)

Workflow

The code is provided in the Jupyter Notebook finding_donors.ipynb. It will read the data from DATA_FOLDER and step by step explore the data, train 3 models and compare their performances.

Testing

From the project folder run pytest

To run a single test: pytest .\tests\test_configuration.py::test_create_folders

Code styling

PEP8 is the style guide for Python code, and it's good practice to follow it to ensure your code is readable and consistent.

To check and format my code according to PEP8 I am using:

  • pycodestyle: tool to check the code against PEP 8 conventions.
  • autopep8: tool to automatically format Python code according to PEP 8 standards.

To run pycodestyle on all files in the project folder and create a report: pycodestyle --statistics --count . > code_styling/report.txt

To run autopep8 on all files in the project folder: autopep8 --recursive --in-place .

I prefere to check and update one file at the time because the previous recursive commands affect also .venv\ files. For example:

pycodestyle .\utils\configuration.py > .\code_styling\configuration_report.txt

autopep8 --in-place .\utils\configuration.py

You can go in folder code_styling and run format_and_lint.cmd.

Running the code

You can open finding_donors.ipynb and run each cell and check their results.

You can also run the command ipython -c "%run finding_donors.ipynb".

To convert the notebook in HTML format run jupyter nbconvert finding_donors.ipynb --to html.

Results

I have compared the results of three models:

  • Decision Trees
  • Support Vector Machines
  • AdaBoost

Results

Becasue the dataset is pretty imbalanced to evaluate the resutls I have used F-1 Score. As explained wonderfully in this post Accurcay is not indicated for imbalanced dataset.

AdaBoost has the best F-1 Score and using GridSearchCV I have tuned the model improving furthermore the performances

Finally I have extracted the feature importance:

Results

List of activities

In the TODO file you can find the list of tasks and on going activities.

Licensing and acknowledgements

Thanks Udacity for the dataset.

Outro

I hope this repository was interesting and thank you for taking the time to check it out. On my Medium you can find a more in depth story and on my Blogspot you can find the same post in italian. Let me know if you have any question and if you like the content that I create feel free to buy me a coffee.

finding_donors_project's People

Contributors

simonerigoni avatar dependabot[bot] avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Forkers

odedhellman

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.