Coder Social home page Coder Social logo

tito421 / adsproject Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alephcero/adsproject

0.0 1.0 0.0 13.15 MB

Project for Data Science trying to infer household income based on household characteristics for census data

License: GNU General Public License v3.0

Jupyter Notebook 99.63% Python 0.37%

adsproject's Introduction

Applied Data Science 2016 Final Project

New York University


Predicting Household Income in Buenos Aires City

Team:

  • Ilan Reinstein
  • Fernando Melchor
  • Felipe Gonzales
  • Nicolas Metallo

ABSTRACT

Understanding demographics is a crucial part of the policy making process, but because of cost and complexity income is not usually measured within the census in Latin-American countries. Our work addresses this issue by developing a predictive model for household income down to the census block level. We extend previous work on the problem by ECLAC (Economic Commission for Latin America and the Caribbean) in three ways: (1) We develop scripts to access and handle REDATAM (data retrieval software) census data. (2) We train our model with survey data from both INDEC (National Institute of Statistics and Census of Argentina) and the Buenos Aires City Government, and we compare several feature reduction methods to define that the most relevant attributes for our prediction model are education, occupation and number of people in the household. (3) We compare our predicted model against publicly available income data from press releases, maps with geographical location of slums and historically low income neighborhoods and real estate prices that we scraped from the Internet. In the end, our model shows a high correlation with survey data although it overestimates for lower income city departments and underestimates for high income city departments. This model is highly representative of wealth distribution at a granular level and could potentially be use for: progressive taxation, public services disposition, real estate estimation and social policy making.

DATA SOURCES

  • Permanent Household Survey (EPH) Q3 2010 (INDEC)
  • National Census of Population, Homes and Households 2010 (INDEC)
  • Annual Household Survey 2010 in Buenos Aires City performed by the General Direction of Statistics and Census.

IPYTHON NOTEBOOKS

    1. Model_by_Individual.ipynb = Prediction Model for Individual data from Permanent Household Survey (EPH)
    1. Merge_Invididual_to_Household.ipynb = Relating Individual Data to Household Data
    1. Model Evaluation and Selection.ipynb = Final prediction model and validation

HELPER FUNCTIONS

REFERENCES

Variable names (original = changed)

  • CODUSU = CODUSU
  • NRO-HOGAR = NRO-HOGAR
  • COMPONENTE = COMPONENTE
  • AGLOMERADO = AGLOMERADO
  • PONDERA = PONDERA
  • CH03 = familyRelation
  • CH04 = female
  • CH06 = age
  • CH12 = schoolYear
  • CH13 = finishedYear
  • CH14 = lastYear
  • ESTADO = activity
  • CAT_OCUP = empCond
  • CAT_INAC = unempCond
  • ITF = ITF
  • IPCF = IPCF
  • P47T = P47T

Adding Household csv file creating see example on Using_getEPH.PY

Variable names (original = changed)

				    'CODUSU',		=		CODUSU',
		       		'NRO_HOGAR',	        =            'NRO_HOGAR',
				'REGION',	        =            'REGION',
				PONDERA',	        =            'PONDERA',
		                      'IV1',	        =            'HomeType',
                                         'IV1_ESP',	        =            'HomeTypeesp',
                                         'IV2',	        =           'RoomsNumber',
                                         'IV3',	        =            'FloorMaterial',
                                         'IV3_ESP',	        =            'FloorMaterialesp',
                                         'IV4',	        =            'RoofMaterial',
                                         'IV5',	        =            'RoofCoat',
                                         'IV6',	        =            'Water',
                                         'IV7',	        =            'WaterType',
                                         'IV7_ESP',	        =            'WaterTypeesp',
                                         'IV8',	        =            'Toilet',
                                         'IV9',	        =            'ToiletLocation',
                                         'IV10',	        =            'ToiletType',
                                         'IV11',	        =            'Sewer',
                                         'IV12_1',	        =            'DumpSites',
                                         'IV12_2',	        =            'Flooding',
                                         'IV12_3',	        =            'EmergencyLoc',
                                         'II1',	        =            'UsableTotalRooms',
                                         'II2',	        =            'SleepingRooms',
                                         'II3',	        =            'OfficeRooms',
                                         'II3_1',	        =            'OnlyWork',
                                         'II4_1',	        =            'Kitchen',
                                         'II4_2',	        =            'Sink',
                                         'II4_3',	        =            'Garage',
                                         'II7',	        =            'Ownership',
                                         'II7_ESP',	        =            'Ownershipesp',
                                         'II8',	        =           'CookingCombustible',
                                         'II8_ESP',	        =            'CookingCombustibleesp'
                                         'II9',	        =           'BathroomUse',
                                         'V1',	        =            'Working',
                                         'IX_TOT',	        =            'HouseMembers',
                                         'IX_MEN10',	=                    'Memberless10',
                                         'IX_MAYEQ10',	=                    'Membermore10',
                                         'ITF',	        =            'TotalHouseHoldIncome',
                                         'VII1_1',	        =            'DomesticService1',
                                         'VII1_2',	        =            'DomesticService2',
                                         'VII2_1',	        =            'DomesticService3',
                                         'VII2_2',	        =            'DomesticService4',
                                         'VII2_3',	        =            'DomesticService5',
                                        'VII2_4'	        =            'DomesticService6'

adsproject's People

Contributors

alephcero avatar nicolasmetallo avatar ilanreinstein avatar fernandomelchor avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.