Coder Social home page Coder Social logo

pycps's Introduction

PyCPS

A python package for working with the Current Population Survey.

Documentation is available at readthedocs.

Neither the Census Bureau nor the NBER provide a clean, RESTful API for getting CPS data. This makes working with the CPS a pain, and reproducibility nearly impossible.

What does it do?

There's a few related functions PyCPS provides:

  1. Downloading data dictionaries and monthly data files
  2. Standardizing variables across months
  3. Merging to create time series

Installation

  • From pip: pip install pycps

  • From source:

    git clone https://github.com/TomAugspurger/pycps
    cd pycps
    pip install .
    pip install -r requirements.txt
    pip install git+https://github.com/PyTables/PyTables
    python setup.py install
    

Dependencies

See requirements.txt

Python

Developed with python 3, aims to be compatible with python 2.

pycps's People

Contributors

tomaugspurger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pycps's Issues

BUG/COMPAT: python2 doesn't substitute paths in readsettings

FAIL: test_substitue (pycps.pycps.tests.test_parsers.TestReaderSettings)

Traceback (most recent call last):
File "/Users/tom/Envs/build/lib/python2.7/site-packages/pycps/pycps/tests/test_parsers.py", line 67, in test_substitue
self.assertEqual(result, expected)
AssertionError: u'{data_path}/data_dictionaries/' != 'data/data_dictionaries/'

Data Dictionary Errors

Compiling a list here. If these are fixed upstream, the workarounds needn't be applied.

File Column Line Number Description Reported Status
1992-01 31 221 Gap in DD No To Report
1992-01 157 755 Gap in DD No To Report
1992-01 259 1304 Gap in DD No To Report
1992-01 171 2803 Gap in DD No To Report
1994-01 135 749 Wrong start column No To Report
1994-01 679 3910 Wrong start column No To Report
1994-04 135 713 Wrong start column No To Report
1994-04 679 3874 Wrong start column No To Report
1995-06 135 665 Wrong start column No To Report
1994-04 679 3826 Wrong start column No To Report
1998-01 149 ~757 Gap in DD Yes Waiting
1998-01 535 ~3130 Gap in DD Yes Waiting
1998-01 556 ~3162 Gap in DD Yes Waiting
1998-01 632 ~3313 Gap in DD Yes Waiting
1998-01 680 ~3427 Gap in DD Yes Waiting
1998-01 786 ~3733 Gap in DD Yes Waiting
2004-05 410 2438 Wrong Start; should be 410 Yes Waiting
2005-08 411 2421 Wrong Start; should be 410 Yes Waiting
2009-01 399 4930 Wrong Width: Should be 19 Yes Waiting
2012-05* 114 954 Duplicate at col 114 Yes Waiting
2012-05 637 4243 Gap in DD Yes Waiting
  • File does say "Starting in Feb. 2005โ€ on line 114, which seems like a copy paste error

In addition to the above:

  • New questions were added in November 2005 (Katrina. This should result in a new data dictionary

REF: Refactor Data Adjustment

Right now they're all in pycps.parsers.fixup_by_dd. Do something similar to the dd adjustments.

Will have to have a different naming method since the functions should be generic.

ENH: Setup logging

need to log

  • every print statement.
  • every row dropped due to NaNs, etc.
  • data transformations? e.g. binning a year?
  • writes to stores

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.