Coder Social home page Coder Social logo

pslmodels / microdf Goto Github PK

View Code? Open in Web Editor NEW
12.0 7.0 10.0 10.33 MB

Analysis tools for working with survey microdata as DataFrames.

Home Page: http://pslmodels.github.io/microdf

License: MIT License

Python 100.00%
microdata pandas tax-calculator survey-microdata dataframes analysis psl-cataloged

microdf's Introduction

Build Codecov

microdf

Analysis tools for working with survey microdata as DataFrames.

Disclaimer: MicroSeries and MicroDataFrame are experimental features and may not consider weights after performing some operations. See open issues.

Installation

Install with:

pip install git+git://github.com/PSLmodels/microdf.git

Questions

Contact the maintainer, Max Ghenis ([email protected]).

Citation

You may cite the source of your analysis as "microdf release #.#.#, author's calculations."

microdf's People

Contributors

jdebacker avatar maxghenis avatar nikhilwoodruff avatar peter-metz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

microdf's Issues

Consistent weighting API

Currently, the functions in weighted.py have a mix of APIs, e.g.:

  • weighted_sum(df, col, w)
  • weighted_quantile(values, quantiles, sample_weight)

Make this consistent.

Make weighted percentiles match unweighted percentiles of stacked data

weighted_quantile, which also powers weighted_median, comes from this SO answer, which works efficiently, but doesn't exactly match unweighted percentiles of stacked data. For example (per this SO comment), the following two should be equivalent:

mdf.weighted_quantile([1, 2], 0.5, [1, 3])  # 1.75
np.median([1, 2, 2, 2])  # 2

They differ because the weighted approach interpolates between values around the specified quantile.

I added a note to the docstring and test about this in https://github.com/MaxGhenis/microdf/pull/49, but it'd be good to figure out a solution, at least as an option. I asked on SO if this is possible; it could well not be, short of literally replicating the values according to the weights.

Create roadmap

PSL criteria (see #13) include:

Projects SHOULD have a public roadmap.

Consider splitting out taxcalc features

microdf supports data and tasks that include but are not limited to taxcalc, and it's increasingly general. Given taxcalc's size and large dependency set, ideally it would be an optional dependency. While pip supports optional dependencies, they're not yet implemented in conda (conda/conda#7502).

Another option is creating a new package like microdf-taxcalc which includes both microdf and taxcalc as dependencies, and includes functions like calc_df.

It's OK for now but placeholder for future consideration.

Include matplotlib stylesheet

A bunch of my plots, some of which would go here, use this style:

!wget https://github.com/MaxGhenis/random/raw/master/Roboto-Regular.ttf -P /usr/local/lib/python3.6/dist-packages/matplotlib/mpl-data/fonts/ttf
mpl.font_manager._rebuild()

sns.set_style('white')
DPI = 200
mpl.rc('savefig', dpi=DPI)
mpl.rcParams['figure.dpi'] = DPI
mpl.rcParams['figure.figsize'] = 6.4, 4.8  # Default.
mpl.rcParams['font.sans-serif'] = 'Roboto'
mpl.rcParams['font.family'] = 'sans-serif'

# Set title text color to dark gray (https://material.io/color) not black.
TITLE_COLOR = '#212121'
mpl.rcParams['text.color'] = TITLE_COLOR

# Axis titles and tick marks are medium gray.
AXIS_COLOR = '#757575'
mpl.rcParams['axes.labelcolor'] = AXIS_COLOR
mpl.rcParams['xtick.color'] = AXIS_COLOR
mpl.rcParams['ytick.color'] = AXIS_COLOR

Ideally this would be part of the package.

See https://stackoverflow.com/questions/31559225/how-to-ship-or-distribute-a-matplotlib-stylesheet

Fix collections warning

From pytest:

==================================================== warnings summary ====================================================
/home/mghenis/miniconda3/lib/python3.7/site-packages/numba/types/containers.py:3
  /home/mghenis/miniconda3/lib/python3.7/site-packages/numba/types/containers.py:3: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Iterable

-- Docs: https://docs.pytest.org/en/latest/warnings.html

Create test generating outputs

To meet PSL criteria (#13):

At least one test MUST generate key outputs from source materials, the test MUST be run with every new version, and the outputs of the test MUST be checked into the repository.

This could be a summary table from calc_df (Tax-Calculator) or something from another dataset like SCF which can be easily downloaded.

Allow for pre-divided weights in add_weighted_metrics

For example, to weight something by XTOT, you currently have to pre-calculate XTOT_m = XTOT * s006 and then do:

add_weighted_metrics(df, metric_vars, w='XTOT_m', divisor=1, suffix='_XTOT_m')

And then you're left with a superfluous XTOT_m_XTOT_m column.

Make this easier.

Add function to calculate tax from a MTR schedule

Basic tax calculation:

def tax_from_mtrs(val, brackets, rates):
    # Args:
    #     val: Value to assess tax on, e.g. wealth or income (list or Series).
    #     brackets: Left side of each bracket (list or Series).
    #     rates: Rate corresponding to each bracket.
    df_tax = pd.DataFrame({'brackets': brackets, 'rates': rates})
    df_tax['base_tax'] = df_tax.brackets.\
        sub(df_tax.brackets.shift(fill_value=0)).\
        mul(df_tax.rates.shift(fill_value=0)).cumsum()
    rows = df_tax.brackets.searchsorted(val, side='right') - 1
    income_bracket_df = df_tax.loc[rows].reset_index(drop=True)
    return pd.Series(val).sub(income_bracket_df.brackets).\
        mul(income_bracket_df.rates).add(income_bracket_df.base_tax)

Add Colab version of example notebooks

These will have installation at the top:

!pip install git+https://github.com/PSLmodels/Tax-Calculator
!pip install git+https://github.com/MaxGhenis/microdf

Could also be used for QuantEcon Notes.

Add disclaimer

PSL criteria (see #13) include:

Projects SHOULD include a disclaimer.

See e.g. Tax-Calculator's disclaimer:

Results will change as Tax-Calculator data and logic improve. A
fundamental reason for adopting open-source methods in this project
is so that people from all backgrounds can contribute to the models
that our society uses to assess economic policy; when
community-contributed improvements are incorporated, the model will
produce different results.

Add dollar_axis

def dollar_axis(axis):
    axis.set_major_formatter(
        mpl.ticker.FuncFormatter(lambda x, p: '$' + format(int(x), ',')))

Add inequality metrics

Create a new inequality.py script with inequality metrics:

  • Share from the top and bottom x percent (make general and also for top 50, 10, 1, 0.1)
  • Ratio of these, e.g. T10/B50 as used in World Inequality Database
  • gini (move from utils.py)

Add PSL_catalog.json

PSL criteria (#13) include:

A PSL_catalog.json configuration file to be used for cataloging these criteria MUST be included in the project's repository. Specific instructions for creating this file can be found in the Catalog-Builder Documentation.

mtr() doesn't work with values below the bottom bracket threshold

For example, the below produces NaN values for negative networth.

WARREN_RATES = [0, 0.02, 0.03]  # 0%, 2%, 3%.

WARREN_BRACKETS = [0,
                   50e6,  # First $50 million.
                   1e9]   # Over $1 billion.

mdf.mtr(wealth.networth, WARREN_BRACKETS, WARREN_RATES)

Workaround is to start the brackets below zero, e.g.:

WARREN_BRACKETS = [-np.inf,
                   50e6,  # First $50 million.
                   1e9]   # Over $1 billion.

But it could be nice to have an option to do this automatically in mtr (which is also used in tax_from_mtrs).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.