Coder Social home page Coder Social logo

tommyod / generalized-additive-models Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 0.0 564 KB

Generalized Additive Models in Python.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
data-science gam glm statistical-inference statistical-models statistics

generalized-additive-models's Introduction

Actions PythonVersion PyPi Black ReadtheDocs

generalized-additive-models

Generalized Additive Models (GAMs) in Python.

About

GAMs are uniquely placed on the interpretability vs. precitive power continuum. In many applications they perform almost as well as more complex models, but are extremely interpretable.

  • GAMs extend linear regression by allowing non-linear relationships between features and the target.
  • The model is still additive, but link functions and multivariate splines facilitate a broad class of models.
  • While GAMs are likely outperformed by non-additive models (e.g. boosted trees), GAMs are extremely interpretable.

Read more about GAMs:

A GAM is a statistical model in which the target variable depends on unknown smooth functions of the features, and interest focuses on inference about these smooth functions.

An exponential family distribution is specified for the target Y (.e.g Normal, Binomial or Poisson) along with a link function g (for example the identity or log functions) relating the expected value of Y to the predictor variables.

Installation

Install using pip:

pip install generalized-additive-models

Example

from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score
from generalized_additive_models import GAM, Spline, Categorical

# Load data
data = load_diabetes(as_frame=True)
df, y = data.data, data.target

# Create model
terms = Spline("bp") + Spline("bmi", constraint="increasing") + Categorical("sex")
gam = GAM(terms)

# Cross validate
scores = cross_val_score(gam, df, y, scoring="r2")
print(scores) # array([0.26, 0.4 , 0.41, 0.35, 0.42])

Go to Read the Docs to see full documentation.

Contributing and development

Contributions are very welcome. You can correct spelling mistakes, write documentation, clean up code, implement new features, etc.

Some guidelines for development:

  • Code must comply with the standard. See the GitHub action pipeline for more information.
  • If possible, use existing algorithms from numpy, scipy and scikit-learn.
  • Write tests, especically regression tests if a bug is fixed.
  • Take backward compatibility seriously. API changes require good reason.

Installation for local development:

pip install -e '.[dev,lint,doc]'

Create documentation locally:

sudo apt install pandoc
sphinx-build docs _built_docs/html -W -a -E --keep-going
sphinx-autobuild docs _built_docs/html -v -j "auto" --watch generalized_additive_models

Once the version has been incremented, the commit must be tagged and pushed in order to publish to PyPi:

git tag -a v0.1.0 -m "Version 0.1.0" b22724c
git push origin v0.1.0

Citing

TODO

generalized-additive-models's People

Contributors

tommyod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

generalized-additive-models's Issues

Help with documentation setup

If someone reading this is knowledgeable with Sphinx: I could use some help reviewing the existing setup for the Sphinx documentation and improving upon it.

Ideally, I would want:

  • A clean, nice setup. No unnecessary complexity, but not bare-bones either.
  • A review of the conf.py file and the existing setup structure.
  • Improvement suggestions that align with the first point above.

Diagnostics plots

Additive Huber and generalization

We could create an additive model that minimizes

$$f(g(X \beta) - y) + \alpha |P \beta|$$

where $g$ is the inverse link function as usual.
The function $f$ is usually $x \mapsto x^2$, but here we could use Huber, pseudohuber, or let the user subclass and define the function.

Instances of this loss function structure:

  • Least squares (GAM with normal distribution)
  • Huber regression
  • Percentile regression (unsure about optimization of this...)
  • Pseudo-huber
  • $\ell_p$-norms

Add Code of Conduct and Contributing guidance

Hi @tommyod, I looked into your repository and it is structurally awesome. But since you requested contributions in your issues I personally think that adding code of conduct and Contributing guidelines is very important. Hope you understand it. So it would be a lot better if you add those.

Thank You!

Tensor of categoricals

I want fit and transform over a Tensor of categoricals to behave like fitting each categorical.

import matplotlib.pyplot as plt
import numpy as np

from generalized_additive_models import GAM, Categorical, Spline, Tensor
from generalized_additive_models.datasets import load_powerlifters

# Load data and filter it
df = load_powerlifters().rename(columns=lambda s:s.removeprefix("best3").removesuffix("kg"))

# Melt dataframe so each exercise (squat, bench, deadlift) ends up in a row
df = df.melt(id_vars=["sex", "age", "bodyweight"], value_vars=["squat", "bench", "deadlift"], value_name="lifted", var_name="exercise")

# Predict total weight lifted, given age, bodyweight and sex
target = df["lifted"]
age = Spline("age", penalty=1e3, num_splines=8)
bodyweight = Spline("bodyweightkg", penalty=1e3, num_splines=8)
sex = Categorical("sex", penalty=0)
sex.fit_transform(df)
array([[1., 0.],
       [1., 0.],
       [0., 1.],
       ...,
       [1., 0.],
       [1., 0.],
       [1., 0.]])

However:

sex_ex_exercise = Tensor([Categorical(feature='sex', penalty=0), Categorical(feature='exercise', penalty=0)])

sex_ex_exercise.fit_transform(df)
Out[149]: 
array([[-1.51653018e-01, -1.51653018e-01,  1.33333333e+00,
        -5.15013649e-01, -5.15013649e-01,  4.18554080e-14],
       [-1.51653018e-01, -1.51653018e-01,  1.33333333e+00,
        -5.15013649e-01, -5.15013649e-01,  4.18554080e-14],
       [-4.84986351e-01, -4.84986351e-01,  2.50910404e-14,
        -1.81680315e-01, -1.81680315e-01,  1.33333333e+00],
       ...,
       [-1.51653018e-01,  1.33333333e+00, -1.51653018e-01,
        -5.15013649e-01, -1.36557432e-14, -5.15013649e-01],
       [-1.51653018e-01,  1.33333333e+00, -1.51653018e-01,
        -5.15013649e-01, -1.36557432e-14, -5.15013649e-01],
       [-1.51653018e-01,  1.33333333e+00, -1.51653018e-01,
        -5.15013649e-01, -1.36557432e-14, -5.15013649e-01]])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.