Coder Social home page Coder Social logo

pybols's Introduction

Batch Ordinary Least Squares regression

An OLS regression that allows you to iterate over your training data in batches. Useful when a normal implementation of linear regression does not fit into memory as this library is considerably more memory efficient than the standard implementation. Expects a vector of your dependent variable y as well as a column-ordered design matrix with your independent variables X. X needs to have the same shape for each iteration/update. Does not calculate intercepts, i.e. data has to be already centered or you have to add a dummy column to your data. Naturally supports multi-processing as the heavy lifting is done with numpy. Inspired by the answer of Chris Taylor on Stackoverlfow.

Installation

The library can be installed straight from PyPI.

pip install bols

The only dependencies are numpy and scipy and the library should work with all Python versions >= 3.6.

Usage

First generate some data.

>>> import numpy as np

>>> data_y = np.random.random_sample((15000,))
>>> data_x0 = np.random.random_sample((15000,))
>>> data_x1 = np.random.random_sample((15000,))
>>> data = np.column_stack((data_y, data_x0, data_x1))
>>> y_a = data[0:5000, 0]
>>> y_b = data[5000:10000, 0]
>>> y_c = data[10000:15000, 0]
>>> data_a = data[0:5000, 1:]
>>> data_b = data[5000:10000, 1:]
>>> data_c = data[10000:15000, 1:]

Then you can just fit a model. You need to pass an iterable of both your dependent and independent variables or in other words an iterable over your batches. The only limitation is that batches need to be of the same size.

>>> from bols import BOLS
>>> model = BOLS()
>>> model.batch([y_a, y_b], [data_a, data_b]) 

We can then also use the fitted model to predict unseen data.

>>> model.predict(data_c)
array([0.27206   , 0.42766053, 0.63881539, ..., 0.39375078, 0.44824941,
       0.4866372 ])

Alternatively, we can also update our model with new batches in the future.

>>> model.batch([y_c], [data_c])

We can also get a bunch of useful statistics about the regression with model.get_statistics(verbose=True) where verbose determines whether the method just returns the statistics or prints them as well.

>>> model.get_statistics(verbose=True)
OLS Regression Results

F:    13564.635
P>|F|:      0.0

  Variable    Coef.    Standard Error       t    P>|t|
----------  -------  ----------------  ------  -------
         0    0.429             0.007  58.225    0.000
         1    0.433             0.007  58.926    0.000

result(names=[0, 1], F=13564.634855542236, F_p_value=0.0, R2=0.6439835739923647, RMSE=0.3463917044171118, beta=array([0.42915076, 0.43304183]), se_beta=array([0.00737053, 0.00734891]), beta_p_value=array([0., 0.]))

Even though our data is purely random the regression and the coefficients are both statistically significant. It is up to the user to make sure linear regression is an appropriate model for the data by for example examining the residuals (model.errors).

Tests

The package is tested against both the implementations of linear regressions by sklearn and statsmodels. Those two packages thus become additional dependencies for running the tests.

Development

Using nix-shell default.nix drops you in a development shell with all dependencies already installed.

pybols's People

Contributors

cstich avatar

Stargazers

 avatar Yuri Brigance avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.