Coder Social home page Coder Social logo

koaning / scikit-lego Goto Github PK

View Code? Open in Web Editor NEW
1.2K 25.0 114.0 25.55 MB

Extra blocks for scikit-learn pipelines.

Home Page: https://koaning.github.io/scikit-lego/

License: MIT License

Makefile 0.09% Python 99.91%
scikit-learn machine-learning common-sense

scikit-lego's Issues

[FEATURE] Time Series Split with gap and column parameter

Time Series Split with a gap parameter between train and testing

Between the blue and the red we want to have a gap, to simulate that in production you need to wait x days before creating your target that looks x days ahead (e.g. case when you want to predict value in x days).
image

Also sometime you have multiple sample per days, the current scikit learn implement doesn't support specifying a date column.

feature request: only apply random noise in `RandomAdder` to training data

Currently, RandomAdder adds noise to data both at training and at prediction time. This causes predictions to become non-deterministic and it offers no clear benefit in most cases I can think of.

I suggest changing the default behaviour of the transformer to only add random noise to the train data and optionally through a constructor flag also to the prediction data.

feature request: timeseries features

it might be nice to be able to accept a datetime column and to generate lots of relevant features from it that can be used in an sklearn pipeline.

think: day_of_week, hour, etc.

[FEATURE] PandasPipeTransformer

It might be cool to have things that are a "lambda" in pandas like this:

df.pipe(func, kw1="a", kw2="b")

to be applied used in a pipeline.

feature request: decay model

The idea is to pass a parameter decay that will automatically decay past features using exponential decay such that the sample_weights param can be optimised in a grid search.

It might be good to discuss what other methods of feature decay we might want.

[FEATURE] Statsmodels wrapper class

If you want to use statsmodels for example for regression in a sklearn pipeline

Example

class SMWrapper(BaseEstimator, RegressorMixin):
    """ A universal sklearn-style wrapper for statsmodels regressors """
    def __init__(self, model_class, fit_intercept=True, sample_weight=None):
        self.model_class = model_class
        self.fit_intercept = fit_intercept
        self.sample_weight = sample_weight
    def fit(self, X, y):
        if self.fit_intercept:
            X = sm.add_constant(X)
        self.model_ = self.model_class(y, X, sample_weight=self.sample_weight)
        # Elastic net regularized fit _> fit_regularized
        #self.results_ = self.model_.fit_regularized(alpha=10, L1_wt=0.5)
        self.results_ = self.model_.fit()
    def predict(self, X):
        if self.fit_intercept:
            X = sm.add_constant(X)
        return self.results_.predict(X)

[FEATURE] Feature selector by name

When you use pandas, you want quickly specify which features to keep and/or which one you want to drop.

Example

class FeatureSelector(BaseEstimator):
    def __init__(self, keep=None, drop=None):
        self.keep = keep
        self.drop = drop

    def transform(self, X):
        if self.keep:
            self.feature_names = self.keep
        else:
            self.feature_names = list(set(X.columns) - set(self.drop))
        return X[self.feature_names]

    def fit(self, X, y=None):
        return self
    
    def get_feature_names(self):
        return self.feature_names

feature request: EstimatorTransformer

is it possible to make a transformer that takes the output of an estimator and adds that to the values that is used for prediction? do we want it?

feature request: RBF Features

this is like the repeating RBF features except that this ... won't repeat. it will simply span the entire space of a variable.

feature request: state-space models

Add state-space models, in discrete form:

x(k+1) = A * x(k) + B * u(k)
y(k) = C * x(k) + D * u(k)

in where:
x(k) - internal state vector at timestamp k
u(k) - input vector at timestamp k
y(k) - output at timestamp k

Initial implementation would be with a given size of state vector x (e.g. you know the dimension of the underlying system). Second iteration could also estimate the length of this vector x, but that's prob not doable in a single day.

Must admit: I haven't seen many use-cases that would be best solved using a state-space model and thus wonder how useful this can be. Also, I haven't seen many use-cases in general.

feature request: FeatureSmoother/ConstrainedSmoother

image

it might be epic if you could smooth out every column in X with regards to y as a transformer before it goes into a estimator. when looking at the loe(w)ess model this seems to be exactly what i want. not sure if it is super useful tho.

feature request: BoosterPipeline

The idea is to have a pipeline where you might have more than one model in sequention. Model 2 would try to improve on the residuals of Model 1 and so forth.

[FEATURE] DebugFeatureUnion

Similar as #46, but here for the FeatureUnion.

Description:
Have a log statement inbetween the steps of a feature union.

feature request: grouped model

sometimes you'd like to group the dataset into separate parts and run a model in each part. the idea of this model would be that you can add a classified/regressor of your own but this model will make sure it gets run per group.

note that this model could actually work quite well in combination with a sklearn.dummy model.

[FEATURE] Feature capping

Use case:

In ML some time your features have extreme large values or even infinite value (np.inf), we want to cap those values with a feature transformer.

Parameters:

  • feature to cap (and/or feature to no cap?)
  • min value
  • max value

feature request: monotonic models

it would be awesome if you could specify (per column) if the feature should be monotonically increasing, descreasing, updownup, downupdown or free. forcing this in a simple linear regression would already be kind of sweet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.