The scikit-lego's discuss from koaning

feature request: ensemble model that takes p(class | model) into account

It might be nice to have an variant of the voting classifier. One that goes a bit further than mere voting but takes the uncertainty of seperate classifiers into account.

feature request: Column Selector

Selects columns based on a name. Accepts Iterable(str) or str (which converts to an iterable of length 1.

[FEATURE] Time Series Split with gap and column parameter

Time Series Split with a gap parameter between train and testing

Between the blue and the red we want to have a gap, to simulate that in production you need to wait x days before creating your target that looks x days ahead (e.g. case when you want to predict value in x days).

Also sometime you have multiple sample per days, the current scikit learn implement doesn't support specifying a date column.

feature request: only apply random noise in `RandomAdder` to training data

Currently, RandomAdder adds noise to data both at training and at prediction time. This causes predictions to become non-deterministic and it offers no clear benefit in most cases I can think of.

I suggest changing the default behaviour of the transformer to only add random noise to the train data and optionally through a constructor flag also to the prediction data.

feature request: moving window cv splitter

https://github.com/roelbertens/time-series-nested-cv/blob/master/time_series_cross_validation/custom_time_series_split.py

feature request: timeseries features

it might be nice to be able to accept a datetime column and to generate lots of relevant features from it that can be used in an sklearn pipeline.

think: day_of_week, hour, etc.

feature request: dtype selector

Selecting Pandas columns by their dtype

[FEATURE] PandasPipeTransformer

It might be cool to have things that are a "lambda" in pandas like this:

df.pipe(func, kw1="a", kw2="b")

to be applied used in a pipeline.

Bug: Sphinx not installed as dependency for development

create dev dependencies in `./setup.
pip install sphinx
pip install sphinx_rtd_theme

feature request: group model

[DOCS] add guide for sklego.pipeline.DebugPipeline

Not every user will appreciate python logging so it might make sense to have a document in the documentation that gives an example of how a DebugPipeline might allow you to discover a bug.

[DOCS] pandas log step documentation is missing, needs example

it is probably a good idea to link to the blogpost that inspired the feature too: https://tomaugspurger.github.io/method-chaining

feature request: LogStep

A wrapper class to log a fit/transform/fit_transform of an estimator or a transformer

missing documentation: RandomAdder

Please add basic information for the documentation.

documentation on github pages

locally it seems to run just fine

but github seems to not be rendering it appropriately

feature request: decay model

The idea is to pass a parameter decay that will automatically decay past features using exponential decay such that the sample_weights param can be optimised in a grid search.

It might be good to discuss what other methods of feature decay we might want.

`RandomAdder` currently applies to train and test

preferably this is a setting.

feature request: get appveyor

this is travis but with support for windows. https://www.appveyor.com/pricing/

[FEATURE] Statsmodels wrapper class

If you want to use statsmodels for example for regression in a sklearn pipeline

Example

class SMWrapper(BaseEstimator, RegressorMixin):
    """ A universal sklearn-style wrapper for statsmodels regressors """
    def __init__(self, model_class, fit_intercept=True, sample_weight=None):
        self.model_class = model_class
        self.fit_intercept = fit_intercept
        self.sample_weight = sample_weight
    def fit(self, X, y):
        if self.fit_intercept:
            X = sm.add_constant(X)
        self.model_ = self.model_class(y, X, sample_weight=self.sample_weight)
        # Elastic net regularized fit _> fit_regularized
        #self.results_ = self.model_.fit_regularized(alpha=10, L1_wt=0.5)
        self.results_ = self.model_.fit()
    def predict(self, X):
        if self.fit_intercept:
            X = sm.add_constant(X)
        return self.results_.predict(X)

[FEATURE] Feature selector by name

When you use pandas, you want quickly specify which features to keep and/or which one you want to drop.

Example

class FeatureSelector(BaseEstimator):
    def __init__(self, keep=None, drop=None):
        self.keep = keep
        self.drop = drop

    def transform(self, X):
        if self.keep:
            self.feature_names = self.keep
        else:
            self.feature_names = list(set(X.columns) - set(self.drop))
        return X[self.feature_names]

    def fit(self, X, y=None):
        return self
    
    def get_feature_names(self):
        return self.feature_names

feature request: EstimatorTransformer

is it possible to make a transformer that takes the output of an estimator and adds that to the values that is used for prediction? do we want it?

feature request: RBF Features

this is like the repeating RBF features except that this ... won't repeat. it will simply span the entire space of a variable.

feature request: state-space models

Add state-space models, in discrete form:

x(k+1) = A * x(k) + B * u(k)
y(k) = C * x(k) + D * u(k)

in where:
x(k) - internal state vector at timestamp k
u(k) - input vector at timestamp k
y(k) - output at timestamp k

Initial implementation would be with a given size of state vector x (e.g. you know the dimension of the underlying system). Second iteration could also estimate the length of this vector x, but that's prob not doable in a single day.

Must admit: I haven't seen many use-cases that would be best solved using a state-space model and thus wonder how useful this can be. Also, I haven't seen many use-cases in general.

feature request: FeatureSmoother/ConstrainedSmoother

it might be epic if you could smooth out every column in X with regards to y as a transformer before it goes into a estimator. when looking at the loe(w)ess model this seems to be exactly what i want. not sure if it is super useful tho.

feature request: lattice regression

the idea is to be force the effect of the parameters of the model to be either increasing or decreasing.

enchancement: add GMM setting for `RandomRegressor`

once #21 is merged there is a randomregression. this is a dummy model used for benchmarking.

feature request: BoosterPipeline

The idea is to have a pipeline where you might have more than one model in sequention. Model 2 would try to improve on the residuals of Model 1 and so forth.

[FEATURE] DebugFeatureUnion

Similar as #46, but here for the FeatureUnion.

Description:
Have a log statement inbetween the steps of a feature union.

feature request: DebugPipeline

A pipeline that logs extra information before/after every step, which is useful for debugging.

feature request: grouped model

sometimes you'd like to group the dataset into separate parts and run a model in each part. the idea of this model would be that you can add a classified/regressor of your own but this model will make sure it gets run per group.

note that this model could actually work quite well in combination with a sklearn.dummy model.