Coder Social home page Coder Social logo

Comments (8)

koaning avatar koaning commented on June 26, 2024 2

I think we can drop the TransformerMixin here if we want it to be more like a resampler.

Then again, part of me would also be "ok" with dropping this feature from this library. Doing sampling stuff in a pipeline really requires imblearn and I'm not sure if I like the idea of adding imblearn as a dependency for scikit-lego.

from scikit-lego.

koaning avatar koaning commented on June 26, 2024 1

That does feel better, yeah. Let's do that and update the docs accordingly.

from scikit-lego.

koaning avatar koaning commented on June 26, 2024

@FBruzzesi so this came up during the code spring. It indeed seems that our OutlierRemover doesn't have a transform method. And I also just noticed that we also only allow X as input.

I really forgot what we had in mind when we designed this one. But I'm wondering what we might want to do with it. @MBrouns do you remember? Is there a reason why we called it transform_train? Maybe related to the fact that we cannot really use it in a pipeline because it changes the shape of X?

from scikit-lego.

FBruzzesi avatar FBruzzesi commented on June 26, 2024

It should be related to #342.
The TL;DR is that the scikit learn Pipeline would not filter y and this would not work with supervised learning

from scikit-lego.

MBrouns avatar MBrouns commented on June 26, 2024

Yea I think that's indeed it. I'm up for calling it resample to make it work with the imblearn folk

from scikit-lego.

FBruzzesi avatar FBruzzesi commented on June 26, 2024

I am certainly not a user of imblearn. I tried to play around with it and it seems not to be so straightforward.
Curiously enough, one of the user guide explaining how to create a custom sampler implements an outlier detection.

To add more details, when adding resample and fit_resample methods, then I end up with the issue of having both fit_resample and transform implemented (the latter due to inheritance) which the imblearn Pipeline seems to not like.

from scikit-lego.

FBruzzesi avatar FBruzzesi commented on June 26, 2024

I definitely agree in not adding imblearn as a dependency. I honestly like the idea of having such feature, but maybe it is the wrong place for it

from scikit-lego.

koaning avatar koaning commented on June 26, 2024

We can also just make a utility function that just removes outliers. Something like:

X_new, y_new = remove_outliers(estimator, X, y)

Wouldn't this be the simplest/cleanest?

from scikit-lego.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.