Comments (8)
I think we can drop the TransformerMixin
here if we want it to be more like a resampler.
Then again, part of me would also be "ok" with dropping this feature from this library. Doing sampling stuff in a pipeline really requires imblearn
and I'm not sure if I like the idea of adding imblearn
as a dependency for scikit-lego.
from scikit-lego.
That does feel better, yeah. Let's do that and update the docs accordingly.
from scikit-lego.
@FBruzzesi so this came up during the code spring. It indeed seems that our OutlierRemover
doesn't have a transform
method. And I also just noticed that we also only allow X
as input.
I really forgot what we had in mind when we designed this one. But I'm wondering what we might want to do with it. @MBrouns do you remember? Is there a reason why we called it transform_train
? Maybe related to the fact that we cannot really use it in a pipeline because it changes the shape of X
?
from scikit-lego.
It should be related to #342.
The TL;DR is that the scikit learn Pipeline would not filter y
and this would not work with supervised learning
from scikit-lego.
Yea I think that's indeed it. I'm up for calling it resample
to make it work with the imblearn folk
from scikit-lego.
I am certainly not a user of imblearn. I tried to play around with it and it seems not to be so straightforward.
Curiously enough, one of the user guide explaining how to create a custom sampler implements an outlier detection.
To add more details, when adding resample
and fit_resample
methods, then I end up with the issue of having both fit_resample
and transform
implemented (the latter due to inheritance) which the imblearn Pipeline seems to not like.
from scikit-lego.
I definitely agree in not adding imblearn
as a dependency. I honestly like the idea of having such feature, but maybe it is the wrong place for it
from scikit-lego.
We can also just make a utility function that just removes outliers. Something like:
X_new, y_new = remove_outliers(estimator, X, y)
Wouldn't this be the simplest/cleanest?
from scikit-lego.
Related Issues (20)
- Broken Deployment HOT 6
- [DOCS] Add bayesian methods to GMM density page.
- Mention other, preferred, packages in docs HOT 3
- [BUG] Error when calling predict_proba with GroupedPredictor using shrinkage and global model HOT 3
- [FEATURE] VarianceThresholdClassifier HOT 1
- `HierarchicalPredictor` and `HierarchicalTransformer` HOT 2
- [DOCS] Separate page for each meta feature HOT 2
- [DOCS] Document KlusterFoldValidation HOT 3
- [DOCS] Broken links on Home page to installation and user guide sections
- [DOCS] Remove netlify docs HOT 2
- [DOCS] Proposed addition: Adding a Quickstart or Overall User Guide Landing Page
- [DOCS] Latex markdown mixup HOT 1
- [DOCS] Missing explanation on how to run the documentation locally HOT 1
- `linear_model.LowessRegression`
- `decomposition.pca_reconstruction.PCAOutlierDetection` HOT 1
- `decomposition.umap_reconstruction.UMAPOutlierDetection` HOT 4
- Delegate Missing Values and Categorical Handling in `GrouperTransformer` and `GrouperPredictor` HOT 6
- [FEATURE] Narwhals migration for dataframe-agnostic codebase HOT 23
- [BUG] zero_inflated_regressor.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-lego.