Comments (5)
This sounds useful enough. Could you give an example of what the resulting pipeline might look like?
Also, this does not exist elsewhere? It seems like the sklearn.compose.ColumnTransformer is still in experimental mode.
from scikit-lego.
I suppose a transformer such as this one is what I am thinking of.
import warnings
from sklearn.base import BaseEstimator, TransformerMixin
class PandasTypeSelector(BaseEstimator, TransformerMixin):
def __init__(self, dtype):
self.dtype = dtype
def fit(self, X, y=None):
# Save the column names for later checking
self.type_columns = list(X.select_dtypes(include=[self.dtype]))
return self
def transform(self, X):
assert isinstance(X, pd.DataFrame)
transformed_frame = X.select_dtypes(include=[self.dtype])
if set(list(transformed_frame)) != set(self.type_columns):
warnings.warn(f'Columns of type {self.dtype} were not equal during fit and transform')
return transformed_frame
from scikit-lego.
- A
ValueError
might be better than a mere assert but this looks good to me. - I think
sklearn
prefers to have ais_fitted_
param around. Might be in the Mixin. Either way; it might be cool to initializeself.dtype
asNone
in the__init__
. - Any idea on how you might want to test this?
from scikit-lego.
- Good point, I agree!
- I believe you mean this: https://goo.gl/ZAdbv7. At what point would like to determine the dtype if it will not be in the init?
- For testing, a simple test with a small sample dataframe will be enough I assume. We do not have to test the .select_dtypes function of Pandas.
from scikit-lego.
There's some basic tests in there that check things like "a transformer should not change the number of rows" and you might consider adding a test for "a transformer should not change the order of the input/output".
from scikit-lego.
Related Issues (20)
- [DOCS] Separate page for each meta feature HOT 2
- [DOCS] Document KlusterFoldValidation HOT 3
- [DOCS] Broken links on Home page to installation and user guide sections
- [DOCS] Remove netlify docs HOT 2
- [DOCS] Proposed addition: Adding a Quickstart or Overall User Guide Landing Page
- [DOCS] Latex markdown mixup HOT 1
- [DOCS] Missing explanation on how to run the documentation locally HOT 1
- [BUG] Rename `transform_train` to `resample`. HOT 8
- `linear_model.LowessRegression`
- `decomposition.pca_reconstruction.PCAOutlierDetection` HOT 1
- `decomposition.umap_reconstruction.UMAPOutlierDetection` HOT 5
- Delegate Missing Values and Categorical Handling in `GrouperTransformer` and `GrouperPredictor` HOT 6
- [FEATURE] Narwhals migration for dataframe-agnostic codebase HOT 23
- [BUG] zero_inflated_regressor.py HOT 1
- [FEATURE] equivalent to sklearn discovery module HOT 7
- [BUG] Fairness regularization HOT 1
- ModuleNotFoundError: No module named 'narwhals' when using RepeatingBasisFunction HOT 3
- [FEATURE] Ability to stratify with cols that contain some Nans values, this way people can hyperparameter tune best imputation methods HOT 1
- [BUG] CI/CD Failing
- [DOCS] linear_model missing docstrings HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-lego.