ottogroup / dstoolbox Goto Github PK

View Code? Open in Web Editor NEW

44.0 44.0 10.0 767 KB

Tools that make working with scikit-learn and pandas easier.

License: Apache License 2.0

Shell 0.07% Python 26.01% Jupyter Notebook 73.92%

machine-learning pandas scikit-learn

dstoolbox's People

Contributors

Stargazers

Watchers

Forkers

benjaminbossan jcriscione vishalbelsare dnouri afcarl stenpiren benjamin-ny lofifnc almajo scieneers-jw

dstoolbox's Issues

Should LabelEncoder class have `inverse` as a parameter?

I'm using PipelineY with sklearn's LabelEncoder like so:

    return PipelineY([...], y_transformer=LabelEncoder())

Firstly, does this seem reasonable? If it does, then I think it would also make sense to have Pipeline.predict always use the inverse transform for y. I'm trying to use a scorer with this pipeline, and it fails, because I can't pass kwargs to predict to the scorer. I have to resort to creating my own scorer:

def scorer(clf, X, y, score_func=accuracy_score):
    y_pred = clf.predict(X, inverse=True)
    return score_func(y, y_pred)

However, if PipelineY allowed me to specify use_inverse_for_predict=True or similar, at instantiation time, to signal that the behavior for predict should be to use the inverse transform by default, then I could go back to using sklearn's stock scorers.

No proper description on pypi

It seems that there no longer is a proper description on pypi:

https://pypi.org/project/dstoolbox/
https://pypi.python.org/pypi/dstoolbox

Not sure if this is related to the recent changes in pypi or whether a change in dstoolbox caused this.

Missing manifest.in

The current pypi build is broken because VERSION is missing, which in turn is caused by the missing MANIFEST.IN.

`ItemSelector` should accept a tuple of slices

This does not work but should:

ItemSelector(np.s_[:10, :5]).fit_transform(np.zeros((12, 18)))

Misleading error message

Here the error message says that ItemSelector with string keys only works with DataFrames when really they also work with dicts.

ColumnTransformer and DataFrame

It seems that the new(ish) ColumnTransformer class also has the problem that it doesn't return a DataFrame, are there any plans to add that to this library?

Alternatively, is anyone aware of a solution?

Inconsistent version requirement for scikit-learn

Forgot to update this line:

dstoolbox/setup.py

Line 11 in ed7097a

'scikit-learn>=0.21,<0.23dev0',

Calling `add_timing` or `shed_timing` twice on `TimedPipeline`

Each time add_timing is called, an additional print statement is made. There should not be more than one.

When shed_timing is called more than once, an error is raised. It should just do nothing.

Strict requirements prevent package upgrade

The strict requirements versions yield a warning when upgrading dstoolbox with more recent dependencies, like sklearn 19.2:

dstoolbox 0.6.2 has requirement scikit-learn==0.19, but you'll have scikit-learn 0.19.2 which is incompatible.

dstoolbox version 0.9.1 dependency issue with scikit-learn

Error message on Terminal while installing this module using sudo pip install dstoolbox:

Collecting scikit-learn<0.23dev0,>=0.21 (from dstoolbox)
Could not find a version that satisfies the requirement scikit-learn<0.23dev0,>=0.21 (from dstoolbox) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18rc2, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21rc2)
No matching distribution found for scikit-learn<0.23dev0,>=0.21 (from dstoolbox)

I tried sudo pip install dstoolbox==0.9.0 and it works.

[bug] dstoolbox has errors with sklearn 0.19

There have been some interface changes in sklearn that lead to errors in dstoolbox. For instance, _transform_one, as used by DataFrameFeatureUnion, now accepts only 3 arguments instead of 4. Presumably, there are other areas that might fail.

setup.py and requirements.txt have inconsistent dependencies

setup.py requires scikit-learn>=0.20,<0.21dev0 while the requirements.txt lists scikit-learn>=0.21,<0.23dev0, which appear to be mutually exclusive. It looks like the latter was updated recently while the former has not been updated for a year and a half.