Coder Social home page Coder Social logo

ottogroup / dstoolbox Goto Github PK

View Code? Open in Web Editor NEW
44.0 44.0 10.0 767 KB

Tools that make working with scikit-learn and pandas easier.

License: Apache License 2.0

Shell 0.07% Python 26.01% Jupyter Notebook 73.92%
machine-learning pandas scikit-learn

dstoolbox's People

Contributors

alattner avatar almajo avatar benjamin-ny avatar benjamin-work avatar benjaminbossan avatar dependabot[bot] avatar dnouri avatar gutzbenj avatar jfeigl-ottogroup avatar lockihh avatar ottogroup-com avatar ottonemo avatar scieneers-jw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dstoolbox's Issues

Should LabelEncoder class have `inverse` as a parameter?

I'm using PipelineY with sklearn's LabelEncoder like so:

    return PipelineY([...], y_transformer=LabelEncoder())

Firstly, does this seem reasonable? If it does, then I think it would also make sense to have Pipeline.predict always use the inverse transform for y. I'm trying to use a scorer with this pipeline, and it fails, because I can't pass kwargs to predict to the scorer. I have to resort to creating my own scorer:

def scorer(clf, X, y, score_func=accuracy_score):
    y_pred = clf.predict(X, inverse=True)
    return score_func(y, y_pred)

However, if PipelineY allowed me to specify use_inverse_for_predict=True or similar, at instantiation time, to signal that the behavior for predict should be to use the inverse transform by default, then I could go back to using sklearn's stock scorers.

Missing manifest.in

The current pypi build is broken because VERSION is missing, which in turn is caused by the missing MANIFEST.IN.

Misleading error message

Here the error message says that ItemSelector with string keys only works with DataFrames when really they also work with dicts.

ColumnTransformer and DataFrame

It seems that the new(ish) ColumnTransformer class also has the problem that it doesn't return a DataFrame, are there any plans to add that to this library?

Alternatively, is anyone aware of a solution?

Strict requirements prevent package upgrade

The strict requirements versions yield a warning when upgrading dstoolbox with more recent dependencies, like sklearn 19.2:

dstoolbox 0.6.2 has requirement scikit-learn==0.19, but you'll have scikit-learn 0.19.2 which is incompatible.

dstoolbox version 0.9.1 dependency issue with scikit-learn

Error message on Terminal while installing this module using sudo pip install dstoolbox:

Collecting scikit-learn<0.23dev0,>=0.21 (from dstoolbox)
Could not find a version that satisfies the requirement scikit-learn<0.23dev0,>=0.21 (from dstoolbox) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18rc2, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21rc2)
No matching distribution found for scikit-learn<0.23dev0,>=0.21 (from dstoolbox)

I tried sudo pip install dstoolbox==0.9.0 and it works.

[bug] dstoolbox has errors with sklearn 0.19

There have been some interface changes in sklearn that lead to errors in dstoolbox. For instance, _transform_one, as used by DataFrameFeatureUnion, now accepts only 3 arguments instead of 4. Presumably, there are other areas that might fail.

setup.py and requirements.txt have inconsistent dependencies

setup.py requires scikit-learn>=0.20,<0.21dev0 while the requirements.txt lists scikit-learn>=0.21,<0.23dev0, which appear to be mutually exclusive. It looks like the latter was updated recently while the former has not been updated for a year and a half.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.