ottogroup / dstoolbox Goto Github PK
View Code? Open in Web Editor NEWTools that make working with scikit-learn and pandas easier.
License: Apache License 2.0
Tools that make working with scikit-learn and pandas easier.
License: Apache License 2.0
I'm using PipelineY
with sklearn's LabelEncoder
like so:
return PipelineY([...], y_transformer=LabelEncoder())
Firstly, does this seem reasonable? If it does, then I think it would also make sense to have Pipeline.predict
always use the inverse transform for y
. I'm trying to use a scorer with this pipeline, and it fails, because I can't pass kwargs to predict
to the scorer. I have to resort to creating my own scorer:
def scorer(clf, X, y, score_func=accuracy_score):
y_pred = clf.predict(X, inverse=True)
return score_func(y, y_pred)
However, if PipelineY
allowed me to specify use_inverse_for_predict=True
or similar, at instantiation time, to signal that the behavior for predict
should be to use the inverse transform by default, then I could go back to using sklearn's stock scorers.
It seems that there no longer is a proper description on pypi:
https://pypi.org/project/dstoolbox/
https://pypi.python.org/pypi/dstoolbox
Not sure if this is related to the recent changes in pypi or whether a change in dstoolbox caused this.
The current pypi build is broken because VERSION
is missing, which in turn is caused by the missing MANIFEST.IN
.
This does not work but should:
ItemSelector(np.s_[:10, :5]).fit_transform(np.zeros((12, 18)))
Here the error message says that ItemSelector
with string keys only works with DataFrame
s when really they also work with dict
s.
It seems that the new(ish) ColumnTransformer class also has the problem that it doesn't return a DataFrame, are there any plans to add that to this library?
Alternatively, is anyone aware of a solution?
Forgot to update this line:
Line 11 in ed7097a
Each time add_timing
is called, an additional print statement is made. There should not be more than one.
When shed_timing
is called more than once, an error is raised. It should just do nothing.
The strict requirements versions yield a warning when upgrading dstoolbox with more recent dependencies, like sklearn 19.2:
dstoolbox 0.6.2 has requirement scikit-learn==0.19, but you'll have scikit-learn 0.19.2 which is incompatible.
Error message on Terminal while installing this module using sudo pip install dstoolbox:
Collecting scikit-learn<0.23dev0,>=0.21 (from dstoolbox)
Could not find a version that satisfies the requirement scikit-learn<0.23dev0,>=0.21 (from dstoolbox) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18rc2, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21rc2)
No matching distribution found for scikit-learn<0.23dev0,>=0.21 (from dstoolbox)
I tried sudo pip install dstoolbox==0.9.0 and it works.
There have been some interface changes in sklearn that lead to errors in dstoolbox. For instance, _transform_one
, as used by DataFrameFeatureUnion
, now accepts only 3 arguments instead of 4. Presumably, there are other areas that might fail.
setup.py requires scikit-learn>=0.20,<0.21dev0
while the requirements.txt lists scikit-learn>=0.21,<0.23dev0
, which appear to be mutually exclusive. It looks like the latter was updated recently while the former has not been updated for a year and a half.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.