machinalis / featureforge Goto Github PK
View Code? Open in Web Editor NEWA set of tools for creating and testing machine learning features, with a scikit-learn compatible API
License: Other
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API
License: Other
When testing, if you assign a plain function to BaseFeatureFixture.feature it gets transformed[0] to a bound method and therefore gets called with a "self" argument (along with the fixture). This behavior will almost always cause a type error in the feature being tested.
[0] Actually the function is untouched but getattr will always return it as a bound method.
If you have a Feature producing instances of decimal.Decimal, the flattener will fail to process it.
In a very long experiment, I would like to be able to incrementally submit results. This is useful if the experiment fails later, or if I want to make queries to see how is it going.
While testing, if your feature doesn't defines an input_schema then test_fuzz fails with a NotImplementedError.
IMHO, if the feature doesn't defines an input schema then test_fuzz doesn't have to run at all.
This code from the documentation is not working because of this:
>>> from featureforge.experimentation.stats_manager import StatsManager
>>> sm = StatsManager(None, 'Your-database-name')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/francolq/.virtualenvs/lq-research/local/lib/python2.7/site-packages/featureforge/experimentation/stats_manager.py", line 62, in __init__
self.booking_delta = timedelta(seconds=booking_duration)
TypeError: unsupported type for timedelta seconds component: NoneType
Whenever a feature output / input check fails there's no indication as to which feature has the blame.
It's necesary to know this in an environment with tens of features or more.
Default behavior is to just log some info about each error, and proceed.
For debugging, it's useful to be able to have errors raised up to the face.
It's customary when providing APIs for runners to provide an optional argv argument to use instead of sys.argv. This allows building custom runners more easily or overriding/defaulting arguments. It also makes the runner argumetn parsing easier to unit test
As an example of this API pattern in other places, you can take a look at https://github.com/docopt/docopt#api or https://docs.python.org/2/library/unittest.html#unittest.main
root@5da98a0113fa:/# pip install featureforge
bash: pip: command not found
root@5da98a0113fa:/# pip3 install featureforge
Downloading/unpacking featureforge
Downloading featureforge-0.1.6.tar.gz
Running setup.py (path:/tmp/pip_build_root/featureforge/setup.py) egg_info for package featureforge
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
long_description = open(os.path.join(base_path, 'README.rst')).read()
File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/tmp/pip_build_root/featureforge/setup.py", line 11, in <module>
long_description = open(os.path.join(base_path, 'README.rst')).read()
File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1383: ordinal not in range(128)
----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/featureforge
Storing debug log for failure in /root/.pip/pip.log
If you have a Feature producing instances of long integers, the flattener will fail to process it.
The following script consumes all 4Gb of RAM in my laptop:
from featureforge.vectorizer import Vectorizer
data = [i for i in range(20000)]
feature = lambda x: str(x)
vectorizer = Vectorizer([feature])
X = vectorizer.fit_transform(data, None)
I suspect this is a bug.
There is a test failing: test_feature_flattener.TestFeatureMappingFlattener
Is related to the fact that schema is validating without error an integer 1
as a str
.
At
https://github.com/machinalis/featureforge/blob/develop/featureforge/flattener.py#L197 , TypeError
s are caught. The call to next()
triggers feature evaluation, and when the call to a feature raises a TypeError
, that exception is caught and a ValueError("Cannot fit with an empty dataset")
message is generated.
There's no reson to relaunch that exception with that message, the original exception should go through
From what I see in the history, the TypeError
guard was there to catch possible problems when calling iter(...)
, but the call to iter()
was moved outside the protected block.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.