src-d / modelforge Goto Github PK
View Code? Open in Web Editor NEWPython library to share machine learning models easily and reliably.
License: Apache License 2.0
Python library to share machine learning models easily and reliably.
License: Apache License 2.0
Example: https://github.com/src-d/style-analyzer/blob/master/.travis.yml
We should not not drop 3.4 here.
As noted in https://github.com/src-d/backlog/issues/1205#issuecomment-400283991 we need to properly document the way Modelforge works ATM. Also how to use it both internally and externally.
Although you set the MODELFORGE_ALWAYS_SIGNOFF
environment to True, the index is committed without a DCO:
➜ ~ modelforge publish -f .modelforge/bot_detection/bot_detection.asdf --meta .modelforge/bot_detection/template_meta.json
INFO:21f9:GitIndex:Cached index is not up to date, pulling warenlg/models
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
INFO:21f9:generic:Reading /home/waren/.modelforge/bot_detection/bot_detection.asdf (100.0 kB)...
INFO:21f9:gcs-backend:Connecting to the bucket...
INFO:21f9:gcs-backend:Uploading bot-detection from /home/waren/.modelforge/bot_detection/bot_detection.asdf...
[################################] 98304/100278 - 00:00:00
INFO:21f9:publish_model:Uploaded as https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fbot-detection%2F599cf161-8e51-44ad-a576-3dd1518afb80.asdf
INFO:21f9:publish_model:Updating the models index...
INFO:21f9:GitIndex:Loaded /home/waren/.local/lib/python3.6/site-packages/modelforge/templates/template_model.md.jinja2
INFO:21f9:GitIndex:Loaded /home/waren/.local/lib/python3.6/site-packages/modelforge/templates/template_readme.md.jinja2
INFO:21f9:GitIndex:Added /home/waren/.modelforge/cache/source{d}/warenlg/models/bot-detection/599cf161-8e51-44ad-a576-3dd1518afb80.md
INFO:21f9:GitIndex:Updated /home/waren/.modelforge/cache/source{d}/warenlg/models/README.md
INFO:21f9:GitIndex:Writing the new index.json ...
INFO:21f9:GitIndex:Committing the index without DCO
INFO:21f9:GitIndex:Pushing the updated index ...
Push to ssh://[email protected]/warenlg/models successful.
INFO:21f9:publish_model:Successfully published
Indeed, the signoff value could never be None here https://github.com/src-d/modelforge/blob/master/modelforge/index.py#L51 as it is parsed from the arguments as action="store_true"
https://github.com/src-d/modelforge/blob/master/modelforge/__main__.py#L37
A function decorater has been added to check if log messages end with a dot 9111b35.
And indeed it works well since it is raising errors when such log message appear:
➜ ~ modelforge publish -f .modelforge/bot_detection/bot_detection.asdf --meta .modelforge/bot_detection/template_meta.json
INFO:0189:generic:Reading /home/waren/.modelforge/bot_detection/bot_detection.asdf (100.0 kB)...
INFO:0189:gcs-backend:Connecting to the bucket...
INFO:0189:gcs-backend:Uploading bot-detection from /home/waren/.modelforge/bot_detection/bot_detection.asdf...
[################################] 98304/100278 - 00:00:00
INFO:0189:publish_model:Uploaded as https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fbot-detection%2F599cf161-8e51-44ad-a576-3dd1518afb80.asdf
INFO:0189:publish_model:Updating the models index...
INFO:0189:GitIndex:Loaded /home/waren/.local/lib/python3.6/site-packages/modelforge/templates/template_model.md.jinja2
INFO:0189:GitIndex:Loaded /home/waren/.local/lib/python3.6/site-packages/modelforge/templates/template_readme.md.jinja2
INFO:0189:GitIndex:Added /home/waren/.modelforge/cache/source{d}/warenlg/models/bot-detection/599cf161-8e51-44ad-a576-3dd1518afb80.md
INFO:0189:GitIndex:Updated /home/waren/.modelforge/cache/source{d}/warenlg/models/README.md
INFO:0189:GitIndex:Writing the new index.json ...
INFO:0189:GitIndex:Committing the index without DCO
INFO:0189:GitIndex:Pushing the updated index ...
Push to ssh://[email protected]/warenlg/models successful.
Traceback (most recent call last):
File "/home/waren/.local/bin/modelforge", line 11, in <module>
sys.exit(main())
File "/home/waren/.local/lib/python3.6/site-packages/modelforge/__main__.py", line 122, in main
return handler(args)
File "/home/waren/.local/lib/python3.6/site-packages/modelforge/backends.py", line 93, in wrapped_supply_backend
return func(args, backend, log)
File "/home/waren/.local/lib/python3.6/site-packages/modelforge/registry.py", line 84, in publish_model
log.info("Successfully published.")
File "/usr/lib/python3.6/logging/__init__.py", line 1308, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python3.6/logging/__init__.py", line 1444, in _log
self.handle(record)
File "/usr/lib/python3.6/logging/__init__.py", line 1454, in handle
self.callHandlers(record)
File "/usr/lib/python3.6/logging/__init__.py", line 1516, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.6/logging/__init__.py", line 865, in handle
self.emit(record)
File "/home/waren/.local/lib/python3.6/site-packages/modelforge/slogging.py", line 75, in decorated_with_check_trailing_dot
(record.name, msg))
AssertionError: Log message is not allowed to have a trailing dot: publish_model: "Successfully published."
So, let's fix those log messages once for all.
Faced problems with multiprocessing inside loaded modelforge model. Happens both in lazy and not lazy load modes.
Here is the code sample, reproducing error:
from multiprocessing import Pool
import pickle
import traceback
import numpy
from modelforge import Model
class NumpyArray(Model):
NAME = "numpy_array"
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.array = numpy.random.normal(size=(4, 4))
def _generate_tree(self):
tree = self.__dict__.copy()
for key in vars(Model()):
del tree[key]
return tree
def _load_tree(self, tree):
self.__dict__.update(tree)
def pickle_test(self, path: str):
with open(path, "wb") as out:
pickle.dump(self, out)
def mult(self, coeff: float):
return self.array * coeff
def multithread_test(self):
coeffs = numpy.random.normal(size=16)
with Pool(4) as pool:
results = pool.map(self.mult, coeffs)
return sum(results)
Test non-lazy mode:
arr_obj = NumpyArray()
arr_obj.save("numpy_array.asdf")
new_arr_obj = NumpyArray()
new_arr_obj.load("numpy_array.asdf", lazy=False)
new_arr_obj.pickle_test()
Here is the output:
TypeErrorTraceback (most recent call last)
<ipython-input-148-3f72419a5166> in <module>()
----> 1 new_arr_obj.pickle_test("array.pkl")
<ipython-input-142-d11f51c103b9> in pickle_test(self, path)
24 def pickle_test(self, path: str):
25 with open(path, "wb") as out:
---> 26 pickle.dump(self, out)
27
28 def mult(self, coeff: float):
TypeError: cannot serialize '_io.BufferedReader' object
Same with multithreading:
new_arr_obj.multithread_test()
Gets:
TypeErrorTraceback (most recent call last)
<ipython-input-149-e6fc3a006712> in <module>()
4 new_arr_obj = NumpyArray()
5 new_arr_obj.load("numpy_array.asdf", lazy=False)
----> 6 new_arr_obj.multithread_test()
<ipython-input-142-d11f51c103b9> in multithread_test(self)
32 coeffs = numpy.random.normal(size=16)
33 with Pool(4) as pool:
---> 34 results = pool.map(self.mult, coeffs)
35 return sum(results)
36
/usr/lib/python3.5/multiprocessing/pool.py in map(self, func, iterable, chunksize)
258 in a list that is returned.
259 '''
--> 260 return self._map_async(func, iterable, mapstar, chunksize).get()
261
262 def starmap(self, func, iterable, chunksize=None):
/usr/lib/python3.5/multiprocessing/pool.py in get(self, timeout)
606 return self._value
607 else:
--> 608 raise self._value
609
610 def _set(self, i, obj):
/usr/lib/python3.5/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
383 break
384 try:
--> 385 put(task)
386 except Exception as e:
387 job, ind = task[:2]
/usr/lib/python3.5/multiprocessing/connection.py in send(self, obj)
204 self._check_closed()
205 self._check_writable()
--> 206 self._send_bytes(ForkingPickler.dumps(obj))
207
208 def recv_bytes(self, maxlength=None):
/usr/lib/python3.5/multiprocessing/reduction.py in dumps(cls, obj, protocol)
48 def dumps(cls, obj, protocol=None):
49 buf = io.BytesIO()
---> 50 cls(buf, protocol).dump(obj)
51 return buf.getbuffer()
52
TypeError: cannot serialize '_io.BufferedReader' object
Same happens in lazy mode. Calling this functions in original class instance works fine.
This fixes the problem locally (can be done inside _load_tree()):
new_arr_obj.array = numpy.array(new_arr_obj.array)
new_arr_obj.multithread_test()
new_arr_obj.pickle_test("array.pkl")
It passes, but looks like numpy arrays non-lazy loading is meant to work right out-of-the-box.
When the github auth fails (for example when given bad credentials), modelforge
's cache is left in an unstable state. Further attempts to upload a model fail, even given the right credentials, with the following error:
Traceback (most recent call last):
File "/home/tristan/.pyenv/versions/3.6.0/bin/modelforge", line 10, in <module>
sys.exit(main())
File "/home/tristan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/modelforge/__main__.py", line 122, in main
return handler(args)
File "/home/tristan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/modelforge/registry.py", line 97, in list_models
password=args.password, cache=args.cache, log_level=args.log_level)
File "/home/tristan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/modelforge/index.py", line 83, in __init__
self.fetch()
File "/home/tristan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/modelforge/index.py", line 110, in fetch
if self._are_local_and_remote_heads_different():
File "/home/tristan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/modelforge/index.py", line 250, in _are_local_and_remote_heads_different
local_head = Repo(self.cached_repo).head()
File "/home/tristan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dulwich/repo.py", line 459, in head
return self.refs[b'HEAD']
File "/home/tristan/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dulwich/refs.py", line 284, in __getitem__
raise KeyError(name)
KeyError: b'HEAD'
Manually deleting the cache solves the problem.
Version: 0.12.1
Using modelforge-0.5.1a0 in jupyter notebook 5.0.0
When importing DocumentFrequencies model from sourced/ml in jupyter notebook:
from sourced.ml.models.df import DocumentFrequencies
DocumentFrequencies()
had the error:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/IPython/core/formatters.py in __call__(self, obj)
691 type_pprinters=self.type_printers,
692 deferred_pprinters=self.deferred_printers)
--> 693 printer.pretty(obj)
694 printer.flush()
695 return stream.getvalue()
/usr/local/lib/python3.5/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
378 if callable(meth):
379 return meth(obj, self, cycle)
--> 380 return _default_pprint(obj, self, cycle)
381 finally:
382 self.end_group()
/usr/local/lib/python3.5/dist-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
493 if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
494 # A user-provided repr. Find newlines and replace them with p.break_()
--> 495 _repr_pprint(obj, p, cycle)
496 return
497 p.begin_group(1, '<')
/usr/local/lib/python3.5/dist-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
691 """A pprint that just redirects to the normal repr function."""
692 # Find newlines and replace them with p.break_()
--> 693 output = repr(obj)
694 for idx,output_line in enumerate(output.splitlines()):
695 if idx:
/usr/local/lib/python3.5/dist-packages/modelforge/model.py in __repr__(self)
150 created_at = metaprop("created_at")
151 version = metaprop("version")
--> 152 parent = metaprop("parent")
153 license = metaprop("license")
154
AttributeError: module '__main__' has no attribute '__file__'
When running style-analyzer's tests with modelforge 0.11.0, I met some errors dur to the model's description, like:
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/lookout/style/typos/tests/test_ranking.py", line 69, in test_save_load
print(ranker)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/modelforge/model.py", line 274, in __str__
" ".join("%s==%s" % tuple(p) for p in self.environment["packages"])
TypeError: 'NoneType' object is not subscriptable
and
File "lookout/style/format/tests/test_analyzer.py", line 151, in test_train_cutoff_labels
self.assertIn("javascript", model1, str(model1))
File "/home/waren/.local/lib/python3.6/site-packages/modelforge/model.py", line 270, in __str__
meta["created_at"] = format_datetime(meta["created_at"])
File "/home/waren/.local/lib/python3.6/site-packages/modelforge/meta.py", line 68, in format_datetime
return dt.strftime("%Y-%m-%d %H:%M:%S%z")
AttributeError: 'NoneType' object has no attribute 'strftime'
Spitballing some ideas, @vmarkovtsev WDYT :
tools.py
, registry.py
)modelforgecfg.py
file or edit a given pre-existing bash/text fileamend
), by uploading a modified meta.json
. It should be able to edit either a specific model, or a series of modelsrc-d/models
Model
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.