Coder Social home page Coder Social logo

ipazc / mldatahub Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 1.0 222 KB

Machine Learning data HUB for storing datasets.

License: GNU General Public License v3.0

Python 100.00%
python3 ming linux mongodb flask flask-restful dataset backend machine-learning

mldatahub's People

Contributors

ipazc avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

bigrlab

mldatahub's Issues

Allow tags to host dict or list objects inside.

Tags for datasets or elements currently do not allow complex objects as components, it only allows strings. Perhaps allowing more complex objects would be useful for the usage of mongo engine search options.

Storage is so slow

It would be a good idea to add a cache layer in the storage and save everything in chunks of files of certain size. The storage should not be synchronized, it should be stored in background. To be multiprocess compliant, it should store the file status in the db.

Forked dataset's elements may change their ID on a modification

When the parent of a forked element is changed or the forked element itself, all the forked elements change their ID.
This is a problem since the client expects the elements to keep their ID. This error can be replicated this way:

>>> from dhub.datasets import Datasets
>>> datasets = Datasets()
>>> dataset = datasets.add_dataset("example")
>>> dataset.add_element(title="This is element1", content=b"none")
>>> fork = dataset.fork("example2")
>>> forked_element = fork[0]
>>> print(forked_element.get_content())
b"none"
>>> forked_element.set_title("changed")
>>> print(forked_element.get_content()
KeyError exception raised.

OS: Ubuntu 14.04 x86_64
Interpreter: Python3.4
DHub version: 0.0.14

Options should be filtered.

The options when retrieving elements should be translated from a custom language.
This will decouple the options from the backend mongo, and also give some security to the request.

The clear command should be a single call to the server

Currently, the clear command is forwarding each key to delete to the server. When the dataset is huge, the process is so slow and requires multiple requests, because the keys are retrieved in chunks of PageSize length, and the deletion is also controlled by the PageSize.

Pagination might be slow

The pagination for elements in Mongo might be slow when using skip().
An interesting way of solving this problem is referenced here.

Perhaps letting the user sending the last element's id from the page rather than the page number (in order to get the next one) would help in speeding up the results.

File size limit exception should be bubled to client through the API-REST call

DatasetElementFactory(token, dataset).edit_elements(crafted_request)
  File "/usr/local/lib/python3.5/dist-packages/mldatahub-0.0.1-py3.5.egg/mldatahub/factory/dataset_element_factory.py", line 224, in edit_elements
    files_refs = {element_id: file for element_id, file in zip(elements_ids, self.storage.put_files_contents(elements_content))}
  File "/usr/local/lib/python3.5/dist-packages/mldatahub-0.0.1-py3.5.egg/mldatahub/storage/remote/mongo_storage.py", line 65, in put_files_contents
    raise Exception("File size limit of {} Bytes exceeded".format(FILE_SIZE_LIMIT))
Exception: File size limit of 16777216 Bytes exceeded

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.