lyst / shovel Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
When submitting a dataset, enforce a README that describes the source of data.
Listing over 1000 files requires multiple calls to aws, and do you really need that many individual files? Maybe bundling them is a better idea.
Datasets can be large...
Without this, it is hard to know where to store a dataset.
Git LFS has some nice properties, but doesn't really map well to large datasets used for analysis. A git model of checking in all resources is good for reproducibility, but it is nice to separate the data from the code.
A proposed future direction for shovel is to support shovel <git commands
where shovel intercepts some commands and swaps a bunch of behaviours out. These can likely be done using git hooks, so it may be possible to init those and then use git directly.
One benefit of the shovel model over LFS is that it lets you version datasets separate from a git repo and share them across multiple. In that sense, the git hooks would need to inspect the state of the filesystem and manage the dig and bury steps of shovel as part of the hooks.
These thoughts are very undeveloped.
There's already a python project called shovel (currently >600 stars on GitHub)
https://github.com/seomoz/shovel
Which causes a problem for pip install and module imports
pip install git+https://github.com/lyst/shovel.git#egg=shovel
Requirement already satisfied: shovel from git+https://github.com/lyst/shovel.git#egg=shovel
https://www.python.org/dev/peps/pep-0423/
shovel
to lyst.shovel
To support better git status versioning etc, the actual version of a dataset should live in a shovel file in the root directory. This allows peek
to check the version of whatever the file system thinks the version should be, and bury
to update the file in a way that shows up in git status
.
This is in response to #20
It is commit to want to re-run a notebook that bury's data. It should be possible to leave the command in place and have it not error if it has been done before.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.