Comments (4)
It looks like git attributes smudge
and clean
can be used for managing files https://git-scm.com/docs/gitattributes, and a pre-push hook would be suitable for actually uploading.
from shovel.
You would still need a non-git interface, since you may have a project which isn't version-controlled. Usually, this won't be the case, but it'll happen sometimes. E.g. you want to use shovel to fetch some data for a quick analysis. So then you have two interfaces? Perhaps I'm misunderstanding.
from shovel.
That makes sense. I was imagining shovel would stay the same, but it would be possible to set it up with hooks so the dig and bury commands are called for you. Unlike LFS, which tries to make it look like the files are in the repo, this would make it clear they are in a pit.
So, in addition to what exists already, inside a git repo:
cd data
shovel init . # adds this dir to maybe repo-root/.shovel so the hooks know which directories are under shovel control
git add . # clean calculates the MD% of the file and writes the interesting data into a .shovel file, for example
git commit # If shovel has a local cache (which it may in the future), the files are copied there with the MD5 as the key by a pre-commit hook
git push # the pre-push hook ensures the files have been uploaded to the S3 pit
Or something. Probably worth getting a lot of inspiration from LFS.
from shovel.
The problem to solve here is I currently add data/
to .gitignore
so git doesn't try to check them in. It would be preferable if I had a good way to check if my data is in sync - both with the pit, but also that the version matches the version in the code. So maybe if peek checked the MD5 etc (as it is intended to), then we are nearly there anyway. If the datasets get their config from a metadata file, not be hard coded in the python code etc., then shovel always checks for sync between the current code version, and bumping the version would show up in git status.
from shovel.
Related Issues (18)
- Require a README for all datasets
- Add force bury
- Add tests for peek
- Add tests for force bury
- When force burying, check for locally deleted files and remove from remote.
- peek local_path should not be keyword only
- Add ignore_exists argument to bury
- Validate version numbers are incremental
- Read dataset params from a `.shovel` file in the dataset root HOT 1
- Add progress indicator to bury and dig
- Add warning if too many files in dataset
- Project Name Collision - Requirement already satisfied HOT 1
- Support 'latest' version when `dig`ing
- Add install instructions to README
- Add LICENCE
- Add CI
- Add peek method to inspect the default pit.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from shovel.