Coder Social home page Coder Social logo

s3git / s3git Goto Github PK

View Code? Open in Web Editor NEW
1.4K 53.0 69.0 101 KB

s3git: git for Cloud Storage. Distributed Version Control for Data. Create decentralized and versioned repos that scale infinitely to 100s of millions of files. Clone huge PB-scale repos on your local SSD to make changes, commit and push back. Oh yeah, it dedupes too and offers directory versioning.

Home Page: http://s3git.org

License: Apache License 2.0

Go 100.00%
cloud-storage git version-control decentralized distributed

s3git's Introduction

s3git: git for Cloud Storage
(or Version Control for Data)

Join the chat at https://gitter.im/s3git/s3git

s3git applies the git philosophy to Cloud Storage. If you know git, you will know how to use s3git!

s3git is a simple CLI tool that allows you to create a distributed, decentralized and versioned repository. It scales limitlessly to 100s of millions of files and PBs of storage and stores your data safely in S3. Yet huge repos can be cloned on the SSD of your laptop for making local changes, committing and pushing back.

Exactly like git, s3git does not require any server-side components, just download and run the executable. It imports the golang package s3git-go that can be used from other applications as well. Or see the Python module or Ruby gem.

Use cases for s3git

  • Build and Release Management (see example with all Kubernetes releases).
  • DevOps Scenarios
  • Data Consolidation
  • Analytics
  • Photo and Video storage

See use cases for a detailed description of these use cases.

Download binaries

DISCLAIMER: These are PRE-RELEASE binaries -- use at your own peril for now

OSX

Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-darwin-amd64

$ mkdir s3git && cd s3git
$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-darwin-amd64
$ chmod +x s3git
$ export PATH=$PATH:${PWD}   # Add current dir where s3git has been downloaded to
$ s3git

Linux

Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-linux-amd64

$ mkdir s3git && cd s3git
$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-linux-amd64
$ chmod +x s3git
$ export PATH=$PATH:${PWD}   # Add current dir where s3git has been downloaded to
$ s3git

Windows

Download s3git.exe from https://github.com/s3git/s3git/releases/download/v0.9.1/s3git.exe

C:\Users\Username\Downloads> s3git.exe

Building from source

Build instructions are as follows (see install golang for setting up a working golang environment):

$ go get -d github.com/s3git/s3git
$ cd $GOPATH/src/github.com/s3git/s3git 
$ go install
$ s3git

BLAKE2 Tree Hashing and Storage Format

Read here how s3git uses the BLAKE2 Tree hashing mode for both deduplicated and hydrated storage (and here for info for BLAKE2 at scale).

Example workflow

Here is a simple workflow to create a new repository and populate it with some data:

$ mkdir s3git-repo && cd s3git-repo
$ s3git init
Initialized empty s3git repository in ...
$ # Just stream in some text
$ echo "hello s3git" | s3git add
Added: 18e622875a89cede0d7019b2c8afecf8928c21eac18ec51e38a8e6b829b82c3ef306dec34227929fa77b1c7c329b3d4e50ed9e72dc4dc885be0932d3f28d7053
$ # Add some more files
$ s3git add "*.mp4"
$ # Commit and log
$ s3git commit -m "My first commit"
$ s3git log --pretty

Push to cloud storage

$ # Add remote back end and push to it
$ s3git remote add "primary" -r s3://s3git-playground -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
$ s3git push
$ # Read back content
$ s3git cat 18e6
hello s3git

Note: Do not store any important info in the s3git-playground bucket. It will be auto-deleted within 24-hours.

Directory versioning

You can also use s3git for directory versioning. This allows you to 'capture' changes coherently all the way down from a directory and subsequently go back to previous versions of the full state of the directory (and not just any file). Think of it as a Time Machine for directories instead of individual files.

So instead of 'saving' a directory by making a full copy into 'MyFolder-v2' (and 'MyFolder-v3', etc.) you capture the state of a directory and give it a meaningful message ("Changed color to red") as version so it is always easy to go back to the version you are looking for.

In addition you can discard any uncommitted changes that you made and go back to the last version that you have captured, which basically means you can (after committing) mess around in a directory and then be rest assured that you can always go back to its original state.

If you push your repository into the cloud then you will have an automatic backup and additionally you can easily collaborate with other people.

Lastly, it works of course with huge binary data too, so not just for text files as in the following 'demo' example:

$ mkdir dir-versioning && cd dir-versioning
$ s3git init .
$ # Just create a single file
$ echo "First line" > text.txt && ls -l
-rw-rw-r-- 1 ec2-user ec2-user 11 May 25 09:06 text.txt
$ #
$ # Create initial snapshot
$ s3git snapshot create -m "Initial snapshot" .
$ # Add new line to initial file and create another file
$ echo "Second line" >> text.txt && echo "Another file" > text2.txt && ls -l
-rw-rw-r-- 1 ec2-user ec2-user 23 May 25 09:08 text.txt
-rw-rw-r-- 1 ec2-user ec2-user 13 May 25 09:08 text2.txt
$ s3git snapshot status .
     New: /home/ec2-user/dir-versioning/text2.txt
Modified: /home/ec2-user/dir-versioning/text.txt
$ #
$ # Create second snapshot
$ s3git snapshot create -m "Second snapshot" .
$ s3git log --pretty
3a4c3466264904fed3d52a1744fb1865b21beae1a79e374660aa231e889de41191009afb4795b61fdba9c156 Second snapshot
77a8e169853a7480c9a738c293478c9923532f56fcd02e3276142a1a29ac7f0006b5dff65d5ca245255f09fa Initial snapshot
$ more text.txt
First line
Second line
$ more text2.txt
Another file
$ #
$ # Go back one version in time
$ s3git snapshot checkout . HEAD^
$ more text.txt
First line
$ more text2.txt
text2.txt: No such file or directory
$ #
$ # Switch back to latest revision
$ s3git snapshot checkout .
$ more text2.txt
Another file

Note that snapshotting works for all files in the directory including any subdirectories. Click the following link for a more elaborate repository that includes all releases of the Kubernetes project.

Clone the YFCC100M dataset

Clone a large repo with 100 million files totaling 11.5 TB in size (Multimedia Commons), yet requiring only 7 GB local disk space.

(Note that this takes about 7 minutes on an SSD-equipped MacBook Pro with 500 Mbit/s download connection so for less powerful hardware you may want to skip to the next section (or if you lack 7 GB local disk space, try a df -h . first). Then again it is quite a few files...)

$ s3git clone s3://s3git-100m -a "AKIAI26TSIF6JIMMDSPQ" -s "5NvshAhI0KMz5Gbqkp7WNqXYlnjBjkf9IaJD75x7"
Cloning into ...
Done. Totaling 97,974,749 objects.
$ cd s3git-100m
$ # List all files starting with '123456'
$ s3git ls 123456
12345649755b9f489df2470838a76c9df1d4ee85e864b15cf328441bd12fdfc23d5b95f8abffb9406f4cdf05306b082d3773f0f05090766272e2e8c8b8df5997
123456629a711c83c28dc63f0bc77ca597c695a19e498334a68e4236db18df84a2cdd964180ab2fcf04cbacd0f26eb345e09e6f9c6957a8fb069d558cadf287e
123456675eaecb4a2984f2849d3b8c53e55dd76102a2093cbca3e61668a3dd4e8f148a32c41235ab01e70003d4262ead484d9158803a1f8d74e6acad37a7a296
123456e6c21c054744742d482960353f586e16d33384f7c42373b908f7a7bd08b18768d429e01a0070fadc2c037ef83eef27453fc96d1625e704dd62931be2d1
$ s3git cat cafebad > olympic.jpg
$ # List and count total nr of files
$ s3git ls | wc -l
97974749

Fork that repo

Below is an example for alice and bob working together on a repository.

$ mkdir alice && cd alice
alice $ s3git clone s3://s3git-spoon-knife -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
Cloning into .../alice/s3git-spoon-knife
Done. Totaling 0 objects.
alice $ cd s3git-spoon-knife
alice $ # add a file filled with zeros
alice $ dd if=/dev/zero count=1 | s3git add
Added: 3ad6df690177a56092cb1ac7e9690dcabcac23cf10fee594030c7075ccd9c5e38adbaf58103cf573b156d114452b94aa79b980d9413331e22a8c95aa6fb60f4e
alice $ # add 9 more files (with random content)
alice $ for n in {1..9}; do dd if=/dev/urandom count=1 | s3git add; done
alice $ # commit
alice $ s3git commit -m "Commit from alice"
alice $ # and push
alice $ s3git push

Clone it again as bob on a different computer/different directory/different universe:

$ mkdir bob && cd bob
bob $ s3git clone s3://s3git-spoon-knife -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
Cloning into .../bob/s3git-spoon-knife
Done. Totaling 10 objects.
bob $ cd s3git-spoon-knife
bob $ # Check if we can access our empty file
bob $ s3git cat 3ad6 | hexdump
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00000200
bob $ # add another 10 files
bob $ for n in {1..10}; do dd if=/dev/urandom count=1 | s3git add; done
bob $ # commit
bob $ s3git commit -m "Commit from bob"
bob $ # and push back
bob $ s3git push

Switch back to alice again to pull the new content:

alice $ s3git pull
Done. Totaling 20 objects.
alice $ s3git log --pretty
3f67a4789e2a820546745c6fa40307aa490b7167f7de770f118900a28e6afe8d3c3ec8d170a19977cf415d6b6c5acb78d7595c825b39f7c8b20b471a84cfbee0 Commit from bob
a48cf36af2211e350ec2b05c98e9e3e63439acd1e9e01a8cb2b46e0e0d65f1625239bd1f89ab33771c485f3e6f1d67f119566523a1034e06adc89408a74c4bb3 Commit from alice

Note: Do not store any important info in the s3git-spoon-knife bucket. It will be auto-deleted within 24-hours.

Here is an nice screen recording:

asciicast

Happy forking!

You may be wondering about concurrent behaviour from

Integration with Minio

Instead of S3 you can happily use the Minio server, for example the public server at https://play.minio.io:9000. Just make sure you have a bucket created using mc (example below uses s3git-test):

$ mkdir minio-test && cd minio-test
$ s3git init 
$ s3git remote add "primary" -r s3://s3git-test -a "Q3AM3UQ867SPQQA43P2F" -s "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG" -e "https://play.minio.io:9000"
$ echo "hello minio" | s3git add
Added: c7bb516db796df8dcc824aec05db911031ab3ac1e5ff847838065eeeb52d4410b4d57f8df2e55d14af0b7b1d28362de1176cd51892d7cbcaaefb2cd3f616342f
$ s3git commit -m "Commit for minio test"
$ s3git push
Pushing 1 / 1 [==============================================================================================================================] 100.00 % 0

and clone it

$ s3git clone s3://s3git-test -a "Q3AM3UQ867SPQQA43P2F" -s "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG" -e "https://play.minio.io:9000"
Cloning into .../s3git-test
Done. Totaling 1 object.
$ cd s3git-test/
$ s3git ls
c7bb516db796df8dcc824aec05db911031ab3ac1e5ff847838065eeeb52d4410b4d57f8df2e55d14af0b7b1d28362de1176cd51892d7cbcaaefb2cd3f616342f
$ s3git cat c7bb
hello minio
$ s3git log --pretty
6eb708ec7dfd75d9d6a063e2febf16bab3c7a163e203fc677c8a9178889bac012d6b3fcda56b1eb160b1be7fa56eb08985422ed879f220d42a0e6ec80c5735ea Commit for minio test

Contributions

Contributions are welcome! Please see CONTRIBUTING.md.

Key features

  • Easy: Use a workflow and syntax that you already know and love

  • Fast: Lightning fast operation, especially on large files and huge repositories

  • Infinite scalability: Stop worrying about maximum repository sizes and have the ability to grow indefinitely

  • Work from local SSD: Make a huge cloud disk appear like a local drive

  • Instant sync: Push local changes and pull down instantly on other clones

  • Versioning: Keep previous versions safe and have the ability to undo or go back in time

  • Forking: Ability to make many variants by forking

  • Verifiable: Be sure that you have everything and be tamper-proof (“data has not been messed with”)

  • Deduplication: Do not store the same data twice

  • Simplicity: Simple by design and provide one way to accomplish tasks

Command Line Help

$ s3git help
s3git applies the git philosophy to Cloud Storage. If you know git, you will know how to use s3git.

s3git is a simple CLI tool that allows you to create a distributed, decentralized and versioned repository.
It scales limitlessly to 100s of millions of files and PBs of storage and stores your data safely in S3.
Yet huge repos can be cloned on the SSD of your laptop for making local changes, committing and pushing back.

Usage:
  s3git [command]

Available Commands:
  add         Add stream or file(s) to the repository
  cat         Read a file from the repository
  clone       Clone a repository into a new directory
  commit      Commit the changes in the repository
  init        Create an empty repository
  log         Show commit log
  ls          List files in the repository
  pull        Update local repository
  push        Update remote repositories
  remote      Manage remote repositories
  snapshot    Manage snapshots
  status      Show changes in repository

Flags:
  -h, --help[=false]: help for s3git

Use "s3git [command] --help" for more information about a command.

License

s3git is released under the Apache License v2.0. You can find the complete text in the file LICENSE.

FAQ

Q Is s3git compatible to git at the binary level?
A No. git is optimized for text content with very nice and powerful diffing and using compressed storage whereas s3git is more focused on large repos with primarily non-text blobs backed up by cloud storage like S3.
Q Do you support encryption?
A No. However it is trivial to encrypt data before streaming into s3git add, eg pipe it through openssl enc or similar.
Q Do you support zipping?
A No. Again it is trivial to zip it before streaming into s3git add, eg pipe it through zip -r - . or similar.
Q Why don't you provide a FUSE interface?
A Supporting FUSE would mean introducing a lot of complexity related to POSIX which we would rather avoid.

s3git's People

Contributors

fwessels avatar gitter-badger avatar harshavardhana avatar thejacobtaylor avatar tzz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

s3git's Issues

project abandoned?!?

Hi.

no updates in ~4 years?!? so:

  • is the project abandoned?
  • is there any maintainer fork known / available?
  • any alternative known / used by someone?
    • maybe git-annex?!?

Push progress

Is there a way to show push progress ?
It seems at the moment it just sits there showing 100% while the file is being uploaded

i0x71@debian:~/gitty$ s3git push
Pushing 1 / 1 [====================================================================================================================================] 100.00%0

Compression

When we use compression before adding a file, it prevent deduplication. (the same for encryption).
It could be fine to compress data at storage level.
Is it planned ?

Project named in a misleading manner

This project does not appear to implement the git protocol. I would suggest renaming the project, as people may be confused by the current name.

Usage qurstion

The s3git-go namespace is in another repo ?

This is great. I have been using minio to do exactly the same thing funnily enough.

Does this support merging though ?

Panic in Viper library during s3git remote add

Created a user with IAM, they did not have any permissions yet. Tried to run s3git remote add with latest code. panic...

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x50 pc=0x407e3e8]

goroutine 1 [running]:
panic(0x46b69e0, 0xc8200120c0)
/usr/local/Cellar/go/1.6.2/libexec/src/runtime/panic.go:481 +0x3e6
github.com/spf13/viper.pflagValue.HasChanged(0x0, 0xc820079bf0)
/Users/jacob/work/src/github.com/spf13/viper/flags.go:41 +0x8
github.com/spf13/viper.(Viper).find(0xc8200e62a0, 0x47ad060, 0x8, 0x0, 0x0)
/Users/jacob/work/src/github.com/spf13/viper/viper.go:738 +0x14a
github.com/spf13/viper.(Viper).Get(0xc8200e62a0, 0x47ad060, 0x8, 0x0, 0x0)
/Users/jacob/work/src/github.com/spf13/viper/viper.go:461 +0x10a
github.com/spf13/viper.(Viper).GetString(0xc8200e62a0, 0x47ad060, 0x8, 0x0, 0x0)
/Users/jacob/work/src/github.com/spf13/viper/viper.go:539 +0x41
github.com/spf13/viper.GetString(0x47ad060, 0x8, 0x0, 0x0)
/Users/jacob/work/src/github.com/spf13/viper/viper.go:537 +0x43
github.com/s3git/s3git/cmd.glob.func10(0x4b31aa0, 0xc82019c930, 0x1, 0x7)
/Users/jacob/work/src/github.com/s3git/s3git/cmd/remote.go:51 +0x147
github.com/spf13/cobra.(Command).execute(0x4b31aa0, 0xc82019c8c0, 0x7, 0x7, 0x0, 0x0)
/Users/jacob/work/src/github.com/spf13/cobra/command.go:603 +0x896
github.com/spf13/cobra.(Command).ExecuteC(0x4b32100, 0x4b31aa0, 0x0, 0x0)
/Users/jacob/work/src/github.com/spf13/cobra/command.go:689 +0x55c
github.com/spf13/cobra.(Command).Execute(0x4b32100, 0x0, 0x0)
/Users/jacob/work/src/github.com/spf13/cobra/command.go:648 +0x2d
github.com/s3git/s3git/cmd.Execute()
/Users/jacob/work/src/github.com/s3git/s3git/cmd/root.go:54 +0x23
main.main()
/Users/jacob/work/src/github.com/s3git/s3git/main.go:67 +0x8ec

Branching / Merging

Is branching supported ?

Also what happens if two users have modified the same file, is there a user interactive merge ?

doesn't compile on Windows 10

Hi.

  • go version go1.14.2 windows/amd64
  • Windows 10, 1909
C:\Users\me\go\src\github.com\s3git\s3git>go install
# github.com/bmatsuo/lmdb-go/lmdb
mdb.c: In function 'mdb_env_setup_locks':
mdb.c:4853:17: warning: implicit declaration of function 'pthread_mutexattr_setrobust'; did you mean 'pthread_mutexattr_settype'? [-Wimplicit-function-declaration]
   if (!rc) rc = pthread_mutexattr_setrobust(&mattr, PTHREAD_MUTEX_ROBUST);
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~
                 pthread_mutexattr_settype
mdb.c:4853:53: error: 'PTHREAD_MUTEX_ROBUST' undeclared (first use in this function); did you mean 'PTHREAD_MUTEX_DEFAULT'?
   if (!rc) rc = pthread_mutexattr_setrobust(&mattr, PTHREAD_MUTEX_ROBUST);
                                                     ^~~~~~~~~~~~~~~~~~~~
                                                     PTHREAD_MUTEX_DEFAULT
mdb.c:4853:53: note: each undeclared identifier is reported only once for each function it appears in
mdb.c: In function 'mdb_mutex_failed':
mdb.c:362:37: warning: implicit declaration of function 'pthread_mutex_consistent'; did you mean 'pthread_mutex_init'? [-Wimplicit-function-declaration]
 #define mdb_mutex_consistent(mutex) pthread_mutex_consistent(mutex)
                                     ^
mdb.c:10193:10: note: in expansion of macro 'mdb_mutex_consistent'
    rc2 = mdb_mutex_consistent(mutex);
          ^~~~~~~~~~~~~~~~~~~~
mdb.c: In function 'mdb_cursor_put':
mdb.c:6725:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
      if (SIZELEFT(fp) < offset) {
         ^
mdb.c:6730:5: note: here
     case MDB_CURRENT:
     ^~~~
go: failed to remove work dir: GetFileInformationByHandle C:\Users\me\AppData\Local\Temp\go-build863785342\NUL: Incorrect function.

Where real files is located?

Hi,
I try this tool with Minio.
Following every step in description instead of text i used image. And it's exist in Minio folder. But when i copied this repo there is nothing, but s3git ls display that this files is exist. What is wrong?

q
q2

Google, Azure compatibility?

Google and Azure both use essentially the S3 protocol. Could you please indicate whether s3git is compatible with those cloud services?

Support partial cloning with ability to commit on partial clones

s3git mentions the potential for creating very large repositories.

Is it possible to clone only a part (subtree or even specific files) of those large repositories, and create new commits using that partial clone?

If it is already possible, that would be a good addition to the documented use case. If not, please consider this a feature request.

is possible restore a single file?

Hello, good day, I've a noob question: with git is possible restore a single file to a specific commit , with diff and reset command but these arguments are not available in s3git, how do you restore a single file?..thank you!!!...

s3git snapshots read entire (potentially huge) data files into memory with probability 1 in 64

When used to commit snapshots, s3git checks to see whether each file is stored in the snapshot deduped format. Described here

The code that performs this check for each file in the snapshot implements a quick "prefilter" using the length of each file:

// Check whether the size contains at least two keys or is a multiple of 64 bytes
if stat.Size() < KeySize*2 || stat.Size() & (KeySize-1) != 0 {
	return false, "", nil, nil
}

Any file that meets these criteria (a 1 in 64 chance) is then read entirely into memory to attempt to verify that the presumed root hash (the last 64 bytes of the file) is the hash of the concatenation of all of the bytes that precede it.

Although it may seem safe to read a valid deduped file into memory like this (an 8TB file will only produce a 100MB deduped file), the problem is that the very purpose of this check is to determine if this is in fact a deduped file or not. If it is not, then it may be a huge file (e.g. an exactly 8TB file) that meets the criteria above of len >= 128 && (len & 63) == 0.

This is obviously bad, and probably two things need to change to fix it.

  1. For safety and stability the memory used by this process needs to be limited. The use of ioutil.ReadFile(filename) (usually a bad code smell) needs to be replaced with functionality that reads the file in chunks, streaming them into the hash calculation, so the memory use is safely bounded.

  2. For performance some more stringent test needs to be made to inexpensively reject virtually all non-deduped datafiles, to avoid unnecessarily reading/hashing huge files:

    • A unique "magic number" of sufficient length could written to the beginning (or end) of a valid deduped file to be cheaply checked. This would obviously be a breaking change to the data format, but it would prevent unnecessarily reading/hashing 1 out of every 64 files.
    • Or perhaps there could be some upper-limit on the size of a valid deduped file, implying an upper-limit on the number of blocks that can belong to one file. A limit of 16 million blocks (1 GB deduped) would allow individual file sizes of ~88TB with the default blocksize. If you are dealing with truly huge files, the default block size of 5MB is probably too small, so by increasing that, you should be able to manage this limit far into the future. This limit would eliminate all files > 1GB as possible deduped files without breaking the format itself.

I'm sure there are other possibile solutions I haven't immediately thought of...

Anyway, I consider this to be a serious issue that makes the current "snapshot" function unsuitable for any dataset with large-ish files (say... bigger than 5-10% of available RAM).

Add INI file support (to Viper) for reading aws config files

Would be handy to read credentials from ~/.aws/config and ~/.aws/credentials config files in combination with a --profile option to allow easy switching between profiles (that are presumably already in place for AWS).

Needs INI support in Viper which is used in combination with Cobra for command line flags.

Add as a homebrew package

In my opinion it would be a good idea to add s3git as a homebrew package. This is surely done very quickly and OSX developers would definitely benefit from it.

What do you guys think about that idea?

crushes on remote add cmd

I'm using s3git on osx compiled from current master.
and it crushes on remote add command,
after revert back to 0.9.2, it works well

s3git remote add origin -r "s3://127.0.0.1:9000/minio/testbucket"  -a "IR7NL8BW98Q3TBN3PW8N" -s "fDhNIrJOkfCb2PbDIXCyvF7+qltlTA3mrCOwTEQ9" --endpoint="127.0.0.1:9000"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x45e1b05]

goroutine 1 [running]:
github.com/spf13/viper.pflagValue.HasChanged(0x0, 0xc0001a0cc0)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/viper/flags.go:41 +0x5
github.com/spf13/viper.(*Viper).find(0xc0001f6000, 0x478f7d2, 0x8, 0x8, 0x18)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/viper/viper.go:945 +0xd5d
github.com/spf13/viper.(*Viper).Get(0xc0001f6000, 0x478f7d2, 0x8, 0x478d701, 0xc0000bf8e0)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/viper/viper.go:632 +0x7c
github.com/spf13/viper.(*Viper).GetString(0xc0001f6000, 0x478f7d2, 0x8, 0x0, 0x0)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/viper/viper.go:687 +0x3f
github.com/spf13/viper.GetString(0x478f7d2, 0x8, 0xc0000bf8e0, 0x0)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/viper/viper.go:685 +0x41
github.com/s3git/s3git/cmd.glob..func10(0x4d56b80, 0xc00012a400, 0x1, 0x8)
	/Users/chenshaoyue/Desktop/go/src/github.com/s3git/s3git/cmd/remote.go:51 +0xb0
github.com/spf13/cobra.(*Command).execute(0x4d56b80, 0xc00012a380, 0x8, 0x8, 0x4d56b80, 0xc00012a380)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/cobra/command.go:766 +0x2cc
github.com/spf13/cobra.(*Command).ExecuteC(0x4d572a0, 0x487df70, 0x4678ea0, 0xc0001fa1e0)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/cobra/command.go:852 +0x2fd
github.com/spf13/cobra.(*Command).Execute(0x4d572a0, 0x4881220, 0xc0001a1b00)
	/Users/chenshaoyue/Desktop/go/src/github.com/spf13/cobra/command.go:800 +0x2b
github.com/s3git/s3git/cmd.Execute()
	/Users/chenshaoyue/Desktop/go/src/github.com/s3git/s3git/cmd/root.go:54 +0x2d
main.main()
	/Users/chenshaoyue/Desktop/go/src/github.com/s3git/s3git/main.go:67 +0x40b

cannot find package "github.com/hashicorp/hcl/hcl/printer"

The following error occurred when attempting to build.

takuya@takuya-MacBookPro2012mid ~ % go get -d github.com/s3git/s3git
cannot find package "github.com/hashicorp/hcl/hcl/printer" in any of:
	/usr/local/Cellar/go/1.15.8/libexec/src/github.com/hashicorp/hcl/hcl/printer (from $GOROOT)
	/Users/takuya/go/src/github.com/hashicorp/hcl/hcl/printer (from $GOPATH)
takuya@takuya-MacBookPro2012mid ~ % go version
go version go1.15.8 darwin/amd64

Put Access Key and Secret Key in a dot file

A suggestion. It could be a good idea to put the Access Key and Secret Key in a configuration file, such as a dot file ( eg .s3git.cfg ) . So we would not need to expose this information whenever you execute a command.

Snapshots that differ only by renamed or duplicated files can't be committed.

Hi thanks for this package.

First the bug: In snapshot mode, if the only difference between the current state and the previous snapshot is the addition of a duplicate file, the snapshot will fail to complete, even though the directory state has been updated (through the addition of the duplicated file).

Repro:

mkdir dup-test
cd dup-test
s3git init
# Initialized empty s3git repository in <directory>
head -n 100000 /dev/urandom > file1.bin
s3git snapshot create . -m 'Initial version'
# [commit <long hash>]
cp file1.bin file1.bin.bak
s3git snapshot create . -m 'Added backup file'
# No changes to snapshot
s3git log -p
# <long hash> Initial version

I also have a couple of quick questions:

How do you enable the rolling hash deduplication? It does not appear to be on by default. If I continue the example above by modifying the end and then the beginning of the file:

du -sh .s3git
# 24M	.s3git
echo 'woot' | cat file1.bin - > file1.bin.bak
s3git snapshot create . -m 'Added post-wooted backup file'
# [commit <long hash>]
du -sh .s3git
# 29M	.s3git         # The last chunk changed, as expected
echo 'woot' | cat - file1.bin > file1.bin.bak2
s3git snapshot create . -m 'Added pre-wooted backup file'
# [commit <long hash>]
du -sh .s3git
# 53M	.s3git         # The every chunk changed, NOT as expected

It appears that appending to files will be deduplicated, but prepending (or otherwise modifying) the file will not be. That doesn't fit my definition of "rolling hash" (e.g. how rsync or rabin file chunking work). Is this implemented? If so, how to enable it?

Finally, a general question that may be answered automatically by your response, but I'm curious about the status of this package. Is it being maintained? Are there plans to move forward beyond the "pre-release" and "use at your own peril (for now)" stage? It looks like a tremendously useful package that is currently more fully baked in comparison to the newer Noms or Dat projects, which have somewhat overlapping goals and approaches...

Thanks in advance for your timely response!

Is s3git project still active? - s3git future

I'm wondering if this project is still active? the last commit was 2 years ago. I really like s3git and appreciate your effort but I'm just curious about the future of this project.

Not returning error on invalid auth

Is it expected behavior to not return an error on invalid auth credentials ?
I have just spent 10 minutes trying to figure out how come clone makes an empty directory after my commit 😄

i0x71@debian:~$ s3git clone s3://gitty -a 'blah' -s 'blah' -e "http://xxxxxxx:9000"
Cloning into /home/i0x71/gitty
Done. Totaling 0 objects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.