Coder Social home page Coder Social logo

s3md5's Introduction

s3md5

Bash script to calculate Etag/S3 MD5 sum for very big files uploaded using multipart S3 API

Description

Calculates the Etag/S3 MD5 Sum of a file, using the same algorithm that S3 uses on multipart uploaded files. Specially usefull on files bigger than 5GB uploaded using multipart S3 API. You can check file integrity comparing S3 Etag with the value returns by 's3md5 file'

Usage

Usage : $APP <size> <file>

- size : Multipart chunk size in MB
- file : Calculate Etag of this file

Example

  • Use 15 MB chunk size (as default in s3cmd .s3cfg config file)

    ~> s3md5 15 myfile.dat

LICENSE

s3md5 is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

s3md5 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with s3md5. For license details you can read LICENSE file. Also you can read GPLv3 from GNU Licenses.

AUTHOR

Copyright (C) 2013
Antonio Espinosa
Email : aespinosa at teachnova dot com
Twitter : @antespi
LinkedIn : Antonio Espinosa
Web : Teachnova

s3md5's People

Contributors

antespi avatar carlomendola avatar jamesoff avatar tamsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

s3md5's Issues

file names with spaces in them

passing files with spaces in there names break the tool

s3md5 15 "/path/to/file with spaces"
du: cannot access β€˜/path/to/file’: No such file or directory

a small fix below.

--- s3md5-master/s3md5  2014-08-06 09:05:31.000000000 +0000
+++ /usr/bin/s3md5      2014-08-22 16:34:08.061000000 +0000
@@ -163,7 +163,7 @@
   $ECHO_BIN $1 | cut -d'-' -f 2
 }
 get_filesize_in_bytes() {
-  du -m $1 | cut -f 1
+  du -m "${1}" | cut -f 1
 }
 get_chunk_size() {
   local filesize=$1
@@ -179,7 +179,7 @@

   SIZE="$1"
   FILE="$2"
-  local FILESIZE=`get_filesize_in_bytes $FILE`
+  local FILESIZE=`get_filesize_in_bytes "${FILE}"`
   if [ $DEBUG -eq 1 ]; then
     $ECHO_BIN "File size: $FILESIZE MB"
   fi

wrong temp file name

Should use double quote or cannot get the process id.
It may cause trouble during multi-thread condition.

ERROR_FILE='/tmp/s3md5-error-$$.out'
SUM_FILE='/tmp/s3md5-md5sumlist-$$.out'
BIN_FILE='/tmp/s3md5-md5bin-$$.out'

=>

ERROR_FILE="/tmp/s3md5-error-$$.out"
SUM_FILE="/tmp/s3md5-md5sumlist-$$.out"
BIN_FILE="/tmp/s3md5-md5bin-$$.out"

Automatically calculate the chunk size from the etag

I am not entirely sure this is correct but it worked to calculate the chunk size for my files:

chunks=$(echo $etag | cut -d'-' -f 2); filesize=$(du -b $file | cut -f 1); echo "($filesize / (1024 * 1024)) / $chunks" + 1 | bc

So with that maybe your script can add an option like s3md5 -etag $etag filename so you don't need to provide the chunk size.

I'll try to make the time to do a PR.

Runs forever on small files

Running against a 1.1MB file with a 50MB chunk setting I get...

SUM for part 1 (0 to 50 MB) ... OK - 2f3a23a755356961d9882ea224f98041
SUM for part 2 (50 to 100 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 3 (100 to 150 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 4 (150 to 200 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 5 (200 to 250 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 6 (250 to 300 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 7 (300 to 350 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 8 (350 to 400 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 9 (400 to 450 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 10 (450 to 500 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 11 (500 to 550 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 12 (550 to 600 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 13 (600 to 650 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 14 (650 to 700 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e
SUM for part 15 (700 to 750 MB) ... OK - d41d8cd98f00b204e9800998ecf8427e

to infinity. Debugging now, but just reporting the issue as a heads up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.