larrabee / s3sync Goto Github PK

View Code? Open in Web Editor NEW

463.0 13.0 71.0 12.43 MB

Really fast sync tool for S3

License: GNU General Public License v3.0

Go 99.61% Dockerfile 0.39%

sync s3 golang multithreading amazon ceph-radosgw sync-s3-bucket

s3sync's People

Stargazers

Watchers

Forkers

vkstwts akatrevorjay dennisjay siredwin shinroo p3t3rp4rk3r attacker34 ronaldoedy sts0mrg0 nickstefan mathieulesniak corner-w hgoldwire rock913 tirasmuturi kevin-kariuki etsangsplk tariq1890 sergey-shambir disaac jonpeel ngitau dkerwin linuxus m9rco sandeepmendiratta 5l1v3r1 e271828- ttys3 sgprabhash nbisweden shoemoney uncelvel jonathangueedes savealive xavierdavidgarcia ka4a microsoft871 temelt knowshan generalcommission deinstapel ccmixter doytsujin hoannv46 ollivr loftwah bigkraig sdblepas widesky isabella232 ariveamar huongbn pathcl djun jacobjohansen tuananh borisssmidtcet cetoptimization gustabmac convoi goodrx leevydanomalik carltonxu gmh5225 jeessy2 forcemeter superleo sanchoneibr pvillamil

s3sync's Issues

Files copied over empty?

Using s3sync to copy items to a new ceph cluster. Items being copied over are empty. I can get them and download them locally, but they are empty.

I have tried using wasabi and another ceph cluster as source.

Thoughts?

support the point-in-time on a versioned S3 bucket

@larrabee, would it be possible to add a point in time option to your utility?
we have searched high and low for a good s3 sync. But none do what we need.

s3-pit-restore - supports point in times, but it does not copy s3 object's metadata, and it copies to the local file system and then uploads to s3. So it is slow, expensive and easily will run out of local disk space
s3s3mirror - is doing bucket to bucket, seems to be quite sophisticated in handling large objects and parallelizing but is not doing point time
AWS CLI s3 sync does not support point time and is reported to be slow, but can do an incremental sync, not just a copy

s3cmd's ``--no-check-certificate`` or equivalent is missing

It seems that this tool is missing equivalent of s3cmd's --no-check-certificate flag and it would be nice to have.

Support specific AWS session token

I'm authenticating with AWS via Okta (saml2) - works fine with the stock aws client but had to tinker on a fork to get this working for s3sync - only issue is right now s3sync always provides a blank third argument for the AWS credentials:

JonPeel@574204e?diff=unified#diff-44bbcc9d983da65f32aa64529eb190e2R45

Commit history here not clean / buggy - just using it to highlight where I needed changes. Could clean up for a PR - would you want a command line argument (which is the route I went) or some other means or specifying?

Jon

fatal error: runtime: out of memory

I'm syncing a folder that has about 500k objects. A few of these objects are about 1GB each. As soon as s3sync gets to them, it bombs with OOM. The sync ran with 30 workers on an instance with 4GB RAM and no swap.

It would help if the sync streamed file contents instead of reading them wholesale into memory, at least for big files.

plz add ARMv8 support

thanks.

Can't sync keys within one bucket?

EDIT for clarity:
I'm able to use the aws CLI and keys like this:

aws --endpoint-url https://s3.amazonaws.com s3 sync s3://bucket/key s3://bucket/other_key

But when I try something similar here

s3sync --tk KEY --ts SECRET --tr us-west-1 --sk KEY --ss SECRET --sr us-west-1 --se https://us-west-1.s3.amazonaws.com --te https://us-west-1.s3.amazonaws.com s3://BUCKET/key/ s3://BUCKET/OTHER_KEY/ -d

I get an error about The authorization header is malformed; the region 'us-west-1' is wrong; expecting 'us-east-1'

Synced: 0; Skipped: 0; Failed: 0; Total processed: 0onHeaderMalformed: The authorization header is malformed; the region 'us-west-1' is wrong; expecting 'us-east-1'

I'm 1000% sure my bucket is in us-west-1. The command works with aws cli. The keys work. The S3 bucket is in us-west-1. Any idea what the issue might be?

Not able to transfer a file with special character

/nfs/DS12BayD/Imgsrv/@juno/1509/150924XX/1129US00/1129US07/CMP/1129US07_Tag.csv

Not able to sync anything with any special character such as file above.

--debug and --sync-log do not provide any output

Syncing from S3 to Ceph using the following command. The result does not print any meaningful debug or sync log. Note that the DEBUG output only shows up after letting the command run for a little before SIGINT. This has been tried without progress bar as well. Tested on both Linux and Mac with identical results:

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.5
BuildVersion:	19F101

$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.1 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.1"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.1 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.1:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.1
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.1"

s3sync --sync-log --debug -p --filter-modified --tk REDACTED --ts REDACTED --sk REDACTED --ss REDACTED --te 'http://ceph.internal:8080' -w 12 s3://folder/daydate=2020-06-13/ s3://folder/daydate=2020-06-13/```
INFO[0000] Starting sync                                
  WARN[0017] Receive signal: interrupt, terminating       
DEBU[0019] Pipeline step: ListSource finished           
DEBU[0022] Pipeline step: FilterObjectsModified finished 
DEBU[0022] Pipeline step: LoadObjData finished          
DEBU[0022] Pipeline step: UploadObj finished            
DEBU[0022] Pipeline step: Logger finished               
DEBU[0022] Pipeline step: Terminator finished           
DEBU[0022] All pipeline steps finished                  
DEBU[0022] Pipeline terminated                          
INFO[0022] 0 ListSource: Input: 0; Output: 7000 (308 obj/sec); Errors: 0 
INFO[0022] 1 FilterObjectsModified: Input: 7000; Output: 0 (0 obj/sec); Errors: 0 
INFO[0022] 2 LoadObjData: Input: 0; Output: 0 (0 obj/sec); Errors: 0 
INFO[0022] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0 
INFO[0022] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0 
INFO[0022] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0 
INFO[0022] Duration: 22.756570531s                      
WARN[0022] Sync Aborted

The Ceph version is Mimic, which does not support ListObjectsV2 (has been a constant source of frustration). There may be other APIs that are not fully/properly supported that might be causing issues.

s3sync fails with few files saying NoSuchKey: The specified key does not exist.

s3sync works with few s3 files and for few, it fails saying NoSuchKey: The specified key does not exist, anything to fix form s3 permission for am I missing any option with s3sync?

error log:
ERRO[0000] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: content/directpath/health-check/abd/details/js.properties sync error: NoSuchKey: The specified key does not exist.
status code: 404, request id: xxxxxxxxxxxxxx, host id: v5dwefwefwefmkwmfkwa98y79790978ZKpjUNCAkKhE+8697fqbdqtdgbxbx0=, terminating

TCP Connection closed

Built the program using "go build"
Requirement was to sync around 7 GB of data from one s3 bucket to other.
Started sync using below command

./s3sync --sk "<source_key>" --ss "<source_secret>" --st "<source_token>" --tk "<taregt_key>" --ts "<target_secret>" --tt "<target_token>" -w 128 s3://<source_bucket_name>/ s3://<target_bucket_name>/

the sync was running and after a while it stopped with error message.
"tcp connection closed"

Memory leak

Hey,

I have experienced a memory leak. this is my scenario.
I am syncing 20TB of data from one bucket to another.

The server running the command has 36 CORES 72gb RAM. I am running the following parameters

./s3sync --debug --sync-log --sync-progress --sk $SK --ss $SS --sr $SR --tk $TK --ts $TS --tr $TR -w 768 s3://SOURCEBUCKET/FOLDERACCESS/ s3://TARGETBUCKET

This command runs for a few minutes then after a while the RAM grows exponentially and breaks the command. I would use supervisor to re spawn the task but i would want to know at what point does the binary do garbage collection.

Ignores secret key command options when aws cli isn't installed

I just played around with the s3sync and noticed that s3sync doesn't run standalone on a server without the aws cli installed on the machine. It looks like it ignores the --sk and --ss options.

When running s3sync without aws cli installed this results in a error:
DEBU[0002] Putting obj <some_file> failed with err: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors

How to reproduce:

Clean install of ubuntu 18.04 LTS
Use s3sync 1.09
Copy a s3 bucket to another s3 bucket

./s3sync --sk --ss --sr us-west-2 --tr eu-central-1 -w 128 s3://source-bucket s3://target-bucket -d -r 10

"Filter newer"

It would be great if there were an option to not sync files that exist at the destination and are newer than the source file. I think this is even the default behavior of aws s3 sync. This would really help with incremental syncing. --filter-modified seems to actually do double work, instead of saving work in the case of incremental syncing.

Storage class, delete removed options

Hello!
Very nice project, but I wonder, how can I specify storage class?

In other words, i have something like this for s3cmd:

s3cmd --storage-class=COLD --delete-removed --acl-private --guess-mime-type --skip-existing sync /backup/ s3://backup/

Is it possible to reproduce the same action with s3sync?

EC2 Instance profile is not detected

Unless I'm missing something, it appears that the implementation for the AWS keychain is not implemented correctly. For our purposes, it is essential that we grant rights using an EC2 Instance Role. This is correctly configured as I can use aws s3 ls to show my buckets / objects without ~/.aws existing at all.

Unfortunately, this doesn't work with s3sync. Instead we get the following error:

INFO[0000] Starting sync                                
DEBU[0000] S3 listing failed with error: NoCredentialProviders: no valid providers in chain. Deprecated.
        For verbose messaging see aws.Config.CredentialsChainVerboseErrors 
Synced: 0; Skipped: 0; Failed: 0; Total processed: 0
Avg syncing speed: 0 obj/sec; Avg listing speed: 0 obj/sec
FATA[0003] Listing objects failed: NoCredentialProviders: no valid providers in chain. Deprecated.
        For verbose messaging see aws.Config.CredentialsChainVerboseErrors

The official AWS go SDK documentation supports the fact that this should work out of the box, as long as the application implements the keychain correctly: https://github.com/aws/aws-sdk-go#configuring-credentials

It appears that s3sync does not specify EC2 Instance Roles as a credential provider:

s3sync/storage-s3st.go

Lines 39 to 43 in 418ed11

    
           cred := credentials.NewChainCredentials( 
        
           	[]credentials.Provider{ 
        
           		&credentials.EnvProvider{}, 
        
           		&credentials.SharedCredentialsProvider{}, 
        
           	})

It looks like this should include:

Example of ChainProvider to be used with an EnvProvider and EC2RoleProvider. In this example EnvProvider will first check if any credentials are available via the environment variables. If there are none ChainProvider will check the next Provider in the list, EC2RoleProvider in this case. If EC2RoleProvider does not return any credentials ChainProvider will return the error ErrNoValidProvidersFoundInChain

creds := credentials.NewChainCredentials(
    []credentials.Provider{
        &credentials.EnvProvider{},
        &ec2rolecreds.EC2RoleProvider{
            Client: ec2metadata.New(sess),
        },
    })

// Usage of ChainCredentials with aws.Config
svc := ec2.New(session.Must(session.NewSession(&aws.Config{
  Credentials: creds,
})))

from: https://docs.aws.amazon.com/sdk-for-go/api/aws/credentials/#ChainProvider

Question about sync

Hi,

Thanks for sharing this tool. It looks great!

One of my collegue used s3sync. What he observed that it is very fast! But it takes equal amount of time to sync if the copy process is somehow canceled. For example it takes 1/2 hour to copy 900.000 out of 1million objects. When copy is canceled, s3sync takes another 1/2 hour to compare 900.000 objects and copies 100.000 left.

Would you mind if I ask whether we are missing something here?

Thanks for your time.

Unexpected EOF when synching bucket to local fs

I am using the version 2.25, when i run the sync with these parameters

./s3sync -d --sync-log --sk **** --ss *** --se hostaddr -w 20 s3://test fs://./data/

it is working some time but after some downloads getting this error.

DEBU[0050] Recv pipeline err: object: )gFE5OB2/11.xo1dBK1OamWlMJsb.rnd sync error: unexpected EOF ERRO[0050] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: )gFE5OB2/11.xo1dBK1OamWlMJsb.rnd sync error: unexpected EOF, terminating

Same for version 2.30
INFO[0901] Sync file Content-Type=application/octet-stream key=9zabR6AR/17.yXJzkzaeeS9gDKpn.rnd size=10485760 DEBU[0903] Recv pipeline err: object: 9Agq8lXt/82.Lit0v8qT2Ptof0H).rnd sync error: unexpected EOF ERRO[0903] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: 9Agq8lXt/82.Lit0v8qT2Ptof0H).rnd sync error: unexpected EOF, terminating

and also that's not good to not have any details about the error, (debug log enabled, retry enabled, sync-log enabled )

Thanks in advance

Connection reset by peer on s3 -> s3 copy

INFO[0000] Starting sync
ERRO[0500] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: bucket_name/xyz123.tif sync error: read tcp 172.31.31.193:40284->52.216.139.13:443: read: connection reset by peer, terminating
INFO[0500] 0 ListSource: Input: 0; Output: 5000 (10 obj/sec); Errors: 0
INFO[0500] 1 LoadObjData: Input: 5000; Output: 3692 (7 obj/sec); Errors: 1
INFO[0500] 2 ACLUpdater: Input: 3692; Output: 3692 (7 obj/sec); Errors: 0
INFO[0500] 3 UploadObj: Input: 3692; Output: 3657 (7 obj/sec); Errors: 0
INFO[0500] 4 Terminator: Input: 3657; Output: 0 (0 obj/sec); Errors: 0
INFO[0500] Duration: 8m20.697078919s
ERRO[0500] Sync Failed

Large bucket, s3sync copies no files

Latest version:
$ ./s3sync --help
Really fast sync tool for S3
Version: 1.14, commit: 418ed11, built at: 2019-04-18T14:46:19Z

Command line used:
./s3sync -sk xxx -ss yyy -sr us-west-1 -tk aaa -ts bbb -tr us-west-2 -w 256 -d s3://foo/ s3://bar/

Tool does nothing:
INFO[0000] Starting sync
Synced: 0; Skipped: 0; Failed: 0; Total processed: 0
Avg syncing speed: 0 obj/sec; Avg listing speed: 0 obj/sec
INFO[0001] Sync finished successfully
INFO[0001] Synced: 0; Skipped: 0; Failed: 0; Total processed: 0
INFO[0001] Avg syncing speed: 0 obj/sec; Avg listing speed: 0 obj/sec; Duration: 1 sec

Is s3sync re-download files if present on destination

Target parameters ignored

Hey, am trying to configure an AWS sync between 2 buckets in different region and this tool only vaidates the source but ignores the target.

am running the command in a bash script.

This is what i run

export TK=TARGETKEY
export TS=TARGETSECRET
export TR=TARGETREGION
export SK=SOURCEKEY
export SS=SOURCESECRET
export SR=SOURCEREGION

./s3sync_bck1 --debug --sync-log --sync-progress --sk $SK --ss $SS --sr $SR --tk $TK --ts $TS --tr $TR -w 128 s3://SOURCEBUCKET s3://TARGETBUCKET

when i run this i get
INFO[0000] Starting sync
DEBU[0001] Listing bucket finished
DEBU[0001] Pipeline step: ListSource finished
DEBU[0001] Pipeline step: LoadObjData finished
DEBU[0001] Pipeline step: UploadObj finished
DEBU[0001] Pipeline step: ACLUpdater finished
DEBU[0001] Pipeline step: Logger finished
DEBU[0001] Pipeline step: Terminator finished
DEBU[0001] All pipeline steps finished
DEBU[0001] Pipeline terminated
INFO[0001] 0 ListSource: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 1 LoadObjData: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 2 ACLUpdater: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] Duration: 1.870510625s
INFO[0001] Sync Done

My Bucket have not been synced and if i change the keys for my destination , they have no effect. A correct key and incorrect key gives the same result.

Why is the s3sync not working bearing in mind this is a straight forward setup. Is there anything that am missing that i should know about.

filter-not-exist really slows down sync

I'm trying to get the s3sync to upload non existing files at a reasonable speed.

s3sync --tk "$AWS_ACCESS_KEY_ID" --ts "$AWS_SECRET_ACCESS_KEY" --tt "$AWS_SESSION_TOKEN" --tr "us-west-2"  --filter-not-exist -w 128 -p /source/Small/ s3://bucket/Small/

Adding the --filter-not-exist option will cause the images processed to drop from ~150 - 200 objects/sec to just 5 objects/sec. I get that you now need to do an object exists check on the s3. Increasing the -w does seem to help. Is there any way to speed thing up?

Any chance of existing file on FS getting overwritten with incomplete content after sync

Hi,

s3sync is super impressive to me regarding its speed and active maintenance, thanks for open-sourcing this tool.

s3sync/storage/fs/fs.go

Line 138 in 44119b3

    
           f, err := os.OpenFile(destPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, st.filePerm)

According to this line, I'm thinking of the following use case:

Sync target: an existing file on FS, e.g. a 100MB CSV file,
Sync source: a new version of the file above on S3, e.g. 101MB CSV file;

When syncing starts, the target 100MB file on FS is truncated to 0 bytes at the beginning. And then, is there any chance the following cases happen after syncing?

io.Copy() fails for some reason, the target file remains at 0 bytes.
Assume the FS hardware is very very slow when writing, end-user reads the target file before io.Copy() finishes its job, then he only gets partial content unexpectedly, e.g. 50MB of 101MB.

Thanks.

Sync two S3 buckets in two different AWS accounts using AWS Profile + assumeRole to access S3 API?

Thanks for creating this great tool!

I attempted to use s3sync with a setup that requires an assumeRole to access the target bucket but apparently this is not supported (yet)?

More particular the setup looks like this:

Two AWS account, each containing an S3 bucket:

Account 12345: source-bucket
Account 67890: destination-bucket

Source bucket received a policy like:

        {
            "Sid": "DelegateS3Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::DESTINATION_ACCOUNT_ID:root"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::destination-bucket/*",
                "arn:aws:s3:::destination-bucket"
            ]
        }

to allow the destination bucket to list and pull the files.

I then used the aws cli tool to verify things work:

export AWS_PROFILE=my.account.test
aws s3 sync s3://source-bucket s3://destination-bucket

and it was syncing the objects quite happily ... just not as fast as I'd like to ;-)

My questons or potential feature request now is to be able to use s3sync in this setup.

Access denied for S3 bucket where only child objects are only permitted

Hi,

I have a case where i cannot access the PARENT BUCKET but i can list the CHILD OBJECTS.
ie
[root@localhost ~]# s3cmd ls s3:///DIRECTORY
ERROR: Access to bucket '' was denied
ERROR: S3 error: 403 (AccessDenied): Access Denied

if i put a trailing slash, i have access to the CHILDREN OBJECTS in the bucket

[root@localhost ~]# s3cmd ls s3:///DIRECTORY/
DIR s3:///DIRECTORY/CONTENT1/
DIR s3:///DIRECTORY/CONTENT2/
DIR s3:///DIRECTORY/CONTENT3/
DIR s3:///DIRECTORY/CONTENT4/
DIR s3:///DIRECTORY/CONTENT5/
DIR s3:///DIRECTORY/CONTENT5/

Using this case scenario, when i try to run my s3sync, i get this error since s3:///DIRECTORY/ is my source bucket. The trailing slash is very important.

This is what i get when i run s3sync
DEBU[0001] S3 listing failed with error: AccessDenied: Access Denied
status code: 403, request id: 91DF1DF0618DA652, host id: B646GOQYS6s9ISVvTvUJqYjWpBmg91j+vrXGvI9nNzDMZEjdsYT7CCsKV1f9nToaLRTSn+UQR54=
DEBU[0001] Pipeline step: ListSource finished
DEBU[0001] Recv pipeline err: AccessDenied: Access Denied
status code: 403, request id: 91DF1DF0618DA652, host id: B646GOQYS6s9ISVvTvUJqYjWpBmg91j+vrXGvI9nNzDMZEjdsYT7CCsKV1f9nToaLRTSn+UQR54=
ERRO[0001] Sync error: pipeline step: 0 (ListSource) failed with error: AccessDenied: Access Denied
status code: 403, request id: 91DF1DF0618DA652, host id: B646GOQYS6s9ISVvTvUJqYjWpBmg91j+vrXGvI9nNzDMZEjdsYT7CCsKV1f9nToaLRTSn+UQR54=, terminating
DEBU[0001] Pipeline step: LoadObjData finished
DEBU[0001] Pipeline step: ACLUpdater finished
DEBU[0001] Pipeline step: UploadObj finished
DEBU[0001] Pipeline step: Logger finished
DEBU[0001] Pipeline step: Terminator finished
DEBU[0001] All pipeline steps finished
DEBU[0001] Pipeline terminated
INFO[0001] 0 ListSource: Input: 0; Output: 0 (0 obj/sec); Errors: 1
INFO[0001] 1 LoadObjData: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 2 ACLUpdater: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] Duration: 1.125876588s
ERRO[0001] Sync Failed

can't stop the program once it starts

Once i start a sync i can't kill it with ctrl+c i instead have to use kill -9 to stop it.

root@raspberrypi1:~# s3sync --version
VersionId: 2.30, commit: e73ac455ec3101f6f0e45453559adb24ce8ab2da, built at: 2021-04-20T14:58:16Z


root@raspberrypi1:~# s3sync \
    --w 30  \
    --tk ******************  \
    --ts ******************   \
    --tr ******************  \
    --te ******************   \
    fs:///tmp/  \
    s3://test  \
    --sync-log
INFO[0000] Starting sync
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
INFO[0000] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/PI size=25550
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/TRMB size=49357
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/CMBM size=31311
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/PLAB size=27209
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/CDK size=33491
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/MKSI size=30689
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/HCAT size=30486
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/SSYS size=48670
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/OSPN size=35094
INFO[0001] Sync file                                     Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/NVEC size=3488
^Z

failed to decode REST XML response

Hi, I'm facing the issue below with newest release verion

time="2019-04-02T08:31:45Z" level=info msg="Starting sync\n"
time="2019-04-02T08:31:45Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:46Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:47Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:48Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:49Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:50Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:51Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:53Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:54Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:55Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:56Z" level=fatal msg="Listing objects failed: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008\n"

here is my command :

./s3sync --tr ap-southeast-1 --tk xxxxxxxxx --ts xxxxxxxxx \
	 --sr ap-southeast-1 --sk xxxxxxxxx --ss xxxxxxxxx \
	 -w 128 s3://old-bucket s3://new-bucket-d -r 10

is there something I did wrong ?

compare md5 checksum between locations before sync

It would be cool to have an optional flag that compares the md5 checksum between locations before syncing, this would reduce a lot wasted time re-syncing a directory.

go package

It'd be amazing if this could made into a go package so that it can be imported into another project!

S3sync Statistics for Amount of Data Transferred and Speed of Transfer

Hi All,
Its a wonderful project to sync the S3 data from one location to Another Location.
Just I have few Questions.
Is it possible to Get how much bytes of Data were transferred ?
Is it possible to get at what speed data were transferred?

Thanks,
Kannan V

sync header info when sync s3 to s3

Can I also sync Content-Type when I sync S3 to S3 like aws s3 sync?

Lack of error handler to godirwalk stops sync process

Not all the data I'm syncing is mine; I can view most but occasional hit a permission denied error. The default for godirwalk is to stop the walk it seems. Can we add an error handler? Might need a bit more thinking about than the below quick change

JonPeel@3dad851

Jon

Failed to copy file names not printed even with debug turned on

I'm copying files from one bucket to another and some of my files got broken on service provider side with HTTP 500 error result instead of file.
I want to know file names that s3sync failed to copy but i can't get it even with debug and logs turned on by command:

s3sync --debug --sync-log --sync-progress s3://from/broken_file_name s3://to

I get such results in console:

INFO[0000] Starting sync
DEBU[0000] Listing bucket finished
DEBU[0000] Pipeline step: ListSource finished
DEBU[0001] Pipeline step: LoadObjData finished
DEBU[0001] Pipeline step: ACLUpdater finished
DEBU[0001] Recv pipeline err: InternalError:
status code: 500, request id: , host id:
ERRO[0001] Sync error: pipeline step: 1 (LoadObjData) failed with error: InternalError:
status code: 500, request id: , host id: , terminating
INFO[0001] 0 ListSource: Input: 0; Output: 1 (1 obj/sec); Errors: 0
INFO[0001] 1 LoadObjData: Input: 1; Output: 0 (0 obj/sec); Errors: 1
INFO[0001] 2 ACLUpdater: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] Duration: 1.21028043s

Is it possible to improve error messages?

Request signature does not match

Error syncing two bucket in different account

ERRO[0000] Sync error: pipeline step: 0 (ListSource) failed with error: SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.

x509: certificate signed by unknown authority, terminating

I try this command in aws China region but got certificate issue:
docker run --rm -ti larrabee/s3sync --tk xxxxxx --ts xxxxxxxxxxxxxx --sr cn-northwest-1 --tr cn-north-1 --sk xxxxx --ss xxxxxxx -w 128 s3://s3-service-dev/folder/ s3://s3-dev-rawdata/csv/cpoc/s3-service-dev/

ERRO[0000] Sync error: pipeline step: 0 (ListSource) failed with error: RequestError: send request failed
caused by: Get https://s3.cn-northwest-1.amazonaws.com.cn/s3-service-dev?encoding-type=url&max-keys=1000&prefix=s3-service-dev%2F**:x509: certificate signed by unknown authority, terminating**

Is this good tools support China region please ?

Improve package documentation

For each of the packages that you have created, i.e., pipeline, collection and storage. Would it be possible for you to improve the documentation as to how each of these could be used? Having comments compatible with godocs would be really nice! I'm looking to build my own cli around the packages you created, very similar to what you have done in ./cli except that I'd like to add the ability to push metrics. At this point I'm looking at re-using a lot of your cli code to get this accomplished 😄

keys / files with spaces are not processed correctly

The awscli s3 command lets a user put / get objects with spaces in the name. This is unfortunate, since it allows badly named files to be put into S3 simply by quoting the name.

For example:

aws s3 ls 's3://some-bucket/prefix/this file has spaces in the name(1).pdf'

s3sync does not handle this situation well, and exits with errors:

DEBU[0002] Putting obj prefix/this file has spaces in the name(1).pdf failed with err: mkdir /path/to/prefix/this: not a directory

Since we have a collection of arbitrarily named files, and come from sources outside our control, we can't guarantee that files we wish to copy do not contain spaces or other problematic characters in the names. I haven't had an opportunity to test it, but I would suspect that this problem would happen when putting / getting from local or s3 alike.

Access denied on attempt to copy file from local fs to s3

level=error msg="Sync error: pipeline step: 3 (UploadObj) failed with error: object: aaa.xml sync error: AccessDenied: Access Denied\n\tstatus code: 403, request id: 57064C841C470CAA, host id: iOGzu8d04mtG5r8U4+IACYtpkFcywDBn0BmRB/mP06mXhunPYsqWux3j9tO0GXD5AFBV2G2PxOg=,

the error occur in my docker container running s3sync binary from command line. I can complete the copy operation by aws s3 sync on the same objects, but s3sync fails. It seems all the permissions are in place...

Don't sync files that haven't changed

It seems that the current implementation doesn't do any metadata checking between the source and destination objects? Objects that haven't changed at the source shouldn't have to be transferred to the destination, especially when dealing with large data files. 🤔

Destructive Sync?

I would love to be able to destroy files that do not exist in the source. I have a need to sync a large number of files but also clean up old files. So far this is the fastest tool I have found I just wish there was a destructive option.

I am thinking about learning some go so I can implement it myself but that would take a long time.

MacOS arm64 build?

Hello!

First, thank you so much for providing s3sync! It fills a particular but significant niche for those of us that can't make use of the awscli "copy" command for whatever reason - either because the selection criteria are too coarse or, as in our case, because we are moving resources between s3 and other technologies, such as ceph.

One request for your releases, can you provide an arm64 release for macOS as well?

Thanks again!

Has this been tested on Windows yet?

I tried to build it on windows.
the EXE builds fine, but when trying sync a local dir (with sub dirs), the FS path is not able to locate any files on the local file system.

s3sync --tk zz--ts yyyy-w 128 fs://e:/some/path/ s3://bucket-in-s3/path/
the error i get is

sync error: open e:\\some\\path\\e:\\some\\path\\folder\\somefile.pdf: The filename, directory name, or volume label syntax is incorrect., terminating
see how the path is appended twice...

What am I doing wrong?

Errors in copying large buffers

hi,

i'm getting strange errors while trying to sync files from a very large bucket (~300Gb) :

during the sync process , it seems that very often s3sync creates a phantom file into the local fs with the same path and name of a bucket subdir (?) : this file is not present in the bucket.

this behaviour makes s3sync unable to copy the files into the destination's directory because there is already a file named as the dir itself....
below are the recursive logs that i'm getting for all the files in the /lots_img/ subfolder .

WARN[7141] Failed to sync object: /lots_img/7/501/810.jpg, error: mkdir /home/media/lots_img/7: not a directory, skipping 
DEBU[7141] Recv pipeline err: object: /lots_img/7/503/146.jpg sync error: mkdir /home/media/lots_img/7: not a directory 
...

this is the command i used to sync from the s3 bucket :
./s3sync -p -d -f skip --sk <key> --ss <secret> --sr <region> -w 128 s3://<bucket> fs:///<folder>

thank you in advance,
Ken

Usage with s3 compatibile storage.

Is possible to use this tool with a s3 compatible storage such as seeweb.it or only with Aws s3?

OpenBSD: syscall.ENODATA does not exist.

To get s3sync to work/compile on OpenBSD, you have to replace syscall.ENODATA with syscall.ENOENT in two places in storage/storage-fs.go.

Impressed of speed!
s3cmd in python took me ~45m for 1.2GB and s3sync with 2.4GB made in ~6m (1 cpu & 16 workers)!!

Directory sync to local directory fails

Hi, just attempting to use this tool, as it's the only one I've found with an option like "--filter-after-mtime", but not having much success:

s3sync --sk XXXXXXXXXXXXXXXXXXXXXX -ss YYYYYYYYYYYYY --sr eu-west-1 -w 4 s3://my-bucket/directory/ fs:///tmp/s3sync/
INFO[0000] Starting sync
ERRO[0000] Sync error: pipeline step: 1 (LoadObjData) failed with error: object:  sync error: NoSuchKey: The specified key does not exist.
        status code: 404, request id: 46C3367B66F3E7F0, host id: 1buVDVaLEkZYJ79Kfz84J5zlRQZdgCA0bhgxQh7luygIWGne/YfQfLMZ0flJzfdrpYcgaP+AcL8=, terminating
INFO[0000] 0 ListSource: Input: 0; Output: 1000 (2109 obj/sec); Errors: 0
INFO[0000] 1 LoadObjData: Input: 1000; Output: 1 (2 obj/sec); Errors: 1
INFO[0000] 2 ACLUpdater: Input: 1; Output: 1 (2 obj/sec); Errors: 0
INFO[0000] 3 UploadObj: Input: 1; Output: 1 (2 obj/sec); Errors: 0
INFO[0000] 4 Terminator: Input: 1; Output: 0 (0 obj/sec); Errors: 0
INFO[0000] Duration: 474.239263ms
ERRO[0000] Sync Failed

One object from the source "directory" is copied successfully, but no more.

Raspberry Pi Distributable?

Hey there,

I'm trying to migrate a cronjob from my OSX laptop to my Pi, but am running into the classic cannot execute binary file: Exec format error from which the Pi suffers from time to time.

I tried to build from source (and tried to use docker), but am getting errors with my Go installation. Those errors aren't tied to s3sync, but I can't compile Golang on my rbpi - with all signs pointing to a reinstall of Ubuntu, which I'd strongly like to avoid.

My Raspberry Pi is the 3b+ with aarch64.

Thanks!

sync error: InvalidURI: Couldn't parse the specified URI.

I get the following output/error on certain filenames which contain special characters:

INFO[0000] Starting sync
ERRO[0027] Sync error: pipeline step: 2 (UploadObj) failed with error: object: B�IBHIN GALLAGHER_Certificate_of_Achievement_653192.pdf sync error: InvalidURI: Couldn't parse the specified URI.

Mismatch sha256 checksums

checksums.txt

c6af6cf9d38b1b1ce478ade6b4ea13bb1a6f7663e4a4acb7ed287521ea856b62  s3sync_1.14_Linux_x86_64.tar.gz

Actual

3b6fc2028088844664a2c91c9f79569ec6d765fc4658828a9bdcb33e93d2bac7 s3sync_1.14_Linux_x86_64.tar.gz

	cred := credentials.NewChainCredentials(
	[]credentials.Provider{
	&credentials.EnvProvider{},
	&credentials.SharedCredentialsProvider{},
	})

larrabee / s3sync Goto Github PK

s3sync's People

Stargazers

Watchers

Forkers

s3sync's Issues

Recommend Projects

Recommend Topics

Recommend Org