larrabee / s3sync Goto Github PK
View Code? Open in Web Editor NEWReally fast sync tool for S3
License: GNU General Public License v3.0
Really fast sync tool for S3
License: GNU General Public License v3.0
Using s3sync to copy items to a new ceph cluster. Items being copied over are empty. I can get them and download them locally, but they are empty.
I have tried using wasabi and another ceph cluster as source.
Thoughts?
@larrabee, would it be possible to add a point in time option to your utility?
we have searched high and low for a good s3 sync. But none do what we need.
It seems that this tool is missing equivalent of s3cmd's --no-check-certificate
flag and it would be nice to have.
I'm authenticating with AWS via Okta (saml2) - works fine with the stock aws client but had to tinker on a fork to get this working for s3sync - only issue is right now s3sync always provides a blank third argument for the AWS credentials:
JonPeel@574204e?diff=unified#diff-44bbcc9d983da65f32aa64529eb190e2R45
Commit history here not clean / buggy - just using it to highlight where I needed changes. Could clean up for a PR - would you want a command line argument (which is the route I went) or some other means or specifying?
Jon
I'm syncing a folder that has about 500k objects. A few of these objects are about 1GB each. As soon as s3sync gets to them, it bombs with OOM. The sync ran with 30 workers on an instance with 4GB RAM and no swap.
It would help if the sync streamed file contents instead of reading them wholesale into memory, at least for big files.
thanks.
EDIT for clarity:
I'm able to use the aws CLI and keys like this:
aws --endpoint-url https://s3.amazonaws.com s3 sync s3://bucket/key s3://bucket/other_key
But when I try something similar here
s3sync --tk KEY --ts SECRET --tr us-west-1 --sk KEY --ss SECRET --sr us-west-1 --se https://us-west-1.s3.amazonaws.com --te https://us-west-1.s3.amazonaws.com s3://BUCKET/key/ s3://BUCKET/OTHER_KEY/ -d
I get an error about The authorization header is malformed; the region 'us-west-1' is wrong; expecting 'us-east-1'
Synced: 0; Skipped: 0; Failed: 0; Total processed: 0onHeaderMalformed: The authorization header is malformed; the region 'us-west-1' is wrong; expecting 'us-east-1'
I'm 1000% sure my bucket is in us-west-1. The command works with aws cli. The keys work. The S3 bucket is in us-west-1. Any idea what the issue might be?
/nfs/DS12BayD/Imgsrv/@juno/1509/150924XX/1129US00/1129US07/CMP/1129US07_Tag.csv
Not able to sync anything with any special character such as file above.
Syncing from S3 to Ceph using the following command. The result does not print any meaningful debug or sync log. Note that the DEBUG output only shows up after letting the command run for a little before SIGINT. This has been tried without progress bar as well. Tested on both Linux and Mac with identical results:
$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.15.5
BuildVersion: 19F101
$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.1 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.1"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.1 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.1:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.1
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.1"
s3sync --sync-log --debug -p --filter-modified --tk REDACTED --ts REDACTED --sk REDACTED --ss REDACTED --te 'http://ceph.internal:8080' -w 12 s3://folder/daydate=2020-06-13/ s3://folder/daydate=2020-06-13/```
INFO[0000] Starting sync
WARN[0017] Receive signal: interrupt, terminating
DEBU[0019] Pipeline step: ListSource finished
DEBU[0022] Pipeline step: FilterObjectsModified finished
DEBU[0022] Pipeline step: LoadObjData finished
DEBU[0022] Pipeline step: UploadObj finished
DEBU[0022] Pipeline step: Logger finished
DEBU[0022] Pipeline step: Terminator finished
DEBU[0022] All pipeline steps finished
DEBU[0022] Pipeline terminated
INFO[0022] 0 ListSource: Input: 0; Output: 7000 (308 obj/sec); Errors: 0
INFO[0022] 1 FilterObjectsModified: Input: 7000; Output: 0 (0 obj/sec); Errors: 0
INFO[0022] 2 LoadObjData: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0022] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0022] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0022] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0022] Duration: 22.756570531s
WARN[0022] Sync Aborted
The Ceph version is Mimic, which does not support ListObjectsV2 (has been a constant source of frustration). There may be other APIs that are not fully/properly supported that might be causing issues.
s3sync works with few s3 files and for few, it fails saying NoSuchKey: The specified key does not exist, anything to fix form s3 permission for am I missing any option with s3sync?
error log:
ERRO[0000] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: content/directpath/health-check/abd/details/js.properties sync error: NoSuchKey: The specified key does not exist.
status code: 404, request id: xxxxxxxxxxxxxx, host id: v5dwefwefwefmkwmfkwa98y79790978ZKpjUNCAkKhE+8697fqbdqtdgbxbx0=, terminating
Built the program using "go build"
Requirement was to sync around 7 GB of data from one s3 bucket to other.
Started sync using below command
./s3sync --sk "<source_key>" --ss "<source_secret>" --st "<source_token>" --tk "<taregt_key>" --ts "<target_secret>" --tt "<target_token>" -w 128 s3://<source_bucket_name>/ s3://<target_bucket_name>/
the sync was running and after a while it stopped with error message.
"tcp connection closed"
Hey,
I have experienced a memory leak. this is my scenario.
I am syncing 20TB of data from one bucket to another.
The server running the command has 36 CORES 72gb RAM. I am running the following parameters
./s3sync --debug --sync-log --sync-progress --sk $SK --ss $SS --sr $SR --tk $TK --ts $TS --tr $TR -w 768 s3://SOURCEBUCKET/FOLDERACCESS/ s3://TARGETBUCKET
This command runs for a few minutes then after a while the RAM grows exponentially and breaks the command. I would use supervisor to re spawn the task but i would want to know at what point does the binary do garbage collection.
I just played around with the s3sync and noticed that s3sync doesn't run standalone on a server without the aws cli installed on the machine. It looks like it ignores the --sk and --ss options.
When running s3sync without aws cli installed this results in a error:
DEBU[0002] Putting obj <some_file> failed with err: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
How to reproduce:
./s3sync --sk --ss --sr us-west-2 --tr eu-central-1 -w 128 s3://source-bucket s3://target-bucket -d -r 10
It would be great if there were an option to not sync files that exist at the destination and are newer than the source file. I think this is even the default behavior of aws s3 sync
. This would really help with incremental syncing. --filter-modified
seems to actually do double work, instead of saving work in the case of incremental syncing.
Hello!
Very nice project, but I wonder, how can I specify storage class?
In other words, i have something like this for s3cmd:
s3cmd --storage-class=COLD --delete-removed --acl-private --guess-mime-type --skip-existing sync /backup/ s3://backup/
Is it possible to reproduce the same action with s3sync?
Unless I'm missing something, it appears that the implementation for the AWS keychain is not implemented correctly. For our purposes, it is essential that we grant rights using an EC2 Instance Role. This is correctly configured as I can use aws s3 ls
to show my buckets / objects without ~/.aws
existing at all.
Unfortunately, this doesn't work with s3sync
. Instead we get the following error:
INFO[0000] Starting sync
DEBU[0000] S3 listing failed with error: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Synced: 0; Skipped: 0; Failed: 0; Total processed: 0
Avg syncing speed: 0 obj/sec; Avg listing speed: 0 obj/sec
FATA[0003] Listing objects failed: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
The official AWS go SDK documentation supports the fact that this should work out of the box, as long as the application implements the keychain correctly: https://github.com/aws/aws-sdk-go#configuring-credentials
It appears that s3sync
does not specify EC2 Instance Roles as a credential provider:
Lines 39 to 43 in 418ed11
It looks like this should include:
Example of ChainProvider to be used with an EnvProvider and EC2RoleProvider. In this example EnvProvider will first check if any credentials are available via the environment variables. If there are none ChainProvider will check the next Provider in the list, EC2RoleProvider in this case. If EC2RoleProvider does not return any credentials ChainProvider will return the error ErrNoValidProvidersFoundInChain
creds := credentials.NewChainCredentials(
[]credentials.Provider{
&credentials.EnvProvider{},
&ec2rolecreds.EC2RoleProvider{
Client: ec2metadata.New(sess),
},
})
// Usage of ChainCredentials with aws.Config
svc := ec2.New(session.Must(session.NewSession(&aws.Config{
Credentials: creds,
})))
from: https://docs.aws.amazon.com/sdk-for-go/api/aws/credentials/#ChainProvider
Hi,
Thanks for sharing this tool. It looks great!
One of my collegue used s3sync. What he observed that it is very fast! But it takes equal amount of time to sync if the copy process is somehow canceled. For example it takes 1/2 hour to copy 900.000 out of 1million objects. When copy is canceled, s3sync takes another 1/2 hour to compare 900.000 objects and copies 100.000 left.
Would you mind if I ask whether we are missing something here?
Thanks for your time.
I am using the version 2.25, when i run the sync with these parameters
./s3sync -d --sync-log --sk **** --ss *** --se hostaddr -w 20 s3://test fs://./data/
it is working some time but after some downloads getting this error.
DEBU[0050] Recv pipeline err: object: )gFE5OB2/11.xo1dBK1OamWlMJsb.rnd sync error: unexpected EOF ERRO[0050] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: )gFE5OB2/11.xo1dBK1OamWlMJsb.rnd sync error: unexpected EOF, terminating
Same for version 2.30
INFO[0901] Sync file Content-Type=application/octet-stream key=9zabR6AR/17.yXJzkzaeeS9gDKpn.rnd size=10485760 DEBU[0903] Recv pipeline err: object: 9Agq8lXt/82.Lit0v8qT2Ptof0H).rnd sync error: unexpected EOF ERRO[0903] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: 9Agq8lXt/82.Lit0v8qT2Ptof0H).rnd sync error: unexpected EOF, terminating
and also that's not good to not have any details about the error, (debug log enabled, retry enabled, sync-log enabled )
Thanks in advance
INFO[0000] Starting sync
ERRO[0500] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: bucket_name/xyz123.tif sync error: read tcp 172.31.31.193:40284->52.216.139.13:443: read: connection reset by peer, terminating
INFO[0500] 0 ListSource: Input: 0; Output: 5000 (10 obj/sec); Errors: 0
INFO[0500] 1 LoadObjData: Input: 5000; Output: 3692 (7 obj/sec); Errors: 1
INFO[0500] 2 ACLUpdater: Input: 3692; Output: 3692 (7 obj/sec); Errors: 0
INFO[0500] 3 UploadObj: Input: 3692; Output: 3657 (7 obj/sec); Errors: 0
INFO[0500] 4 Terminator: Input: 3657; Output: 0 (0 obj/sec); Errors: 0
INFO[0500] Duration: 8m20.697078919s
ERRO[0500] Sync Failed
Latest version:
$ ./s3sync --help
Really fast sync tool for S3
Version: 1.14, commit: 418ed11, built at: 2019-04-18T14:46:19Z
Command line used:
./s3sync -sk xxx -ss yyy -sr us-west-1 -tk aaa -ts bbb -tr us-west-2 -w 256 -d s3://foo/ s3://bar/
Tool does nothing:
INFO[0000] Starting sync
Synced: 0; Skipped: 0; Failed: 0; Total processed: 0
Avg syncing speed: 0 obj/sec; Avg listing speed: 0 obj/sec
INFO[0001] Sync finished successfully
INFO[0001] Synced: 0; Skipped: 0; Failed: 0; Total processed: 0
INFO[0001] Avg syncing speed: 0 obj/sec; Avg listing speed: 0 obj/sec; Duration: 1 sec
Hey, am trying to configure an AWS sync between 2 buckets in different region and this tool only vaidates the source but ignores the target.
am running the command in a bash script.
This is what i run
export TK=TARGETKEY
export TS=TARGETSECRET
export TR=TARGETREGION
export SK=SOURCEKEY
export SS=SOURCESECRET
export SR=SOURCEREGION
./s3sync_bck1 --debug --sync-log --sync-progress --sk $SK --ss $SS --sr $SR --tk $TK --ts $TS --tr $TR -w 128 s3://SOURCEBUCKET s3://TARGETBUCKET
when i run this i get
INFO[0000] Starting sync
DEBU[0001] Listing bucket finished
DEBU[0001] Pipeline step: ListSource finished
DEBU[0001] Pipeline step: LoadObjData finished
DEBU[0001] Pipeline step: UploadObj finished
DEBU[0001] Pipeline step: ACLUpdater finished
DEBU[0001] Pipeline step: Logger finished
DEBU[0001] Pipeline step: Terminator finished
DEBU[0001] All pipeline steps finished
DEBU[0001] Pipeline terminated
INFO[0001] 0 ListSource: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 1 LoadObjData: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 2 ACLUpdater: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] Duration: 1.870510625s
INFO[0001] Sync Done
My Bucket have not been synced and if i change the keys for my destination , they have no effect. A correct key and incorrect key gives the same result.
Why is the s3sync not working bearing in mind this is a straight forward setup. Is there anything that am missing that i should know about.
I'm trying to get the s3sync to upload non existing files at a reasonable speed.
s3sync --tk "$AWS_ACCESS_KEY_ID" --ts "$AWS_SECRET_ACCESS_KEY" --tt "$AWS_SESSION_TOKEN" --tr "us-west-2" --filter-not-exist -w 128 -p /source/Small/ s3://bucket/Small/
Adding the --filter-not-exist
option will cause the images processed to drop from ~150 - 200 objects/sec to just 5 objects/sec. I get that you now need to do an object exists check on the s3. Increasing the -w does seem to help. Is there any way to speed thing up?
Hi,
s3sync
is super impressive to me regarding its speed and active maintenance, thanks for open-sourcing this tool.
Line 138 in 44119b3
According to this line, I'm thinking of the following use case:
When syncing starts, the target 100MB file on FS is truncated to 0 bytes at the beginning. And then, is there any chance the following cases happen after syncing?
io.Copy()
fails for some reason, the target file remains at 0 bytes.io.Copy()
finishes its job, then he only gets partial content unexpectedly, e.g. 50MB of 101MB.Thanks.
Thanks for creating this great tool!
I attempted to use s3sync with a setup that requires an assumeRole to access the target bucket but apparently this is not supported (yet)?
More particular the setup looks like this:
Account 12345: source-bucket
Account 67890: destination-bucket
{
"Sid": "DelegateS3Access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::DESTINATION_ACCOUNT_ID:root"
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::destination-bucket/*",
"arn:aws:s3:::destination-bucket"
]
}
to allow the destination bucket to list and pull the files.
export AWS_PROFILE=my.account.test
aws s3 sync s3://source-bucket s3://destination-bucket
and it was syncing the objects quite happily ... just not as fast as I'd like to ;-)
My questons or potential feature request now is to be able to use s3sync in this setup.
Hi,
I have a case where i cannot access the PARENT BUCKET but i can list the CHILD OBJECTS.
ie
[root@localhost ~]# s3cmd ls s3:///DIRECTORY
ERROR: Access to bucket '' was denied
ERROR: S3 error: 403 (AccessDenied): Access Denied
if i put a trailing slash, i have access to the CHILDREN OBJECTS in the bucket
[root@localhost ~]# s3cmd ls s3:///DIRECTORY/
DIR s3:///DIRECTORY/CONTENT1/
DIR s3:///DIRECTORY/CONTENT2/
DIR s3:///DIRECTORY/CONTENT3/
DIR s3:///DIRECTORY/CONTENT4/
DIR s3:///DIRECTORY/CONTENT5/
DIR s3:///DIRECTORY/CONTENT5/
Using this case scenario, when i try to run my s3sync, i get this error since s3:///DIRECTORY/ is my source bucket. The trailing slash is very important.
This is what i get when i run s3sync
DEBU[0001] S3 listing failed with error: AccessDenied: Access Denied
status code: 403, request id: 91DF1DF0618DA652, host id: B646GOQYS6s9ISVvTvUJqYjWpBmg91j+vrXGvI9nNzDMZEjdsYT7CCsKV1f9nToaLRTSn+UQR54=
DEBU[0001] Pipeline step: ListSource finished
DEBU[0001] Recv pipeline err: AccessDenied: Access Denied
status code: 403, request id: 91DF1DF0618DA652, host id: B646GOQYS6s9ISVvTvUJqYjWpBmg91j+vrXGvI9nNzDMZEjdsYT7CCsKV1f9nToaLRTSn+UQR54=
ERRO[0001] Sync error: pipeline step: 0 (ListSource) failed with error: AccessDenied: Access Denied
status code: 403, request id: 91DF1DF0618DA652, host id: B646GOQYS6s9ISVvTvUJqYjWpBmg91j+vrXGvI9nNzDMZEjdsYT7CCsKV1f9nToaLRTSn+UQR54=, terminating
DEBU[0001] Pipeline step: LoadObjData finished
DEBU[0001] Pipeline step: ACLUpdater finished
DEBU[0001] Pipeline step: UploadObj finished
DEBU[0001] Pipeline step: Logger finished
DEBU[0001] Pipeline step: Terminator finished
DEBU[0001] All pipeline steps finished
DEBU[0001] Pipeline terminated
INFO[0001] 0 ListSource: Input: 0; Output: 0 (0 obj/sec); Errors: 1
INFO[0001] 1 LoadObjData: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 2 ACLUpdater: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] Duration: 1.125876588s
ERRO[0001] Sync Failed
Once i start a sync i can't kill it with ctrl+c
i instead have to use kill -9
to stop it.
root@raspberrypi1:~# s3sync --version
VersionId: 2.30, commit: e73ac455ec3101f6f0e45453559adb24ce8ab2da, built at: 2021-04-20T14:58:16Z
root@raspberrypi1:~# s3sync \
--w 30 \
--tk ****************** \
--ts ****************** \
--tr ****************** \
--te ****************** \
fs:///tmp/ \
s3://test \
--sync-log
INFO[0000] Starting sync
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
INFO[0000] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/PI size=25550
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
^CWARN[0000] Receive signal: interrupt, terminating
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/TRMB size=49357
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/CMBM size=31311
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/PLAB size=27209
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/CDK size=33491
^CWARN[0001] Receive signal: interrupt, terminating
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/MKSI size=30689
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/HCAT size=30486
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/SSYS size=48670
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/OSPN size=35094
INFO[0001] Sync file Content-Type= key=TIME_SERIES_INTRADAY_V2/1min/2021-02-04/NVEC size=3488
^Z
Hi, I'm facing the issue below with newest release verion
time="2019-04-02T08:31:45Z" level=info msg="Starting sync\n"
time="2019-04-02T08:31:45Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:46Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:47Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:48Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:49Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:50Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:51Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:53Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:54Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:55Z" level=debug msg="S3 listing failed with error: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008"
time="2019-04-02T08:31:56Z" level=fatal msg="Listing objects failed: SerializationError: failed to decode REST XML response\ncaused by: XML syntax error on line 2: illegal character code U+0008\n"
here is my command :
./s3sync --tr ap-southeast-1 --tk xxxxxxxxx --ts xxxxxxxxx \
--sr ap-southeast-1 --sk xxxxxxxxx --ss xxxxxxxxx \
-w 128 s3://old-bucket s3://new-bucket-d -r 10
is there something I did wrong ?
It would be cool to have an optional flag that compares the md5 checksum between locations before syncing, this would reduce a lot wasted time re-syncing a directory.
It'd be amazing if this could made into a go package so that it can be imported into another project!
Hi All,
Its a wonderful project to sync the S3 data from one location to Another Location.
Just I have few Questions.
Is it possible to Get how much bytes of Data were transferred ?
Is it possible to get at what speed data were transferred?
Thanks,
Kannan V
Can I also sync Content-Type when I sync S3 to S3 like aws s3 sync
?
Not all the data I'm syncing is mine; I can view most but occasional hit a permission denied error. The default for godirwalk is to stop the walk it seems. Can we add an error handler? Might need a bit more thinking about than the below quick change
Jon
I'm copying files from one bucket to another and some of my files got broken on service provider side with HTTP 500 error result instead of file.
I want to know file names that s3sync failed to copy but i can't get it even with debug and logs turned on by command:
s3sync --debug --sync-log --sync-progress s3://from/broken_file_name s3://to
I get such results in console:
INFO[0000] Starting sync
DEBU[0000] Listing bucket finished
DEBU[0000] Pipeline step: ListSource finished
DEBU[0001] Pipeline step: LoadObjData finished
DEBU[0001] Pipeline step: ACLUpdater finished
DEBU[0001] Recv pipeline err: InternalError:
status code: 500, request id: , host id:
ERRO[0001] Sync error: pipeline step: 1 (LoadObjData) failed with error: InternalError:
status code: 500, request id: , host id: , terminating
INFO[0001] 0 ListSource: Input: 0; Output: 1 (1 obj/sec); Errors: 0
INFO[0001] 1 LoadObjData: Input: 1; Output: 0 (0 obj/sec); Errors: 1
INFO[0001] 2 ACLUpdater: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 3 UploadObj: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 4 Logger: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] 5 Terminator: Input: 0; Output: 0 (0 obj/sec); Errors: 0
INFO[0001] Duration: 1.21028043s
Is it possible to improve error messages?
Error syncing two bucket in different account
ERRO[0000] Sync error: pipeline step: 0 (ListSource) failed with error: SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.
I try this command in aws China region but got certificate issue:
docker run --rm -ti larrabee/s3sync --tk xxxxxx --ts xxxxxxxxxxxxxx --sr cn-northwest-1 --tr cn-north-1 --sk xxxxx --ss xxxxxxx -w 128 s3://s3-service-dev/folder/ s3://s3-dev-rawdata/csv/cpoc/s3-service-dev/
ERRO[0000] Sync error: pipeline step: 0 (ListSource) failed with error: RequestError: send request failed
caused by: Get https://s3.cn-northwest-1.amazonaws.com.cn/s3-service-dev?encoding-type=url&max-keys=1000&prefix=s3-service-dev%2F**:x509: certificate signed by unknown authority, terminating**
Is this good tools support China region please ?
For each of the packages that you have created, i.e., pipeline
, collection
and storage
. Would it be possible for you to improve the documentation as to how each of these could be used? Having comments compatible with godocs would be really nice! I'm looking to build my own cli around the packages you created, very similar to what you have done in ./cli
except that I'd like to add the ability to push metrics. At this point I'm looking at re-using a lot of your cli code to get this accomplished 😄
The awscli s3 command lets a user put / get objects with spaces in the name. This is unfortunate, since it allows badly named files to be put into S3 simply by quoting the name.
For example:
aws s3 ls 's3://some-bucket/prefix/this file has spaces in the name(1).pdf'
s3sync
does not handle this situation well, and exits with errors:
DEBU[0002] Putting obj prefix/this file has spaces in the name(1).pdf failed with err: mkdir /path/to/prefix/this: not a directory
Since we have a collection of arbitrarily named files, and come from sources outside our control, we can't guarantee that files we wish to copy do not contain spaces or other problematic characters in the names. I haven't had an opportunity to test it, but I would suspect that this problem would happen when putting / getting from local or s3 alike.
level=error msg="Sync error: pipeline step: 3 (UploadObj) failed with error: object: aaa.xml sync error: AccessDenied: Access Denied\n\tstatus code: 403, request id: 57064C841C470CAA, host id: iOGzu8d04mtG5r8U4+IACYtpkFcywDBn0BmRB/mP06mXhunPYsqWux3j9tO0GXD5AFBV2G2PxOg=,
the error occur in my docker container running s3sync binary from command line. I can complete the copy operation by aws s3 sync on the same objects, but s3sync fails. It seems all the permissions are in place...
It seems that the current implementation doesn't do any metadata checking between the source and destination objects? Objects that haven't changed at the source
shouldn't have to be transferred to the destination, especially when dealing with large data files. 🤔
I would love to be able to destroy files that do not exist in the source. I have a need to sync a large number of files but also clean up old files. So far this is the fastest tool I have found I just wish there was a destructive option.
I am thinking about learning some go so I can implement it myself but that would take a long time.
Hello!
First, thank you so much for providing s3sync! It fills a particular but significant niche for those of us that can't make use of the awscli "copy" command for whatever reason - either because the selection criteria are too coarse or, as in our case, because we are moving resources between s3 and other technologies, such as ceph.
One request for your releases, can you provide an arm64 release for macOS as well?
Thanks again!
I tried to build it on windows.
the EXE builds fine, but when trying sync a local dir (with sub dirs), the FS path is not able to locate any files on the local file system.
s3sync --tk zz--ts yyyy-w 128 fs://e:/some/path/ s3://bucket-in-s3/path/
the error i get is
sync error: open e:\\some\\path\\e:\\some\\path\\folder\\somefile.pdf: The filename, directory name, or volume label syntax is incorrect., terminating
see how the path is appended twice...
What am I doing wrong?
hi,
i'm getting strange errors while trying to sync files from a very large bucket (~300Gb) :
during the sync process , it seems that very often s3sync creates a phantom file into the local fs with the same path and name of a bucket subdir (?) : this file is not present in the bucket.
this behaviour makes s3sync unable to copy the files into the destination's directory because there is already a file named as the dir itself....
below are the recursive logs that i'm getting for all the files in the /lots_img/ subfolder .
WARN[7141] Failed to sync object: /lots_img/7/501/810.jpg, error: mkdir /home/media/lots_img/7: not a directory, skipping
DEBU[7141] Recv pipeline err: object: /lots_img/7/503/146.jpg sync error: mkdir /home/media/lots_img/7: not a directory
...
this is the command i used to sync from the s3 bucket :
./s3sync -p -d -f skip --sk <key> --ss <secret> --sr <region> -w 128 s3://<bucket> fs:///<folder>
thank you in advance,
Ken
Is possible to use this tool with a s3 compatible storage such as seeweb.it or only with Aws s3?
To get s3sync to work/compile on OpenBSD, you have to replace syscall.ENODATA with syscall.ENOENT in two places in storage/storage-fs.go.
Impressed of speed!
s3cmd in python took me ~45m for 1.2GB and s3sync with 2.4GB made in ~6m (1 cpu & 16 workers)!!
Hi, just attempting to use this tool, as it's the only one I've found with an option like "--filter-after-mtime", but not having much success:
s3sync --sk XXXXXXXXXXXXXXXXXXXXXX -ss YYYYYYYYYYYYY --sr eu-west-1 -w 4 s3://my-bucket/directory/ fs:///tmp/s3sync/
INFO[0000] Starting sync
ERRO[0000] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: sync error: NoSuchKey: The specified key does not exist.
status code: 404, request id: 46C3367B66F3E7F0, host id: 1buVDVaLEkZYJ79Kfz84J5zlRQZdgCA0bhgxQh7luygIWGne/YfQfLMZ0flJzfdrpYcgaP+AcL8=, terminating
INFO[0000] 0 ListSource: Input: 0; Output: 1000 (2109 obj/sec); Errors: 0
INFO[0000] 1 LoadObjData: Input: 1000; Output: 1 (2 obj/sec); Errors: 1
INFO[0000] 2 ACLUpdater: Input: 1; Output: 1 (2 obj/sec); Errors: 0
INFO[0000] 3 UploadObj: Input: 1; Output: 1 (2 obj/sec); Errors: 0
INFO[0000] 4 Terminator: Input: 1; Output: 0 (0 obj/sec); Errors: 0
INFO[0000] Duration: 474.239263ms
ERRO[0000] Sync Failed
One object from the source "directory" is copied successfully, but no more.
Hey there,
I'm trying to migrate a cronjob from my OSX laptop to my Pi, but am running into the classic cannot execute binary file: Exec format error
from which the Pi suffers from time to time.
I tried to build from source (and tried to use docker), but am getting errors with my Go installation. Those errors aren't tied to s3sync, but I can't compile Golang on my rbpi - with all signs pointing to a reinstall of Ubuntu, which I'd strongly like to avoid.
My Raspberry Pi is the 3b+ with aarch64.
Thanks!
I get the following output/error on certain filenames which contain special characters:
INFO[0000] Starting sync
ERRO[0027] Sync error: pipeline step: 2 (UploadObj) failed with error: object: B�IBHIN GALLAGHER_Certificate_of_Achievement_653192.pdf sync error: InvalidURI: Couldn't parse the specified URI.
checksums.txt
c6af6cf9d38b1b1ce478ade6b4ea13bb1a6f7663e4a4acb7ed287521ea856b62 s3sync_1.14_Linux_x86_64.tar.gz
Actual
3b6fc2028088844664a2c91c9f79569ec6d765fc4658828a9bdcb33e93d2bac7 s3sync_1.14_Linux_x86_64.tar.gz
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.