Coder Social home page Coder Social logo

knox-mpu's Introduction

knox-mpu

A Node.js client designed to make large file uploads to Amazon S3 via the MultiPartUpload API simple and easy. It's built on top of the excellent Knox library from the guys over at LearnBoost.

Features

  • Simple and easy to use
  • Pipe either a file, or a stream directly to S3 (No need to know the content length first!)
  • Automatically separates a file/stream into appropriate sized segments for upload
  • Asynchronous uploading of segments
  • Handy events to track your upload progress

Planned

  • Better error handling (reuploading failed parts, etc)

Installing

Installation is done via NPM, by running npm install knox-mpu-alt

Examples

Uploading a stream

To upload a stream, simply pass the stream when constructing the MultiPartUpload. The upload will then listen to the stream, and create parts from incoming data stream. When a part reaches the minimum part size, it will attempt to upload it to S3.

// Create a Knox client first
var client = knox.createClient({ ... }),
    upload = null;


upload = new MultiPartUpload(
            {
                client: client,
                objectName: 'destination.txt', // Amazon S3 object name
                stream: stream
            },
            // Callback handler
            function(err, body) {
                // If successful, will return body, containing Location, Bucket, Key, ETag and size of the object
                /*
                  {
                      Location: 'http://Example-Bucket.s3.amazonaws.com/destination.txt',
                      Bucket: 'Example-Bucket',
                      Key: 'destination.txt',
                      ETag: '"3858f62230ac3c915f300c664312c11f-9"',
                      size: 7242880
                  }
                */
            }
        );

Uploading a file

To upload a file, pass the path to the file in the constructor. Knox-mpu will split the file into parts and upload them.

// Create a Knox client first
var client = knox.createClient({ ... }),
    upload = null;


upload = new MultiPartUpload(
            {
                client: client,
                objectName: 'destination.txt', // Amazon S3 object name
                file: ... // path to the file
            },
            // Callback handler
            function(err, body) {
                // If successful, will return body, containing Location, Bucket, Key, ETag and size of the object
                /*
                  {
                      Location: 'http://Example-Bucket.s3.amazonaws.com/destination.txt',
                      Bucket: 'Example-Bucket',
                      Key: 'destination.txt',
                      ETag: '"3858f62230ac3c915f300c664312c11f-9"',
                      size: 7242880
                  }
                */
            }
        );

Options

The following options can be passed to the MultiPartUpload constructor -

  • client Required The knox client to use for this upload request
  • objectName Required The destination object name/path on S3 for this upload
  • stream The stream to upload (required if file is not being supplied)
  • file The path to the file (required if stream is not being supplied)
  • headers Any additional headers to include on the requests
  • partSize The minimum size of the parts to upload (default to 5MB).
  • batchSize The maximum number of concurrent parts that can be uploading at any one time (default is 4)
  • maxUploadSize The maximum size of the file to upload (default inifinity). Useful if there is a stream with unknown length.
  • noDisk If true, parts will be kept in-memory instead of written to temp files (default to false).
  • maxRetries Number of times to retry failed part upload (default is 0 for no retry).

Events

The MultiPartUpload will emit a number of events -

  • initiated Emitted when the multi part upload has been initiated, and received an upload ID. Passes the upload id through as the first argument to the event
  • uploading Emitted each time a part starts uploading. The part id is passed as the first argument.
  • uploaded Emitted each time a part finishes uploading. Passes through an object containing the part id and Amazon ETag for the uploaded part.
  • failed Emitted each time a part upload fails. Passes an object containing the part id and error message
  • completed Emitted when the upload has completed successfully. Contains the object information from Amazon S3 (location, bucket, key and ETag)

knox-mpu's People

Contributors

aarsilv avatar amdstorm avatar anthonyfig avatar chad3814 avatar corbanb avatar dustmason avatar efexen avatar jrnt30 avatar kulkarnih avatar kylegetson avatar lahabana avatar mikermcneil avatar nathanoehlman avatar philleski avatar sgress454 avatar shakkhar avatar strathausen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

knox-mpu's Issues

Invalid upload ID error is not explicit enough

I was getting an error from Knox when creating an upload that said 'Invalid upload ID'. Looking at the code, this occurs when the response from AWS is missing the field UploadId:

if (!body.UploadId) return callback('Invalid upload ID');
.

The error message suggests that this is a user error (i.e. that the user passed in an invalid upload id), when really the error is that the response from AWS is missing an upload id.

I inserted some logging to figure out the cause of the problem, and in my case it was due to an invalid AWS Access Key Id. Here's the body of the response:

{ Code: 'InvalidAccessKeyId',
  Message: 'The AWS Access Key Id you provided does not exist in our records.',
  RequestId: <some string>,
  HostId: <some string>,
  AWSAccessKeyId: <my bad key>}

It would be nice to either see this body when the error occurs or at least have a more explicit error message that suggests the cause of the issue. I'm happy to submit a pull request with a fix if you have an idea of how you'd want to fix it.

Error event wrongly documented

Readme documents the error event as Emitted each time a part upload fails. Passes an object containing the part id and error message.

This event is never emitted; looking at the source, it seems that this event proper name is failed.

TypeError: Object function MultiPartUpload

Hi, when i use the same s3 config as knox i got this error with knox-mpu : var client = knox.createClient({ TypeError: Object function MultiPartUpload

i've been searching a bit, when i give the bucket name in the createClient(), i should just use the bucket name or the bucket name with s3 endpoint ".eu-west-1" ? neither of these config work.

Thanks!

Make the callback optional

Really cool package!
However, it seems like the constructor's callback is compulsory example:

  var upload = new MultiPartUpload({
                client: s3,
                objectName: objectKey, // Amazon S3 object name
                stream: stream
  });
  upload.on('initiated', function(uploadId) {
    console.error("Job:" + resource.id + " Initiated upload id:", uploadId);
  });

  upload.on('error', function(err) {
    console.error("Job:" + resource.id + " Error when uploading:", err);
  });

  upload.on('completed', function(data) {
    console.log("Job:" + resource.id + " Successfull upload to:"  + data.Location);
  });

Outputs:

ob:1367235434743 Initiated upload id: 7omHblfHVFppTCpFlF90QOFwKB8wErOns0TAYvyCpLv1e2woS1lz2g7h2r0hxmEft2yr.TK.m6yA0T5aHL1Nng--
Job:1367235434743 Successfull upload to:https://bucket.s3.amazonaws.com/1367235434743

./node_modules/knox-mpu/lib/multipartupload.js:246
            return callback(null, body);
                   ^
TypeError: undefined is not a function
    at MultiPartUpload._completeUploads (./node_modules/knox-mpu/lib/multipartupload.js:246:20)

It does make sense to make the callback optional for this type of use case no?

Upload stream to s3 failed

I've used module for storing s3, for some files in my test-suite it works, but with other files not..

I receive {"part":1,"message":"Upload failed with status code 400"} .. what could be a good point to debug?

best, tom

problem with header modifications for public upload

hi - I'm successfully using mpu, and have code like this:

upload = new mpu {
          client:       @s3client
          objectName:   fname
          file:         lname
        }, (err, res) -> ### buncha stuff here ###

These uploads always work.

However, as soon I use a header designed to make my file public-readable, I get a vague error. Here's my mod; the only difference is the intro of the header:

public_header = {
      'x-amz-acl':    'public-read'
}
upload = new mpu {
          client:       @s3client
          objectName:   fname
          file:         lname
          headers:      public_header
        }, (err, res) -> ### buncha stuff here ###

I'm getting an err which I print:

Unable to initiate stream upload

This doesn't tell me much. Have I misused mpu, perhaps?

Thanks!

Release 0.1.7

Please, can someone release 2014's code into a 0.1.7 release?
The 0.1.6 release only bundle code until december 2013.

Thanks ;-).

Track Progress

Is there any way to track the progress of the upload? By progress I mean completed percentage (i.e. 34%,,,, 38%,,,, etc)

Thanks,
Anthony

ECONNRESET when uploading a large file.

I'm getting ECONNRESET errors when uploading a 350mb file with knox-mpu.

In particular:

{
    "part": 1,
    "message": {
        "code": "ECONNRESET",
        "errno": "ECONNRESET",
        "syscall": "read"
    }
}

The part specified is different each time but is always between 1 and 4. (I am using the default batchSize of 4.)

Is there any other info that would be helpful in debugging this?

Create tags for release since 0.1.7

Can you please create the appropriate tags for release 0.1.7, 0.1.8, 0.1.9, 0.1.10, 0.1.11 and 0.2.0?

Il will allow us to npm install them ;-).
Thanks.

Keys with unicode characters are not properly uploaded

When attempting to upload a file with unicode characters in the object name, knox-mpu will have an issue getting a properly authorization signature due to the interaction with the Knox client/auth.

Ex.

new MultiPartUpload({
  stream: readStream,
  objectName: 'someBucket/стра%на 1.csv',
  client: knoxClient,
}, function(err, data){
   console.log(err);
});

limit concurrent requests

Need a way to limit the number of concurrent requests. This works great for files under a few gig, but when I tried pushing up a file that was almost 200GB, it created 50 files of 4+G each (the size I specified) and upload them all at once. This obviously didn't work, and consumed the network's bandwidth rather quickly.

World Readable Uploads

Is there an easy way to have the multipart uploads world readable as part of the upload process?

Reduced redundancy?

Is it possible to use knox-mpu to specify the x-amz-storage-​class param listed in the Amazon MPU docs? I would like to upload files using REDUCED_REDUNDANCY.

Throws unhandled exception

Has anyone else experienced an unhandled ENOENT issue? I'm guessing I may be passing an invalid id in somewhere, but the real issue I'm identifying is the need to catch this error and hand it back as the err argument. I'll take a look myself in a few minutes if I can't trace the issue to anything in my code, but if anyone has thoughts on where I should look, I'd very much appreciate it :)

Thanks!
Mike

Copying of all headers for PUT request invalid in some cases

PR #39 has introduced an issue when using the x-amz-acl header. When using this header in conjunction with the mpu upload, the request will fail with a message of "The XML you provided was not well-formed or did not validate against our published schema"

I believe this is due to a limited subset of headers being valid for the PUT object request for the multipart upload.

Emit an `incomplete` event when maxRetries is exceeded

If a part failed to be uploaded, a failed event is emitted.
Then a new upload is launched for this part until maxRetries is reached.

But, if every uploads of a part fail, the complete event is trigger anyway.
This event is misleading, because the full upload is done, but the file is incomplete.
For example, if there is only one part, the complete event will be triggered with a size of 0.

Replacing this event by done and triggering an incomplete event make more sense.

Completes before parts are finished uploading

I'm having trouble tracking down an issue. I'm trying to upload a 7020205724 byte file with each part size set to 20971520 bytes. This results in 335 parts needing to be uploaded.

The issue is that it seems to get lose track of unfinished uploads and pushes the completion xml to S3 before all parts are there. This results in a corrupted file.

I'm logging the completion and finish events to a file for debug purposes and end up with something like this (notice that it thinks there are only 328 parts in the finished upload).

Finished uploading file with ETag "84403d6a7224f82f49a1b3a0731b612d-328"
Finished uploading part 329 with ETag "aaec8a124b4699bd74293ba91bca886d".
Finished uploading part 335 with ETag "92d99d83507e880cdd1d65fd29e3874b".
Finished uploading part 330 with ETag "c3d5efe9d6732a1aaa58235921734977".
Finished uploading part 331 with ETag "7477b87c1b48f5cb8a7fead3148a5b61".
Finished uploading part 333 with ETag "08e759f7e6319bd7a29395cff5094401".
Finished uploading part 334 with ETag "611a4c95827d795a09ebee185a879d10".
Finished uploading part 332 with ETag "3dd6a210a0f5a8fc70d9b13d5656c2bb".

Any thoughts? I've been looking through the code and can't quite pinpoint what is happening.

Stream upload to S3 fails quoting unsupported HTTP version.

I'm trying to stream a file to an S3 instance. I get the generic error "Unable to initiate stream upload [Invalid upload ID". But the response from S3 indicates unsupported HTTP version.

<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>HttpVersionNotSupported</Code>
<Message>The HTTP version specified is not supported.</Message>
<RequestId>788D2319F7FE4EAF</RequestId>
</Error>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.