Coder Social home page Coder Social logo

dicer's Introduction

Description

A very fast streaming multipart parser for node.js.

Benchmarks can be found here.

Requirements

Install

npm install dicer

Examples

  • Parse an HTTP form upload
const { inspect } = require('util');
const http = require('http');

const Dicer = require('dicer');

// Quick and dirty way to parse multipart boundary
const RE_BOUNDARY =
  /^multipart\/.+?(?:; boundary=(?:(?:"(.+)")|(?:([^\s]+))))$/i;
const HTML = Buffer.from(`
  <html><head></head><body>
    <form method="POST" enctype="multipart/form-data">
      <input type="text" name="textfield"><br />
      <input type="file" name="filefield"><br />
      <input type="submit">
    </form>
  </body></html>
`);
const PORT = 8080;

http.createServer((req, res) => {
  let m;
  if (req.method === 'POST'
      && req.headers['content-type']
      && (m = RE_BOUNDARY.exec(req.headers['content-type']))) {
    const d = new Dicer({ boundary: m[1] || m[2] });

    d.on('part', (p) => {
      console.log('New part!');
      p.on('header', (header) => {
        for (const h in header) {
          console.log(
            `Part header: k: ${inspect(h)}, v: ${inspect(header[h])}`
          );
        }
      });
      p.on('data', (data) => {
        console.log(`Part data: ${inspect(data.toString())}`);
      });
      p.on('end', () => {
        console.log('End of part\n');
      });
    });
    d.on('finish', () => {
      console.log('End of parts');
      res.writeHead(200);
      res.end('Form submission successful!');
    });
    req.pipe(d);
  } else if (req.method === 'GET' && req.url === '/') {
    res.writeHead(200);
    res.end(HTML);
  } else {
    res.writeHead(404);
    res.end();
  }
}).listen(PORT, () => {
  console.log(`Listening for requests on port ${PORT}`);
});

API

Dicer is a Writable stream

Dicer (special) events

  • finish() - Emitted when all parts have been parsed and the Dicer instance has been ended.

  • part(< PartStream >stream) - Emitted when a new part has been found.

  • preamble(< PartStream >stream) - Emitted for preamble if you should happen to need it (can usually be ignored).

  • trailer(< Buffer >data) - Emitted when trailing data was found after the terminating boundary (as with the preamble, this can usually be ignored too).

Dicer methods

  • (constructor)(< object >config) - Creates and returns a new Dicer instance with the following valid config settings:

    • boundary - string - This is the boundary used to detect the beginning of a new part.

    • headerFirst - boolean - If true, preamble header parsing will be performed first.

    • maxHeaderPairs - integer - The maximum number of header key=>value pairs to parse Default: 2000 (same as node's http).

  • setBoundary(< string >boundary) - (void) - Sets the boundary to use for parsing and performs some initialization needed for parsing. You should only need to use this if you set headerFirst to true in the constructor and are parsing the boundary from the preamble header.

PartStream is a Readable stream

PartStream (special) events

  • header(< object >header) - An object containing the header for this particular part. Each property value is an array of one or more string values.

dicer's People

Contributors

briangreenery avatar ehennum avatar mscdex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dicer's Issues

Is it still maintained?

Thank you for this package!

The last relevant commit is 2021, not sure it's still active.

Constructing multipart requests

This library supports parsing multipart requests very well, but I don't see any routines for constructing such requests. What do you think of adding this function? Or did you intend for this to go purely in one direction?

Last part gets discarded when terminating boundary is missing

TL;DR> The last part of a stream is discarded when a terminating boundary is missing. In the case when a Content-Length is known for the part (e.g. Motion JPEG streams over HTTP), we propose sending on the last part when the length of the part matches the Content-Length in the last boundary.

We are using dicer to transform a http multipart Motion JPEG stream into individual JPEG frames. During the creation of the testsuite for our 'MjpegReader' we created a sample. This will give our CI bot something to work on without requiring access to a camera. We created the sample.mjpeg using curl, e.g.

curl -m 1 http://somecamera/video.cgi > out.mjpg

The -m 1 switch 'cuts' the download after 1 second, which results in a captured mjpeg stream of 1 second and 15 frames.

Feeding this sample to dicer causes the last frame to drop. After investigating, we found that dicer is expecting an ending boundary, e.g.

--myboundary--

as per RFC 1341 (http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html)

We notices that your testsuite has a testcase for this scenario, but that is limited to multipart form data for an uploaded file, which is a use case that is differing from our use case of streaming mjpeg data.

In our use case, we will be faced with connections dropping and therefore the terminating boundaries most likely always missing. We have attempted to listen to emitted events, but we were not able to coax dicer in to giving us that (potentially complete) data.

But since we know what the Content-Length is of the part in question, we (or perhaps even Dicer itself) can acertain that the part is complete and still send it on anyway.

--myboundary
Content-Type: image/jpeg
Content-Length: 68463

<partdata>

Insights, feedback and comments are very welcome.

Error: Part terminated early due to unexpected end of multipart data

Hi,
Getting this error often, But don't know what is the exact scenario.
Any idea how to catch this error?

Error: Part terminated early due to unexpected end of multipart data
    at /srv/storage-server/node_modules/dicer/lib/Dicer.js:65:36
    at _combinedTickCallback (internal/process/next_tick.js:131:7)
    at process._tickDomainCallback (internal/process/next_tick.js:218:9)

I am not able to parse 'multipart/mixed' content like this

I am able to parse this , but why part is not giving header and the content seprately , check the parse content

--END_OF_PART
Content-Length: 337
Content-Type: application/http
content-id: 1
content-transfer-encoding: binary


POST https://www.googleapis.com/drive/v3/files/<var class="apiparam">fileId</var>/permissions?fields=id
Authorization: Bearer <var class="apiparam">authorization_token</var>
Content-Length: 70
Content-Type: application/json; charset=UTF-8


{
  "emailAddress":"[email protected]",
  "role":"writer",
  "type":"user"
}
--END_OF_PART
Content-Length: 353
Content-Type: application/http
content-id: 2
content-transfer-encoding: binary


POST https://www.googleapis.com/drive/v3/files/<var class="apiparam">fileId</var>/permissions?fields=id&sendNotificationEmail=false
Authorization: Bearer <var class="apiparam">authorization_token</var>
Content-Length: 58
Content-Type: application/json; charset=UTF-8


{
   "domain":"appsrocks.com",
   "role":"reader",
   "type":"domain"
}
--END_OF_PART--

The part i am getting is like this

'\r\nPOST https://www.googleapis.com/drive/v3/files/<var class="apiparam">fileId/permissions?fields=id\r\nAuthorization: Bearer <var class="apiparam">authorization_token\r\nContent-Length: 70\r\nContent-Type: application/json; charset=UTF-8\r\n\r\n\r\n{\r\n "emailAddress":"[email protected]",\r\n "role":"writer",\r\n "type":"user"\r\n}'

How can i get the part as header,parms , body separately .

Can you tell me what is the issue dicer have?

Vulnerability Fix

Hi guys,

anyone working on this https://security.snyk.io/vuln/SNYK-JS-DICER-2311764 ?

Affected versions of this package are vulnerable to Denial of Service (DoS). A malicious attacker can send a modified form to server, and crash the nodejs service. An attacker could sent the payload again and again so that the service continuously crashes.

dicer writable never ends if bad multipart

stream.pipe(dicer); never fired dicer 'end' event if broken multipart passed.

I had to write this code to solve the problem

stream.on('data', function (chunk) {
    dicer.write(chunk);
});

dicer.on('end', function () {
    dicer.__ended__ = true;
   //  ok
});

stream.on('end', function () {
    dicer.end();
    if ( dicer.__ended__ ) {
        return;  
    }

    // ERROR
});

I expected SyntaxError or another on 'error' dicer event

Wish: shortcut method for file uploads

Hello,

Thanks for writing a streaming upload parser for Node.

I think a nice addition to the module would be a shortcut method for the common case of processing a file upload field. Something like this perhaps:

    d.onFileHeader('my-file-upload-field-name', function (header) { ... }
    d.onFileData('my-file-upload-field-name', function (data) { ... }

So, if you are looking for a part with a file upload named 'my-file-upload-field-name', you can just declare that, with a bit less syntactic overhead.

Empty part will hang the process

If the multipart data like this:

POST /member.php?mod=register&inajax=1 HTTP/1.1
Host: domainExample
Accept: text/html, application/xhtml+xml, */*
Connection: Keep-Alive
Content-Length: 522
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryzca7IDMnT6QwqBp7
Referer: http://domainExample/member.php?mod=register
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

------WebKitFormBoundaryzca7IDMnT6QwqBp7
Content-Disposition: form-data; name="regsubmit"

yes
------WebKitFormBoundaryzca7IDMnT6QwqBp7
------WebKitFormBoundaryzca7IDMnT6QwqBp7
Content-Disposition: form-data; name="referer"

http://domainExample/./
------WebKitFormBoundaryzca7IDMnT6QwqBp7
Content-Disposition: form-data; name="activationauth"


------WebKitFormBoundaryzca7IDMnT6QwqBp7
Content-Disposition: form-data; name="seccodemodid"


member::register
------WebKitFormBoundaryzca7IDMnT6QwqBp7--

The second part is empty,this will make the self._parts number is less then the number of processing this._part.on('end', function() {} emit. This will cause the process to hang.

Dicer hangs when used with `stream/promises`' `pipeline`

Test case:

const { Readable } = require('stream');
const { pipeline } = require('stream/promises');
const Dicer = require('dicer');

async function main() {
  const r = new Readable({ read() {} });
  const d = new Dicer({ boundary: 'a' });

  d.on('part', async (part) => {
    part.resume();
  });

  r.push(`--a\r\nA: 1\r\nB: 1\r\n\r\n123\r\n--a\r\n\r\n456\r\n--a--\r\n`);
  setImmediate(() => {
    r.push(null);
  });

  const timer = setTimeout(() => {
    throw new Error('Should be canceled');
  }, 2000);

  await pipeline(r, d);

  clearTimeout(timer);
}

main();

Denial Of Service (DoS) Vulnerability

Hi,

veracode finds the following vulnerability in all available versions of the library.

CVE-2022-24434
Denial Of Service (DoS): dicer is vulnerable to denial of service. The vulnerability exists in parseHeader function in HeaderParser.js due to the use of a variable h which allows an attacker to modify and send the form to server and crash the service.

Special 'end' from Readme is actually 'finish'

The README.md suggests that Dicer will emit a special end event "when all parts have been parsed". Upon reading the code, it seems that in reality a finish event is emitted. Is the README.md wrong, or have I misunderstood something?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.