What problem are you trying to solve? ne

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Fast byteLength() about encoding HOT 4 OPEN

jamiebuilds commented on August 12, 2024

Fast byteLength()

from encoding.

Comments (4)

jakearchibald commented on August 12, 2024 1

@jamiebuilds can you give usecase(s) where you only care about the byte length and don't need the encoded data?

from encoding.

jamiebuilds commented on August 12, 2024 1

@jakearchibald I work on an end-to-end encrypted messaging app where we can't inspect the types of payloads being sent between clients on the server, so there are many places where we need to enforce a max byte length on the client to prevent certain types of abuse overloading client apps.

Right now we mostly do encode the data in Node buffers but found that it would be more efficient to catch these things earlier and have the option of dropping payloads that are too large before we start doing anything with that data.

After implementing some of this though, I actually found an even better way of doing this:

function maxLimitCheck(maxByteSize: number) {
	let encoder = new TextEncoder()
  let maxSizeArray = new Uint8Array(maxByteSize + 4)
  return (input: string): boolean => {
    return encoder.encodeInto(input, maxSizeArray).written < maxByteSize
  }
}

let check = maxLimitCheck(5e6) // 5MB

check("a".repeat(5)) // true
check("a".repeat(5e6)) // true
check("a".repeat(5e6 - 1) + "¢") // true
check("a".repeat(5e6 + 1)) // false
check("a".repeat(2 ** 29 - 24)) // false

Testing this out in my benchmark repo with the max size array enforcing a couple different limits:

./benchmarks/blob.js:                        4.8 ops/sec (±0.1, p=0.001, o=0/10)
./benchmarks/buffer.js:                     54.5 ops/sec (±3.0, p=0.001, o=0/10)
./benchmarks/implementation.js:              0.7 ops/sec (±0.0, p=0.001, o=0/10)
./benchmarks/textencoder.js:                11.9 ops/sec (±1.0, p=0.001, o=0/10)

5MB:
6’318.7 ops/sec (±743.3, p=0.001, o=8/100) severe outliers=6

50MB:
551.8 ops/sec (±7.6, p=0.001, o=7/100) severe outliers=4

500MB:
51.5 ops/sec (±4.6, p=0.001, o=6/100) severe outliers=4

I still believe this is a useful function to have, there are more than 10k results for Buffer.byteLength( on GitHub (which looking around mostly seem like strings being passed in, although the API accepts Buffers and other typed arrays too).

Seems like a lot of people are using it for Content-Length headers too

from encoding.

WebReflection commented on August 12, 2024

I am not 100% this is correct ... but ... it's also pretty slow and I start wondering if the slowness doesn't come directly from string internal code-points:

"use strict"
module.exports = (input) => {
  let total = 0;
  for (const c of input) {
    const p = c.codePointAt(0);
    if (p < 0x80) total += 1;
    else if (p < 0x800) total += 2;
    else total += (p & 0xD800) ? 4 : 3;
  }
  return total;
};

Results on my laptop:

./benchmarks/blob.js:           405’174.6 ops/sec (±5’563.9, p=0.001, o=6/100) severe outliers=2
./benchmarks/buffer.js:         45’447’421.7 ops/sec (±659’453.6, p=0.001, o=0/100)
./benchmarks/codepoint.js:      15’096’778.8 ops/sec (±185’463.1, p=0.001, o=0/100)
./benchmarks/implementation.js: 65’565’103.6 ops/sec (±1’127’578.0, p=0.001, o=4/100) severe outliers=2
./benchmarks/textencoder.js:    2’698’465.4 ops/sec (±97’198.0, p=0.001, o=0/100)

from encoding.

WebReflection commented on August 12, 2024

Did some extra test to verify if the buffer creation is the reason for such slowdown and indeed this proves it:

new buffer each time

"use strict"
let input = require("../input")
let encoder = new TextEncoder()
module.exports = () => {
  // size as worst case scenario
  const ui8Array = new Uint8Array(input.length * 4);
  return encoder.encodeInto(input, ui8Array).written;
}

This is still faster than encode(input).byteLength:

./benchmarks/textencoder.js:    3’329’442.3 ops/sec (±174’287.7, p=0.001, o=0/100)

Now, if there is no new buffer creation at all:

"use strict"
let input = require("../input")
let encoder = new TextEncoder()
// size as worst case scenario
const ui8Array = new Uint8Array(input.length * 4);
module.exports = () => {
  return encoder.encodeInto(input, ui8Array).written;
}

The result is better than code points loop:

./benchmarks/textencoder.js:    23’922’510.0 ops/sec (±547’321.7, p=0.001, o=4/100) severe outliers=3

I suppose a method to just count bytes length would make it possible to have performance closer to NodeJS buffer.

from encoding.

Fast byteLength() about encoding HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent