Coder Social home page Coder Social logo

Comments (4)

crowlKats avatar crowlKats commented on August 12, 2024 1

I agree with the byte stream, that definitively would be great. I didn't take that concept into consideration when opening this issue, and as such I guess this issue is redundant.
One reason for LineStream to be Uint8array based was to align with a utility we have that does the equivalent but with Deno's Deno.Reader & Deno.Writer: so to have a 1-to-1 mapping & people effortlessly change over to the WHATWG streams based one; but looking back on it with this issue in mind, I see that might have been a mistake.

from encoding.

annevk avatar annevk commented on August 12, 2024

cc @ricea @MattiasBuelens

from encoding.

MattiasBuelens avatar MattiasBuelens commented on August 12, 2024

I think this is intentional. Readable byte streams also don't allow enqueuing empty chunks, and it's likely that we'll want to make TextEncoderStream.readable a proper byte stream in the future:

const rs = new ReadableStream({
  type: "bytes",
  start(controller) {
    controller.enqueue(new Uint8Array(0)); // throws
  }
});

We also don't want to enqueue empty strings if we're in the middle of a multi-byte character:

const { readable, writable } = new TextDecoderStream("utf-8");
const reader = readable.getReader();
const writer = writable.getWriter();
const readPromise = reader.read();
writer.write(new Uint8Array([0xF0, 0x9F, 0x99]));
// readPromise is still pending
writer.write(new Uint8Array([0x82]))
const { done, value } = await readPromise;
// -> value == "🙂"

I'm a bit surprised by Deno's LineStream design. I would expect a transform stream that splits text by line delimiters to accept strings as input and produce strings as output. Instead, it looks like it uses raw byte chunks as both input and output?

That means that LineStream is making an assumption about the text encoding, right? How exactly is that supposed to deal with multi-byte text encodings like utf-16? For example:

new TextDecoder("utf-16").decode(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
// -> "A\nB"

I would expect you first run these chunks through a TextDecoderStream, and then split by line delimiters:

const readable = new ReadableStream({
  start(controller) {
    controller.enqueue(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
    controller.close();
  }
});

readable
  .pipeThrough(new TextDecoderStream("utf-16"))
  .pipeThrough(new LineStream());
// -> stream with chunks "A" and "B"

Instead, with Deno's current LineStream, you get garbage:

readable
  .pipeThrough(new LineStream())
  .pipeThrough(new TextDecoderStream("utf-16"));
// -> stream with chunks "A", "䈀" and "�"

from encoding.

crowlKats avatar crowlKats commented on August 12, 2024

Closing as nothing is wrong with the spec, but rather a badly implemented utility.

from encoding.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.