Comments (4)
I agree with the byte stream, that definitively would be great. I didn't take that concept into consideration when opening this issue, and as such I guess this issue is redundant.
One reason for LineStream
to be Uint8array
based was to align with a utility we have that does the equivalent but with Deno's Deno.Reader
& Deno.Writer
: so to have a 1-to-1 mapping & people effortlessly change over to the WHATWG streams based one; but looking back on it with this issue in mind, I see that might have been a mistake.
from encoding.
from encoding.
I think this is intentional. Readable byte streams also don't allow enqueuing empty chunks, and it's likely that we'll want to make TextEncoderStream.readable
a proper byte stream in the future:
const rs = new ReadableStream({
type: "bytes",
start(controller) {
controller.enqueue(new Uint8Array(0)); // throws
}
});
We also don't want to enqueue empty strings if we're in the middle of a multi-byte character:
const { readable, writable } = new TextDecoderStream("utf-8");
const reader = readable.getReader();
const writer = writable.getWriter();
const readPromise = reader.read();
writer.write(new Uint8Array([0xF0, 0x9F, 0x99]));
// readPromise is still pending
writer.write(new Uint8Array([0x82]))
const { done, value } = await readPromise;
// -> value == "🙂"
I'm a bit surprised by Deno's LineStream
design. I would expect a transform stream that splits text by line delimiters to accept strings as input and produce strings as output. Instead, it looks like it uses raw byte chunks as both input and output?
That means that LineStream
is making an assumption about the text encoding, right? How exactly is that supposed to deal with multi-byte text encodings like utf-16
? For example:
new TextDecoder("utf-16").decode(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
// -> "A\nB"
I would expect you first run these chunks through a TextDecoderStream
, and then split by line delimiters:
const readable = new ReadableStream({
start(controller) {
controller.enqueue(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
controller.close();
}
});
readable
.pipeThrough(new TextDecoderStream("utf-16"))
.pipeThrough(new LineStream());
// -> stream with chunks "A" and "B"
Instead, with Deno's current LineStream
, you get garbage:
readable
.pipeThrough(new LineStream())
.pipeThrough(new TextDecoderStream("utf-16"));
// -> stream with chunks "A", "䈀" and "�"
from encoding.
Closing as nothing is wrong with the spec, but rather a badly implemented utility.
from encoding.
Related Issues (20)
- "For logical right shifts operands must have at ..." HOT 4
- Corner cases arising from Big5 encoder not excluding HKSCS codes with lead bytes 0xFA–FE HOT 6
- End-of-queue during decoding of GB18030 should not mask ASCII characters. HOT 4
- gb18030 encoder using index gb18030 ranges pointer HOT 4
- aria-label usage in BMP coverage table HOT 4
- Bug in TextDecoderStream around processing the end of stream. HOT 1
- Add a static decode and encode method to `TextEncoder` and `TextDecoder` HOT 10
- Shift_JIS decoder HOT 12
- [GB18030] Wrong codepoint at index 7533 HOT 4
- 7-bit ASCII encoding HOT 3
- The concept of "output encoding" is not described anywhere HOT 5
- Visualization tables has lack of descriptions HOT 2
- Why Big5 index contains unmappable characters? HOT 2
- Consider adding windows-936-2000 as a label for GBK HOT 2
- Preface punctuation
- Reflect changes in GB 18030-2022 HOT 5
- Make encodeInto() throw when given a detached buffer HOT 5
- Ambiguous wording in GB18030 decoder HOT 4
- Reference link wrong in "If ioQueue is empty..." HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from encoding.