Comments (18)
The figure for iso2022jp-encode-href-errors-misc ought to be 79.
from encoding.
List of bugs raised:
- https://bugs.webkit.org/show_bug.cgi?id=159887
- https://bugzilla.mozilla.org/show_bug.cgi?id=1285398
- https://bugs.chromium.org/p/chromium/issues/detail?id=626399
- https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/8202212/
from encoding.
Chromium:
- form/href-encoding-misc:
Out of 93, 30 characters(Cf, default ignorable) share the same cause as #58, #59, #61, #62.
The rest seems to be half-width Katakana. I remember raising an issue with this somewhere (and taking action), but I couldn't find it. - form/href-encoding: 373 characters. Mostly CJK Ideographs. Chromium treats them as not covered by ISO-2022-JP. Need to investigate. ISO-2022-JP in ICU (used by Chrome) share the table with Shift_JIS (and Shift_JIS in chromium passes the tests).
from encoding.
@jungshik is this the bug report about half-width katakana that you were looking for?
https://bugs.chromium.org/p/chromium/issues/detail?id=544402&thanks=544402&ts=1445064020
from encoding.
@r12a, No, that's not what I had in mind. I remember doing something - at least filing a bug - to take a look at what you reported here; Half-width Katana in ISO-2022-JP, but I couldn't find it.
If you have the same issue in UTF-8 at http://r12a.github.io/uniview/?block=halfwidth_and_fullwidth_forms (which I couldn't reproduce on my Mac Chrome in the past. I didn't try it today) , it cannot be related to the encoding conversion.
from encoding.
Two decode expectations for malformed sequences seem wrong:
Fail escape start: 1B 65 79 56 1B 28 42 assert_equals: expected "�eByV" but got "�eyV"
Fail escape: 1B 24 65 79 56 1B 28 42 assert_equals: expected "�e$yV" but got "�$eyV"
from encoding.
Firefox Nightly 56 got much improvement, but still encoding errors has 63 failures and decoding errors has 2 failures.
from encoding.
Today and yesterday i updated the results at https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#iso2022jp for Firefox, FNightly, Chrome, and Canary. The latest summary is:
from encoding.
Firefox Nightly 56 got much improvement, but still encoding errors has 63 failures and decoding errors has 2 failures.
Per earlier comment, the decoder error handling failures are test suite bugs.
The encoder failures are due to the test suite not having been updated to account for the spec change to half-width katakana handling.
from encoding.
The encoder failures are due to the test suite not having been updated to account for the spec change to half-width katakana handling.
@hsivonen i updated the encoder algorithm used by the tests. I haven't updated the results page yet, but you can run the tests from https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#iso2022jp (click on the title in the left column).
The result of that fix is that Chrome and Safari now pass iso2022jp-encode-form-errors-misc.html cleanly. Firefox however still sticks on 8 characters (not katakana), so i'm guessing that may be a FF bug(?)
from encoding.
Two decode expectations for malformed sequences seem wrong:
Fail escape start: 1B 65 79 56 1B 28 42 assert_equals: expected "�eByV" but got "�eyV"
Fail escape: 1B 24 65 79 56 1B 28 42 assert_equals: expected "�e$yV" but got "�$eyV"
@hsivonen @annevk i fixed the first, which was indeed a bug. (Wrong expectations.)
however the second test you mention above still fails in FF, Chrome and Safari.
I suspect it may be a bug in the decoder algorithm at https://encoding.spec.whatwg.org/#iso-2022-jp-decoder
step escape.8 says:
Prepend lead and byte to stream.
To get what the browsers are actually returning, i think it needs to say
"Prepend byte and lead to stream"
Or perhaps better, specify explicitly the order in which those two should end up when prepended.
from encoding.
I should mention that i tested those in FF 55.
from encoding.
What it means is that they're to be prepended together in the order specified as specified at https://encoding.spec.whatwg.org/#concept-stream-prepend. I'm not sure how you can read it any other way.
from encoding.
It would seem weird to prepend lead and then prepend byte (aka prepend byte and lead) as that would have them be in the reverse order when you read that stream.
from encoding.
encoding_rs
went into Firefox 56, so testing 56 is more useful than testing 55.
I see one failure (escape: 1B 24 65 79 56 1B 28 42 | assert_equals: expected "�e$yV" but got "�$eyV") in Firefox Nightly. This is clearly a test case bug with the test case having e and $ reversed compared to the ASCII interpretation of the input bytes.
from encoding.
What it means is that they're to be prepended together in the order specified as specified at https://encoding.spec.whatwg.org/#concept-stream-prepend. I'm not sure how you can read it any other way.
I read
those tokens must be inserted, in given order
as "insert the first one, and then insert the second one". Depends whether 'in given order' refers to 'insert' or 'the tokens'. I read it as "prepend lead, then byte to stream". (I'll admit that i was following the wording rather than the deep logic of what was going on.)
Something like you just said may be clearer, eg. Prepend lead and byte together to stream.
Anyway, i'll fix it.
from encoding.
Now that Firefox passes all these tests and a year has passed, I'm happy to consider this done. A new issue would also be less noisy at this point, were one warranted.
If you want to pursue changing the wording here I'd be open to that by the way, but let's discuss that in a new issue.
from encoding.
2. form/href-encoding: 373 characters. Mostly CJK Ideographs. Chromium treats them as not covered by ISO-2022-JP. Need t
It's now fixed in Chromium's ToT. It'll be included in next canary and Chrome 72 (will turn stable in January. dev/beta before that). https://crbug.com/901255 .
from encoding.
Related Issues (20)
- "For logical right shifts operands must have at ..." HOT 4
- Corner cases arising from Big5 encoder not excluding HKSCS codes with lead bytes 0xFA–FE HOT 6
- End-of-queue during decoding of GB18030 should not mask ASCII characters. HOT 4
- gb18030 encoder using index gb18030 ranges pointer HOT 4
- aria-label usage in BMP coverage table HOT 4
- Bug in TextDecoderStream around processing the end of stream. HOT 1
- Add a static decode and encode method to `TextEncoder` and `TextDecoder` HOT 10
- Shift_JIS decoder HOT 12
- [GB18030] Wrong codepoint at index 7533 HOT 4
- TextDecoderStream: empty Uint8Array should result in an empty string HOT 4
- 7-bit ASCII encoding HOT 3
- The concept of "output encoding" is not described anywhere HOT 5
- Visualization tables has lack of descriptions HOT 2
- Why Big5 index contains unmappable characters? HOT 2
- Consider adding windows-936-2000 as a label for GBK HOT 2
- Preface punctuation
- Reflect changes in GB 18030-2022 HOT 5
- Make encodeInto() throw when given a detached buffer HOT 5
- Ambiguous wording in GB18030 decoder HOT 4
- Reference link wrong in "If ioQueue is empty..." HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from encoding.