Comments (5)
This, as you correctly identified in the comment to another issue, is likely a BOM mark which terminates the processing (BTW, there is no such encoding as DOS-UTF8, there is only just UTF-8).
Unfortunately, I don't think I want to add special handling for it in rust-xml
. Ideally this should be handled on the low level by I/O streams library, and rust-xml
should use properly decoded streams of characters. There is an encoding library, rust-encoding
, which probably can be a basis for this, but I'm not sure if it provides proper streams and not only buffered decoding.
Another reason I don't want to add BOM handling directly into the library is that BOM mark should not be used in UTF-8 encoded files - it is not even recommended by W3C.
For now, your best bet will be dropping first character from the buffer if it is a BOM mark. However, as far as I'm aware, Rust Buffer
does not provide an ability to "unread" characters (though it would be quite natural), so you will have to either to read your document into memory entirely and just slice it from the second byte onward, or you can write your own Buffer
or even Reader
implementation which filters out the first several bytes if they are UTF-8 BOM mark.
from xml-rs.
Hi,
have you mentioned this to the Rust developers?
Philippe
----- Mail original -----
De: "Vladimir Matveev" [email protected]
À: "netvl/rust-xml" [email protected]
Cc: "Phiroc" [email protected]
Envoyé: Mardi 2 Septembre 2014 10:25:30
Objet: Re: [rust-xml] Issue with UTF8 files (#10)
This, as you correctly identified in the comment to another issue, is likely a BOM mark which terminates the processing (BTW, there is no such encoding as DOS-UTF8, there is only just UTF-8).
Unfortunately, I don't think I want to add special handling for it in rust-xml . Ideally this should be handled on the low level by I/O streams library, and rust-xml should use properly decoded streams of characters. There is an encoding library, rust-encode , which probably can be a basis for this, but I'm not sure if it provides proper streams and not only buffered decoding.
Another reason I don't want to add BOM handling directly into the library is that BOM mark should not be used in UTF-8 encoded files - it is not even recommended by W3C.
For now, your best bet will be dropping first character from the buffer if it is a BOM mark. However, as far as I'm aware, Rust Buffer does not provide an ability to "unread" characters (though it would be quite natural), so you will have to either to read your document into memory entirely and just slice it from the second byte onward, or you can write your own Buffer or even Reader implementation which filters out the first several bytes if they are UTF-8 BOM mark.
—
Reply to this email directly or view it on GitHub .
from xml-rs.
@Phiroc, there was an RFC on something like that, but it was postponed. I now looked more closely into rust-encoding
and it seems to me that it has facilities which would allow implementing streaming decoding, but I'm not sure if I'm able to implement such thing, at least not right now.
from xml-rs.
There is a new library, encoding_rs
, which explicitly supports streaming encoding/decoding. I guess it is possible to use it for streaming encoding/decoding.
from xml-rs.
latin1, ASCII, and UTF-16 are now supported.
I'm not sure if there's demand for other encodings like legacy codepages or non-Unicode CJK encodings. Please open new issue if you need other encodings.
from xml-rs.
Related Issues (20)
- [Question] Is this library suited for XML reformatting? HOT 2
- Performance compared to VTD XML ? HOT 1
- unix line endings
- How to pretty print a xml
- Overflow in lexer when parsing malformed doctype HOT 1
- Fails to parse /> as part of XML body HOT 7
- Implement the position trait for the Events Iterator
- panicked at 'attempt to add with overflow' HOT 2
- Feature request: common Error enum for read/write
- Restricted XmlEvent? HOT 1
- [Question] How to implement streaming parsing? HOT 1
- deprecation warnings HOT 1
- Is this crate abandoned? HOT 3
- Parsing of comments <!-- <!-->
- Maintenance of xml-rs HOT 10
- EventReader never return Result::Err after document end HOT 1
- Version 0.8.9 broke deserialization behavior HOT 2
- 0.8.12's field types in xml::common::TextPosition break existing code HOT 1
- Panic in `PullParser::push_pos()` HOT 1
- You've found a bug in xml-rs, caused by calls to push_pos() HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xml-rs.