Comments (4)
This is a tough one. It would basically require the VCardReader to iterate
over the vCard one byte at a time so that it could change the character
encoding of the property value when it finds a "CHARSET" parameter.
Original comment by mike.angstadt
on 22 Nov 2014 at 4:20
- Changed title: Property values with encodings different from the input stream's encoding not read correctly (vCard 2.1)
from ez-vcard.
One workaround is to use an iso reader and reencode it afterwards,
InputStream is = new FileInputStream("test/resources/Herr Steve Jobs.vcf");
InputStreamReader isr = new InputStreamReader(is, StandardCharsets.ISO_8859_1);
Ezvcard.parse(isr).all();
byte[] raw =
vCards.get(0).getAddresses().get(0).getStreetAddress().getBytes(StandardCharsets
.ISO_8859_1);
String charset = vCards.get(0).getAddresses().get(0).getCharset();
return new String(raw, charset);
This way I can trick the reader (charbased, multibyte) into reading single
bytes and encode the bytes with the given encoding.
But you are right, the proper fix would be to change the inner logic in the
library from reading chars to streaming bytes.
The method parse(Reader reader) shouldn't exist. At the end of a line you would
convert the read value bytes into a string using the parameter encoding or utf8
(vcard3+) or system encoding (old vcards, no param).
I'm surprised nobody noticed this game breaking bug so far. Or no not English
programmer has reported it and instead looked for another library. On github
I'd probably send you a pull request if you don't want to fix it yourself.
Original comment by [email protected]
on 22 Nov 2014 at 6:16
from ez-vcard.
A patch would be most welcome. :) The code would need to be clean and easy to
read, of course. :P
I was always under the impression that the CHARSET parameter was intended to be
used in conjunction with quoted-printable values, so it knows that character
encoding to use when decoding the quoted-printable string.
Are you creating this vCard yourself or is Outlook generating it? If you are
creating it yourself, one work-around would be to encode the property value in
quoted-printable encoding.
Original comment by mike.angstadt
on 22 Nov 2014 at 10:58
from ez-vcard.
I got this to work by modifying the parser to read one byte at a time instead of one char at a time. But it only works for ASCII-compatible character sets because each byte had to be cast to a char in order to parse the vCard syntax (the property name, parameters, etc). If the vCard file as a whole is encoded in something like UTF-16 (which encodes each character in 2 bytes instead of 1), it fails.
This "cast byte to char" approach also fails for a number of more obscure character sets. I tested this by looping through all available character sets supported by the JVM and saving a vCard file in each one. I then attempted to read the file using the "cast byte to char" approach, and many failed.
String s = "BEGIN:VCARD\r\nVERSION:4.0\r\nFN:Name\r\nEND:VCARD\r\n";
for (Charset c : Charset.availableCharsets().values()) {
File file = new File("temp.vcf");
FileOutputStream out = new FileOutputStream(file);
BufferedWriter w = new BufferedWriter(new OutputStreamWriter(out, c));
w.write(s);
w.close();
//read the vCard file...
}
The work around described by David only works if the vCard file is encoded in an ASCII-compatible character encoding (which ISO-8859-1 is). If the file is encoded in, say, UTF-16, it fails because the parser is trying to parse the file using ISO-8859-1, which is not compatible with UTF-16.
The problem boils down to this: How do you parse a file that contains text encoded in multiple character encodings? This is not something that happens often. 99.99% of the time, a text file is encoded in a single character set, not multiple.
The Reader
class does not let you switch character encodings mid-stream. _Therefore, the only way to do this is to treat the vCard file as a binary file and manually convert each byte to a character as it is read off the stream, switching character sets when the property value is reached._ How this is done, I don't know. It might be possible using the CharsetDecoder
class.
I tried wrapping the raw InputStream
in a new Reader
object when the property value was reached, but that didn't work. For some reason, its read()
method returned -1
, even though the stream has not ended. This problem can be demonstrated as follows:
@Test
public void multiple_readers() throws Exception {
String s = "Hello world!";
byte[] b = s.getBytes("UTF-8");
ByteArrayInputStream in = new ByteArrayInputStream(b);
Reader r1 = new InputStreamReader(in, "UTF-8");
Reader r2 = new InputStreamReader(in, "UTF-8");
assertEquals('H', r1.read());
assertEquals('e', r2.read()); //fails
}
from ez-vcard.
Related Issues (20)
- Unable to share Vcf contact card for Android 11 and above. HOT 1
- IMPP: case-insensitive comparison of schemes HOT 1
- VCardWriter - Writes data without proper formatting for Google Contacts - Expected : YYYY-MM-DD ; current : YYYYMMDD HOT 2
- [Q] How do I get the `RawProperty`s? HOT 1
- Version 0.11.3 vulnerabilities HOT 3
- Migration to 0.12.0 while still supporting Android API 21..25 HOT 6
- Add support for Jigsaw modules HOT 1
- Parsing from chunk of text HOT 2
- Custom getter in VCardProperty HOT 9
- Invalid value for "MonthOfYear" caused by wrong BDAY date HOT 1
- Raw property values HOT 4
- Relations and Anniversaries are lost in serialization HOT 2
- what is the way to add social media contact? HOT 1
- Default line folding line length is not null HOT 1
- [Compatibility] Treat KIND values case-insensitive HOT 2
- whole vCard being drop instead of the specific corrupted fields HOT 3
- WhatsApp format of vcard is not supported?
- No Class found in DavX5 HOT 1
- hCard GEO HOT 1
- CRLF is handled incorrectly in vcard 2.1 properties HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ez-vcard.