Comments (9)
I'm unable to reproduce this. In irb
:
>> $KCODE = 'u'
=> "u"
>> require 'htmlentities'
=> true
>> coder = HTMLEntities.new
=> #<HTMLEntities:0x7f58e53a71f0 @flavor="xhtml1">
>> coder.decode('FRANÇOIS')
=> "FRANÇOIS"
>> coder.decode('François')
=> "François"
If you can supply a minimal test case, I'll investigate.
from htmlentities.
Do you know of a way for me to be absolutely certain that I am feeding utf-8 to the decoder? I have a unit test on that section of my app that also passes however when I run the app in full, I'm getting this error. I do not for an instant discount a problem later down the line but my investigations so far brought me to suspect the decoder or what I'm feeding it.
from htmlentities.
UPDATE: In script/console, I get this:
>> $KCODE='u'
=> "u"
>> coder = HTMLEntities.new
=> #<HTMLEntities:0x2aaaac8274f0 @flavor="xhtml1">
>> coder.decode('FRANÇOIS')
=> "FRANÃOIS"
>> coder.decode('François')
=> "François"
in irb, the require 'htmlentities' didn't work. I probably need the full path.
We're using htmlentities 4.2.0 under Rails 2.3.8 and Ruby 1.8.7
from htmlentities.
It looks like your terminal is not UTF-8. That's a separate problem.
from htmlentities.
That's what I thought too but locale seems to think otherwise.
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
from htmlentities.
UPDATE: I found that further down the line in our code, I am trying to decode hex entities created by Nokogiri. It is Ç that is failing to decode properly. I'm getting à instead of Ç.
from htmlentities.
locale
tells you what your programs are using, but it doesn't tell you what the terminal emulator is doing. (Which one are you using, by the way?)
However, let's try a different tack. Regardless of what your terminal is doing, the bytes should be the same. In irb
:
>> coder.decode("ç").unpack("C*")
=> [195, 167]
>> coder.decode("Ç").unpack("C*")
=> [195, 135]
>> coder.decode("Ç").unpack("C*")
=> [195, 135]
What results do you get?
from htmlentities.
I'm using puTTY in xterm mode.
I ran your example and got the same results. I've looked into this further and have determined that my tests are inadequate and misleading. It looks like my problem is that I have been feeding latin encoding to your decoder and this is the source of my difficulties. If I make sure that it it UTF-8, I get the proper decoding.
Thank you for your time. Sorry to have wasted it.
from htmlentities.
Ah, yes, doing UTF-8 on Windows is usually harder than it should be. Glad you've got a bit closer to the problem.
from htmlentities.
Related Issues (20)
- Cannot Decode , HTML to Comma HOT 6
- Add License information to gemfile HOT 8
- NameError: uninitialized constant HTMLEntities::Encoder::Encoding HOT 5
- Option to exlude some characters from being decoded HOT 8
- Verify HTML entity names HOT 4
- decode fails on html_safe strings HOT 2
- Remove http://htmlentities.rubyforge.org/ link in the description on GitHub HOT 1
- Encode Registered Trademark (®) HOT 1
- Expanded encoder doesn't encode colon character HOT 2
- doesn't decode &Amp; - purposeful? HOT 1
- Decode of TM symbol inconsistent between entity name and code HOT 1
- expanded.rb - warning: key "inodot" is duplicated and overwritten on line 466 HOT 4
- Using this with Controller HOT 1
- Encoding for "μ" does not seem to work HOT 6
- Typo in files: "subE" is ⫅, not ⊆ HOT 1
- Improperly decoding apostrophe HOT 2
- "\xE2" from ASCII-8BIT to UTF-8 HOT 1
- Add support for case-insentitive decoding
- Add support for incorrect numerical entity format
- Add support for HTML5 entities (specifically, ≈)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from htmlentities.