Coder Social home page Coder Social logo

Comments (9)

threedaymonk avatar threedaymonk commented on August 16, 2024

I'm unable to reproduce this. In irb:

>> $KCODE = 'u'
=> "u"
>> require 'htmlentities'
=> true
>> coder = HTMLEntities.new
=> #<HTMLEntities:0x7f58e53a71f0 @flavor="xhtml1">
>> coder.decode('FRAN&Ccedil;OIS')
=> "FRANÇOIS"
>> coder.decode('Fran&ccedil;ois')
=> "François"

If you can supply a minimal test case, I'll investigate.

from htmlentities.

wksmall avatar wksmall commented on August 16, 2024

Do you know of a way for me to be absolutely certain that I am feeding utf-8 to the decoder? I have a unit test on that section of my app that also passes however when I run the app in full, I'm getting this error. I do not for an instant discount a problem later down the line but my investigations so far brought me to suspect the decoder or what I'm feeding it.

from htmlentities.

wksmall avatar wksmall commented on August 16, 2024

UPDATE: In script/console, I get this:
>> $KCODE='u'
=> "u"
>> coder = HTMLEntities.new
=> #<HTMLEntities:0x2aaaac8274f0 @flavor="xhtml1">
>> coder.decode('FRANÇOIS')
=> "FRANÃOIS"
>> coder.decode('François')
=> "François"

in irb, the require 'htmlentities' didn't work. I probably need the full path.

We're using htmlentities 4.2.0 under Rails 2.3.8 and Ruby 1.8.7

from htmlentities.

threedaymonk avatar threedaymonk commented on August 16, 2024

It looks like your terminal is not UTF-8. That's a separate problem.

from htmlentities.

wksmall avatar wksmall commented on August 16, 2024

That's what I thought too but locale seems to think otherwise.

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

from htmlentities.

wksmall avatar wksmall commented on August 16, 2024

UPDATE: I found that further down the line in our code, I am trying to decode hex entities created by Nokogiri. It is Ç that is failing to decode properly. I'm getting à instead of Ç.

from htmlentities.

threedaymonk avatar threedaymonk commented on August 16, 2024

locale tells you what your programs are using, but it doesn't tell you what the terminal emulator is doing. (Which one are you using, by the way?)

However, let's try a different tack. Regardless of what your terminal is doing, the bytes should be the same. In irb:

>> coder.decode("&ccedil;").unpack("C*")
=> [195, 167]
>> coder.decode("&Ccedil;").unpack("C*")
=> [195, 135]
>> coder.decode("&#xC7;").unpack("C*")
=> [195, 135]

What results do you get?

from htmlentities.

wksmall avatar wksmall commented on August 16, 2024

I'm using puTTY in xterm mode.

I ran your example and got the same results. I've looked into this further and have determined that my tests are inadequate and misleading. It looks like my problem is that I have been feeding latin encoding to your decoder and this is the source of my difficulties. If I make sure that it it UTF-8, I get the proper decoding.

Thank you for your time. Sorry to have wasted it.

from htmlentities.

threedaymonk avatar threedaymonk commented on August 16, 2024

Ah, yes, doing UTF-8 on Windows is usually harder than it should be. Glad you've got a bit closer to the problem.

from htmlentities.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.