Comments (22)
What settings? How?
from kramdown.
Could you please run the following:
ruby -e 'puts Encoding.default_external'
ruby -e 'puts Encoding.default_internal'
The solution is probably that the input needs to be converted to UTF-8 before being parsed. Not sure, though, if the output needs to be converted back...
from kramdown.
$ ruby -e 'puts Encoding.default_external'
CP850
$ ruby -e 'puts Encoding.default_internal'
The solution is probably that the input needs to be converted to UTF-8 before being parsed.
I have also tried to convert files saved in UTF-8 with input as described. The same thing happens.
from kramdown.
Yeah, because your environment external encoding is CP850 which means that all files read and all text input get the encoding label CP850...
What I have meant with "The solution..." is that kramdown needs to do this internally, i.e. convert input to UTF-8. I will see how the Ruby stdlib CSV library does this and will probably follow along the footsteps.
from kramdown.
Can I work around this? I know nothing about ruby, I just have it installed so I can use kramdown and sass, but it sounds like a bad thing not to have an "environment external encoding" to anything else than UTF-8.
from kramdown.
You can try calling kramdown in the following way until I can fix this (note that the input has to be valid UTF-8):
ruby --external-encoding UTF-8 -S kramdown
from kramdown.
Thanks, that works.
from kramdown.
Possibly related:
C:\Users\Simon\Desktop>ruby aaå.rb
Hello, World!
C:\Users\Simon\Desktop>kramdown aaå.rb
C:/Ruby193/lib/ruby/gems/1.9.1/gems/kramdown-0.14.2/bin/kramdown:72:in `read': No such file or directory - aaå.rb (Errno::ENOENT
)
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/kramdown-0.14.2/bin/kramdown:72:in `<top (required)>'
from C:/Ruby193/bin/kramdown:23:in `load'
from C:/Ruby193/bin/kramdown:23:in `<main>'
from kramdown.
You may also want to globally change the default encoding to UTF-8 if that is what you use. See for example http://stackoverflow.com/questions/469163/how-to-set-the-default-encoding-in-windows-xp and http://stackoverflow.com/questions/11806512/ruby-1-9-wrong-file-encoding-on-windows.
And yes, this will be related. However, this is a general problem: If you use UTF-8, you should set that as encoding for your computer because otherwise the CP850 encoding will always make trouble.
from kramdown.
Setting LANG
fixes it for me
LANG=en_US.CP850 kramdown aaå.rb
/usr/lib/ruby/gems/1.9.1/gems/kramdown-0.14.2/bin/kramdown:72:in `read': No such
file or directory - aaå.rb (Errno::ENOENT)
from /usr/lib/ruby/gems/1.9.1/gems/kramdown-0.14.2/bin/kramdown:72:in `<
top (required)>'
from /usr/bin/kramdown:23:in `load'
from /usr/bin/kramdown:23:in `<main>'
LANG=en_US.UTF-8 kramdown aaå.rb
<p>puts `Hello, World!'</p>
from kramdown.
I ran into this problem today because Tilt doesn't handle encoding properly (rtomayko/tilt#75). This is how I patched the problem, in case it's useful to anyone.
Ruby 2
module Kramdown::Parser
module EncodingFix
def adapt_source(source)
super.force_encoding('UTF-8')
end
end
class Base; prepend EncodingFix; end
end
Ruby <= 1.9
module Kramdown::Parser
class Base
alias old_adapt_source adapt_source
def adapt_source(source)
old_adapt_source(source).force_encoding('UTF-8')
end
end
end
from kramdown.
So, this does to seem to be a bit more complicated... or funny, depending on how you look at it.
I took the example "å
from @lydell and put it in a file on Windows 7 with Notepad and selected ANSI encoding. So, what does one expect here? I expected that the file would contain CP850 encoded characters because this is what seems to be the encoding on Windows 7 command line. But when you look up what ANSI encoding means, you see that it is actually called CP-1252. So the file gets saved in CP-1252 format.
On the command line, ruby reads it in as CP-850 (because this is the external encoding) and then outputs the result as CP-850 which leads to å
becoming õ
... which is just not right.
So... you are basically screwed on Windows if you expect a sane default environment because the command line encoding differs from the GUI encoding (or however one wants to phrase that).
However, since there is also still a bug in kramdown I will fix this bug by converting the source string to UTF-8 in Kramdown::Parser::Base.adapt_source and convert the result back to the original encoding of the string in Kramdown::Converter::Base.convert. The back-conversion is not really needed in the most common use cases because on terminal output or when writing to files Ruby automatically transcodes strings to the external encoding. However, when the string is further transformed in Ruby the caller probably expects a string in the same encoding as he has given.
And the result of all this? If you save a file on Windows with a CP-850 encoding, kramdown will now work correctly. Just remember that saving a file in Notepad with the ANSI encoding does not mean CP-850 but CP-1252 (or WINDOWS-1252 as it is known to ruby)!
Coming to you with the next release of kramdown which will be the (spoiler alert) 1.0.0 😄
The problem with the input file can't be solved by kramdown since this is a general problem (Question: Is the encoding of the file system paths different to the external encoding on Windows 7 cmd command line? Answer: Yes, it seems so. Solution: For your and my sake/saneness, please just set the default encoding to UTF-8 everywhere and use UTF-8 everywhere).
from kramdown.
I just tried 1.0.0. I would like to confirm that the original test case now works! Thanks!
Converting a kramdown file with UTF-8 (without BOM) encoding now works out of the box, without changing any settings or typing extra things on the command line. Great!
However, the "possibly related" issue still persists, with the same error. @svnpenn's LANG fix does not work for me:
$ LANG=en-US.UTF-8 kramdown aaå.rb
c:/Ruby193/lib/ruby/1.9.1/optparse.rb:1351:in `===': invalid byte sequence in UTF-8 (ArgumentError)
from c:/Ruby193/lib/ruby/1.9.1/optparse.rb:1351:in `block in parse_in_order'
from c:/Ruby193/lib/ruby/1.9.1/optparse.rb:1347:in `catch'
from c:/Ruby193/lib/ruby/1.9.1/optparse.rb:1347:in `parse_in_order'
from c:/Ruby193/lib/ruby/1.9.1/optparse.rb:1341:in `order!'
from c:/Ruby193/lib/ruby/1.9.1/optparse.rb:1432:in `permute!'
from c:/Ruby193/lib/ruby/1.9.1/optparse.rb:1453:in `parse!'
from c:/Ruby193/lib/ruby/gems/1.9.1/gems/kramdown-1.0.0/bin/kramdown:16:in `<top (required)>'
from c:/Ruby193/bin/kramdown:23:in `load'
from c:/Ruby193/bin/kramdown:23:in `<main>'
Is it related, or should a new issue be opened? Or, should I change my settings? If the latter—exactly what should I change? (The answer to that should be added to kramdown's install instructions for Windows.) Why can ruby find files with unicode characters in them, but not kramdown?
Anyways, thanks for the quick solution for the main problem! (For the time being, I could just avoid unicode characters in my file names, or rename them temporarily.)
from kramdown.
There are still known problems with Ruby, Windows and Unicode path names (see, for example, http://bugs.ruby-lang.org/issues/1685). However, they may not apply to this situation.
You should be able to work around this by using ruby --external-encoding UTF-8
. However, you need to be aware that this assumes that the content files for kramdown are also in UTF-8 and not CP850!
I have search a bit and found the chcp
command with which you can change the used CMD.com code page. Code Page 65001 can be used for UTF-8. You should also change the console font from the raster font to something else (Lucida Console works fine for me).
After changing to code page 1252 (chcp 1252
), I was able to execute the following command:
C:\temp>chcp 850
Aktive Codepage: 850.
C:\temp>kramdown ä.txt
C:/Ruby193/lib/ruby/gems/1.9.1/gems/kramdown-1.0.0/bin/kramdown:59:in `read': No such file or directory - ä.txt (Errno::ENOENT)
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/kramdown-1.0.0/bin/kramdown:59:in `<top (required)>'
from C:/Ruby193/bin/kramdown:23:in `load'
from C:/Ruby193/bin/kramdown:23:in `<main>'
C:\temp>chcp 1252
Aktive Codepage: 1252.
C:\temp>kramdown ä.txt
<p>aaä</p>
C:\temp>
Please note that the contents of the ä.txt
file was in Windows 1252 encoding and not UTF-8. If it were in UTF-8, the output would have been <p>aaä</p>
(because Ruby would interpret the characters as being Windows 1252 encoded before giving it to kramdown).
And the moral of the story? I don't think that I can provide you with a general solution. You need to make sure that the proper code page on the command line is set and that all your files and their content is encoded in this code page. I'm sorry I can't help more but I don't really use Windows that often.
from kramdown.
@lydell you input
LANG=en-US.UTF-8 kramdown aaå.rb
when I input
LANG=en_US.UTF-8 kramdown aaå.rb
notice carefully the underscore.
from kramdown.
Does using LANG=...
on Windows really work?
from kramdown.
@gettalong with Cygwin
from kramdown.
Ah, yes, of course, there it should work. But I don't think it works with the Windows Command Shell.
from kramdown.
Ah, I'm so used to seeing en-US that I just couldn't see that underscore … However, I still got the same result :(
LANG=...
does not work with Windows Command Shell, that's right. The discussion might have been a bit confused, since I've sometimes used Windows Command Shell and sometimes (mostly) Git bash (I don't know if LANG=...
works there).
In Windows Command Shell, running chcp 1252
before running kramdown aaå.rb
works! aaå.rb is saved with UTF-8 encoding though. And using chcp 65001
did not work! I'm now very confused …
Unfortunately, the chcp command does not work in Git bash, which is what I use the most. Luckily, I found a solution: cat aaå.rb | kramdown
.
It seems like Sass has the same problem:
$ sass aaå.rb
Errno::ENOENT: No such file or directory - aaÕ.rb
Use --trace for backtrace.
I also tried feeding the aaå.rb files to half a dozen compilers using node.js. All of them found it and used it correctly. So it really seems to be a ruby thing.
To sum up, thanks for your efforts! The important thing is that the main issue is resolved. I will continue to experiment with this, because I really want Windows users to be able to enjoy kramdown :)
from kramdown.
If you change the code page using chcp
, you basically change what Ruby sees as the encoding of its environment. I referred to this as the default external encoding (use ruby -e "puts Encoding.default_external"
to output it).
So if you change the code page to 1252, kramdown/ruby now can read the file name correctly because the encodings match (the default external with the encoding of the file name). However, since the content of the file is in a different encoding than the file name, the output from kramdown is garbled...
(side note: I don't really know how file system paths work on Windows and whether they are encoded in UTF-8 or Windows 1252, I'm just interpreting the data).
What exactly is "Git bash"? If it is based on Cygwin, the LANG
trick from @svnpenn should work!
from kramdown.
Also just found http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as
After reading this it seems that Ruby is using the ANSI version of the fopen
system call because it works if the external encoding is Windows 1252 but not if it is UTF-8. So this could probably be fixed by always converting the file name to the proper ANSI encoding when passing it to a Ruby file method.
from kramdown.
@gettalong "Git Bash" is essentially MinGW, and yes it is based on Cygwin.
from kramdown.
Related Issues (20)
- handling `br` inside formatting tags HOT 2
- [feature] support inline notes
- Attribute name consisting of colon does not get parsed HOT 1
- Code block language not parsing in details/summary following math expression HOT 2
- Custom Parser Support HOT 1
- MD to HTML is adding spaces to code segments. HOT 4
- Add an extension for parsing Mermaid syntax to the libraries extending the functionality of kramdown HOT 1
- Could not find gem 'kramdown' in https://github.com/gettalong/kramdown.git HOT 5
- Pipe characters within a inline link is parsed as a table (rather than a link) HOT 1
- `markdown="0"` meaning HOT 1
- Select whether an HTML tag is parsed as block or span level HOT 2
- `time` should be span level by default HOT 1
- Void elements should not use trailing slashes. HOT 2
- Jekyll kramdown: How to disable generating styles in tables? HOT 1
- Rouge CSS not applied to code HOT 2
- Japanese chars combined with MD syntax are not converted HOT 5
- Can I remove the image parsing method? HOT 2
- parse failure on empty tables
- Odd behavior with nested table and mixing block and inline elements HOT 1
- [bug] Kramdown converter fails with NoMethodError: undefined method `type' for nil HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kramdown.