Coder Social home page Coder Social logo

rdoc failed when parsing UTF-8 script about rdoc HOT 6 CLOSED

ruby avatar ruby commented on May 18, 2024
rdoc failed when parsing UTF-8 script

from rdoc.

Comments (6)

majioa avatar majioa commented on May 18, 2024

Sorry, the patch is the following:

--- ruby_lex.rb 2011-03-19 03:10:32.340220572 +0300
+++ /usr/share/ruby/1.9/rdoc/ruby_lex.rb    2011-03-19 04:12:27.188068372 +0300
@@ -196,7 +196,7 @@ class RDoc::RubyLex
   end
 
   def peek_equal?(str)
-    chrs = str.split(//)
+    chrs = str.unpack('U*').map do |x| x.chr(Encoding::UTF_8) end
     until @rests.size >= chrs.size
       return false unless buf_input
     end
@@ -221,7 +221,7 @@ class RDoc::RubyLex
     prompt
     line = @input.call
     return nil unless line
-    @rests.concat line.split(//)
+    @rests.concat(line.unpack('U*').map do |x| x.chr(Encoding::UTF_8) end)
     true
   end
   private :buf_input

from rdoc.

drbrain avatar drbrain commented on May 18, 2024

What was the input file that fails without this patch?

Did it fail on ruby 1.9 or ruby 1.8?

from rdoc.

majioa avatar majioa commented on May 18, 2024

ruby 1.9, without the patch if an input file contains UTF-8 chars the rdoc (regexp class) throws an exception

from rdoc.

drbrain avatar drbrain commented on May 18, 2024

I can't reproduce. In order to apply your patch I need a test file that fails with a specific ruby and RDoc version.

Given this file:

# coding: utf-8

module Euler
  def ℇ
    Math::E
  end

  def π
    Math::PI
  end

  def identity
    ℇ ** (i * π) + 1 == 0
  end
end

I have no error:

$ rdoc --version
rdoc 3.5.3
$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.6.0]
$ rdoc euler.rb
Parsing sources...
100% [ 1/ 1]  euler.rb

Generating Darkfish format into /Users/ehodel/tmp/doc...

Files:      1

Classes:    0 (0 undocumented)
Modules:    1 (1 undocumented)
Constants:  0 (0 undocumented)
Attributes: 0 (0 undocumented)
Methods:    3 (3 undocumented)

Total:      4 (4 undocumented)
  0.00% documented

Elapsed: 0.1s

from rdoc.

majioa avatar majioa commented on May 18, 2024

That error raised when the in a source file hasn't specified explicitly utf-8 cp, but has it. example (note that the encoding is switched by passing the keys to ruby):

$ cat 1.rb
#!/usr/bin/ruby -KU
class Test
      attr_reader :истокъ, :цѣль
end
$ rdoc --ri 1.rb
...
incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

I guess this is resukted of the following code:

    @identifier_re ||= if defined? Encoding then
                         eval '/[\p{Alnum}_]/u'
                       else
                         eval '/[\w\x80-\xff]/'
                       end

Here cp of @identifier_re expplicitly set yo utf-8. But since the file has non-utf cp, and has utf-8 chars (>=0x80), them regexp match reaises the expection. May be it would be good to forcely convert a file to utf-8? The patch tries to fix it.

from rdoc.

majioa avatar majioa commented on May 18, 2024

Now i have three more incremental patches for you.

It fixes the same problem as denoted early:

--- /usr/share/ruby/1.9/rdoc/rdoc.rb    2011-03-20 04:41:53.556174053 +0300
+++ rdoc.rb     2011-03-20 04:41:24.176224970 +0300
@@ -406,12 +406,12 @@ The internal error was:
   def read_file_contents(filename)
     content = open filename, "rb" do |f| f.read end

-    utf8 = content.sub!(/\A\xef\xbb\xbf/, '')
     if defined? Encoding then
       if /coding[=:]\s*([^\s;]+)/i =~ content[%r"\A(?:#!.*\n)?.*\n"]
         enc = ::Encoding.find($1)
       end
-      if enc ||= (Encoding::UTF_8 if utf8)
+      re = Regexp.new('\A\xef\xbb\xbf'.force_encoding(content.encoding))
+      if enc ||= (Encoding::UTF_8 if content.sub!(re, ''))
         content.force_encoding(enc)
       end
     end

Reporoduce as denoted early.

This fixes case when t.text is nil (that leads to an exception), as it was in my case:

--- markup.rb   2011-03-20 00:29:06.192185818 +0300
+++ /usr/share/ruby/1.9/rdoc/generator/markup.rb    2011-03-20 00:29:23.200625435 +0300
@@ -111,6 +111,8 @@ class RDoc::AnyMethod
                 nil
               end
 
+      next unless t.text
+
       text = CGI.escapeHTML t.text
 
       if style

I couldn't reproduce this simply on an example. I can only show a trace log. The log is the following:

$ rake install --trace
...
ERROR:  While generating documentation for prekhlag-
... MESSAGE:   Error while evaluating /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml: undefined method `gsub' for nil:NilClass (at...")
... RDOC args: --op /usr/share/ruby/gems/1.9/doc/prekhlag-/rdoc lib LICENSE.txt README.rdoc --title prekhlag- Documentation --quiet
        /usr/share/ruby/1.9/cgi/util.rb:34:in `escapeHTML'
        /usr/share/ruby/1.9/rdoc/generator/markup.rb:114:in `block in markup_code'
        /usr/share/ruby/1.9/rdoc/generator/markup.rb:96:in `each'
        /usr/share/ruby/1.9/rdoc/generator/markup.rb:96:in `markup_code'
        /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml:250:in `block (4 levels) in generate_class_files'
        /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml:224:in `each'
        /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml:224:in `block (3 levels) in generate_class_files'
        /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml:219:in `each'
        /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml:219:in `block (2 levels) in generate_class_files'
        /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml:217:in `each'
        /usr/share/ruby/1.9/rdoc/generator/template/darkfish/classpage.rhtml:217:in `block in generate_class_files'
        /usr/share/ruby/1.9/erb.rb:753:in `eval'
        /usr/share/ruby/1.9/erb.rb:753:in `result'
        /usr/share/ruby/1.9/rdoc/generator/darkfish.rb:346:in `render_template'
        /usr/share/ruby/1.9/rdoc/generator/darkfish.rb:266:in `block in generate_class_files'
        /usr/share/ruby/1.9/rdoc/generator/darkfish.rb:259:in `each'
        /usr/share/ruby/1.9/rdoc/generator/darkfish.rb:259:in `generate_class_files'
        /usr/share/ruby/1.9/rdoc/generator/darkfish.rb:180:in `generate'
        /usr/share/ruby/1.9/rdoc/rdoc.rb:392:in `block in document'
        /usr/share/ruby/1.9/rdoc/rdoc.rb:388:in `chdir'
        /usr/share/ruby/1.9/rdoc/rdoc.rb:388:in `document'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/doc_manager.rb:189:in `run_rdoc'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/doc_manager.rb:144:in `install_rdoc'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/doc_manager.rb:130:in `generate_rdoc'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/commands/install_command.rb:155:in `block in execute'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/commands/install_command.rb:154:in `each'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/commands/install_command.rb:154:in `execute'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/command.rb:278:in `invoke'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/command_manager.rb:133:in `process_args'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/command_manager.rb:103:in `run'
        /usr/local/share/ruby/site_ruby/1.9/rubygems/gem_runner.rb:63:in `run'
        /usr/bin/gem:21:in `'
Successfully installed prekhlag-
1 gem installed
Installing ri documentation for prekhlag-...
Installing RDoc documentation for prekhlag-...

When an encoding is specified as 'coding' word inside the ruby script in ruby's cp output format like:

#!/usr/bin/ruby
#<Encoding:UTF-8>

the script rdoc is failed on the encode value (while the ruby parser passed):
The patch fixes it:

--- rdoc.rb 2011-03-20 04:41:24.176224970 +0300
+++ /usr/share/ruby/1.9/rdoc/rdoc.rb    2011-03-20 04:59:43.896061415 +0300
@@ -407,7 +407,7 @@ The internal error was:
     content = open filename, "rb" do |f| f.read end
 
     if defined? Encoding then
-      if /coding[=:]\s*([^\s;]+)/i =~ content[%r"\A(?:#!.*\n)?.*\n"]
+      if /coding[=:]\s*([\w\d_:()\-\.\/]+)/i =~ content[%r"\A(?:#!.*\n)?.*\n"]
         enc = ::Encoding.find($1)
       end
       re = Regexp.new('\A\xef\xbb\xbf'.force_encoding(content.encoding))

from rdoc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.