Coder Social home page Coder Social logo

Comments (9)

tagomoris avatar tagomoris commented on May 27, 2024

@topac Could you add versions of ruby & msgpack-ruby, and stack trace?

from msgpack-ruby.

topac avatar topac commented on May 27, 2024

@tagomoris You're right.

  • ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-darwin14]
  • MessagePack::VERSION is 0.7.6
  • test.rb:33:in readpartial': Invalid argument @ io_getpartial - /tmp/testmsgpack1 (Errno::EINVAL)
    from test.rb:33:in load' from test.rb:33:in '`

from msgpack-ruby.

tagomoris avatar tagomoris commented on May 27, 2024

Thank you. I'll check it, but give me time for it.

from msgpack-ruby.

tagomoris avatar tagomoris commented on May 27, 2024

Ah, got it. It's a kind of this issue: msgpack/msgpack#214
MessagePack supports a string/binary which has a length in 32bit integer (== 2GB in 32bit signed int).
It's the limitation of msgpack spec, not bug of msgpack-ruby.

It might be a problem that error message should be more clear, or #to_msgpack should show any errors.

from msgpack-ruby.

topac avatar topac commented on May 27, 2024

@tagomoris I think it's related to the fact that is loading from a file. In fact, skipping the part which write string to file and loading directly it does not trigger the error:

# ......

begin
  print "to_msgpack..."
  string = {"a" => "foo", "b" => 42, "c" => generate_input_string}.to_msgpack
  puts "done"

  print "loading..."
  MessagePack.load(string) # instead of MessagePack.load(File.open(TMP_FILE_PATH))
  puts "done"
ensure
  FileUtils.rm_f(TMP_FILE_PATH)
end

from msgpack-ruby.

chrisseaton avatar chrisseaton commented on May 27, 2024

This bug is still there.

The workaround is MessagePack.load(string) # instead of MessagePack.load(File.open(TMP_FILE_PATH)), as suggested, and that works.

The issue is not msgpack/msgpack#214, it's that msgpack tries to do readpartial on the entire string at once. readpartial calls the read system call, which does not allow reading more than INTMAX bytes, returning the error code EINVAL as expected, which is what we see.

msgpack's buffer needs to be engineered to not read more than INTMAX bytes at a time.

require 'msgpack'
require 'fileutils'
require 'securerandom'

TMP_FILE_PATH = "/tmp/testmsgpack1"
MAX_SIZE = 1_000_000 * 3_000 # 3gb
# note that no error is raised if for example MAX_SIZE = 2mb

def generate_random_string
  string = "".force_encoding("BINARY")
  loop do
    string << (SecureRandom.random_bytes(2048)*100)
    return string if string.bytesize >= MAX_SIZE
  end
end

big_string = generate_random_string

begin
  # Create an hash with 3 keys, the value of the key "c" is a binary string of 3gb
  print "to_msgpack..."
  string = {"a" => "foo", "b" => 42, "c" => big_string}.to_msgpack
  puts "done"

  # Write the msgpack to disk
  print "write to disk..."
  File.open(TMP_FILE_PATH, "wb") do |f|
    curr = 0
    loop do
      buffer = string[curr..curr+20000-1]
      break if buffer.to_s.bytesize == 0
      curr += f.write(buffer)
    end
  end
  puts "done"

  file = File.open(TMP_FILE_PATH)
  file.readpartial(MAX_SIZE, big_string) # this is the underlying issue
  file.close
ensure
  FileUtils.rm_f(TMP_FILE_PATH)
end

from msgpack-ruby.

chrisseaton avatar chrisseaton commented on May 27, 2024

Seems like some IO uses this buffer size option, which has a sensible default, and some try to read it all

% grep io_partial_read_method **/*.c
ext/msgpack/buffer.c:        b->io_buffer = rb_funcall(b->io, b->io_partial_read_method, 1, SIZET2NUM(b->io_buffer_size));
ext/msgpack/buffer.c:        VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(b->io_buffer_size), b->io_buffer);
ext/msgpack/buffer.c:        VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(length), string);
ext/msgpack/buffer.c:    VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(length), b->io_buffer);
ext/msgpack/buffer.c:    VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(length), b->io_buffer);
ext/msgpack/buffer_class.c:    b->io_partial_read_method = get_partial_read_method(io);

from msgpack-ruby.

chrisseaton avatar chrisseaton commented on May 27, 2024

read_raw_body_begin to read_raw_body_cont to msgpack_buffer_read_to_string to _msgpack_buffer_read_from_io_to_string, which ignores the buffer size.

from msgpack-ruby.

chrisseaton avatar chrisseaton commented on May 27, 2024

I think I have a fix - merge #289 before I open it as a PR.

diff --git a/ext/msgpack/buffer.c b/ext/msgpack/buffer.c
index 9cfe0c7..15fc892 100644
--- a/ext/msgpack/buffer.c
+++ b/ext/msgpack/buffer.c
@@ -613,9 +613,11 @@ size_t _msgpack_buffer_feed_from_io(msgpack_buffer_t* b)
 
 size_t _msgpack_buffer_read_from_io_to_string(msgpack_buffer_t* b, VALUE string, size_t length)
 {
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
     if(RSTRING_LEN(string) == 0) {
         /* direct read */
-        VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(length), string);
+        VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(MIN(b->io_buffer_size, length)), string);
         if(ret == Qnil) {
             return 0;
         }
@@ -627,7 +629,7 @@ size_t _msgpack_buffer_read_from_io_to_string(msgpack_buffer_t* b, VALUE string,
         b->io_buffer = rb_str_buf_new(0);
     }
 
-    VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(length), b->io_buffer);
+    VALUE ret = rb_funcall(b->io, b->io_partial_read_method, 2, SIZET2NUM(MIN(b->io_buffer_size, length)), b->io_buffer);
     if(ret == Qnil) {
         return 0;
     }
@@ -635,6 +637,8 @@ size_t _msgpack_buffer_read_from_io_to_string(msgpack_buffer_t* b, VALUE string,
 
     rb_str_buf_cat(string, (const void*)RSTRING_PTR(b->io_buffer), rl);
     return rl;
+
+#undef MIN
 }
 
 size_t _msgpack_buffer_skip_from_io(msgpack_buffer_t* b, size_t length)

from msgpack-ruby.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.