Coder Social home page Coder Social logo

Comments (3)

penguinpowernz avatar penguinpowernz commented on September 26, 2024

OK did a bit of a test cos I had some files of varying sizes lying around. All the figures are in MB and run on ruby-2.2.2:

size       loaded     compress   diff      change
1.07       13.04      14.61      1.57      10.74%
52.64      64.61      95.21      30.60     32.13%
73.49      85.47      123.97     38.50     31.05%
99.82      111.79     147.03     35.24     23.96%
155.45     167.42     202.38     34.96     17.27%
330.60     342.52     429.77     87.25     20.30%
493.55     505.53     615.88     110.35    17.91%

The columns are:

  1. size of the file
  2. memory use after loading the file into a local var
  3. memory use after compressing the contents of the local var and storing that to a different local var
  4. the difference between the two memory sizes (i.e. the memory used by this library)
  5. the percentage of the original memory used by this library

So I'm not sure how much use those numbers are, but maybe someone can inflect a better relationship between them than I can.

Here's the 'benchmark' code:

testmem:

#!/usr/bin/env ruby

APP_PATH = File.dirname(File.dirname(File.expand_path(__FILE__)))
$LOAD_PATH.unshift(File.join(APP_PATH, "lib"))

base = 'testdir'

puts "%-10s %-10s %-10s %s" % ["size", "loaded", "compress", "diff"]

# only use the files ending in these numbers (selected in order of filesize)
[2, 8, 1, 3, 4, 9, 6].each do |i|
  fn = "#{base}/file.#{i}"
  system "bin/testmem2 #{fn}"
end

testmem2:

#!/usr/bin/env ruby

APP_PATH = File.dirname(File.dirname(File.expand_path(__FILE__)))
$LOAD_PATH.unshift(File.join(APP_PATH, "lib"))

require 'xz'
require 'mem_info'

fn = ARGV[0]
abort "ERROR: no such file" unless File.exist?(fn)

# get the file size in MB
size     = (File.size(fn).to_f / 1024 / 1024).round(2)

# load the contents and check the memory usage
contents = File.read(fn)
premem   = (MemInfo.rss.to_f / 1024).round(2)

# compress the contents and check the memory usage
c        = XZ.compress(contents)
mem      = (MemInfo.rss.to_f / 1024).round(2)

# calculate how much the XZ lib used to do it's thang
diff     = mem - premem

# print the report
puts "%-10.2f %-10.2f %-10.2f %0.2f" % [size, premem, mem, diff]

mem_info.rb:

module MemInfo
  # This uses backticks to figure out the pagesize, but only once
  # when loading this module.
  # You might want to move this into some kind of initializer
  # that is loaded when your app starts and not when autoload
  # loads this module.
  KERNEL_PAGE_SIZE = `getconf PAGESIZE`.chomp.to_i rescue 4096 
  STATM_PATH       = "/proc/#{Process.pid}/statm"
  STATM_FOUND      = File.exist?(STATM_PATH)

  def self.rss
    STATM_FOUND ? (File.read(STATM_PATH).split(' ')[1].to_i * KERNEL_PAGE_SIZE) / 1024 : 0
  end
end

from ruby-xz.

Quintus avatar Quintus commented on September 26, 2024

Sorry for the delay, I still do not know why GitHub doesn’t send me emails when someone opens issues on my projects.

What constitutes a big amount of data?

It’s relative to the amount of RAM your box has built-in. The compress method holds the entire string in memory before passing it into liblzma, so if you for example read a video file of 42 GiB into memory all in one chunk, it might be that you run out of memory because you don’t have sufficient RAM. Granted, you are going to run out of memory already when you call File.read on the 42 GiB file, but I thought it was worth to mention it in the documentation.

compress_stream does not suffer from this, because it reads from the IO object only in chunks of small sizes. So if you only have 4 GiB of RAM available, you can still compress a 42 GiB file, because it is only processed chunk by chunk rather than all in one go.

Vale,
Quintus

from ruby-xz.

penguinpowernz avatar penguinpowernz commented on September 26, 2024

Ah of course, that makes sense.

Cheers

from ruby-xz.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.