Comments (3)
OK did a bit of a test cos I had some files of varying sizes lying around. All the figures are in MB and run on ruby-2.2.2:
size loaded compress diff change
1.07 13.04 14.61 1.57 10.74%
52.64 64.61 95.21 30.60 32.13%
73.49 85.47 123.97 38.50 31.05%
99.82 111.79 147.03 35.24 23.96%
155.45 167.42 202.38 34.96 17.27%
330.60 342.52 429.77 87.25 20.30%
493.55 505.53 615.88 110.35 17.91%
The columns are:
- size of the file
- memory use after loading the file into a local var
- memory use after compressing the contents of the local var and storing that to a different local var
- the difference between the two memory sizes (i.e. the memory used by this library)
- the percentage of the original memory used by this library
So I'm not sure how much use those numbers are, but maybe someone can inflect a better relationship between them than I can.
Here's the 'benchmark' code:
testmem:
#!/usr/bin/env ruby
APP_PATH = File.dirname(File.dirname(File.expand_path(__FILE__)))
$LOAD_PATH.unshift(File.join(APP_PATH, "lib"))
base = 'testdir'
puts "%-10s %-10s %-10s %s" % ["size", "loaded", "compress", "diff"]
# only use the files ending in these numbers (selected in order of filesize)
[2, 8, 1, 3, 4, 9, 6].each do |i|
fn = "#{base}/file.#{i}"
system "bin/testmem2 #{fn}"
end
testmem2:
#!/usr/bin/env ruby
APP_PATH = File.dirname(File.dirname(File.expand_path(__FILE__)))
$LOAD_PATH.unshift(File.join(APP_PATH, "lib"))
require 'xz'
require 'mem_info'
fn = ARGV[0]
abort "ERROR: no such file" unless File.exist?(fn)
# get the file size in MB
size = (File.size(fn).to_f / 1024 / 1024).round(2)
# load the contents and check the memory usage
contents = File.read(fn)
premem = (MemInfo.rss.to_f / 1024).round(2)
# compress the contents and check the memory usage
c = XZ.compress(contents)
mem = (MemInfo.rss.to_f / 1024).round(2)
# calculate how much the XZ lib used to do it's thang
diff = mem - premem
# print the report
puts "%-10.2f %-10.2f %-10.2f %0.2f" % [size, premem, mem, diff]
mem_info.rb:
module MemInfo
# This uses backticks to figure out the pagesize, but only once
# when loading this module.
# You might want to move this into some kind of initializer
# that is loaded when your app starts and not when autoload
# loads this module.
KERNEL_PAGE_SIZE = `getconf PAGESIZE`.chomp.to_i rescue 4096
STATM_PATH = "/proc/#{Process.pid}/statm"
STATM_FOUND = File.exist?(STATM_PATH)
def self.rss
STATM_FOUND ? (File.read(STATM_PATH).split(' ')[1].to_i * KERNEL_PAGE_SIZE) / 1024 : 0
end
end
from ruby-xz.
Sorry for the delay, I still do not know why GitHub doesn’t send me emails when someone opens issues on my projects.
What constitutes a big amount of data?
It’s relative to the amount of RAM your box has built-in. The compress
method holds the entire string in memory before passing it into liblzma, so if you for example read a video file of 42 GiB into memory all in one chunk, it might be that you run out of memory because you don’t have sufficient RAM. Granted, you are going to run out of memory already when you call File.read
on the 42 GiB file, but I thought it was worth to mention it in the documentation.
compress_stream
does not suffer from this, because it reads from the IO object only in chunks of small sizes. So if you only have 4 GiB of RAM available, you can still compress a 42 GiB file, because it is only processed chunk by chunk rather than all in one go.
Vale,
Quintus
from ruby-xz.
Ah of course, that makes sense.
Cheers
from ruby-xz.
Related Issues (13)
- Remove dependency on io-like HOT 4
- Add methods for determining the approx. (de)compressed size HOT 1
- Sync API with Ruby’s Zlib::GzipFile HOT 2
- ruby-xz breaks Resolv.getaddress in ruby 2.2.x HOT 8
- Invalid archive checksum should raise exception HOT 1
- can't require 'xz' on OS X without adding a symlink
- how to generate a lzma-js compatible string to decode in browser? HOT 1
- Stream (`IO`-like) mode would be great HOT 15
- Memory leak HOT 3
- StreamWriter crashes when writing more than a couple of bytes HOT 2
- Add 'io-like' dependency to gemspec. HOT 2
- remove initialize in lib_lzma.rb HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ruby-xz.