woahdae / simple_xlsx_reader Goto Github PK

Ruby xlsx reader to parse cell values into plain ruby primitives and dates/times.

License: MIT License

Ruby 100.00%

simple_xlsx_reader's Introduction

SimpleXlsxReader

A fast xlsx reader for Ruby that parses xlsx cell values into plain ruby primitives and dates/times.

This is not a rewrite of excel in Ruby. Font styles, for example, are parsed to determine whether a cell is a number or a date, then forgotten. We just want to get the data, and get out!

Summary (now with stream parsing):

doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
doc.sheets # => [<#SXR::Sheet>, ...]
doc.sheets.first.name # 'Sheet1'
rows = doc.sheet.first.rows # <SXR::Document::RowsProxy>
rows.each # an <Enumerator> ready to chain or stream
rows.each {} # Streams the rows to your block
rows.each(headers: true) {} # Streams row-hashes
rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
rows.slurp # Slurps rows into memory as a 2D array

That's the gist of it!

Why?

Accurate

This project was started years ago, primarily because other Ruby xlsx parsers didn't import data with the correct types. Numbers as strings, dates as numbers, hyperlinks with inaccessible URLs, or - subtly buggy - simple dates as DateTime objects. If your app uses a timezone offset, depending on what timezone and what time of day you load the xlsx file, your dates might end up a day off! SimpleXlsxReader understands all these correctly.

Idiomatic

Many Ruby xlsx parsers seem to be inspired more by Excel than Ruby, frankly. SimpleXlsxReader strives to be fairly idiomatic Ruby:

# quick example having fun w/ ruby
doc = SimpleXlsxReader.open(path_or_io)
doc.sheets.first.rows.each(headers: {id: /ID/})
  .with_index.with_object({}) do |(row, index), acc|
    acc[row[:id]] = index
end

Now faster

Finally, as of v2.0, SimpleXlsxReader is the fastest and most memory-efficient parser. Previously this project couldn't reasonably load anything over ~10k rows. Other parsers could load 100k+ rows, but were still taking ~1gb RSS to do so, even "streaming," which seemed excessive. So a SAX implementation was born. See performance for details.

Usage

Streaming

SimpleXlsxReader is performant by default - If you use rows.each {|row| ...} it will stream the XLSX rows to your block without loading either the sheet XML or the full sheet data into memory.

You can also chain rows.each with other Enumerable functions without triggering a slurp, and you have lots of ways to find and map headers while streaming.

If you had an excel sheet representing this data:

| Hero ID | Hero Name  | Location     |
| 13576   | Samus Aran | Planet Zebes |
| 117     | John Halo  | Ring World   |
| 9704133 | Iron Man   | Planet Earth |

Get a handle on the rows proxy:

rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows

Simple streaming (kinda boring):

rows.each { |row| ... }

Streaming with headers, and how about a little enumerable chaining:

# Map of hero names by ID: { 117 => 'John Halo', ... }

rows.each(headers: true).with_object({}) do |row, acc|
  acc[row['Hero ID']] = row['Hero Name']
end

Sometimes though you have some junk at the top of your spreadsheet:

| Unofficial Report  |                        |              |
| Dont tell Nintendo | Yes "John Halo" I know |              |
|                    |                        |              |
| Hero ID            | Hero Name              | Location     |
| 13576              | Samus Aran             | Planet Zebes |
| 117                | John Halo              | Ring World   |
| 9704133            | Iron Man               | Planet Earth |

For this, headers can be a hash whose keys replace headers and whose values help find the correct header row:

# Same map of hero names by ID: { 117 => 'John Halo', ... }

rows.each(headers: {id: /ID/, name: /Name/}).with_object({}) do |row, acc|
  acc[row[:id]] = row[:name]
end

If your header-to-attribute mapping is more complicated than key/value, you can do the mapping elsewhere, but use a block to find the header row:

# Example roughly analogous to some production code mapping a single spreadsheet
# across many objects. Might be a simpler way now that we have the headers-hash
# feature.

object_map = { Hero => { id: 'Hero ID', name: 'Hero Name', location: 'Location' } }

HEADERS = ['Hero ID', 'Hero Name', 'Location']

rows.each(headers: ->(row) { (HEADERS & row).any? }) do |row|
  object_map.each_pair do |klass, attribute_map|
    attributes =
      attribute_map.each_pair.with_object({}) do |(key, header), attrs|
        attrs[key] = row[header]
      end

    klass.new(attributes)
  end
end

Slurping

To make SimpleXlsxReader rows act like an array, for use with legacy SimpleXlsxReader apps or otherwise, we still support slurping the whole array into memory. The good news is even when doing this, the xlsx worksheet & shared string files are never loaded as a (big) Nokogiri doc, so that's nice.

By default, to prevent accidental slurping, <RowsProxy> will throw an exception if you try to access it with array methods like [] and shift without explicitly slurping first. You can slurp either by calling rows.slurp or globally by setting SimpleXlsxReader.configuration.auto_slurp = true.

Once slurped, enumerable methods on rows will use the slurped data (i.e. not re-parse the sheet), and those Array-like methods will work.

We don't support all Array methods, just the few we have used in real projects, as we transition towards streaming instead.

Load Errors

By default, cell load errors (ex. if a date cell contains the string 'hello') result in a SimpleXlsxReader::CellLoadError.

If you would like to provide better error feedback to your users, you can set SimpleXlsxReader.configuration.catch_cell_load_errors = true, and load errors will instead be inserted into Sheet#load_errors keyed by [rownum, colnum]:

{
  [rownum, colnum] => '[error]'
}

Performance

SimpleXlsxReader is (as of this writing) the fastest and most memory efficient Ruby xlsx parser.

Recent updates here have focused on large spreadsheets with especially non-unique strings in sheets using xlsx' shared strings feature (Excel-generated spreadsheets always use this). Other projects have implemented streaming parsers for the sheet data, but currently none stream while loading the shared strings file, which is the second-largest file in an xlsx archive and can represent millions of strings in large files.

For more details, see my fork of @shkm's excel benchmark project, but here's the summary:

1mb excel file, 10,000 rows of sample "sales records" with a fair amount of non-unique strings (ran on an M1 Macbook Pro):

Gem	Parses/second	RSS Increase	Allocated Mem	Retained Mem	Allocated Objects	Retained Objects
simple_xlsx_reader	1.13	36.94mb	614.51mb	1.13kb	8796275	3
roo	0.75	74.0mb	164.47mb	2.18kb	2128396	4
creek	0.65	107.55mb	581.38mb	3.3kb	7240760	16
xsv	0.61	75.66mb	2127.42mb	3.66kb	5922563	10
rubyxl	0.27	373.52mb	716.7mb	2.18kb	10612577	4

Here is a benchmark for the "worst" file I've seen, a 26mb file whose shared strings represent 10% of the archive (note, MemoryProfiler has too much overhead to reasonably measure allocations so that analysis was left off, and we just measure total time for one parse):

Gem	Time	RSS Increase
simple_xlsx_reader	28.71s	148.77mb
roo	40.25s	1322.08mb
xsv	45.82s	391.27mb
creek	60.63s	886.81mb
rubyxl	238.68s	9136.3mb

Installation

Add this line to your application's Gemfile:

gem 'simple_xlsx_reader'

And then execute:

$ bundle

Or install it yourself as:

$ gem install simple_xlsx_reader

Versioning

This project follows semantic versioning 1.0

Contributing

Remember to write tests, think about edge cases, and run the existing suite.

The full suite contains a performance test that on an M1 MBP runs the final large file in about five seconds. Check out that test before & after your change to check for performance changes.

Then, the standard stuff:

Fork this project
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

simple_xlsx_reader's People

Contributors

Stargazers

Watchers

Forkers

johnmcwilliams marktorrance timrauhut leadbaxter pk4media ebeigarts filiperocha quoideneuf dphase zilverline serg123e sibylus deltreey nebiolabs usawal eritiro seako missiondata netbrick raksonibs zeumo jamesadney angeloashmore brentmc79 sharesight blackli0nxx lsq gocarrot jayecao til msk-yv enpnetwork narnach reisbalans-tech rajanverma-me philippgoecke jrodrigosm robbevp kalrover enzymecorp vinosmith cafonso laurent-martin dlackty iq-scm codemole jgarlick bolandross hbriggs olleolleolle myabc

simple_xlsx_reader's Issues

need skip type casting

some cell is needed original string.

in my case cell data is floating number(currency style), type is nil, but style is fixnum.
this case cell data is casting to integer.

optional skip type casting(not shared string) will be useful another case.
how do you think about this?

disappear string for long text

original text
皮革制品 - 皮革制品需避免接触水分(雨水，水，汗液等)，油，湿气及阳光直射。佩戴及携带时，可能与其它色彩的产品摩擦引起移染，因此与其它产品(特别是 JEANS 等服装)接触时敬请注意。 - 若不小心弄湿，请立即除去水渍，将商品的损伤最小化。 - 天然皮革表面上的伤痕及斑点，并不是缺陷而是使用天然素材的证据。 - 根据素材特性，因使用会产生磨耗或掉色，此情况与产品的耐久性无关。 - 浅色材质，易发生被外部深色物体染色的可能性，使用时请多加注意。丝巾、皮带及饰品没有专门可以维修的部分，请多加注意。

read by simple_xlsx_reader
请多加注意。丝巾、皮带及饰品没有专门可以维修的部分，请多加注意。

how to fix?

sheet name and sheet mismatch when using ruby 1.8.7

Hash in 1.9 is enumerate in insertion order, but in 1.8.7 is not.

In SimpleXlsxReader::Document::Mapper::load_sheets

Percentage 100 times too small

Using the linked to XLSX, generated in Excel for Mac, the percentages are being cast as 100 times too small.

http://flmisc.s3.amazonaws.com/multiple_field_type.xlsx

The result is
= #SimpleXlsxReader::Document:0x000001012a55d8 @file_path="Dropbox/clients/oslo_data/multiple_field_type.xlsx", @sheets=[#struct SimpleXlsxReader::Document::Sheet name="Sheet1", rows=[["Name", "Class", "Score % ", "Score Actual", "Salary", "My vals", nil], ["Ben", "A", 0.0087, "98", "300", 0.0087, nil], ["Oskar", "A", 0.005600000000000001, "78", "500", 0.005600000000000001, nil], ["Jasmine", "B", 0.006999999999999999, "87", "350.2", 0.006999999999999999, nil]]]

The 0.0087 should be 0.87

It appears that the division by 100 in the casting by format is not required?

Is this a Mac only issue or a general one?

Thanks, Ben

TypeError: {:keyword_init=>true} is not a symbol in new 2.0.0-pre

simple_xlsx_reader/loader.rb:22:in `new': {:keyword_init=>true} is not a symbol (TypeError)

this is on ruby 2.4.5 - perhaps you can make the gem compatible for the upgrade-challenged?
something along these lines might help? alexford/twilito#4

Originally posted by @woahdae in #25 (comment)

Should not treat all cells with the "Generic" style as strings

Hi!

I've realized that, when reading the contents of a cell, if the cell's type is missing and the cell's style is "Generic", the gem treats the cell's value as if it was a string. Line 874 in test/simple_xlsx_reader_test.rb says it all: it "reads 'Generic' cells as strings" do.

This is actually a problem: when reading a spreadsheet with numeric values that have no type and a "Generic" style, the gem does not convert them to numbers, but leaves them as strings. This happens, for example, if you create a spreadsheet (with just numbers and no formatting) in Google Sheets and download it as an .xlsx file. If you try to read the file with the gem, you'll see that all the numbers are read as if they were strings.

I have created a (currently failing) test to show the behavior: https://github.com/jrodrigosm/simple_xlsx_reader/tree/treat-typeless-cells-as-numbers-by-default . This is the diff:

diff --git a/test/simple_xlsx_reader_test.rb b/test/simple_xlsx_reader_test.rb
index 20724cd..cc159dc 100644
--- a/test/simple_xlsx_reader_test.rb
+++ b/test/simple_xlsx_reader_test.rb
@@ -818,6 +818,10 @@ describe SimpleXlsxReader do
                 <c r='I1' s='0'>
                   <v>GUI-made hyperlink</v>
                 </c>
+
+                <c r='J1' s='0'>
+                  <v>1</v>
+                </c>
               </row>
             </sheetData>
 
@@ -916,6 +920,10 @@ describe SimpleXlsxReader do
         )
       )
     end
+    
+    it "reads 'Generic' cells with numbers as numbers" do
+      _(@row[9]).must_equal 1
+    end
   end
 
   describe 'parsing documents with blank rows' do

I think the gem might be interpreting the "Generic" style incorrectly. According to the standard, if the cell's type is missing, it should be treated as a number by default (see section 18.18.11 of the ECMA-376 standard, and see also line 3918 in section A.2). The standard also says that a "Generic" format can perfectly be applied to numbers (section 18.8.30).

I think the gem's behavior should probably be tweaked so that, if a cell's type is missing (which means, by default, that it is a number), and its style is "Generic", the cell value is NOT parsed as a string, but as a number (Integer, Float) or a date.

Would you agree with this interpretation?

If so, would you accept a PR?

Use case-insensitive sheet matching

Some XLSX creators create sheet names with a capital-S which causes the regex on:

https://github.com/woahdae/simple_xlsx_reader/blob/master/lib/simple_xlsx_reader.rb#L122

to fail. The solution is easy: just implement the /i regex flag like so:

entry.name.match(/^xl.worksheets.sheet([0-9]*)\.xml$/i

I could submit a PR if you want, but its a one-character change. For now I monkey-patched it.

ArgumentError: path name contains null byte on v2.x

After updating to the version v2.0.0 I'm receiving the following error on my RSpec test when calling the SimpleXlsxReader.parse:

RSpec.describe FooController do
  render_views

  it 'exports' do
    get :export, format: :xlsx

    SimpleXlsxReader.parse(response.body)
  end
end

ArgumentError: path name contains null byte

Error when opening .xlsx files

I hope this message finds you well. Firstly, I'd like to express my gratitude for the efforts put into maintaining the SimpleXlsxReader gem – it's been a valuable tool for many.

I've come across an issue while utilizing the gem that I believe needs your attention. The problem arises when I attempt to run the following code snippet:

doc = SimpleXlsxReader.open("tmp/filename.xlsx")

This line of code triggers an error message: RuntimeError: Zip::ZipFile.open_buffer expects an argument of class String or IO. Found: File. After investigating the source of the issue, it seems that the root cause lies within the loader.rb file, precisely at this line.

My solution to this problem was to modify the line SimpleXlsxReader::Zip.open_buffer(string_or_io) to SimpleXlsxReader::Zip.open(string_or_io). This change effectively resolves the issue and allows the code to run smoothly.

I believe this small tweak could prevent similar errors for other users as well. I wanted to bring this to your attention so that it can be reviewed and addressed appropriately.

Special and long words, such as German words, are not supported in the new version.

In the 1.0.5 version, we could read the column that has an object value, but in the new version, we can't read them.

Zlib error ('buffer error') while inflating when reading from string

To reproduce:

Change the load from string test to actually load from string: #57

Workaround:

I haven't had time to dig into the issue. However commenting the following lines in lib/simple_xlsx_reader/loader/sheet_parser.rb appears to workaround the issue.

if xrels_file&.grep(/hyperlink/)&.any?
  xrels_file.rewind
  load_gui_hyperlinks # represented as hyperlinks_by_cell
end

Parsing empty cell causes an exception

I know it says that since version 0.9.1 the problem with parsing empty cells has been solved. However, this being my first time using the simple_xlsx_reader gem and the very first problem I've encountered is that it bombs out on row 0, column 2 (3rd column) of my spreadsheet.

The only thing I can say about it is that my data is "irregular" in that the rows at the top of the spreadsheet have fewer columns than subsequent rows. Sort of like this:

| A | B | C | D | E | F | <---- columns in spreadsheet
1 | X | X | <---- row 1 ('X' marks where I have data)
2 | X |
3
4 | X | X | X | X | X | X |
5 | X | X | X | X | X | X |
6 | X | X | X | X | X | X |
7
8 | X | X |
9 | X |

etc. You get the idea.

I will probably be cloning and fixing it for myself. I'd be happy to give you the changes I make, but I haven't ever done a pull request or whatever, so if you'd help guide me or I can just tell you what I changed, or whatever works.

I just need to get it working.

Unable to find a header

So for some reason, this works

xlsx.sheets.first.rows.each(headers: {last_name: /last name|nom/i, first_name: /first name|prénom|prenom/i, email: /email/i})

I get my header[:first_name]

but this doesn't :

xlsx.sheets.first.rows.each(headers: {first_name: /first name|prénom|prenom/i, last_name: /last name|nom/i, email: /email/i})

It cannot find prénom or prenom for some reason.

Parse as string

In my sheet I have some value than is transformed in date or number.
How can I force all to be parsed as a string?

Performance vs RubyXL

I think your gem is much cleaner and than RubyXL. Furthermore, it parses dates correctly. Only the performance is much behind RubyXL. I parsed 6000 lines, 11 columns and got:

RubyXL: 8s
SimpleXlsx: 35s

Is there any hint, to speedup parsing?

Thanks a lot,
Joachim

NoMethodError: undefined method 'at_xpath' for nil:NilClass

Hi,
I have some troubles while parsing a file since yesterday. The file opens well on Microsoft Excel.
Should a new version of Microsoft Excel cause that bug?

NoMethodError: undefined method `at_xpath' for nil:NilClass
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:192:in `last_cell_label'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:204:in `sheet_dimensions'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:127:in `parse_sheet'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:109:in `block in load_sheets'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `each'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `each_with_index'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `each'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `map'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `load_sheets'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:40:in `sheets'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:29:in `tap'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:29:in `open'

This bug appears when I try to parse the file BulletinSearch.xlsx from http://www.microsoft.com/en-us/download/details.aspx?id=36982

Thanks for help

Empty cells should return "nil"?

In case there is an empty cell, an exception is raised. I can disable this with
SimpleXlsxReader.configuration.catch_cell_load_errors = true. Then I get an empty string on an empty cell.

I am not sure, if this is the right way. Why not just return "nil" on empty cells?

Thanks a lot,
Joachim

Memory Consumption Too High

Hi,

when I read from a 2.1mb excel file with approx 65k rows, the read process allocates about 500mb of RAM. Pretty high for such a small file. Could the be reduced? Maybe by writing tmp files from the zip to disk or using a streaming parser?

Interface to read from an IO object

I have an XLSX document already in memory (I'm testing a web controller than renders a spreadsheet). To parse it, I currently need to write it out to disk then pass the filename into SimpleXlsxReader.open. I'd like to be able to just parse it directly from memory.

Memory Leak

I believe there is a memory leak. The test I did is describe below.

s = SimpleXlsxReader.open('big.xlsx')
s = nil
GC.start

The memory consuption was ~2,000MB before the open. With the file open, it went to ~3,400MB. After the garbage collector, it went to ~3,200MB

Default rake task fails

When I run rake with no arguments, i.e. trying to run default rake task, which in your case runs all tests, I get errors:

➜  simple_xlsx_reader git:(rubyzip-1-0-compat) rake
Warning: you should require 'minitest/autorun' instead.
Warning: or add 'gem "minitest"' before 'require "minitest/autorun"'
From:
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/minitest-5.0.6/lib/minitest/spec.rb:3:in `<top (required)>'
  /Users/valentin/Projects/personal/simple_xlsx_reader/test/test_helper.rb:2:in `<top (required)>'
  /Users/valentin/Projects/personal/simple_xlsx_reader/test/performance_test.rb:1:in `<top (required)>'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `each'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `block in <main>'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `select'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `<main>'
/Users/valentin/Projects/personal/simple_xlsx_reader/test/performance_test.rb:99:in `block in <top (required)>': undefined method `bench_performance_linear' for SimpleXlsxReader:Class (NoMethodError)
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/minitest-5.0.6/lib/minitest/spec.rb:70:in `class_eval'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/minitest-5.0.6/lib/minitest/spec.rb:70:in `describe'
    from /Users/valentin/Projects/personal/simple_xlsx_reader/test/performance_test.rb:4:in `<top (required)>'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `each'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `block in <main>'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `select'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `<main>'
rake aborted!
Command failed with status (1): [ruby -I"lib:test" -I"/Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib" "/Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb" "test/**/*_test.rb" ]

Tasks: TOP => default => test
(See full trace by running task with --trace)

Please update the README with details of how to properly run your tests.

NoMethodError: undefined method `parse' for nil:NilClass / sheet_parser=nil

Trying to parse this: https://www.ercot.com/files/docs/2023/02/07/IntGenbyFuel2023.xlsx

Mirrored, as ercot.com geoblocks at least Europe: IntGenbyFuel2023.xlsx

doc = SimpleXlsxReader.open('IntGenbyFuel2023.xlsx')
rows = doc.sheets.first.rows
=> #<SimpleXlsxReader::Document::RowsProxy:0x00007f3df5bb2c60 @load_errors={}, @sheet_parser=nil, @slurped=nil>
rows.first
NoMethodError: undefined method `parse' for nil:NilClass
from /home/morgan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/simple_xlsx_reader-5.0.0/lib/simple_xlsx_reader/document.rb:97:in `each'

@sheet_parser is nil for all sheets

Decimal Number Rounding Issue

Hi @woahdae,
I recently found an issue when trying to read an excel file that contains several of float numbers as below
.

Issue found:
Some of the numbers below the 'use General format' row will have incorrect format when i call the reader. However, when i applied 'Accounting' format, the numbers below 'use Accounting' format' row has been imported correctly.
Here is the result that I have printed out.

Expected behaviour: the imported float number values should be the same as the one in excel sheet.

My implementation:

    filepath = "#{Rails.root}/app/assets/Book1.xlsx"
    
    doc = SimpleXlsxReader.open(filepath)
    puts doc.sheets.first.name
    puts doc.sheets.first.rows

Ruby version: 2.6.3
Rails version: 5.2.0

Unable to read the exported xlsx file extension excel file

Hi,

Would like to ask an issue whereby I used a tool called GrapeCity.Spreadbuilder.Workbook to export out excel files but unable to read through ruby. It gives the following error.
C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:193:in last_cell_label': undefined method at_xpath' for nil:NilClass (NoMethodError)
Did you mean? at_exit

I found out that the XML info needed is missing, in this case is the dimension info.
Is there a solution I can refer to solve this?

Sample link for reference on how I write my code to export
http://helpcentral.componentone.com/netHelp/AR9/GrapeCity.ActiveReports.Export.Excel.v9~GrapeCity.SpreadBuilder.Workbook.html

Thanks.

Compatibility with RubyZip 1.0 broken

it fails when trying to require zip/zip

Documentation usage wrong

In the Idiomatic section, the documentation specifies:
doc = SimpleXlsxReader.open(path_or_io)
but looking into the source code the method that receives io is parse

Thanks for the amazing gem

Fatally high memory usage when opening large spreadsheets

Repro steps:

Create a simple workbook with several worksheets, and around a million total rows. The resultant .xlsx file is around 27MiB in size.
Open it with the following code:

require 'simple_xlsx_reader'
file = File.open('large.xlsx')
doc = SimpleXlsxReader.open(file)

Observe the Ruby process consumes all available RAM on the system (6.0GiB), then the system locks up necessitating a hard reboot. Presumably it's paging like mad as the drive light is constantly on.

Details:

ruby 2.2.1p85 (2015-02-26 revision 49769) [x86_64-linux]
simple_xlsx_reader (1.0.2)
nokogiri (1.6.6.2)
rubyzip (1.1.7)

Rounding

I have a spreadsheet that contains float values formatted as integers. When I read those values though simple_xlsx_reader it returns integers instead of floats, but that's not the real problem, the thing is that it rounds differently.

Any advice? Maybe a way of force simple_xlsx_reader to always read numbers as float.

simple_xlsx_reader output:
139702
35519
3488

REF!

113544
120874
92441
10262
11017
13339
4923

No way to open a XLSX locked by password

Hi, I just wondering if there is a way to open a xlsx spreadsheet which is locked by password.
I even have the password to open that file but I can't find a method in this gem that I could use to open the file.
Something like: SimpleXlsxReader.open(xlsx_filename, password)

There is something to workaround this use case.

enumerator.reverse with ruby 1.9.3

I have a ruby 1.9.3 (windows) install. This gem errors out in the following method:

      def column_letter_to_number(column_letter)
        pow = -1
        column_letter.codepoints.reverse.inject(0) do |acc, charcode|
          pow += 1
          acc + 26**pow * (charcode - 64)
        end
      end

For 1.9.3, the iterator needs to have to_a, as in column_letter.codepoints.to_a.reverse.inject(0). Once that is in place, all is good with the tests.

If simple_xlsx_reader is meant for ruby 2+, then I didn't get that from any dependencies. Sorry.