Coder Social home page Coder Social logo

simple_xlsx_reader's Introduction

SimpleXlsxReader

A fast xlsx reader for Ruby that parses xlsx cell values into plain ruby primitives and dates/times.

This is not a rewrite of excel in Ruby. Font styles, for example, are parsed to determine whether a cell is a number or a date, then forgotten. We just want to get the data, and get out!

Summary (now with stream parsing):

doc = SimpleXlsxReader.open('/path/to/workbook.xlsx')
doc.sheets # => [<#SXR::Sheet>, ...]
doc.sheets.first.name # 'Sheet1'
rows = doc.sheet.first.rows # <SXR::Document::RowsProxy>
rows.each # an <Enumerator> ready to chain or stream
rows.each {} # Streams the rows to your block
rows.each(headers: true) {} # Streams row-hashes
rows.each(headers: {id: /ID/}) {} # finds & maps headers, streams
rows.slurp # Slurps rows into memory as a 2D array

That's the gist of it!

See also the Document object.

Why?

Accurate

This project was started years ago, primarily because other Ruby xlsx parsers didn't import data with the correct types. Numbers as strings, dates as numbers, hyperlinks with inaccessible URLs, or - subtly buggy - simple dates as DateTime objects. If your app uses a timezone offset, depending on what timezone and what time of day you load the xlsx file, your dates might end up a day off! SimpleXlsxReader understands all these correctly.

Idiomatic

Many Ruby xlsx parsers seem to be inspired more by Excel than Ruby, frankly. SimpleXlsxReader strives to be fairly idiomatic Ruby:

# quick example having fun w/ ruby
doc = SimpleXlsxReader.open(path_or_io)
doc.sheets.first.rows.each(headers: {id: /ID/})
  .with_index.with_object({}) do |(row, index), acc|
    acc[row[:id]] = index
end

Now faster

Finally, as of v2.0, SimpleXlsxReader is the fastest and most memory-efficient parser. Previously this project couldn't reasonably load anything over ~10k rows. Other parsers could load 100k+ rows, but were still taking ~1gb RSS to do so, even "streaming," which seemed excessive. So a SAX implementation was born. See performance for details.

Usage

Streaming

SimpleXlsxReader is performant by default - If you use rows.each {|row| ...} it will stream the XLSX rows to your block without loading either the sheet XML or the full sheet data into memory.

You can also chain rows.each with other Enumerable functions without triggering a slurp, and you have lots of ways to find and map headers while streaming.

If you had an excel sheet representing this data:

| Hero ID | Hero Name  | Location     |
| 13576   | Samus Aran | Planet Zebes |
| 117     | John Halo  | Ring World   |
| 9704133 | Iron Man   | Planet Earth |

Get a handle on the rows proxy:

rows = SimpleXlsxReader.open('suited_heroes.xlsx').sheets.first.rows

Simple streaming (kinda boring):

rows.each { |row| ... }

Streaming with headers, and how about a little enumerable chaining:

# Map of hero names by ID: { 117 => 'John Halo', ... }

rows.each(headers: true).with_object({}) do |row, acc|
  acc[row['Hero ID']] = row['Hero Name']
end

Sometimes though you have some junk at the top of your spreadsheet:

| Unofficial Report  |                        |              |
| Dont tell Nintendo | Yes "John Halo" I know |              |
|                    |                        |              |
| Hero ID            | Hero Name              | Location     |
| 13576              | Samus Aran             | Planet Zebes |
| 117                | John Halo              | Ring World   |
| 9704133            | Iron Man               | Planet Earth |

For this, headers can be a hash whose keys replace headers and whose values help find the correct header row:

# Same map of hero names by ID: { 117 => 'John Halo', ... }

rows.each(headers: {id: /ID/, name: /Name/}).with_object({}) do |row, acc|
  acc[row[:id]] = row[:name]
end

If your header-to-attribute mapping is more complicated than key/value, you can do the mapping elsewhere, but use a block to find the header row:

# Example roughly analogous to some production code mapping a single spreadsheet
# across many objects. Might be a simpler way now that we have the headers-hash
# feature.

object_map = { Hero => { id: 'Hero ID', name: 'Hero Name', location: 'Location' } }

HEADERS = ['Hero ID', 'Hero Name', 'Location']

rows.each(headers: ->(row) { (HEADERS & row).any? }) do |row|
  object_map.each_pair do |klass, attribute_map|
    attributes =
      attribute_map.each_pair.with_object({}) do |(key, header), attrs|
        attrs[key] = row[header]
      end

    klass.new(attributes)
  end
end

Slurping

To make SimpleXlsxReader rows act like an array, for use with legacy SimpleXlsxReader apps or otherwise, we still support slurping the whole array into memory. The good news is even when doing this, the xlsx worksheet & shared string files are never loaded as a (big) Nokogiri doc, so that's nice.

By default, to prevent accidental slurping, <RowsProxy> will throw an exception if you try to access it with array methods like [] and shift without explicitly slurping first. You can slurp either by calling rows.slurp or globally by setting SimpleXlsxReader.configuration.auto_slurp = true.

Once slurped, enumerable methods on rows will use the slurped data (i.e. not re-parse the sheet), and those Array-like methods will work.

We don't support all Array methods, just the few we have used in real projects, as we transition towards streaming instead.

Load Errors

By default, cell load errors (ex. if a date cell contains the string 'hello') result in a SimpleXlsxReader::CellLoadError.

If you would like to provide better error feedback to your users, you can set SimpleXlsxReader.configuration.catch_cell_load_errors = true, and load errors will instead be inserted into Sheet#load_errors keyed by [rownum, colnum]:

{
  [rownum, colnum] => '[error]'
}

Performance

SimpleXlsxReader is (as of this writing) the fastest and most memory efficient Ruby xlsx parser.

Recent updates here have focused on large spreadsheets with especially non-unique strings in sheets using xlsx' shared strings feature (Excel-generated spreadsheets always use this). Other projects have implemented streaming parsers for the sheet data, but currently none stream while loading the shared strings file, which is the second-largest file in an xlsx archive and can represent millions of strings in large files.

For more details, see my fork of @shkm's excel benchmark project, but here's the summary:

1mb excel file, 10,000 rows of sample "sales records" with a fair amount of non-unique strings (ran on an M1 Macbook Pro):

Gem Parses/second RSS Increase Allocated Mem Retained Mem Allocated Objects Retained Objects
simple_xlsx_reader 1.13 36.94mb 614.51mb 1.13kb 8796275 3
roo 0.75 74.0mb 164.47mb 2.18kb 2128396 4
creek 0.65 107.55mb 581.38mb 3.3kb 7240760 16
xsv 0.61 75.66mb 2127.42mb 3.66kb 5922563 10
rubyxl 0.27 373.52mb 716.7mb 2.18kb 10612577 4

Here is a benchmark for the "worst" file I've seen, a 26mb file whose shared strings represent 10% of the archive (note, MemoryProfiler has too much overhead to reasonably measure allocations so that analysis was left off, and we just measure total time for one parse):

Gem Time RSS Increase
simple_xlsx_reader 28.71s 148.77mb
roo 40.25s 1322.08mb
xsv 45.82s 391.27mb
creek 60.63s 886.81mb
rubyxl 238.68s 9136.3mb

Installation

Add this line to your application's Gemfile:

gem 'simple_xlsx_reader'

And then execute:

$ bundle

Or install it yourself as:

$ gem install simple_xlsx_reader

Versioning

This project follows semantic versioning 1.0

Contributing

Remember to write tests, think about edge cases, and run the existing suite.

The full suite contains a performance test that on an M1 MBP runs the final large file in about five seconds. Check out that test before & after your change to check for performance changes.

Then, the standard stuff:

  1. Fork this project
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

simple_xlsx_reader's People

Contributors

bforma avatar bwlang avatar codemole avatar dlackty avatar eritiro avatar kalsan avatar marilynfranklin avatar olleolleolle avatar quoideneuf avatar robbevp avatar rodrigoserranodekuple avatar strnadj avatar taichi-ishitani avatar til avatar woahdae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simple_xlsx_reader's Issues

need skip type casting

some cell is needed original string.

in my case cell data is floating number(currency style), type is nil, but style is fixnum.
this case cell data is casting to integer.

optional skip type casting(not shared string) will be useful another case.
how do you think about this?

disappear string for long text

original text
皮革制品 - 皮革制品需避免接触水分(雨水,水,汗液等),油,湿气及阳光直射。佩戴及携带时,可能与其它色彩的产品摩擦引起移染,因此与其它产品(特别是 JEANS 等服装)接触时敬请注意。 - 若不小心弄湿,请立即除去水渍,将商品的损伤最小化。 - 天然皮革表面上的伤痕及斑点,并不是缺陷而是使用天然素材的证据。 - 根据素材特性,因使用会产生磨耗或掉色,此情况与产品的耐久性无关。 - 浅色材质,易发生被外部深色物体染色的可能性,使用时请多加注意。 丝巾、皮带及饰品没有专门可以维修的部分,请多加注意。

->

read by simple_xlsx_reader
请多加注意。 丝巾、皮带及饰品没有专门可以维修的部分,请多加注意。

how to fix?

Percentage 100 times too small

Using the linked to XLSX, generated in Excel for Mac, the percentages are being cast as 100 times too small.

http://flmisc.s3.amazonaws.com/multiple_field_type.xlsx

The result is
= #SimpleXlsxReader::Document:0x000001012a55d8 @file_path="Dropbox/clients/oslo_data/multiple_field_type.xlsx", @sheets=[#struct SimpleXlsxReader::Document::Sheet name="Sheet1", rows=[["Name", "Class", "Score % ", "Score Actual", "Salary", "My vals", nil], ["Ben", "A", 0.0087, "98", "300", 0.0087, nil], ["Oskar", "A", 0.005600000000000001, "78", "500", 0.005600000000000001, nil], ["Jasmine", "B", 0.006999999999999999, "87", "350.2", 0.006999999999999999, nil]]]

The 0.0087 should be 0.87

It appears that the division by 100 in the casting by format is not required?

Is this a Mac only issue or a general one?

Thanks, Ben

Should not treat all cells with the "Generic" style as strings

Hi!

I've realized that, when reading the contents of a cell, if the cell's type is missing and the cell's style is "Generic", the gem treats the cell's value as if it was a string. Line 874 in test/simple_xlsx_reader_test.rb says it all: it "reads 'Generic' cells as strings" do.

This is actually a problem: when reading a spreadsheet with numeric values that have no type and a "Generic" style, the gem does not convert them to numbers, but leaves them as strings. This happens, for example, if you create a spreadsheet (with just numbers and no formatting) in Google Sheets and download it as an .xlsx file. If you try to read the file with the gem, you'll see that all the numbers are read as if they were strings.

I have created a (currently failing) test to show the behavior: https://github.com/jrodrigosm/simple_xlsx_reader/tree/treat-typeless-cells-as-numbers-by-default . This is the diff:

diff --git a/test/simple_xlsx_reader_test.rb b/test/simple_xlsx_reader_test.rb
index 20724cd..cc159dc 100644
--- a/test/simple_xlsx_reader_test.rb
+++ b/test/simple_xlsx_reader_test.rb
@@ -818,6 +818,10 @@ describe SimpleXlsxReader do
                 <c r='I1' s='0'>
                   <v>GUI-made hyperlink</v>
                 </c>
+
+                <c r='J1' s='0'>
+                  <v>1</v>
+                </c>
               </row>
             </sheetData>
 
@@ -916,6 +920,10 @@ describe SimpleXlsxReader do
         )
       )
     end
+    
+    it "reads 'Generic' cells with numbers as numbers" do
+      _(@row[9]).must_equal 1
+    end
   end
 
   describe 'parsing documents with blank rows' do

I think the gem might be interpreting the "Generic" style incorrectly. According to the standard, if the cell's type is missing, it should be treated as a number by default (see section 18.18.11 of the ECMA-376 standard, and see also line 3918 in section A.2). The standard also says that a "Generic" format can perfectly be applied to numbers (section 18.8.30).

I think the gem's behavior should probably be tweaked so that, if a cell's type is missing (which means, by default, that it is a number), and its style is "Generic", the cell value is NOT parsed as a string, but as a number (Integer, Float) or a date.

Would you agree with this interpretation?

If so, would you accept a PR?

ArgumentError: path name contains null byte on v2.x

After updating to the version v2.0.0 I'm receiving the following error on my RSpec test when calling the SimpleXlsxReader.parse:

RSpec.describe FooController do
  render_views

  it 'exports' do
    get :export, format: :xlsx

    SimpleXlsxReader.parse(response.body)
  end
end
ArgumentError: path name contains null byte

Error when opening .xlsx files

I hope this message finds you well. Firstly, I'd like to express my gratitude for the efforts put into maintaining the SimpleXlsxReader gem – it's been a valuable tool for many.

I've come across an issue while utilizing the gem that I believe needs your attention. The problem arises when I attempt to run the following code snippet:

doc = SimpleXlsxReader.open("tmp/filename.xlsx")

This line of code triggers an error message: RuntimeError: Zip::ZipFile.open_buffer expects an argument of class String or IO. Found: File. After investigating the source of the issue, it seems that the root cause lies within the loader.rb file, precisely at this line.

My solution to this problem was to modify the line SimpleXlsxReader::Zip.open_buffer(string_or_io) to SimpleXlsxReader::Zip.open(string_or_io). This change effectively resolves the issue and allows the code to run smoothly.

I believe this small tweak could prevent similar errors for other users as well. I wanted to bring this to your attention so that it can be reviewed and addressed appropriately.

Zlib error ('buffer error') while inflating when reading from string

To reproduce:

  • Change the load from string test to actually load from string: #57

Workaround:

I haven't had time to dig into the issue. However commenting the following lines in lib/simple_xlsx_reader/loader/sheet_parser.rb appears to workaround the issue.

if xrels_file&.grep(/hyperlink/)&.any?
  xrels_file.rewind
  load_gui_hyperlinks # represented as hyperlinks_by_cell
end

Parsing empty cell causes an exception

I know it says that since version 0.9.1 the problem with parsing empty cells has been solved. However, this being my first time using the simple_xlsx_reader gem and the very first problem I've encountered is that it bombs out on row 0, column 2 (3rd column) of my spreadsheet.

The only thing I can say about it is that my data is "irregular" in that the rows at the top of the spreadsheet have fewer columns than subsequent rows. Sort of like this:

| A | B | C | D | E | F | <---- columns in spreadsheet
1 | X | X | <---- row 1 ('X' marks where I have data)
2 | X |
3
4 | X | X | X | X | X | X |
5 | X | X | X | X | X | X |
6 | X | X | X | X | X | X |
7
8 | X | X |
9 | X |

etc. You get the idea.

I will probably be cloning and fixing it for myself. I'd be happy to give you the changes I make, but I haven't ever done a pull request or whatever, so if you'd help guide me or I can just tell you what I changed, or whatever works.

I just need to get it working.

Unable to find a header

So for some reason, this works

xlsx.sheets.first.rows.each(headers: {last_name: /last name|nom/i, first_name: /first name|prénom|prenom/i, email: /email/i})

I get my header[:first_name]

but this doesn't :

xlsx.sheets.first.rows.each(headers: {first_name: /first name|prénom|prenom/i, last_name: /last name|nom/i, email: /email/i})

It cannot find prénom or prenom for some reason.

Parse as string

In my sheet I have some value than is transformed in date or number.
How can I force all to be parsed as a string?

Performance vs RubyXL

I think your gem is much cleaner and than RubyXL. Furthermore, it parses dates correctly. Only the performance is much behind RubyXL. I parsed 6000 lines, 11 columns and got:

RubyXL: 8s
SimpleXlsx: 35s

Is there any hint, to speedup parsing?

Thanks a lot,
Joachim

NoMethodError: undefined method 'at_xpath' for nil:NilClass

Hi,
I have some troubles while parsing a file since yesterday. The file opens well on Microsoft Excel.
Should a new version of Microsoft Excel cause that bug?

NoMethodError: undefined method `at_xpath' for nil:NilClass
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:192:in `last_cell_label'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:204:in `sheet_dimensions'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:127:in `parse_sheet'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:109:in `block in load_sheets'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `each'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `each_with_index'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `each'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `map'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:108:in `load_sheets'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:40:in `sheets'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:29:in `tap'
/Users/me/.rvm/gems/ruby-2.2.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:29:in `open'

This bug appears when I try to parse the file BulletinSearch.xlsx from http://www.microsoft.com/en-us/download/details.aspx?id=36982

Thanks for help

Empty cells should return "nil"?

In case there is an empty cell, an exception is raised. I can disable this with
SimpleXlsxReader.configuration.catch_cell_load_errors = true. Then I get an empty string on an empty cell.

I am not sure, if this is the right way. Why not just return "nil" on empty cells?

Thanks a lot,
Joachim

Memory Consumption Too High

Hi,

when I read from a 2.1mb excel file with approx 65k rows, the read process allocates about 500mb of RAM. Pretty high for such a small file. Could the be reduced? Maybe by writing tmp files from the zip to disk or using a streaming parser?

Interface to read from an IO object

I have an XLSX document already in memory (I'm testing a web controller than renders a spreadsheet). To parse it, I currently need to write it out to disk then pass the filename into SimpleXlsxReader.open. I'd like to be able to just parse it directly from memory.

Memory Leak

I believe there is a memory leak. The test I did is describe below.

s = SimpleXlsxReader.open('big.xlsx')
s = nil
GC.start

The memory consuption was ~2,000MB before the open. With the file open, it went to ~3,400MB. After the garbage collector, it went to ~3,200MB

Default rake task fails

When I run rake with no arguments, i.e. trying to run default rake task, which in your case runs all tests, I get errors:

➜  simple_xlsx_reader git:(rubyzip-1-0-compat) rake
Warning: you should require 'minitest/autorun' instead.
Warning: or add 'gem "minitest"' before 'require "minitest/autorun"'
From:
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/minitest-5.0.6/lib/minitest/spec.rb:3:in `<top (required)>'
  /Users/valentin/Projects/personal/simple_xlsx_reader/test/test_helper.rb:2:in `<top (required)>'
  /Users/valentin/Projects/personal/simple_xlsx_reader/test/performance_test.rb:1:in `<top (required)>'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `each'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `block in <main>'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `select'
  /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `<main>'
/Users/valentin/Projects/personal/simple_xlsx_reader/test/performance_test.rb:99:in `block in <top (required)>': undefined method `bench_performance_linear' for SimpleXlsxReader:Class (NoMethodError)
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/minitest-5.0.6/lib/minitest/spec.rb:70:in `class_eval'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/minitest-5.0.6/lib/minitest/spec.rb:70:in `describe'
    from /Users/valentin/Projects/personal/simple_xlsx_reader/test/performance_test.rb:4:in `<top (required)>'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `each'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:9:in `block in <main>'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `select'
    from /Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb:4:in `<main>'
rake aborted!
Command failed with status (1): [ruby -I"lib:test" -I"/Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib" "/Users/valentin/.rbenv/versions/1.9.3-p448/lib/ruby/gems/1.9.1/gems/rake-10.1.0/lib/rake/rake_test_loader.rb" "test/**/*_test.rb" ]

Tasks: TOP => default => test
(See full trace by running task with --trace)

Please update the README with details of how to properly run your tests.

NoMethodError: undefined method `parse' for nil:NilClass / sheet_parser=nil

Trying to parse this: https://www.ercot.com/files/docs/2023/02/07/IntGenbyFuel2023.xlsx

Mirrored, as ercot.com geoblocks at least Europe: IntGenbyFuel2023.xlsx

doc = SimpleXlsxReader.open('IntGenbyFuel2023.xlsx')
rows = doc.sheets.first.rows
=> #<SimpleXlsxReader::Document::RowsProxy:0x00007f3df5bb2c60 @load_errors={}, @sheet_parser=nil, @slurped=nil>
rows.first
NoMethodError: undefined method `parse' for nil:NilClass
from /home/morgan/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/simple_xlsx_reader-5.0.0/lib/simple_xlsx_reader/document.rb:97:in `each'

@sheet_parser is nil for all sheets

Decimal Number Rounding Issue

Hi @woahdae,
I recently found an issue when trying to read an excel file that contains several of float numbers as below
Screen Shot 2020-04-01 at 7 51 00 pm.

Issue found:
Some of the numbers below the 'use General format' row will have incorrect format when i call the reader. However, when i applied 'Accounting' format, the numbers below 'use Accounting' format' row has been imported correctly.
Here is the result that I have printed out.

Screen Shot 2020-04-01 at 8 23 27 pm

Expected behaviour: the imported float number values should be the same as the one in excel sheet.

My implementation:

    filepath = "#{Rails.root}/app/assets/Book1.xlsx"
    
    doc = SimpleXlsxReader.open(filepath)
    puts doc.sheets.first.name
    puts doc.sheets.first.rows

Ruby version: 2.6.3
Rails version: 5.2.0

Unable to read the exported xlsx file extension excel file

Hi,

Would like to ask an issue whereby I used a tool called GrapeCity.Spreadbuilder.Workbook to export out excel files but unable to read through ruby. It gives the following error.
C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/simple_xlsx_reader-1.0.2/lib/simple_xlsx_reader.rb:193:in last_cell_label': undefined method at_xpath' for nil:NilClass (NoMethodError)
Did you mean? at_exit

I found out that the XML info needed is missing, in this case is the dimension info.
Is there a solution I can refer to solve this?

Sample link for reference on how I write my code to export
http://helpcentral.componentone.com/netHelp/AR9/GrapeCity.ActiveReports.Export.Excel.v9~GrapeCity.SpreadBuilder.Workbook.html

Thanks.

Documentation usage wrong

In the Idiomatic section, the documentation specifies:
doc = SimpleXlsxReader.open(path_or_io)
but looking into the source code the method that receives io is parse

Thanks for the amazing gem

Fatally high memory usage when opening large spreadsheets

Repro steps:

  1. Create a simple workbook with several worksheets, and around a million total rows. The resultant .xlsx file is around 27MiB in size.

  2. Open it with the following code:

require 'simple_xlsx_reader'
file = File.open('large.xlsx')
doc = SimpleXlsxReader.open(file)
  1. Observe the Ruby process consumes all available RAM on the system (6.0GiB), then the system locks up necessitating a hard reboot. Presumably it's paging like mad as the drive light is constantly on.

Details:

ruby 2.2.1p85 (2015-02-26 revision 49769) [x86_64-linux]
simple_xlsx_reader (1.0.2)
nokogiri (1.6.6.2)
rubyzip (1.1.7)

Rounding

I have a spreadsheet that contains float values formatted as integers. When I read those values though simple_xlsx_reader it returns integers instead of floats, but that's not the real problem, the thing is that it rounds differently.

Any advice? Maybe a way of force simple_xlsx_reader to always read numbers as float.

simple_xlsx_reader output:
139702
35519
3488

REF!

113544
120874
92441
10262
11017
13339
4923

captura de tela 2014-10-16 as 11 34 55

No way to open a XLSX locked by password

Hi, I just wondering if there is a way to open a xlsx spreadsheet which is locked by password.
I even have the password to open that file but I can't find a method in this gem that I could use to open the file.
Something like: SimpleXlsxReader.open(xlsx_filename, password)

There is something to workaround this use case.

enumerator.reverse with ruby 1.9.3

I have a ruby 1.9.3 (windows) install. This gem errors out in the following method:

      def column_letter_to_number(column_letter)
        pow = -1
        column_letter.codepoints.reverse.inject(0) do |acc, charcode|
          pow += 1
          acc + 26**pow * (charcode - 64)
        end
      end

For 1.9.3, the iterator needs to have to_a, as in column_letter.codepoints.to_a.reverse.inject(0). Once that is in place, all is good with the tests.

If simple_xlsx_reader is meant for ruby 2+, then I didn't get that from any dependencies. Sorry.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.