Coder Social home page Coder Social logo

Comments (8)

levilansing avatar levilansing commented on July 26, 2024 2

For kicks I tried to read a 125MB xlsx file (1 sheet, plain data, no formatting). after about a minute and 18GB of ram allocated, I got a OOM error from Nokogiri.

from simple_xlsx_reader.

woahdae avatar woahdae commented on July 26, 2024 1

This has been addressed in https://github.com/woahdae/simple_xlsx_reader/tree/2.0.0-pre, I'd love feedback on the changes if anyone here still uses this gem.

from simple_xlsx_reader.

woahdae avatar woahdae commented on July 26, 2024 1

Fixed in 2.0, just released.

from simple_xlsx_reader.

EtienneDepaulis avatar EtienneDepaulis commented on July 26, 2024

@chbach there is an other gem which does that: https://github.com/weshatheleopard/rubyX
It has the same dependencies however we found their API hard to use for our simple usage (importing 100+lines Excel files)

from simple_xlsx_reader.

chbach avatar chbach commented on July 26, 2024

Thanks, I’ll give it a shot. However, I like the simplicity and API of this gem.

from simple_xlsx_reader.

EtienneDepaulis avatar EtienneDepaulis commented on July 26, 2024

This is exactly why we switched to this gem this morning :)

from simple_xlsx_reader.

woahdae avatar woahdae commented on July 26, 2024

I'd love to make it more memory efficient (or accept a pull request to that
effect!) but there are some inherent tradeoffs WRT excel vs say CSV. I'll
check out the linked gem and see how it works, but to address a couple
issues:

  1. xlsx is an archive format using cross-referenced xml files, and ruby
    strings aren't small. Once unarchived and represented as nokogiri nodes,
    and given ruby's memory allocation strategy that tends to double itself
    when more memory is needed, I'm not shocked a 2mb file can represent big
    in-memory data.

  2. xlsx cross-references a few internal files to represent a sheet, most
    notably the "shared strings," which makes a streaming parser non-obvious in
    terms of implementation.

One win could be to switch to ox in
general, and especially to use its SAX callback api for the main sheet. I
think we'd still have to load shared strings into memory though.

On Thursday, April 28, 2016, Etienne Depaulis [email protected]
wrote:

This is exactly why we switched to this gem this morning :)


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#25 (comment)

Sent from my phone

from simple_xlsx_reader.

darkseid avatar darkseid commented on July 26, 2024

Guys, I'm having the same issue: A 96MB xlsx file (only plain data) is allocating about 20GB of RAM and then I got the same OOM error from Nokogiri: "Nokogiri::XML::XPath::SyntaxError: Memory allocation failed : growing nodeset hit limit: growing nodeset hit limit" Any advice ? I'm considering move to the good and old CSV format...

from simple_xlsx_reader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.