Comments (8)
For kicks I tried to read a 125MB xlsx file (1 sheet, plain data, no formatting). after about a minute and 18GB of ram allocated, I got a OOM error from Nokogiri.
from simple_xlsx_reader.
This has been addressed in https://github.com/woahdae/simple_xlsx_reader/tree/2.0.0-pre, I'd love feedback on the changes if anyone here still uses this gem.
from simple_xlsx_reader.
Fixed in 2.0, just released.
from simple_xlsx_reader.
@chbach there is an other gem which does that: https://github.com/weshatheleopard/rubyX
It has the same dependencies however we found their API hard to use for our simple usage (importing 100+lines Excel files)
from simple_xlsx_reader.
Thanks, I’ll give it a shot. However, I like the simplicity and API of this gem.
from simple_xlsx_reader.
This is exactly why we switched to this gem this morning :)
from simple_xlsx_reader.
I'd love to make it more memory efficient (or accept a pull request to that
effect!) but there are some inherent tradeoffs WRT excel vs say CSV. I'll
check out the linked gem and see how it works, but to address a couple
issues:
-
xlsx is an archive format using cross-referenced xml files, and ruby
strings aren't small. Once unarchived and represented as nokogiri nodes,
and given ruby's memory allocation strategy that tends to double itself
when more memory is needed, I'm not shocked a 2mb file can represent big
in-memory data. -
xlsx cross-references a few internal files to represent a sheet, most
notably the "shared strings," which makes a streaming parser non-obvious in
terms of implementation.
One win could be to switch to ox in
general, and especially to use its SAX callback api for the main sheet. I
think we'd still have to load shared strings into memory though.
On Thursday, April 28, 2016, Etienne Depaulis [email protected]
wrote:
This is exactly why we switched to this gem this morning :)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#25 (comment)
Sent from my phone
from simple_xlsx_reader.
Guys, I'm having the same issue: A 96MB xlsx file (only plain data) is allocating about 20GB of RAM and then I got the same OOM error from Nokogiri: "Nokogiri::XML::XPath::SyntaxError: Memory allocation failed : growing nodeset hit limit: growing nodeset hit limit" Any advice ? I'm considering move to the good and old CSV format...
from simple_xlsx_reader.
Related Issues (20)
- Percentage 100 times too small HOT 2
- Fatally high memory usage when opening large spreadsheets HOT 3
- NoMethodError: undefined method 'at_xpath' for nil:NilClass HOT 7
- Memory Leak HOT 2
- Unable to read the exported xlsx file extension excel file HOT 5
- No way to open a XLSX locked by password HOT 4
- Decimal Number Rounding Issue HOT 1
- Use case-insensitive sheet matching HOT 1
- Parse as string HOT 1
- TypeError: {:keyword_init=>true} is not a symbol in new 2.0.0-pre HOT 2
- ArgumentError: path name contains null byte on v2.x HOT 9
- Special and long words, such as German words, are not supported in the new version. HOT 7
- Should not treat all cells with the "Generic" style as strings HOT 1
- Unable to find a header HOT 1
- disappear string for long text HOT 1
- Documentation usage wrong
- Error when opening .xlsx files HOT 2
- NoMethodError: undefined method `parse' for nil:NilClass / sheet_parser=nil
- Zlib error ('buffer error') while inflating when reading from string
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simple_xlsx_reader.