ffalt / xlsx-extract Goto Github PK
View Code? Open in Web Editor NEWnodejs lib for extracting data from XLSX files
License: MIT License
nodejs lib for extracting data from XLSX files
License: MIT License
Is there a way to stop the read until I have finished a preprocessing task?
new XLSX()
.extract("../excel_source/Software List_1.2.xlsm", { sheet_name: "Products", ignore_header: 0 }) // or sheet_name or sheet_nr
.on("sheet", function (sheet) {
console.log("sheet", sheet); //sheet is array [sheetname, sheetid, sheetnr]
})
.on("row", function (row) {
// here can we do a continue or something like that in order to read the next row?
Thanks
Open excel xlsx File, throw Error: end of central directory record signature not found
Hi!
How can get only the headers. I've many file with 200K records but only need the headers name.
Thanks!
I have an application that receives an Excel from a streaming interface (AWS S3).
I would love a library that can
Take an Excel stream as input
Emit an event on each row (or batch of rows) for processing
xlsx-stream-reader does these things, but it is slow
Hi,
I'm looking for a excel-streamreader and found this is very nice.
But I also found the phonetic information is included in the output.
Is anyway to ignore it?
Sometimes lib produces empty string instead of actual strings. I can get right values by changing next lines
parser.on('text', function (txt) {
if (addvalue) {
cell.val = (cell.val ? cell.val : '') + txt;
}
});
with these
parser.on('text', function (txt) {
console.log(txt);
//if (addvalue) {
cell.val = (cell.val ? cell.val : '') + txt;
//}
});
You can look at file in attachment
CompetitiveReport_A.xlsx
I am importing a large data set with currency values up to the tenth decimal place. Currently the library simply disregards the extra digits, but I want an option to keep the extra precision. I will submit a PR with the option enabled
Hello!
I'm trying to use xlsx-extract on Windows 7
I'e intsall the package with the command:
D:\Business\Projects\Smart>npm install xlsx-extract --msvs_version=2013
The output is the following:
[email protected] install D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\node-expat
node-gyp rebuild
D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\node-expat>if not defined npm_config_node_gyp (node "C:\Program Files\nodejs\node_modules\npm\bin\node-gyp-bin
....\node_modules\node-gyp\bin\node-gyp.js" rebuild ) else (rebuild)
xmlparse.c
xmltok.c
xmlrole.c
......\deps\libexpat\lib\xmlparse.c(1844): warning C4244: 'return' : conversion from '__int64' to 'XML_Index', possible loss of data [D:\Business\Projects\Smart\node_modules\x
lsx-extract\node_modules\node-expat\build\deps\libexpat\expat.vcxproj]
expat.vcxproj -> D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\node-expat\build\Release\libexpat.lib
node-expat.cc
..\node-expat.cc(138): warning C4267: 'argument' : conversion from 'size_t' to 'int', possible loss of data [D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\no
de-expat\build\node_expat.vcxproj]
Creating library D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\node-expat\build\Release\node_expat.lib and object D:\Business\Projects\Smart\node_module
s\xlsx-extract\node_modules\node-expat\build\Release\node_expat.exp
Generating code
Finished generating code
node_expat.vcxproj -> D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\node-expat\build\Release\node_expat.node
[email protected] node_modules\xlsx-extract
[email protected] ([email protected])
[email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected])
[email protected] ([email protected], [email protected], [email protected])
So, no errors.
But once I've laucnhed my application I get the error:
D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\node-expat\node_modules\bindings\bindings.js:83
throw e
^
Error: Cannot resolve file or directory D:\Business\Projects\Smart\node_modules\xlsx-extract\node_modules\node-expat\build\node_expat.node in D:\Business\Projects\Smart\node_mod
ules\xlsx-extract\node_modules\node-expat\node_modules\bindings
at D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:63:31
at D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:166:15
at D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:59:4
at D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\node_modules\tapable\lib\Tapable.js:134:6
at Tapable. (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\DirectoryDefaultFilePlugin.js:16:51)
at Storage.provide (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\CachedInputFileSystem.js:52:20)
at CachedInputFileSystem.stat (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\CachedInputFileSystem.js:126:20)
at Tapable. (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\DirectoryDefaultFilePlugin.js:15:6)
at Tapable.applyPluginsParallelBailResult (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\node_modules\tapable\lib\Tapable.js:139:14)
at Tapable.<anonymous> (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:57:8)
at Tapable.Resolver.forEachBail (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:171:3)
at Tapable.doResolve (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:56:7)
at Tapable.resolve (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:41:14)
at Tapable.resolveSync (D:\Business\Projects\Smart\node_modules\enhanced-require\node_modules\enhanced-resolve\lib\Resolver.js:16:7)
at ModuleFactory.createSync (D:\Business\Projects\Smart\node_modules\enhanced-require\lib\ModuleFactory.js:50:40)
at RequireContext.theRequire (D:\Business\Projects\Smart\node_modules\enhanced-require\lib\RequireContext.js:114:40)
Please, advise how to fix that.
Thank you.
I have a use-case for parsing xlsx documents which are zipped with STORE compression (i.e. uncompressed).
node-unzip does not support this, but node-unzip-2 does.
node-unzip-2 is a fork of the former, and it's hot-swappable.
Not an issue, so much as hoping to start a discussion.
I'm currently trying to use xlsx-extract
in a context where native node modules just don't seem to be an option (azure's aws lambda knockoff, but the in browser would be another example). As a result the dependency on node-expat
is problematic.
I've made changes (see my fork) to replace the node-expat
dependency with sax
(pure js). It looks like I'm not the first person to try this.
Given that sax is significantly slower than expat it doesn't seem like just merging these changes back to xlsx-extract
is good idea, but it would be nice to unify the code bases somewhat.
I think that something along the lines of xlsx-extract-core
(BYO xml parser), then xlsx-extract
(depends on
xlsx-extract-core
, uses node-expat
, no name change for backward compatibility) & xlsx-extract-sax
(depends on
xlsx-extract-core
, uses sax
, although the name is already taken on npm), would be a nice place to get to, but might need a little work to set up an adapter pattern.
Is this something you'd be interesting in taking if I do the initial leg work?
@
We have the following snippet in our codebase
new XLSX()
.extract(inFile, { sheet_name: mapping.sheetName })
.on('row', row => writer.write(row))
.on('error', error => {
console.error(error);
process.exit(1);
})
.on('end', () => {
// Xlsx-extract might send end-event before its done reading rows.
// wait one second beofre assuming that prosessing is done.
// If [Error: write after end] is observed during import, it's probably
// writer.end() being run too soon
setTimeout(() => {
console.log('OK');
writer.end();
}, 1500);
});
This is in order to minimize the amount of the following error:
events.js:141
throw er; // Unhandled 'error' event
^
Error: write after end
at writeAfterEnd (_stream_writable.js:159:12)
at WriteStream.Writable.write (_stream_writable.js:204:5)
at EsBulkImportWriter.writeln (/Users/tomasfagerbekk/Repos/HO21-datakilder/utilities/dist/EsBulkImportWriter.js:78:31)
at EsBulkImportWriter.write (/Users/tomasfagerbekk/Repos/HO21-datakilder/utilities/dist/EsBulkImportWriter.js:71:22)
at XLSX.<anonymous> (/Users/tomasfagerbekk/Repos/HO21-datakilder/utilities/dist/bin/preparebulk.js:36:27)
at emitOne (events.js:77:13)
at XLSX.emit (events.js:169:7)
It seems to us that 'row' events are being emitted before 'end' event in some cases (we do large bulk import). Do you know how we can fix this without the ugly workaround (which sometimes still break), or where we might have gone wrong?
Cell-values with %-char always return as 0.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.