skale-me / node-parquet Goto Github PK
View Code? Open in Web Editor NEWNodeJS module to access apache parquet format files
License: Apache License 2.0
NodeJS module to access apache parquet format files
License: Apache License 2.0
Would it be possible for ParquetWriter to write to a node stream? It looks to me like parquet-cpp supports various streams.
Installed and compiled on Amazon linux. Whenever I try to use int96 to store timestamp node crashes:
Using Node 8.x.x and 6.x.x
/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node[15317]: ../src/node_buffer.cc:220:char* node::Buffer::Data(v8::Localv8::Object): Assertion `obj->IsArrayBufferView()' failed.
1: node::Abort() [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
2: node::Assert(char const* const () [4]) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
3: node::Buffer::Length(v8::Localv8::Value) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
4: 0x7f2e12e39d41 [/home/ec2-user/bidder/node_modules/node-parquet/build/Release/parquet.node]
5: ParquetWriter::Write(Nan::FunctionCallbackInfov8::Value const&) [/home/ec2-user/bidder/node_modules/node-parquet/build/Release/parquet.node]
6: 0x7f2e12e39b57 [/home/ec2-user/bidder/node_modules/node-parquet/build/Release/parquet.node]
7: v8::internal::FunctionCallbackArguments::Call(void ()(v8::FunctionCallbackInfov8::Value const&)) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
8: 0xb43f48 [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
9: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) [/home/ec2-user/.nvm/versions/node/v8.1.4/bin/node]
10: 0x2efe9ab840bd
I'm having trouble building node-parquet on my MacOS system:
Darwin Apollo.local 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64
Everything appears to build fine until the node-gyp rebuild step:
> node-gyp rebuild
CXX(target) Release/obj.target/parquet/src/parquet_binding.o
In file included from ../src/parquet_binding.cc:3:
In file included from ../src/parquet_reader.h:8:
In file included from ../deps/parquet-cpp/src/parquet/api/reader.h:22:
../deps/parquet-cpp/src/parquet/column_reader.h:22:10: fatal error: 'cstdint' file not found
#include <cstdint>
^
1 error generated.
make: *** [Release/obj.target/parquet/src/parquet_binding.o] Error 1
gyp ERR! build error
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (~/.nvm/versions/node/v6.11.0/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:285:23)
gyp ERR! stack at emitTwo (events.js:106:13)
gyp ERR! stack at ChildProcess.emit (events.js:191:7)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
gyp ERR! System Darwin 16.7.0
gyp ERR! command "~/.nvm/versions/node/v6.11.0/bin/node" "~/.nvm/versions/node/v6.11.0/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd node-parquet
gyp ERR! node -v v6.11.0
gyp ERR! node-gyp -v v3.6.0
gyp ERR! not ok
npm ERR! code ELIFECYCLE
npm ERR! errno 1
I have g++, cmake, boost, and thrift all installed. I've even tried upgrading to newer versions of cmake and boost building from source, re-installing packages, and everything else I could think of.
StackOverflow seems to think this "cstdint" package is included in a "tr1" folder, and proposes a solution: https://stackoverflow.com/questions/10116724/clang-os-x-lion-cannot-find-cstdint
however, the proposed solution doesn't work for me either.
Any help getting this to build would be greatly appreciated.
First of all, thank you for creating a Node module that performs this very special task and for sharing it with us!
I've found two different conditions that cause the host program to hang indefinitely (in Node 8.9.0), never reporting an error or any hint of what the issue could be. I debugged these problems through a couple of arduous debug sessions:
null
/undefined
instead of empty Array indexes for optional fields.Each of these are not necessarily problems, especially if they're documented. The problem is that this module does not throw errors, instead it hangs.
Hi,
I wanted to use this wonderful module in aws lambda, the key blocker is that when I compile node-parquet module then the whole thing is over 400MB; Unfortunately AWS Lambda allows to upload ~240 MB max per lambda function.
I was wondering is there any possibility to slim the whole output down. Or is this is what we get?
In any case I'm looking through make files to understand if I can do something on my own.
Thanks for your time!
We require fast parquet writer in nodejs. Can anyone update this module? We are ready to fund.
I did "sudo apt-get install -y bison flex libssl-dev libboost-dev libboost-system-dev libboost-filesystem-dev libboost-regex-dev"
before install node-parquet.
But cannot go next step...
Please, check the error logs.
[email protected] preinstall /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
./build_parquet-cpp.sh
CMake Error at CMakeLists.txt:19 (cmake_minimum_required):
CMake 3.2.0 or higher is required. You are running version 2.8.12.2
-- Configuring incomplete, errors occurred!
npm WARN [email protected] No description
npm WARN [email protected] No repository field.
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] preinstall: ./build_parquet-cpp.sh
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] preinstall script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2018-05-30T06_53_50_947Z-debug.log
0 info it worked if it ends with ok
1 verbose cli [ '/usr/local/bin/node',
1 verbose cli '/usr/local/bin/npm',
1 verbose cli 'install',
1 verbose cli '--save',
1 verbose cli 'node-parquet' ]
2 info using [email protected]
3 info using [email protected]
4 verbose npm-session 008f3bb4f7d551b2
5 silly install loadCurrentTree
6 silly install readLocalPackageData
7 http fetch GET 304 https://registry.npmjs.org/node-parquet 817ms (from cache)
8 silly pacote tag manifest for node-parquet@latest fetched in 854ms
9 silly install loadIdealTree
10 silly install cloneCurrentTreeToIdealTree
11 silly install loadShrinkwrap
12 silly install loadAllDepsIntoIdealTree
13 silly resolveWithNewModule [email protected] checking installable status
14 http fetch GET 304 https://registry.npmjs.org/minimist 126ms (from cache)
15 http fetch GET 304 https://registry.npmjs.org/nan 127ms (from cache)
16 silly pacote range manifest for minimist@^1.2.0 fetched in 130ms
17 silly resolveWithNewModule [email protected] checking installable status
18 silly pacote range manifest for nan@^2.10.0 fetched in 131ms
19 silly resolveWithNewModule [email protected] checking installable status
20 http fetch GET 304 https://registry.npmjs.org/hexdump-nodejs 697ms (from cache)
21 silly pacote range manifest for hexdump-nodejs@^0.1.0 fetched in 700ms
22 silly resolveWithNewModule [email protected] checking installable status
23 silly currentTree [email protected]
24 silly idealTree [email protected]
24 silly idealTree ├── [email protected]
24 silly idealTree ├── [email protected]
24 silly idealTree ├─┬ [email protected]
24 silly idealTree │ └── [email protected]
24 silly idealTree └── [email protected]
25 silly install generateActionsToTake
26 silly diffTrees action count 5
27 silly diffTrees add [email protected]
28 silly diffTrees add [email protected]
29 silly diffTrees add [email protected]
30 silly diffTrees add [email protected]
31 silly diffTrees add [email protected]
32 silly decomposeActions action count 40
33 silly decomposeActions fetch [email protected]
34 silly decomposeActions extract [email protected]
35 silly decomposeActions preinstall [email protected]
36 silly decomposeActions build [email protected]
37 silly decomposeActions install [email protected]
38 silly decomposeActions postinstall [email protected]
39 silly decomposeActions finalize [email protected]
40 silly decomposeActions refresh-package-json [email protected]
41 silly decomposeActions fetch [email protected]
42 silly decomposeActions extract [email protected]
43 silly decomposeActions preinstall [email protected]
44 silly decomposeActions build [email protected]
45 silly decomposeActions install [email protected]
46 silly decomposeActions postinstall [email protected]
47 silly decomposeActions finalize [email protected]
48 silly decomposeActions refresh-package-json [email protected]
49 silly decomposeActions fetch [email protected]
50 silly decomposeActions extract [email protected]
51 silly decomposeActions preinstall [email protected]
52 silly decomposeActions build [email protected]
53 silly decomposeActions install [email protected]
54 silly decomposeActions postinstall [email protected]
55 silly decomposeActions finalize [email protected]
56 silly decomposeActions refresh-package-json [email protected]
57 silly decomposeActions fetch [email protected]
58 silly decomposeActions extract [email protected]
59 silly decomposeActions preinstall [email protected]
60 silly decomposeActions build [email protected]
61 silly decomposeActions install [email protected]
62 silly decomposeActions postinstall [email protected]
63 silly decomposeActions finalize [email protected]
64 silly decomposeActions refresh-package-json [email protected]
65 silly decomposeActions fetch [email protected]
66 silly decomposeActions extract [email protected]
67 silly decomposeActions preinstall [email protected]
68 silly decomposeActions build [email protected]
69 silly decomposeActions install [email protected]
70 silly decomposeActions postinstall [email protected]
71 silly decomposeActions finalize [email protected]
72 silly decomposeActions refresh-package-json [email protected]
73 silly install executeActions
74 silly doSerial global-install 40
75 verbose correctMkdir /root/.npm/_locks correctMkdir not in flight; initializing
76 verbose lock using /root/.npm/_locks/staging-846bdfdb6908b49a.lock for /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging
77 silly doParallel extract 40
78 silly extract [email protected]
79 silly pacote trying hexdump-nodejs@https://registry.npmjs.org/hexdump-nodejs/-/hexdump-nodejs-0.1.0.tgz by hash: sha1-W2KB2R3YjHnfpRtC8I2sTML5rpI=
80 silly extract [email protected]
81 silly pacote trying minimist@https://registry.npmjs.org/minimist/-/minimist-1.2.0.tgz by hash: sha1-o1AIsg9BOD7sH7kU9M1d95omQoQ=
82 silly extract [email protected]
83 silly pacote trying nan@https://registry.npmjs.org/nan/-/nan-2.10.0.tgz by hash: sha512-bAdJv7fBLhWC+/Bls0Oza+mvTaNQtP+1RyhhhvD95pgUJz6XM5IzgmxOkItJ9tkoCiplvAnXI1tNmmUD/eScyA==
84 silly extract [email protected]
85 silly pacote trying node-parquet@https://registry.npmjs.org/node-parquet/-/node-parquet-0.2.7.tgz by hash: sha512-m9OySE3WfBgkTQ+lH8SC9cbrmBPgBSbGSG9hhrQACaqnyQFXJXuutqEeCIxo/2We5iuguCFsfpqqnjfCvPxGMg==
86 silly extract [email protected]
87 silly pacote trying varint@https://registry.npmjs.org/varint/-/varint-5.0.0.tgz by hash: sha1-2Ca4n3SQcy+rwMDtaT7Uddyynr8=
88 silly pacote hexdump-nodejs@https://registry.npmjs.org/hexdump-nodejs/-/hexdump-nodejs-0.1.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/hexdump-nodejs-1072ae2d by content address 113ms
89 silly pacote varint@https://registry.npmjs.org/varint/-/varint-5.0.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/varint-4786d7ab by content address 116ms
90 silly pacote minimist@https://registry.npmjs.org/minimist/-/minimist-1.2.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/minimist-1906643f by content address 128ms
91 silly pacote nan@https://registry.npmjs.org/nan/-/nan-2.10.0.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/nan-85e1df4c by content address 142ms
92 silly pacote node-parquet@https://registry.npmjs.org/node-parquet/-/node-parquet-0.2.7.tgz extracted to /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging/node-parquet-76a6ccb4 by content address 232ms
93 silly doReverseSerial unbuild 40
94 silly doSerial remove 40
95 silly doSerial move 40
96 silly doSerial finalize 40
97 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/hexdump-nodejs
98 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/minimist
99 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet/node_modules/nan
100 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/varint
101 silly finalize /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
102 silly doParallel refresh-package-json 40
103 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/hexdump-nodejs
104 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/minimist
105 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet/node_modules/nan
106 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/varint
107 silly refresh-package-json /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
108 silly doParallel preinstall 40
109 silly preinstall [email protected]
110 info lifecycle [email protected]preinstall: [email protected]preinstall: [email protected]
111 silly preinstall [email protected]
112 info lifecycle [email protected]
113 silly preinstall [email protected]
114 info lifecycle [email protected]preinstall: [email protected]preinstall: [email protected]
115 silly preinstall [email protected]
116 info lifecycle [email protected]
117 silly preinstall [email protected]
118 info lifecycle [email protected]preinstall: [email protected]preinstall: unsafe-perm in lifecycle false
119 verbose lifecycle [email protected]
120 verbose lifecycle [email protected]preinstall: PATH: /usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/node-gyp-bin:/usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet/node_modules/.bin:/usr/local/globalcdn/playground/nodeParquet/node_modules/.bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/gamespreinstall: CWD: /usr/local/globalcdn/playground/nodeParquet/node_modules/node-parquet
121 verbose lifecycle [email protected]
122 silly lifecycle [email protected]preinstall: Args: [ '-c', './build_parquet-cpp.sh' ]preinstall: Returned: code: 1 signal: null
123 silly lifecycle [email protected]
124 info lifecycle [email protected]~preinstall: Failed to exec preinstall script
125 verbose unlock done using /root/.npm/_locks/staging-846bdfdb6908b49a.lock for /usr/local/globalcdn/playground/nodeParquet/node_modules/.staging
126 silly saveTree [email protected]
126 silly saveTree └─┬ [email protected]
126 silly saveTree ├── [email protected]
126 silly saveTree ├── [email protected]
126 silly saveTree ├── [email protected]
126 silly saveTree └── [email protected]
127 warn [email protected] No description
128 warn [email protected] No repository field.
129 verbose stack Error: [email protected] preinstall: ./build_parquet-cpp.sh
129 verbose stack Exit status 1
129 verbose stack at EventEmitter. (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:285:16)
129 verbose stack at emitTwo (events.js:126:13)
129 verbose stack at EventEmitter.emit (events.js:214:7)
129 verbose stack at ChildProcess. (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
129 verbose stack at emitTwo (events.js:126:13)
129 verbose stack at ChildProcess.emit (events.js:214:7)
129 verbose stack at maybeClose (internal/child_process.js:925:16)
129 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:209:5)
130 verbose pkgid [email protected]
131 verbose cwd /usr/local/globalcdn/playground/nodeParquet
132 verbose Linux 3.13.0-74-generic
133 verbose argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "--save" "node-parquet"
134 verbose node v8.11.1
135 verbose npm v5.6.0
136 error code ELIFECYCLE
137 error errno 1
138 error [email protected] preinstall: ./build_parquet-cpp.sh
138 error Exit status 1
139 error Failed at the [email protected] preinstall script.
139 error This is probably not a problem with npm. There is likely additional logging output above.
140 verbose exit [ 1, true ]
Hi,
I'm trying to use the ParquetWriter to write to a parquet file one line at a time. I noticed that node memory usage continues to grow with each call writer.write(rows)
. I am processing a very large file and the memory usage grows beyond my machines limits. Since I am reading and writing one row at a time, it seems like the memory usage should stay constant. Is there a workaround for this?
Thanks,
David
In some cases my node dies with following message:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
Reproducible on Ubuntu 16.04.4 LTS
Node.js v6.10.3 (downloaded as binaries)
node-parquet 0.2.6 (installed via npm with all dev-depenencies)
I can fetch more information if needed. This is my dev environment.
P.S. Thanks for parquet for node!!
Hi,
I'm trying to install node-parquet, using "npm install node-parquet" on my macOS Sierra.
I installed all the dependencies, but i'm getting this error:
[100%] Built target parquet_static
[email protected] install /Volumes/Data/Desenvolvimento/repositories/web/parquet_reader/node_modules/node-parquet
node-gyp rebuild
CXX(target) Release/obj.target/parquet/src/parquet_binding.o
In file included from ../src/parquet_binding.cc:3:
In file included from ../src/parquet_reader.h:8:
In file included from ../deps/parquet-cpp/src/parquet/api/reader.h:22:
../deps/parquet-cpp/src/parquet/column/reader.h:22:10: fatal error: 'cstdint' file not found
#include
^
1 error generated.
make: *** [Release/obj.target/parquet/src/parquet_binding.o] Error 1
gyp ERR! build error
First: thank you for building this super-useful tool. Definitely comes in handy when needing to run quick diffs against two parquet files.
Numeric filenames cause javascript errors. Since my parquet files are output by hive, they have numeric names.
If you run the following code:
parquet head 00000
You'll get the following error message:
cat 0
/home/youruser/.nodenv/versions/6.11.2/lib/node_modules/node-parquet/bin/parquet.js:54
const reader = new parquet.ParquetReader(file);
^
TypeError: wrong argument
at TypeError (native)
at cat (/home/sroeca/.nodenv/versions/6.11.2/lib/node_modules/node-parquet/bin/parquet.js:54:18)
at Object.<anonymous> (/home/sroeca/.nodenv/versions/6.11.2/lib/node_modules/node-parquet/bin/parquet.js:43:5)
at Module._compile (module.js:570:32)
at Object.Module._extensions..js (module.js:579:10)
at Module.load (module.js:487:32)
at tryModuleLoad (module.js:446:12)
at Function.Module._load (module.js:438:3)
at Module.runMain (module.js:604:10)
at run (bootstrap_node.js:389:7)
At present, the simple workaround is to rename the files to a non-numeric value. This is mildly cumbersome.
Hi Mark,
I started getting strange results when converting to Parquet and back using your module and your example:
Here is the code I'm using:
var parquet = require('node-parquet');
var schema = {
small_int: {type: 'int32'},
big_int: {type: 'int64'},
name: {type: 'byte_array'}
};
var data = [
[ 13, 1111, 'hello world r'],
[ 2, 2234, 'hello world 1'],
[ 3, 2334, 'hello world 2'],
[ 4, 1223, 'hello world 3']
];
var writer = new parquet.ParquetWriter('/tmp/my_file.parquet', schema);
writer.write(data);
writer.close();
And this is the code I'm reading the Parquet file:
var fs = require('fs');
var parquet = require('node-parquet');
var file = '/tmp/my_file.parquet';
var reader = new parquet.ParquetReader(file);
console.log(reader.info());
console.log(reader.rows());
reader.close();
And this is the result I'm getting:
{ version: 0,
createdBy: 'parquet-cpp version 1.0.0',
rowGroups: 1,
columns: 3,
rows: 4 }
[ [ undefined, 1111, 'hello world r' ],
[ 2, 2234, 'hello world 1' ],
[ 3, 2334, 'hello world 2' ],
[ 4, 1223, 'hello world 3' ] ]
As you can see the number 13 is shown as undefined. If I add a more complex schema more integers are shown as undefined.
I'm running AWS Linux, Node 8.2.1
Any idea?
Hi
AWS athena because i am trying to use parquet.
However, there was a problem with the installation.
I need help.
i installed brew install cmake
before npm install
~ brew install cmake
Warning: cmake 3.12.2 is already installed and up-to-date
➜ assistant git:(113-admin_006-post-s3-athena-parquet) ✗ npm i node-parquet
> [email protected] preinstall /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet
> ./build_parquet-cpp.sh
-- The C compiler identification is AppleClang 9.1.0.9020039
-- The CXX compiler identification is AppleClang 9.1.0.9020039
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found PkgConfig: /usr/local/bin/pkg-config (found version "0.29.2")
clang-tidy not found
clang-format not found
-- Compiler id: AppleClang
Selected compiler clang 4.0
-- Performing Test CXX_SUPPORTS_SSE3
-- Performing Test CXX_SUPPORTS_SSE3 - Success
-- Performing Test CXX_SUPPORTS_ALTIVEC
-- Performing Test CXX_SUPPORTS_ALTIVEC - Success
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
-- Build Type: RELEASE
-- Boost version: 1.67.0
-- Found the following Boost libraries:
-- regex
-- Boost include dir: /usr/local/include
-- Boost libraries: /usr/local/lib/libboost_regex-mt.dylib
-- THRIFT_HOME:
-- Thrift compiler/libraries NOT found: (THRIFT_INCLUDE_DIR-NOTFOUND, THRIFT_STATIC_LIB-NOTFOUND). Looked in system search paths.
-- Thrift include dir: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep/src/thrift_ep-install/include
-- Thrift static library: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep/src/thrift_ep-install/lib/libthrift.a
-- Thrift compiler: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep/src/thrift_ep-install/bin/thrift
-- Thrift version:
-- Checking for module 'arrow'
-- No package 'arrow' found
-- Could not find the Arrow library. Looked for headers in , and for libs in
-- Building Apache Arrow from commit: 501d60e918bd4d10c429ab34e0b8e8a87dffb732
-- CMAKE_CXX_FLAGS: -Qunused-arguments -O3 -DNDEBUG -Wall -std=c++11 -stdlib=libc++
-- Found cpplint executable at /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/deps/parquet-cpp/build-support/cpplint.py
-- Configuring done
CMake Warning (dev):
Policy CMP0068 is not set: RPATH settings on macOS do not affect
install_name. Run "cmake --help-policy CMP0068" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.
For compatibility with older versions of CMake, the install_name fields for
the following targets are still affected by RPATH settings:
parquet_shared
This warning is for project developers. Use -Wno-dev to suppress it.
-- Generating done
-- Build files have been written to: /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp
Scanning dependencies of target thrift_ep
[ 1%] Creating directories for 'thrift_ep'
[ 3%] Performing download step (download, verify and extract) for 'thrift_ep'
-- thrift_ep download command succeeded. See also /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-download-*.log
[ 5%] No patch step for 'thrift_ep'
[ 7%] No update step for 'thrift_ep'
[ 9%] Performing configure step for 'thrift_ep'
-- thrift_ep configure command succeeded. See also /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-configure-*.log
[ 10%] Performing build step for 'thrift_ep'
CMake Error at /Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-RELEASE.cmake:16 (message):
Command failed: 2
'/Applications/Xcode.app/Contents/Developer/usr/bin/make'
See also
/Users/hongjinho/Documents/cosmee/assistant/node_modules/node-parquet/build_deps/parquet-cpp/thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build-*.log
make[2]: *** [thrift_ep-prefix/src/thrift_ep-stamp/thrift_ep-build] Error 1
make[1]: *** [CMakeFiles/thrift_ep.dir/all] Error 2
make: *** [all] Error 2
npm WARN assistant No description
npm WARN assistant No repository field.
npm WARN assistant No license field.
npm ERR! code ELIFECYCLE
npm ERR! errno 2
npm ERR! [email protected] preinstall: `./build_parquet-cpp.sh`
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the [email protected] preinstall script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /Users/hongjinho/.npm/_logs/2018-09-15T18_38_47_846Z-debug.log
The following code generates a SIGSEGV system error:
var parquet = require('node-parquet');
var schema = { string: {type: 'byte_array'}, };
var f = new parquet.ParquetWriter(__dirname + '/t1.parquet', schema);
f.write([
[ "hello" ], // Ok
[ [ 4 ] ] // Fault => crash
]);
f.close();
I have too large amount no-sql data, I want to read data as stream and just pass the schema and stream . I will upload on S3 as parquet file . Due to large amount data can't store on local so I don't want to store file in memory or physically memory . Please advise me
Hi,
We are desperately in a need to parse parquet formatted files from node server to get some meaningful information out of it.we believe this module is the best fit for our need.
So,If we could tell us when can we expect the initial working version of this module that would be of very helpful.
Thanks,
Basil
When using the group type I get an error on the number of values in the column.
E.g. from the example given in the documentation, this message is returned:
Error: Column 2 had 7 while previous column had 2
So it seems like the nested values are treated as a single value column and therefore it thinks the column has 7 rows, when it in reality has two.
based on the example on the readme:
var parquet = require('node-parquet');
var schema = {
small_int: {type: 'int32', optional: true},
big_int: {type: 'int64'},
my_boolean: {type: 'bool'},
name: {type: 'byte_array', optional: true},
};
var data = [
[ 1, 23234, true, 'hello world'],
[ , 1234, false, ],
];
var writer = new parquet.ParquetWriter('my_file.parquet', schema);
writer.write(data);
writer.close();
Here will be the output:
$ ./node_modules/node-parquet/bin/parquet.js cat ./my_file.parquet
[1,23234,true,"hello world"]
[null,1234,false,null]
If write it three times like below:
writer.write(data);
writer.write(data);
writer.write(data);
writer.close();
The output will be nulls after first write:
$ ./node_modules/node-parquet/bin/parquet.js cat ./my_file.parquet
[1,23234,true,"hello world"]
[null,1234,false,null]
[null,null,null,null]
[null,null,null,null]
[null,null,null,null]
[null,null,null,null]
Getting this on a Fedora, all requirements listed in the readme installed:
CXX(target) Release/obj.target/parquet/src/parquet_binding.o
In file included from ../deps/parquet-cpp/src/parquet/api/reader.h:22:0,
from ../src/parquet_reader.h:8,
from ../src/parquet_binding.cc:3:
../deps/parquet-cpp/src/parquet/column_reader.h:29:33: fatal error: arrow/util/bit-util.h: No such file or directory
#include <arrow/util/bit-util.h>
My use-case is that we have a bunch of Parquet files in S3 I'm operating over in batches. While it works fine to download things to the local file system before reading and then deleting them after I'm done, it would be nicer if I could cut out the file system completely and pass the reader object a Buffer
instance.
(and apologies if this already exists but I just didn't spot it in the docs)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.