Comments (8)
The best suggestion is to use the typescript types to guide you.
This is working for me with 0.5.0
import { tableFromArrays, tableToIPC } from "apache-arrow";
import * as parquet from "parquet-wasm/node/arrow1";
import { writeFileSync } from "fs";
// Create Arrow Table in JS
const LENGTH = 2000;
const rainAmounts = Float32Array.from({ length: LENGTH }, () =>
Number((Math.random() * 20).toFixed(1))
);
const rainDates = Array.from(
{ length: LENGTH },
(_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i)
);
const rainfall = tableFromArrays({
precipitation: rainAmounts,
date: rainDates,
});
// Write Arrow Table to Parquet
const writerProperties = new parquet.WriterPropertiesBuilder()
.setCompression(parquet.Compression.ZSTD)
.build();
const arrowWasmTable = parquet.Table.fromIPCStream(
tableToIPC(rainfall, "stream")
);
const parquetBuffer = parquet.writeParquet(arrowWasmTable, writerProperties);
writeFileSync("out.parquet", parquetBuffer);
I can verify that the file loads correctly in Python
from parquet-wasm.
The entire table is empty. readParquet
does not return a Uint8Array
, it returns a Table
object, so you need to call a method to convert that table to IPC bytes first. It should be something like tableFromIPC(readParquet().intoIPCStream())
. The types will guide you
from parquet-wasm.
I think the right abstract is a class like ParquetFile
from the pyarrow world in Python that only reads the metadata. We don't have something like that today, but it might come in in the next release
from parquet-wasm.
The API changed after 0.5.0. In 0.5 the Table
object doesn't exist. You can look at the 0.5.0 README for an example https://github.com/kylebarron/parquet-wasm/tree/v0.5.0?tab=readme-ov-file#example
from parquet-wasm.
Thanks for this hint (it is really good that you answer so fast to the issues 👍). Now I have another Problem with the writeParquet function. The following lines make trouble
const uintArr = tableToIPC(rainfall, 'stream');
const parquetBuffer = writeParquet(
uintArr, // this should be a table
writerProperties
);
as writeParquet is expecting a Table:
Argument of type 'Uint8Array' is not assignable to parameter of type 'Table'.
Type 'Uint8Array' is missing the following properties from type 'Table': free, recordBatch, toFFI, intoFFI, and 3 more.
So I tried
let writerProperties = new WriterPropertiesBuilder();
writerProperties = writerProperties.setCompression(Compression.ZSTD);
const props = writerProperties.build();
const uintArr = tableToIPC(rainfall, 'stream');
const arrTable = Table.fromIPCStream(uintArr);
const parquetBuffer = writeParquet(
arrTable,
props
);
and importing from 'parquet-wasm/node/arrow1' which compiles. However, this produces an empty schema. Therefore, the question is, how to call writeParquet from the return of tableToIPC(rainfall, 'stream')?
BTW: I changed the apache-arrow version to 13.0.0 as this version is also used in parquet-wasm 0.5.0
from parquet-wasm.
Hello Kyle, thanks for your support and time :). Indeed, your code works, and it turned out that my code, which I posted earlier, works as well. However, the schema is empty when I inspect the table by setting a break point after calling 'tableFromIPC(readParquet(parquetBuffer));'.
from parquet-wasm.
Perfect, 'tableFromIPC(readParquet(parquetBuffer).intoIPCStream())' worked.
from parquet-wasm.
BTW: Do you have any example to use 'readParquetStream'?
Background: I have a huge parquet file (~500MB) and I only want to read, e.g., the first line or the schema (I don't know if the 'readParquetStream' is the right function for that).
If I do this with the stream:
readParquetStream('file:///C:/Users/marcel.tiator/Projekte/IDE/IDETest4/example.parquet').then((value) =>
{
console.log('test');
});
I get the following runtime error:
2024-03-12T09:04:47.226Z root ERROR RuntimeError: unreachable
at wasm://wasm/0132be12:wasm-function[2356]:0x3158b9
at wasm://wasm/0132be12:wasm-function[4456]:0x393d5b
at wasm://wasm/0132be12:wasm-function[3105]:0x35a45b
at wasm://wasm/0132be12:wasm-function[90]:0xbde8c
at wasm://wasm/0132be12:wasm-function[2045]:0x2e9c24
at wasm://wasm/0132be12:wasm-function[4892]:0x39bbde
at __wbg_adapter_28 (...)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
from parquet-wasm.
Related Issues (20)
- No functioning example HOT 11
- Write a valid geoparquet? HOT 6
- 0.6 roadmap
- Support read options HOT 1
- Fully empty file does not load HOT 5
- Remove www/ directory HOT 1
- Add wasm to package.json exports
- Use `import.meta.resolve` instead of `import.meta.url`
- Improve documentation around calling `.free` HOT 4
- get wrong result when use `columns` in option HOT 7
- `free()` not work for `ParquetFile` and so on HOT 1
- Cannot import parquet-wasm/bundler HOT 3
- Write data streaming to a parquet file HOT 3
- How to use with vite? HOT 4
- initialization of GeoArrowPolygonLayer({id: 'geoarrow-polygons'}): geometryColumn not Polygon or MultiPolygon HOT 3
- Buffers are intermittently converted into BigInt data, rather than strings HOT 9
- Large reads consistently fail with: "RuntimeError: unreachable" HOT 4
- Publish 0.7.0 HOT 1
- Having problem running on AWS Lambda . HOT 4
- Struggling to merge two parquet files HOT 19
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquet-wasm.