Coder Social home page Coder Social logo

Comments (8)

kylebarron avatar kylebarron commented on September 22, 2024 1

The best suggestion is to use the typescript types to guide you.

This is working for me with 0.5.0

import { tableFromArrays, tableToIPC } from "apache-arrow";
import * as parquet from "parquet-wasm/node/arrow1";
import { writeFileSync } from "fs";

// Create Arrow Table in JS
const LENGTH = 2000;
const rainAmounts = Float32Array.from({ length: LENGTH }, () =>
  Number((Math.random() * 20).toFixed(1))
);

const rainDates = Array.from(
  { length: LENGTH },
  (_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i)
);

const rainfall = tableFromArrays({
  precipitation: rainAmounts,
  date: rainDates,
});

// Write Arrow Table to Parquet
const writerProperties = new parquet.WriterPropertiesBuilder()
  .setCompression(parquet.Compression.ZSTD)
  .build();
const arrowWasmTable = parquet.Table.fromIPCStream(
  tableToIPC(rainfall, "stream")
);
const parquetBuffer = parquet.writeParquet(arrowWasmTable, writerProperties);
writeFileSync("out.parquet", parquetBuffer);

I can verify that the file loads correctly in Python
image

from parquet-wasm.

kylebarron avatar kylebarron commented on September 22, 2024 1

The entire table is empty. readParquet does not return a Uint8Array, it returns a Table object, so you need to call a method to convert that table to IPC bytes first. It should be something like tableFromIPC(readParquet().intoIPCStream()). The types will guide you

from parquet-wasm.

kylebarron avatar kylebarron commented on September 22, 2024 1

I think the right abstract is a class like ParquetFile from the pyarrow world in Python that only reads the metadata. We don't have something like that today, but it might come in in the next release

from parquet-wasm.

kylebarron avatar kylebarron commented on September 22, 2024

The API changed after 0.5.0. In 0.5 the Table object doesn't exist. You can look at the 0.5.0 README for an example https://github.com/kylebarron/parquet-wasm/tree/v0.5.0?tab=readme-ov-file#example

from parquet-wasm.

MaTiAtSIE avatar MaTiAtSIE commented on September 22, 2024

Thanks for this hint (it is really good that you answer so fast to the issues 👍). Now I have another Problem with the writeParquet function. The following lines make trouble

const uintArr = tableToIPC(rainfall, 'stream');
const parquetBuffer = writeParquet(
  uintArr, // this should be a table
  writerProperties
);

as writeParquet is expecting a Table:

Argument of type 'Uint8Array' is not assignable to parameter of type 'Table'.
  Type 'Uint8Array' is missing the following properties from type 'Table': free, recordBatch, toFFI, intoFFI, and 3 more.

So I tried

let writerProperties = new WriterPropertiesBuilder();
writerProperties = writerProperties.setCompression(Compression.ZSTD);
const props = writerProperties.build();
const uintArr = tableToIPC(rainfall, 'stream');
const arrTable = Table.fromIPCStream(uintArr);
const parquetBuffer = writeParquet(
    arrTable,
    props
);

and importing from 'parquet-wasm/node/arrow1' which compiles. However, this produces an empty schema. Therefore, the question is, how to call writeParquet from the return of tableToIPC(rainfall, 'stream')?

BTW: I changed the apache-arrow version to 13.0.0 as this version is also used in parquet-wasm 0.5.0

from parquet-wasm.

MaTiAtSIE avatar MaTiAtSIE commented on September 22, 2024

Hello Kyle, thanks for your support and time :). Indeed, your code works, and it turned out that my code, which I posted earlier, works as well. However, the schema is empty when I inspect the table by setting a break point after calling 'tableFromIPC(readParquet(parquetBuffer));'.
empty_schema

from parquet-wasm.

MaTiAtSIE avatar MaTiAtSIE commented on September 22, 2024

Perfect, 'tableFromIPC(readParquet(parquetBuffer).intoIPCStream())' worked.

from parquet-wasm.

MaTiAtSIE avatar MaTiAtSIE commented on September 22, 2024

BTW: Do you have any example to use 'readParquetStream'?
Background: I have a huge parquet file (~500MB) and I only want to read, e.g., the first line or the schema (I don't know if the 'readParquetStream' is the right function for that).

If I do this with the stream:

readParquetStream('file:///C:/Users/marcel.tiator/Projekte/IDE/IDETest4/example.parquet').then((value) =>
{
    console.log('test');
});

I get the following runtime error:

2024-03-12T09:04:47.226Z root ERROR RuntimeError: unreachable
    at wasm://wasm/0132be12:wasm-function[2356]:0x3158b9
    at wasm://wasm/0132be12:wasm-function[4456]:0x393d5b
    at wasm://wasm/0132be12:wasm-function[3105]:0x35a45b
    at wasm://wasm/0132be12:wasm-function[90]:0xbde8c
    at wasm://wasm/0132be12:wasm-function[2045]:0x2e9c24
    at wasm://wasm/0132be12:wasm-function[4892]:0x39bbde
    at __wbg_adapter_28 (...)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

from parquet-wasm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.