Comments (6)
I think the issue here is that Arrow schema metadata and Parquet key-value file metadata are technically different concepts. And so I assume that the Rust parquet
crate does not automatically write the table metadata onto the Parquet file.
Other libraries include Arrow table schema metadata onto the Parquet key-value metadata, so maybe we should do the same here when writing.
I am able to open and view the geo metadata key in pyarrow
Is this on the table schema or the parquet metadata. They're two different things.
The Parquet schema is accessible with pyarrow.parquet.read_metadata(...).metadata.get(b'geo')
while the Arrow schema is stored separately in the Parquet file and is accessible with pyarrow.parquet.read_schema(...).metadata.get(b'geo')
. I'm guessing only the latter one exists in the Parquet file you're writing.
from parquet-wasm.
Thanks for such a quick response! And nice guess, I just checked and you're absolutely correct - Only pyarrow.parquet.read_schema(...).metadata.get(b'geo')
exists in the file I've written. So would the solution be to write the geo metadata to the parquet metadata rather than the arrow schema? Is that possible at the moment? Thanks again!
from parquet-wasm.
I think we just need to implement this method:
parquet-wasm/src/writer_properties.rs
Lines 145 to 154 in ef8ca3b
from parquet-wasm.
Can you test from this branch #503? There are developer docs here for building.
Usage should be something like:
import {
WriterProperties,
WriterPropertiesBuilder,
} from "./pkg/esm/parquet_wasm.js";
let props = new Map<string, string>();
props.set("geo", "...");
let writerProps = new WriterPropertiesBuilder()
.setKeyValueMetadata(props)
.build();
from parquet-wasm.
Sorry for the delay - took a while to figure out how to build/run, etc. Good learning experience haha! It works perfectly, thanks Kyle! If I can help by contributing any documentation, etc. when it is merged into the main branch, let me know! Happy to close this now if you are :)
from parquet-wasm.
Awesome, good to hear! Ideally most people will be writing GeoParquet via the geoarrow-wasm set of tools, like @geoarrow/geoparquet-wasm
.
const wkb = geos.geosGeomToWKB(geomPtr) // returns a WKB buffer
Having two sets of Wasm bundles is a lot of code for the user to download and means that you need to have memory copies out of one Wasm memory space and then into the other's.
But alas, for now, if it works that's good!
A doc example would be welcome! You can include markdown in the ///
in the Rust code here:
parquet-wasm/src/writer_properties.rs
Line 160 in ba3c161
That gets copied into the generated .d.ts
doc comments
from parquet-wasm.
Related Issues (20)
- No functioning example HOT 11
- 0.6 roadmap
- Support read options HOT 1
- Fully empty file does not load HOT 5
- Remove www/ directory HOT 1
- Add wasm to package.json exports
- Use `import.meta.resolve` instead of `import.meta.url`
- Improve documentation around calling `.free` HOT 4
- get wrong result when use `columns` in option HOT 7
- `free()` not work for `ParquetFile` and so on HOT 1
- Cannot import parquet-wasm/bundler HOT 3
- Write data streaming to a parquet file HOT 3
- How to use with vite? HOT 4
- initialization of GeoArrowPolygonLayer({id: 'geoarrow-polygons'}): geometryColumn not Polygon or MultiPolygon HOT 3
- Buffers are intermittently converted into BigInt data, rather than strings HOT 9
- Large reads consistently fail with: "RuntimeError: unreachable" HOT 4
- Publish 0.7.0 HOT 1
- Having problem running on AWS Lambda . HOT 4
- Struggling to merge two parquet files HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquet-wasm.