Comments (6)
Wasm in the browser generally has a 2GB memory limit, and the library currently has two copies, so there's certainly some limit. The easiest way around it is to read the parquet file in chunks
from parquet-wasm.
Wasm in the browser generally has a 2GB memory limit, and the library currently has two copies, so there's certainly some limit. The easiest way around it is to read the parquet file in chunks
Do you mean readParquet()
function?
In my code:
async function parquetData() {
const data = await fetch("http://localhost:3000/large.parquet")
return data.arrayBuffer();
}
parquetData()
.then((data) => {
const parquetBytes = new Uint8Array(data);
const decodedArrowBytes = readParquet(parquetBytes);
...
})
.catch((error) => {
console.log('--- error\n')
console.error(error);
});
the error occures on readParquet(parquetBytes)
call.
from parquet-wasm.
Yeah you're storing the whole parquet file at once and reading it at once, so it's not too surprising it runs out of memory on a large file. There are chunked apis for reading only specific chunks at a time, and you can do that as part of the fetch directly too
from parquet-wasm.
Thanks you for the answer!
I'm trying to put a very large number of points (100 million is not the limit) in deck.gl using your module, and I'm trying to figure out how to do it better. I don't want to split the file into multiple parts. Could you suggest a good way to read parquet in chunks?
from parquet-wasm.
There's no way to read only part of a column if the file wasn't originally saved internally with multiple chunks
from parquet-wasm.
@kylebarron Thank you very much for your answers!
from parquet-wasm.
Related Issues (20)
- Parquet files as tiles deckgl HOT 1
- dependencies? HOT 3
- Read/write from/to streams HOT 4
- HEAD-ache requests HOT 2
- Do you have a data processing flowchart for this set? HOT 1
- Update `readParquetFFI` docs to drop the table
- Document that stream API needs polyfill to be used as async iterable HOT 5
- Deprecate arrow2/parquet2
- Try using `ParquetObjectReader` for arrow1 async api HOT 1
- Module '"parquet-wasm/bundler/arrow2"' has no exported member '__wasm' HOT 3
- Group dependabot updates
- Add publish from CI
- Update README documentation HOT 2
- Request batching HOT 22
- bundler version doesn't work in production since 0.4.0-beta.5 HOT 3
- Unable to get 0.6.0-beta.1 to work in Node HOT 9
- Writing a Date column drops associated time information HOT 1
- Changelog notes: HOT 5
- explore ehttp for request fetching
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquet-wasm.