Comments (10)
I will look into this.
Are there any NULLs or empty geometries in the dataframe you are using?
from h3ronpy.
So far I have been unable to reproduce this error. Could you provide a test-dataset which triggers this error? Maybe a small subset of your dataframe is sufficient.
What version of pyarrow are you using?
from h3ronpy.
Thanks, I now can reproduce it.
Fixing this will take some time I guess. The error originates in this check calling this. Not sure how this could be triggered without a bug in pyarrow.
from h3ronpy.
Hmm. Maybe it is memory related. I am using https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.select which creates a new table. This should have little effect on RAM usage due to arrow only copying pointers to data. The take
afterwards lets then memory explode. When I am processing your dataset in batches of max. 10000 rows it works. When computing all of it at once, the error occurs and my 32Gb machine does not have much RAM left. If that is the reason, the error message is quite bad ;) At least you can work around the problem using batches. I will look into how to make the process less memory intensive.
from h3ronpy.
@nmandery No worries Nico. I appreciate your prompt investigation of the issue and your suggestions about how to proceed. Thank you!
from h3ronpy.
Thank you @nmandery - I'm not seeing any NaNs or empty geometries in the dataset.
from h3ronpy.
Here's a link to the exact dataset I'm using: https://web.tresorit.com/l/J3zSS#XkNiAArLbMRKOWoCaL1XpA
The PyArrow version installed in my virtual environment is 14.0.1.
from h3ronpy.
That's great news @nmandery - no worries on the timing. I can explore other pathways in the meantime. Just glad I could contribute something here that will help make the library better.
from h3ronpy.
I wouldn't be surprised. I'm breaking things left and right over here. ;-)
I've been experimenting a bit with h3pandas
as well and I'm able to get to the stage where I've got a GeoDataframe with the cell geometries. When I do a GeoPandas dissolve
operation to group cells into larger geometries, memory usage seems to grow without bound and crash my machine. All this to say that it seems challenging to integrate multiple attribute layers at high resolution and national scale.
from h3ronpy.
I know the feeling of running low on resources when dealing with large numbers of H3 cells ;-)
I suppose for your case you may want to try to work in batches. In the past I often used batches defined by low-resolution H3-cells (r=4 may work here).
I am afraid there is little I can do here to improve things. To give the user at least a bit of a hint that memory exhaustion may be the issue I added a warning in #41
from h3ronpy.
Related Issues (20)
- h3ronpy : python 3.7 build issue HOT 2
- Raster_to_geodataframe Plotting inconsistent Hex Resolution HOT 3
- Build docs on readthedocs
- Conversion from ndarray to h3 DataFrame throws unspecified error in h3ron.h3ron HOT 3
- Build wheels for more architectures
- cells_parse fails due to unexpected array type when using polars
- Points aren't properly parsed into H3 cells HOT 4
- Import failure on M1 MacBook Pro HOT 6
- Question: Best way to convert Polars series of Lat/Lng to H3 cells HOT 5
- integration with the `polars` api HOT 6
- implement `vertexes_parse` and `directededges_parse`
- Create `change_resolution_list` function
- Illegal instruction (core dumped) on Ubuntu HOT 2
- Question: Lazy coordinates_to_cells HOT 2
- geodataframe_to_cells issue HOT 7
- Move to official arrow bindings
- h3 library version 3.7, is there a plan to update for version 4.0? HOT 1
- Bounds check fail for global raster HOT 3
- local ij - methods request HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h3ronpy.