Coder Social home page Coder Social logo

Comments (10)

nmandery avatar nmandery commented on June 11, 2024 1

I will look into this.

Are there any NULLs or empty geometries in the dataframe you are using?

from h3ronpy.

nmandery avatar nmandery commented on June 11, 2024 1

So far I have been unable to reproduce this error. Could you provide a test-dataset which triggers this error? Maybe a small subset of your dataframe is sufficient.

What version of pyarrow are you using?

from h3ronpy.

nmandery avatar nmandery commented on June 11, 2024 1

Thanks, I now can reproduce it.

Fixing this will take some time I guess. The error originates in this check calling this. Not sure how this could be triggered without a bug in pyarrow.

from h3ronpy.

nmandery avatar nmandery commented on June 11, 2024 1

Hmm. Maybe it is memory related. I am using https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.select which creates a new table. This should have little effect on RAM usage due to arrow only copying pointers to data. The take afterwards lets then memory explode. When I am processing your dataset in batches of max. 10000 rows it works. When computing all of it at once, the error occurs and my 32Gb machine does not have much RAM left. If that is the reason, the error message is quite bad ;) At least you can work around the problem using batches. I will look into how to make the process less memory intensive.

from h3ronpy.

diehl avatar diehl commented on June 11, 2024 1

@nmandery No worries Nico. I appreciate your prompt investigation of the issue and your suggestions about how to proceed. Thank you!

from h3ronpy.

diehl avatar diehl commented on June 11, 2024

Thank you @nmandery - I'm not seeing any NaNs or empty geometries in the dataset.

from h3ronpy.

diehl avatar diehl commented on June 11, 2024

Here's a link to the exact dataset I'm using: https://web.tresorit.com/l/J3zSS#XkNiAArLbMRKOWoCaL1XpA

The PyArrow version installed in my virtual environment is 14.0.1.

from h3ronpy.

diehl avatar diehl commented on June 11, 2024

That's great news @nmandery - no worries on the timing. I can explore other pathways in the meantime. Just glad I could contribute something here that will help make the library better.

from h3ronpy.

diehl avatar diehl commented on June 11, 2024

I wouldn't be surprised. I'm breaking things left and right over here. ;-)

I've been experimenting a bit with h3pandas as well and I'm able to get to the stage where I've got a GeoDataframe with the cell geometries. When I do a GeoPandas dissolve operation to group cells into larger geometries, memory usage seems to grow without bound and crash my machine. All this to say that it seems challenging to integrate multiple attribute layers at high resolution and national scale.

from h3ronpy.

nmandery avatar nmandery commented on June 11, 2024

I know the feeling of running low on resources when dealing with large numbers of H3 cells ;-)

I suppose for your case you may want to try to work in batches. In the past I often used batches defined by low-resolution H3-cells (r=4 may work here).

I am afraid there is little I can do here to improve things. To give the user at least a bit of a hint that memory exhaustion may be the issue I added a warning in #41

from h3ronpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.