Coder Social home page Coder Social logo

h3ronpy's Introduction

h3ronpy

A data science toolkit for the H3 geospatial grid.

PyPI

ReadTheDocs

DOI

This library is not a substitute for the official python h3 library - instead it provides more high-level functions on top of H3 and integrations into common dataframe libraries.

Documentation is available on https://h3ronpy.readthedocs.io/.

Features

  • H3 algorithms provided using the performant h3o library.
  • Build on Apache Arrow and pyarrow for efficient data handling.
  • Dedicated APIs for the the pandas and polars dataframe libraries. The pandas support includes geopandas.
  • Multi-threaded conversion of raster data to the H3 grid using numpy arrays.
  • Multi-threaded conversion of vector data, including geopandas GeoDataFrames and any object which supports the python __geo_interface__ protocol (shapely, geojson, ...).

Most parts of this library aim to be well-performing. Benchmarking the conversion of 1000 uint64 cell values to strings using

  • a simplistic list comprehension calling h3-py h3_to_string
  • a numpy vectorized (numpy.vectorize) variant of h3-py h3_to_string
  • the cells_to_string function of this library (release build)

leads to the following result on a standard laptop:

---------------------------------------------------------------------------------------------- benchmark: 3 tests ---------------------------------------------------------------------------------------------
Name (time in us)                           Min                 Max                Mean            StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_cells_to_string                    48.4710 (1.0)       75.5000 (1.0)       52.4252 (1.0)      1.5461 (1.0)       52.0330 (1.0)      0.4890 (1.0)       307;448       19.0748 (1.0)        4090           1
test_h3_to_string_python_list          290.5460 (5.99)     325.8180 (4.32)     297.5644 (5.68)     4.8769 (3.15)     296.1350 (5.69)     8.2420 (16.85)       806;4        3.3606 (0.18)       2863           1
test_h3_to_string_numpy_vectorized     352.9870 (7.28)     393.8450 (5.22)     360.1159 (6.87)     3.7195 (2.41)     359.4820 (6.91)     3.8420 (7.86)      447;131        2.7769 (0.15)       2334           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

The benchmark implementation can be found in tests/polars/test_benches.py and uses pytest-benchmark.

Limitations

Not all functionalities of the H3 grid are wrapped by this library, the current feature-set was implemented when there was a need and the time for it. As a opensource library new features can be requested in the form of github issues or contributed using pull requests.

License

MIT

h3ronpy's People

Contributors

bielstela avatar manevillef avatar nmandery avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

h3ronpy's Issues

Question: Best way to convert Polars series of Lat/Lng to H3 cells

Hi @nmandery, I have a large polars Dataframe with a struct column of WGS84 latitude and longitude. I want to map the Points to corresponding H3 cells.

Naturally using the uber h3 python library with pl.struct(['lat', 'lng']).apply() is quite slow. I found your library, but it looks like its only made to work with geo-type data. Is there a workaround outside of polars -> geopandas -> polars?

geodataframe_to_cells issue

@nmandery when executing

geodataframe_to_cells(gdf, H3_RESOLUTION, containment_mode=ContainmentMode.IntersectsBoundary)

shouldn't at least one H3 cell always be returned regardless of resolution? I'm trying to compute the set of H3 cells that completely covers the geometries in gdf. I have an example running in a notebook where an empty geodataframe is returned if the H3 resolution goes below a certain threshold. This is not what I expected when the IntersectsBoundary containment mode is selected.

ArrowIndexError: Negative buffer slice length

I executed the following command on a dataset I'm working with and saw the following error. Could anyone provide some guidance on what might be the issue here? Any pointers would be greatly appreciated!

ga_cells_df = geodataframe_to_cells(ga_gdf, 10)
---------------------------------------------------------------------------
ArrowIndexError                           Traceback (most recent call last)
File <timed exec>:1

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/h3ronpy/pandas/vector.py:124, in geodataframe_to_cells(gdf, resolution, containment_mode, compact, cell_column_name, all_intersecting)
    116 cells = _av.wkb_to_cells(
    117     gdf.geometry.to_wkb(),
    118     resolution,
   (...)
    121     all_intersecting=all_intersecting,
    122 )
    123 table = pa.Table.from_pandas(pd.DataFrame(gdf.drop(columns="geometry"))).append_column(cell_column_name, cells)
--> 124 return _arrow_util.explode_table_include_null(table, cell_column_name).to_pandas().reset_index(drop=True)

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/h3ronpy/arrow/util.py:10, in explode_table_include_null(table, column)
      8 other_columns.remove(column)
      9 indices = pc.list_parent_indices(pc.fill_null(table[column], [None]))
---> 10 result = table.select(other_columns).take(indices)
     11 result = result.append_column(
     12     pa.field(column, table.schema.field(column).type.value_type),
     13     pc.list_flatten(pc.fill_null(table[column], [None])),
     14 )
     15 return result

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/pyarrow/table.pxi:2005, in pyarrow.lib._Tabular.take()

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/pyarrow/compute.py:486, in take(data, indices, boundscheck, memory_pool)
    446 """
    447 Select values (or records) from array- or table-like data given integer
    448 selection indices.
   (...)
    483 ]
    484 """
    485 options = TakeOptions(boundscheck=boundscheck)
--> 486 return call_function('take', [data, indices], options, memory_pool)

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/pyarrow/_compute.pyx:590, in pyarrow._compute.call_function()

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/pyarrow/_compute.pyx:385, in pyarrow._compute.Function.call()

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File ~/opt/miniconda3/envs/viz-prototyping/lib/python3.11/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()

ArrowIndexError: Negative buffer slice length

h3ronpy : python 3.7 build issue

Hi. Firstly this is a great library to work with h3indexes.
However, I have a weird error that I faced when trying to pip install h3ronpy in python3.7.
The following are the last lines of the error!

` Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
Checking for Rust toolchain....
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/home/airflow/.local/lib/python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/airflow/.local/lib/python3.7/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-4_vsukfz/overlay/lib/python3.7/site-packages/maturin/init.py", line 140, in prepare_metadata_for_build_wheel
output = subprocess.check_output(["cargo", "--version"]).decode(
File "/usr/local/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/usr/local/lib/python3.7/subprocess.py", line 488, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: 'cargo'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

I found this weird because the other I could install the package in python3.9 without anyissues.

Even though the error says permission denied: 'cargo', I think there is a python dependency issue.

Thanks!

Illegal instruction (core dumped) on Ubuntu

Hi Nico,

when attempting to execute the module, the core dumps due to illegal instruction. (Possibly related to #26)

>>> import geopandas as gpd
>>> from h3ronpy.pandas.vector import geodataframe_to_cells
>>> shape = gpd.read_file("admin_bavaria.gpkg")
>>> shape
     GID_1 GID_0  COUNTRY  NAME_1 VARNAME_1 NL_NAME_1     TYPE_1   ENGTYPE_1 CC_1 HASC_1  ISO_1  area_adm1                                           geometry
0  DEU.2_1   DEU  Germany  Bayern   Bavaria        NA  Freistaat  Free State   09  DE.BY  DE-BY  70531.935  MULTIPOLYGON (((8.92500 50.10593, 8.92493 50.1...
>>> df = geodataframe_to_cells(shape,3)
Illegal instruction (core dumped)

I'm using h3ronpy version 0.19.1 and installed it with pip version 22.0.2. The OS is Ubuntu:

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04

However, when installing the package from source, the error does not occur.

Do you have an idea how to fix this issue when using pip install?

Best,
Johanna

Nans no indexed even when nodata_value is set to a numerical value

Hi,

I have a raster where a large area is represented by nan values. However, nans have a meaning (as in is not possible to get a result in that area) so I would like to index them.

When using raster_to_dataframe, regardless of the value I give to nodata_value, nans are not indexed.

Is there anything I can do to be able to index nan areas?

Thanks
Jorge

Raster_to_geodataframe Plotting inconsistent Hex Resolution

I am encountering an issue with inconsistent hexagon resolution in the H3ron Uber H3 Raster library. Hexagons generated within the same image exhibit varying sizes, hindering accurate data analysis and interpretation.

I request your assistance in resolving this matter and achieving a uniform hex resolution throughout the image. Any guidance or support you can provide would be greatly appreciated. I am also willing to share examples or code snippets to aid in troubleshooting.
Screenshot from 2023-05-23 03-49-16

Create `change_resolution_list` function

There are multiple functions for changing cell resolutions, but having one implementation which

  • returns the changed cells in the same order of the input
  • allows handling invalid input by setting the validity of the output
  • outputs a List array to allow the user some flexibility

Conversion from ndarray to h3 DataFrame throws unspecified error in h3ron.h3ron

Hi there!

Using raster.raster_to_dataframe throws an error in h3ronpy raster.py RuntimeError: operation failed at

values, indexes = func(in_raster, _get_transform(transform), h3_resolution, axis_order, compacted, nodata_value)

The above line calls raster.rs's raster_to_h3 function

the raster_to_h3 function instantiates the H3Converter class from h3ron_ndarray and there the to_h3 function is called (link to Github source).

The error thrown is defined in h3ron.h3ron.src.error.rs and the to_h3 function creates a HashMap type in let mut h3_map = HashMap::default(); imported from h3ron.h3ron.src.collections.mod.rs, see this line.

But I don't know how the error is thrown or where I went off path. I am new to Rust and would be glad for pointers on how to resolve this :)

It does not happen with every GeoTIF, but crashes on Sentinel satellite data. I set up a Jupyter notebook for others to reproduce my error: https://github.com/vanyabrucker/h3ronpy-issue-21

Thanks!

OSError: exception: access violation reading 0x00000000000000A0

I get this error while testing the uploaded example. Appears to happen in line:
vegetation_h3_df.plot(column="value", linewidth=0.2, edgecolor="black", **vegetation_plot_args)

CALLBACK error:


OSError Traceback (most recent call last)
in
6
7 print("plotting ... this may take a bit")
----> 8 vegetation_h3_df.plot(column="value", linewidth=0.2, edgecolor="black", **vegetation_plot_args)
9 pyplot.show()

~\AppData\Roaming\Python\Python38\site-packages\geopandas\plotting.py in call(self, *args, **kwargs)
948 kind = kwargs.pop("kind", "geo")
949 if kind == "geo":
--> 950 return plot_dataframe(data, *args, **kwargs)
951 if kind in self._pandas_kinds:
952 # Access pandas plots

~\AppData\Roaming\Python\Python38\site-packages\geopandas\plotting.py in plot_dataframe(df, column, cmap, color, ax, cax, categorical, legend, scheme, k, vmin, vmax, markersize, figsize, legend_kwds, categories, classification_kwds, missing_kwds, aspect, **style_kwds)
663 if aspect == "auto":
664 if df.crs and df.crs.is_geographic:
--> 665 bounds = df.total_bounds
666 y_coord = np.mean([bounds[1], bounds[3]])
667 ax.set_aspect(1 / np.cos(y_coord * np.pi / 180))

~\AppData\Roaming\Python\Python38\site-packages\geopandas\base.py in total_bounds(self)
2582 array([ 0., -1., 3., 2.])
2583 """
-> 2584 return GeometryArray(self.geometry.values).total_bounds
2585
2586 @Property

~\AppData\Roaming\Python\Python38\site-packages\geopandas\array.py in total_bounds(self)
913 # TODO with numpy >= 1.15, the 'initial' argument can be used
914 return np.array([np.nan, np.nan, np.nan, np.nan])
--> 915 b = self.bounds
916 return np.array(
917 (

~\AppData\Roaming\Python\Python38\site-packages\geopandas\array.py in bounds(self)
905 @Property
906 def bounds(self):
--> 907 return vectorized.bounds(self.data)
908
909 @Property

~\AppData\Roaming\Python\Python38\site-packages\geopandas_vectorized.py in bounds(data)
935 # as those return an empty tuple, not resulting in a 2D array
936 bounds = np.array(
--> 937 [
938 geom.bounds
939 if not (geom is None or geom.is_empty)

~\AppData\Roaming\Python\Python38\site-packages\geopandas_vectorized.py in (.0)
936 bounds = np.array(
937 [
--> 938 geom.bounds
939 if not (geom is None or geom.is_empty)
940 else (np.nan, np.nan, np.nan, np.nan)

~\AppData\Roaming\Python\Python38\site-packages\shapely\geometry\base.py in bounds(self)
473 return ()
474 else:
--> 475 return self.impl'bounds'
476
477 @Property

~\AppData\Roaming\Python\Python38\site-packages\shapely\coords.py in call(self, this)
185 def call(self, this):
186 self._validate(this)
--> 187 env = this.envelope
188 if env.geom_type == 'Point':
189 return env.bounds

~\AppData\Roaming\Python\Python38\site-packages\shapely\geometry\base.py in envelope(self)
498 def envelope(self):
499 """A figure that envelopes the geometry"""
--> 500 return geom_factory(self.impl'envelope')
501
502 @Property

~\AppData\Roaming\Python\Python38\site-packages\shapely\topology.py in call(self, this, *args)
78 def call(self, this, *args):
79 self._validate(this)
---> 80 return self.fn(this._geom, *args)

OSError: exception: access violation reading 0x00000000000000A0

Any ideas for this?? Thank you!!

Points aren't properly parsed into H3 cells

I was testing the library with different geometries, and I found out that points aren't properly matched to their H3 cells. I could filter out the points and parse them using official H3 bindings, but maybe there is an option to properly parse them in this library in a full package. I haven't tested the linestrings / multilinestrings and geometry collections yet.

from h3ronpy.pandas.vector import geometry_to_cells
from shapely.geometry import Point
import h3

# Manhattan Central Park
point = Point(-73.9575, 40.7938)

h3.int_to_str(geometry_to_cells(point, 8)[0])
# 8875588a83fffff - random cell near Null Island (0, 0)

h3.latlng_to_cell(point.y, point.x, 8)
# 882a1008d7fffff - proper cell

Import failure on M1 MacBook Pro

Importing h3ronpy in a Jupyter notebook (version 7) is causing a kernel failure on my M1 MacBook Pro. I'm not totally surprised now that I'm no longer running my Python stack in emulation mode. I dropped down to a terminal and did the import at an iPython prompt and got the following message:

zsh: illegal hardware instruction  ipython

The import is working without issue on my Intel-based iMac so I'll proceed there, but wanted to share this.

integration with the `polars` api

Good morning. First and foremost, congrats for this library, it is a joy to use! I've been playing around the Polars functions and wondered if those could be used somehow with the Expressions api in polars.

To do a group_by parent cell one can do (maybe not the best approach at all)

df.with_columns(
            pl.col("h3index").map_batches(lambda x: change_resolution(x, h3res))
        )
        .group_by("h3index")
        .agg(pl.col("value").sum())

But if somehow we could make change_resolution part of the expressions api this could be done like

 df.with_columns(
           pl.col("h3index").h3.change_resolution(h3res)
       )
       .group_by("h3index")
       .agg(pl.col("value").sum())

My first question should be if my assumption is correct and the polars functions in this lib must be treated as user defined functions in order to integrate them in polars or there's better way to use h3ronpy's functions in polars that I'm not seeing?

hmm maybe this is more of a polars related question than h3ronpy's but anyway, here it is !
Thanks!

Bounds check fail for global raster

Hi! sometimes I run into the same issue with global raster that trigger the Input array spans more than the bounds of WGS84 - input needs to be in WGS84 projection with lat/lon coordinates exception. In this example I'm using a global poopulation raster from sedac that has this shape and transform:

>>> src.shape
(4320, 8640)
>>> src.transform
Affine(0.0416666666666667, 0.0, -180.0,
       0.0, -0.0416666666666667, 89.99999999999994)

Looking at the source check_wgs84_bounds I can't see any issue with the checks but when I do the computation manually the floating point curse strikes:

>>> transform.a * shape[1]
360.0000000000003
>>> transform.a * shape[0]
180.00000000000014

Both values will fail the check in check_wgs84_bounds. I did not debug the rust code but I bet something like this is happening under the hood. Do you know any workaround to do on the the user inputs to avoid this issue (like casting the transform to ints or something)? What I normally do is clipping a few border pixels but it is not ideal.

Thank you!!

cells_parse fails due to unexpected array type when using polars

>>> import polars as pl
>>> from h3ronpy.polars import cells_parse
>>> cells_parse(pl.Series(["801ffffffffffff"]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nico/.cache/pypoetry/virtualenvs/basp-ingest-bc7lia5F-py3.11/lib/python3.10/site-packages/h3ronpy/polars/__init__.py", line 20, in wrapper
    result = func(*args, **kw)
  File "/home/nico/.cache/pypoetry/virtualenvs/basp-ingest-bc7lia5F-py3.11/lib/python3.10/site-packages/h3ronpy/arrow/__init__.py", line 66, in cells_parse
    return op.cells_parse(_to_arrow_array(arr, pa.utf8()), set_failing_to_invalid=set_failing_to_invalid)
ValueError: Expected arrow2::array::utf8::Utf8Array<i32>, found arrow array of type LargeUtf8
>>> cells_parse(pl.Series(["801ffffffffffff"]).to_arrow())
shape: (1,)
Series: '' [u64]
[
        577023702256844799
]

Rebuild ontop h3o and arrow2

Use arrow2 to exchange arrays with python. Migrate to h3o. Make use of h3arrow and rasterh3

By building ontop arrow, we can also provide direct support for polars without requiring pandas for most functionalities

Reverse (H3 set → raster)

Could this library do the reverse operation for raster conversion? i.e., given a set of H3 indicies (at the same, or perhaps at mixed resolutions), and some property (e.g. elevation), produce a raster output.

Naïvely, this could be through conversion of the H3 set to a consistent resolution, conversion to geo boundaries, and then rasterisation. However I'm particularly thinking about a hypothetically efficient approach that avoids expanding the compacted (mixed-resolution) set of H3 indices to a common resolution before rasterisation (if this is actually possible).

I think one way to approach this would be to find the set of pixels covering the region (extent of the input H3 set), compute each pixel's H3 index at the highest appropriate resolution (the highest resolution cell in the input set), and then for each H3 cell in the input set, find the pixels that intersect using the H3 API. (There'd be undefined behaviour if the input H3 set included overlaps.) This avoids having to uncompact the input set, but whether or not that's actually less efficient than computing the H3 index for each pixel is unclear to me.

I'm not even convinced that this idea makes sense, given H3's heirarchical non-containment (children at >2 resolutions higher than a parent may be entirely outside of the (grand)parent's boundary). But since the conversion raster → H3 set is possible and sensible, I assume the reverse is, too.

Question: Lazy coordinates_to_cells

I am a new polars user and I am curious how do I use the coordinates_to_cells function in a lazy context?

If I do what I think needs to be done I get an error TypeError: 'Expr' object is not iterable I can achieve my goal in the eager way. But hoping I can do this with the lazy api?

import polars as pl
from h3ronpy.polars.vector import coordinates_to_cells

# Sample Polars DataFrame with latitude and longitude
data = {
    "x": [-74.0060, -118.2437, -87.6298],  # 'x' for longitude
    "y": [40.7128, 34.0522, 41.8781],  # 'y' for latitude
}

res = 8
df = (
    pl.DataFrame(data)
    .lazy()
    .with_columns(
        coordinates_to_cells(pl.col("x"), pl.col("y"), resarray=res)
        .h3.cells_to_string()
        .alias(f"h3_{res}")
    )
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.