Coder Social home page Coder Social logo

geoarrow / geoarrow-c Goto Github PK

View Code? Open in Web Editor NEW
24.0 5.0 3.0 3.78 MB

Experimental C and C++ implementation of the GeoArrow specification

Home Page: http://geoarrow.org/geoarrow-c/

License: Apache License 2.0

CMake 0.97% C 63.79% C++ 32.01% Python 1.17% Cython 1.84% Shell 0.23%
c geoarrow

geoarrow-c's Introduction

geoarrow-c

Codecov test coverage Documentation geoarrow on GitHub

The geoarrow C library is a geospatial type system and generic coordinate-shuffling library written in C with bindings in C++, R, and Python. The library supports well-known binary (WKB), well-known text (ISO) and geoarrow encodings as Arrow extension types with all possible mutual conversions including support for Z, M, and ZM geometries.

The library currently implements version 0.1.0 of the GeoArrow specification. The easiest way to get started with GeoArrow is to use the Python bindings, which currently use geoarrow-c under the hood for most operations.

geoarrow-c's People

Contributors

anthonynorth avatar jorisvandenbossche avatar paleolimbot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

geoarrow-c's Issues

[python] README example doesn't work

From the project README:

import geoarrow.pyarrow as ga

ga.point()
# PointType(geoarrow.point)
ga.point().storage_type
# StructType(struct<x: double, y: double>)
ga.as_geoarrow(["POINT (0 1)"])
# PointArray:PointType(geoarrow.point)[1]
# <POINT (0 1)>

When installing geoarrow with pip (pip install geoarrow-c), the example from the project README yields an import-error:

Traceback (most recent call last):
  File "/some/path/myscript.py", line 1, in <module>
    import geoarrow.pyarrow as ga
ModuleNotFoundError: No module named 'geoarrow.pyarrow'

Importing geoarrow.c yields an attribute-error:

Traceback (most recent call last):
  File "/some/path/myscript.py", line 3, in <module>
    ga.point()
AttributeError: module 'geoarrow.c' has no attribute 'point'

Expose array offsets and lengths in `geobuffers()` output

Currently, calling geobuffers() on a sliced array will result in buffers that don't necessarily line up with the logical content of the array (i.e., non-zero offsets are not supported). This is consistent with pyarrow.Array.buffers but is confusing since there is no way to check the "all offsets are 0" assumption or get the information required to apply the offsets. This should be implemented by another method (maybe geoslices()?).

[python] Prevent low-level classes to be instantiated (segfaults)

This should probably not be done by a user, but we can avoid the __init__ being callable to prevent this:

In [22]: from geoarrow.c._lib import CVectorType

In [23]: CVectorType()
Out[23]: terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct null not valid
Aborted (core dumped)

Looking forward to conda or pip installation

Nice work! Looks like you've written wkt and wkb serializers into geoarrow, that's a much needed feature over at cuSpatial. If there was a way to install this and the python bindings via a dependency manager (pip, preferably, or conda) then I'd be putting making this a dependency of cuSpatial at the top of my list. I'll be watching closely and waiting!

`GeoArrowBuilder` should be able to handle WKT and WKB arrays

Currently, an attempt to GeoArrowBuilderInitXXX() from a WKB or WKT array fails. This leads to awkward code for clients that want to support generic conversion using the visitor pattern (I tried this and immediately forgot that it wasn't implemented!).

Add support for 64-bit offsets

Currently, all offset types are int32_t and iterating over LARGE_WKT and LARGE_WKB is currently not possible. We need to:

  • Ensure that the offsets member of the GeoArrowArrayView can handle int64_t
  • Implement a visitor + writer for LARGE_WKB and LARGE_WKT. There is nothing inherently difficult about this except that doing it in C is hard because there are no templates

While we're changing the GeoArrowArrayView + places that access it, it's worth making a few other changes to ease the cost of updating when adding a few other features:

  • Ensure that serialized types can be represented by a GeoArrowArrayView. Right now there's no place to put the data member and geobuffers() does not work for serialized types
  • Ensure that the coordinate buffer value of double can expand to fit more types, probably by making this type a union. This will ensure that a future version of geoarrow-c that supports float coords won't be source-breaking.

Implement amalgamation for single-file geoarrow.c/geoarrow.h distrubution

nanoarrow goes to considerable effort to make it very, very easy to copy/paste files into another project and use them as-is. There is nothing inherent about geoarrow-c that would prevent this and I think it would be good to do this and test this early to ensure there is a clear path to "using geoarrow-c".

In this scenario it may be worth separating the kernels as optional components: not everybody needs kernels (notably: extensions that vendor geoarrow-c that implement kernels themselves).

[R]: Export wk reader & writer S3 methods?

Could we export wk_handle.geoarrow_array and wk_writer.geoarrow_array S3 methods? This will allow for integration with {wk} readers, writers and filters.

Geoarrow arrays don't currently have their own class. I'm assuming this is planned?

I think this is all we need (once the geoarrow_array class exists)

wk_handle.geoarrow_array <- function(handleable, handler, size = NA_integer_, ...) {
  handler <- wk::as_wk_handler(handler)
  geoarrow::geoarrow_handle(handleable, handler, size)
}

wk_writer.geoarrow_array <- function(handleable, schema = NULL) {
  # what is an apppropriate default?
  if (is.null(schema)) {
    schema <- geoarrow::infer_geoarrow_schema(handleable)
  }
  geoarrow::geoarrow_writer(schema)
}

wk::wk_count(my_geoarrow_array)

[R] Add GeoParquet reader implementation

Like the previous geoarrow, we should provide read_geoparquet() (which reads to data.frame with geometry as geoarrow_vctr) and read_geoparquet_sf() (which reads directly to sf). I believe the current geoarrow is set up better to handle this (or will be following #83), and there is a Python implementation to draw on which has pretty good coverage of the possible GeoParquet permutations: https://github.com/geoarrow/geoarrow-python/blob/main/geoarrow-pyarrow/src/geoarrow/pyarrow/io.py#L84-L189

Failing tests in python bindings

I'm on a Mac M2 if that's related to anything. I have apache-arrow C++ installed via homebrew if that's relevant/used at all.

cd ~/tmp
git clone https://github.com/geoarrow/geoarrow-c
cd geoarrow-c
git checkout 68d67e5
cd python
virtualenv env
source ./env/bin/activate
for d in geoarrow-c geoarrow-pyarrow geoarrow-pandas; do
    cd $d && pip install ".[test]" && cd ..
done
> cd geoarrow-c && pytest
================================================================ test session starts ================================================================
platform darwin -- Python 3.11.1, pytest-7.4.2, pluggy-1.3.0
rootdir: /Users/kyle/tmp/geoarrow-c/python/geoarrow-c
plugins: anyio-3.6.2
collected 9 items

tests/test_geoarrow_lib.py ..F.....                                                                                                           [ 88%]
tests/test_import.py .                                                                                                                        [100%]

===================================================================== FAILURES ======================================================================
________________________________________________________________ test_c_vector_type _________________________________________________________________

    def test_c_vector_type():
        type_obj = lib.CVectorType.Make(
            ga.GeometryType.POINT, ga.Dimensions.XY, ga.CoordType.SEPARATE
        )

        assert type_obj.geometry_type == ga.GeometryType.POINT
        assert type_obj.dimensions == ga.Dimensions.XY
        assert type_obj.coord_type == ga.CoordType.SEPARATE

        schema = type_obj.to_schema()
        type_obj2 = lib.CVectorType.FromExtension(schema)
        assert type_obj2 == type_obj

        pa_type = pa.DataType._import_from_c(schema._addr())
        pa_type_expected = pa.struct(
            [pa.field("x", pa.float64()), pa.field("y", pa.float64())]
        )

        # Depending on how the tests are run, the extension type might be
        # registered here.
        if isinstance(pa_type, pa.ExtensionType):
>           assert pa_type.storage_type == pa_type_expected
E           assert FixedSizeListType(fixed_size_list<xy: double>[2]) == StructType(struct<x: double, y: double>)
E            +  where FixedSizeListType(fixed_size_list<xy: double>[2]) = PointType(FixedSizeListType(fixed_size_list<xy: double>[2])).storage_type

tests/test_geoarrow_lib.py:52: AssertionError
============================================================== short test summary info ==============================================================
FAILED tests/test_geoarrow_lib.py::test_c_vector_type - assert FixedSizeListType(fixed_size_list<xy: double>[2]) == StructType(struct<x: double, y: double>)
============================================================ 1 failed, 8 passed in 0.41s ============================================================
> cd geoarrow-pyarrow && pytest
================================================================ test session starts ================================================================
platform darwin -- Python 3.11.1, pytest-7.4.2, pluggy-1.3.0
rootdir: /Users/kyle/tmp/geoarrow-c/python/geoarrow-pyarrow
plugins: anyio-3.6.2
collected 0 items / 3 errors

====================================================================== ERRORS =======================================================================
______________________________________________________ ERROR collecting tests/test_compute.py _______________________________________________________
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/runner.py:341: in from_call
    result: Optional[TResult] = func()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:531: in collect
    self._inject_setup_module_fixture()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:545: in _inject_setup_module_fixture
    self.obj, ("setUpModule", "setup_module")
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:310: in obj
    self._obj = obj = self._getobj()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:528: in _getobj
    return self._importtestmodule()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
tests/test_compute.py:7: in <module>
    import geoarrow.pyarrow as ga
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/geoarrow/pyarrow/__init__.py:94: in <module>
    register_extension_types()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/geoarrow/pyarrow/_type.py:694: in register_extension_types
    raise RuntimeError("Failed to register one or more extension types")
E   RuntimeError: Failed to register one or more extension types
______________________________________________________ ERROR collecting tests/test_dataset.py _______________________________________________________
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/runner.py:341: in from_call
    result: Optional[TResult] = func()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:531: in collect
    self._inject_setup_module_fixture()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:545: in _inject_setup_module_fixture
    self.obj, ("setUpModule", "setup_module")
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:310: in obj
    self._obj = obj = self._getobj()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:528: in _getobj
    return self._importtestmodule()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
tests/test_dataset.py:8: in <module>
    import geoarrow.pyarrow as ga
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/geoarrow/pyarrow/__init__.py:94: in <module>
    register_extension_types()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/geoarrow/pyarrow/_type.py:694: in register_extension_types
    raise RuntimeError("Failed to register one or more extension types")
E   RuntimeError: Failed to register one or more extension types
______________________________________________________ ERROR collecting tests/test_pyarrow.py _______________________________________________________
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/runner.py:341: in from_call
    result: Optional[TResult] = func()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:531: in collect
    self._inject_setup_module_fixture()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:545: in _inject_setup_module_fixture
    self.obj, ("setUpModule", "setup_module")
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:310: in obj
    self._obj = obj = self._getobj()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:528: in _getobj
    return self._importtestmodule()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
tests/test_pyarrow.py:9: in <module>
    import geoarrow.pyarrow as ga
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/geoarrow/pyarrow/__init__.py:94: in <module>
    register_extension_types()
/Users/kyle/.pyenv/versions/3.11.1/lib/python3.11/site-packages/geoarrow/pyarrow/_type.py:694: in register_extension_types
    raise RuntimeError("Failed to register one or more extension types")
E   RuntimeError: Failed to register one or more extension types
============================================================== short test summary info ==============================================================
ERROR tests/test_compute.py - RuntimeError: Failed to register one or more extension types
ERROR tests/test_dataset.py - RuntimeError: Failed to register one or more extension types
ERROR tests/test_pyarrow.py - RuntimeError: Failed to register one or more extension types
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 3 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================= 3 errors in 0.65s =================================================================

Implement `const char* GeoArrowInferExtensionName(const struct ArrowSchema*)`

There's a good deal of code in Python that would have to be duplicated in R to support inferring the extension name from a storage array. The inference is basically:

  • Binary/Large Binary -> geoarrow.wkb
  • String/LargeString -> geoarrow.wkt
  • Struct/FixedSizeList -> geoarrow.point
  • List<Struct/FixedSizeList> -> geoarrow.multipoint
  • List<List<Struct/FixedSizePoint>> -> geoarrow.multilinestring
  • List<List<List<Struct/FixedSizePoint>>> -> geoarrow.multipolygon

I don't think it needs to perform all out validation...just enough to infer what the extension name would be if it were a valid extension storage type. Then it can be passed to ArrowSchemaInitFromStorage() to do the actual validation.

Modularize kernel implementations

Currently all kernels are in the same file in one totally massive mess of function pointers. This is a bad example of how to implement a kernel, which should mostly be a one-kernel-per-file situation. It also makes it more difficult to add more kernels or add features/options to existing kernels.

[R] processing nanoarrow streams with geoarrow column

I have the ability to return a nanoarrow stream with a geoarrow geometry array. I'd like to be able to take the stream and turn it into a tabular data structure (data.frame-esque of any variety). Is it possible with geoarrow as it is today, to take this and turn it into a data.frame with a geoarrow array column?

Below is a small reprex using an R package im trying to develop using arrow-rs and geoarrow-rs
https://github.com/JosiahParry/serde_esri

library(serdesri)

url <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/ACS_Population_by_Race_and_Hispanic_Origin_Boundaries/FeatureServer/2/query?where=1=1&outFields=objectid&resultRecordCount=10&f=json"

req <- httr2::request(url)
resp <- httr2::req_perform(req)
json <- httr2::resp_body_string(resp)

stream <- parse_esri_json_str(json, 2)
stream
#> <nanoarrow_array_stream struct<OBJECTID: int64, geometry: geoarrow.polygon{large_list<rings: large_list<vertices: fixed_size_list(2)<xy: double>>>}>>
#>  $ get_schema:function ()  
#>  $ get_next  :function (schema = x$get_schema(), validate = TRUE)  
#>  $ release   :function ()

df <- as.data.frame(stream)
#> Warning in warn_unregistered_extension_type(x): geometry: Converting unknown
#> extension geoarrow.polygon{large_list<rings: large_list<vertices:
#> fixed_size_list(2)<xy: double>>>} as storage type
#> Warning in warn_unregistered_extension_type(storage): geometry: Converting
#> unknown extension geoarrow.polygon{large_list<rings: large_list<vertices:
#> fixed_size_list(2)<xy: double>>>} as storage type

str(df, 1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ OBJECTID: num  1 2 3 4 5 6 7 8 9 10
#>  $ geometry: list<list<list<dbl>>> [1:10]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.