Coder Social home page Coder Social logo

geoarrow-python's Introduction

GeoArrow for Python

The GeoArrow Python packages provide an implementation of the GeoArrow specification that integrates with pyarrow and pandas. The GeoArrow Python bindings enable input/output to/from Arrow-friendly formats (e.g., Parquet, Arrow Stream, Arrow File) and general-purpose coordinate shuffling tools among GeoArrow, WKT, and WKB encodings.

Installation

Python bindings for GeoArrow are available on PyPI. You can install them with:

pip install geoarrow-pyarrow geoarrow-pandas

You can install the latest development versions with:

pip install "git+https://github.com/geoarrow/geoarrow-python.git#egg=geoarrow-pyarrow&subdirectory=geoarrow-pyarrow"
pip install "git+https://github.com/geoarrow/geoarrow-python.git#egg=geoarrow-pandas&subdirectory=geoarrow-pandas"

If you can import the namespaces, you're good to go!

import geoarrow.pyarrow as ga
import geoarrow.pandas as _

Examples

You can create geoarrow-encoded pyarrow.Arrays with as_geoarrow():

ga.as_geoarrow(["POINT (0 1)"])
PointArray:PointType(geoarrow.point)[1]
<POINT (0 1)>

This will work with:

  • An existing array created by geoarrow
  • A geopandas.GeoSeries
  • A pyarrow.Array or pyarrow.ChunkedArray (geoarrow text interpreted as well-known text; binary interpreted as well-known binary)
  • Anything that pyarrow.array() will convert to a text or binary array

If there is no common geometry type among elements of the input, as_geoarrow() will fall back to well-known binary encoding. To explicitly convert to well-known text or binary, use as_wkt() or as_wkb().

Alternatively, you can construct GeoArrow arrays directly from a series of buffers as described in the specification:

import numpy as np

ga.point().from_geobuffers(
    None,
    np.array([1.0, 2.0, 3.0]),
    np.array([3.0, 4.0, 5.0])
)
PointArray:PointType(geoarrow.point)[3]
<POINT (1 3)>
<POINT (2 4)>
<POINT (3 5)>
ga.point().with_coord_type(ga.CoordType.INTERLEAVED).from_geobuffers(
    None,
    np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
)
PointArray:PointType(interleaved geoarrow.point)[3]
<POINT (1 2)>
<POINT (3 4)>
<POINT (5 6)>

Importing geoarrow.pyarrow will register the geoarrow extension types with pyarrow such that you can read/write Arrow streams, Arrow files, and Parquet that contains Geoarrow extension types. A number of these files are available from the geoarrow-data repository.

import urllib.request
from pyarrow import feather

url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_line.arrow"
local_filename, headers = urllib.request.urlretrieve(url)
feather.read_table(local_filename).schema
OBJECTID: int64
FEAT_CODE: string
LINE_CLASS: int32
MISCID_1: string
MISCNAME_1: string
MISCID_2: string
MISCNAME_2: string
HID: string
MISCID_3: string
MISCNAME_3: string
MISCID_4: string
MISCNAME_4: string
SHAPE_LEN: double
geometry: extension<geoarrow.multilinestring<MultiLinestringType>>

The as_geoarrow() function can accept a geopandas.GeoSeries as input:

import geopandas

url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_line.fgb.zip"
df = geopandas.read_file(url)
array = ga.as_geoarrow(df.geometry)
array
MultiLinestringArray:MultiLinestringType(geoarrow.multilinestring <{"$schema":"https://proj.org/schema...>)[255]
<MULTILINESTRING ((648686.210534334 5099183.050480807, 648626.2095...>
<MULTILINESTRING ((687688.0166642987 5117030.253445747, 686766.217...>
<MULTILINESTRING ((631355.7058094738 5122893.354471898, 631364.529...>
<MULTILINESTRING ((665166.2114203956 5138643.056812348, 665146.211...>
<MULTILINESTRING ((673606.2114490251 5162963.061371056, 673606.211...>
...245 values...
<MULTILINESTRING ((681672.817898342 5078602.646958541, 681866.2179...>
<MULTILINESTRING ((414868.0669037141 5093041.933686847, 414793.966...>
<MULTILINESTRING ((414868.0669037141 5093041.933686847, 414829.866...>
<MULTILINESTRING ((414868.0669037141 5093041.933686847, 414937.366...>
<MULTILINESTRING ((648686.210534334 5099183.050480807, 648866.2105...>

You can convert back to geopandas using to_geopandas():

ga.to_geopandas(array)
0      MULTILINESTRING ((648686.211 5099183.050, 6486...
1      MULTILINESTRING ((687688.017 5117030.253, 6867...
2      MULTILINESTRING ((631355.706 5122893.354, 6313...
3      MULTILINESTRING ((665166.211 5138643.057, 6651...
4      MULTILINESTRING ((673606.211 5162963.061, 6736...
                             ...
250    MULTILINESTRING ((681672.818 5078602.647, 6818...
251    MULTILINESTRING ((414868.067 5093041.934, 4147...
252    MULTILINESTRING ((414868.067 5093041.934, 4148...
253    MULTILINESTRING ((414868.067 5093041.934, 4149...
254    MULTILINESTRING ((648686.211 5099183.050, 6488...
Length: 255, dtype: geometry

Pandas integration

The geoarrow-pandas package provides an extension array that wraps geoarrow memory and an accessor that provides pandas-friendly wrappers around the compute functions available in geoarrow.pyarrow.

import geoarrow.pandas as _
import pandas as pd

df = pd.read_feather("https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_point.arrow")
df.geometry.geoarrow.format_wkt().head(5)
0     MULTIPOINT (277022.6936181751 4820886.609673489)
1     MULTIPOINT (315701.2552756762 4855051.378571571)
2    MULTIPOINT (255728.65994492616 4851022.107901295)
3     MULTIPOINT (245206.7841665779 4895609.409696873)
4    MULTIPOINT (337143.18135472975 4860312.288760258)
dtype: string[pyarrow]

Building

Python bindings for geoarrow are managed with setuptools. This means you can build the project using:

git clone https://github.com/geoarrow/geoarrow-python.git
pip install -e geoarrow-pyarrow/ geoarrow-pandas/

Tests use pytest:

pytest

geoarrow-python's People

Contributors

kylebarron avatar paleolimbot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geoarrow-python's Issues

Writing GeoParquet from GeoPandas using GeoArrow encoded columns?

I've been trying to figure out if this package enables writing GeoParquet files with GeoArrow geometry columns but haven't yet found a way. Is this supported or will that support come with GeoParquet 1.1 when GeoArrow is supposed to be supported in that spec?

If that is the case, what other ways do we have of writing GeoArrow columns to GeoParquet? Is translating via ogr2ogr and GEOMETRY_ENCODING=GEOARROW the best option?

to_geopandas method returns an error

Hi!

This is sort of a continuation of #16. When i try to convert the original dataset with the "to_geopandas" method i get the error bellow. Is there anything i am doing improperly?

import geopandas as gpd
import pyarrow.parquet as pa
from pyarrow.parquet import read_table
import shapely
import geoarrow.pyarrow as ga

tb = read_table(r"/home/parquet/buildings.parquet")
dataset = ga.dataset(tb,geometry_columns=["geometry"])

gp=ga.to_geopandas(dataset.to_table())

gp=ga.to_geopandas(dataset.to_table())
Traceback (most recent call last):
File "", line 1, in
File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 592, in to_geopandas
wkb_array_or_chunked = as_wkb(obj)
File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 267, in as_wkb
return as_geoarrow(obj, _type.wkb())
File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 280, in as_geoarrow
obj = obj_as_array_or_chunked(obj)
File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_compute.py", line 30, in obj_as_array_or_chunked
return array(obj_in, validate=False)
File "/home/venv/lib/python3.10/site-packages/geoarrow/pyarrow/_array.py", line 152, in array
arr = pa.array(obj, *args, **kwargs)
File "pyarrow/array.pxi", line 327, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert <pyarrow.lib.ChunkedArray object at 0x7f021c19b740>
[
[
1,
2,
3,
4,
5,
...
65532,
65533,
65534,
65535,
65536
],
[
65537,
65538,
65539,
65540,
65541,
...
131068,
131069,
131070,
131071,
131072
],
...,
[
196609,
196610,
196611,
196612,
196613,
...
262140,
262141,
262142,
262143,
262144
],
[
262145,
262146,
262147,
262148,
262149,
...
318059,
318060,
318061,
318062,
318063
]
] with type pyarrow.lib.ChunkedArray: did not recognize Python value type when inferring an Arrow data type

Function filter fragments returns the whole dataset

Hi! Thank you for the work you do for the community. I don't know how far along this project is, but i have installed the libraries. I imported a geoparquet (one generated with QGIS the other with geopandas) and it reads it in. I might have understood it wrong from the code but would the geometry input from "filter_fragments" method from the GeoDataset class be used as a filter so that you read in only the data that intersects the bounding box of the input from "filter_fragments"? If that is so, i have tried the following

`import geopandas as gpd
import pyarrow.parquet as pa
from pyarrow.parquet import read_table
import shapely
import geoarrow.pyarrow as ga

tb = read_table(r"/home/parquet/buildings.parquet")
dataset = ga.dataset(tb,geometry_columns=["geometry"])

gpdf_mask = gpd.read_file("/home/shapefiles/area_1.gpkg")
bnds = gpdf_mask.iloc[0].geometry.wkt
x = dataset.filter_fragments(bnds)`

I have tried with datasets in both epsg:4326 and epsg:25833. When i run len(x.to_table()) the length of the result is the same as the length of the original dataset. The geometries are saved as wkb.

I have used the following files to do the testing. I dont know if it is a bug or if it is something i am doing wrong.

from geoarrow to pyarrow without geopandas?

Is this the best way to get a pyarrow table from a GeoTable (without pandas, of course)?

pa.RecordBatchReader._import_from_c_capsule(counties.__arrow_c_stream__()).read_all()

where counties is my GeoTable.

and then as a related question, do you know if that copies since it's reading it in batches?

Python GeoArrow Module Proposal

Python GeoArrow Module Proposal

The strength of Arrow is in its interoperability, and therefore I think it's worthwhile to discuss how to ensure all the pieces around geoarrow-python fit together really well.

(It's possible this should be written RFC-style as a PR to a docs folder in this repo?)

Goals:

  • Modular: the user can install what they need and choose which dependencies they want.
  • Interoperable: the user can use c-based and rust-based (and more? CUDA?) modules together smoothly.
  • Extensible: future developers can develop on top of geoarrow-c and/or geoarrow-rust and largely reuse their python bindings without having to create ones from scratch
  • Strongly typed. A method like convex_hull should always return a PolygonArray instead of a generic GeometryArray that the user can't "see into".
  • Static typing support: At least minimal typing support and IDE autocompletion where possible.
  • No strict pyarrow dependency. At least in the longer term, users should not be required to use pyarrow, even though it's likely the vast majority will.

This proposal is based around the new Arrow PyCapsule Interface, which allows libraries to safely interoperate data without memory leaks and without going through pyarrow. This is implemented in pyarrow as of v14+, work is underway to add it to arrow-rs, and presumably nanoarrow support is not too hard to implement.

Primarily Functional API

A functional API makes it easy to take in data without knowing its provenance. Implementations may choose to also implement methods on classes if desired to improve the API usability, but nothing should be implemented solely as a method.

Data Structures

These are the data structure concepts that I think need to be first-class. Each core implementation will implement classes that conform to one of these

GeometryArray

This is a logical array of contiguous memory that conforms to the GeoArrow spec. I envision there being PointArray, LineStringArray, etc. classes that are all subclasses of this.

This object should have an __arrow_c_array__ member that conforms to the PyCapsule interface. The exported ArrowSchema must include extension type information (an extension name of geoarrow.* and optionally extension metadata).

Whether the array uses small or large list offsets internally does not matter, but the implementation should respect the requested_schema parameter of the PyCapsule interface when exporting.

GeometryStorageArray?

In geoarrow-rs I've tried to make a distinction between "storage" types (i.e. WKB and WKT) and "analysis" types (i.e. anything zero-copy). This is partially to nudge users not to store data as WKB and operate directly on the WKB repeatedly. Do we want to make any spec-level distinction between storage and analysis arrays? Should every operation accept storage types? I think it should be fine for a function to declare it'll accept only non-storage types, and direct a user to call, say, parse_wkb.

ChunkedGeometryArray

I believe that chunked arrays need to be a first-class data concept. Chunking is core to the Arrow and Parquet ecosystems, and to handle something like unary_union that requires the entire column as input to a single kernel requires understanding some type of chunked input. I envision there being ChunkedPointArray, ChunkedLineStringArray, etc. classes that are all subclasses of this.

This should have an __arrow_c_stream__ member. The ArrowSchema must represent a valid GeoArrow geometry type and must include extension type information (at least a name of geoarrow.* and optionally extension metadata).

This stream should be compatible with Dewey's existing kernel structure that allows for pushing a sequence of arrays into the kernel.

(It looks like pyarrow doesn't implement __arrow_c_stream__ for a ChunkedArray? To me it seems natural for it to exist on a ChunkedArray... I'd be happy to open an issue.)

GeometryTable

For operations like joins, kernels need to be aware not only of geometries but also of attribute columns.

This should have an __arrow_c_stream__ member. The ArrowSchema must be a struct type that includes all fields in the table. The ArrowArray must be a struct array that includes all arrays in the table. At least one child of the ArrowSchema must have GeoArrow extension type information (an extension name of geoarrow.* and optionally extension metadata).

Future proofing

Spatial indexes can be serialized within a table or geometry array by having a struct containing the geometry column and a binary-typed run end array holding the bytes of the index (pending geoarrow discussion).

Not sure what other future proofing to consider.

Module hierarchy

General things to consider:

  • How much appetite for monorepo-based approach? I.e. for shapely interop would you rather have an optional dependency on shapely from geoarrow.pyarrow or have a separate library geoarrow.shapely that's very minimal. (Personally, I could go either way, but if geoarrow.shapely isn't likely to change often, I might lean towards a separate module...?)
  • We presumably can't have import cycles across submodules
  • Versioning? I have to say I don't love requiring all libraries to be at the same version number, like the general Arrow libraries do.

geoarrow.pyarrow

  • Pyarrow-based extension type classes
  • Does not have any external dependencies other than pyarrow
  • Holds and registers pyarrow extension types and extension arrays for all classes.

geoarrow.pandas

  • depends on geoarrow-pyarrow, pyarrow, pandas
  • should it have required? optional? dependencies on other submodules for operations on arrays?

geoarrow.shapely

  • Contains two main functions for I/O between geoarrow geometry arrays and shapely using the shapely to/from ragged array implementation.

    import numpy as np
    from numpy.typing import NDArray
    
    def to_shapely(
        array: ArrowArrayExportable | ArrowStreamExportable
    ) -> NDArray[np.object_]: ...
    def from_shapely(
        arr: NDArray[np.object_],
        *,
        maxchunk: int
    ) -> ArrowArrayExportable | ArrowStreamExportable: ...
  • from_shapely returns pyarrow-based extension arrays. Longer term it also takes a parameter for the max chunk size.

  • depends on geoarrow-pyarrow, shapely

geoarrow.gdal

Wrapper around pyogrio?

geoarrow.c

I'll let dewey give his thoughts here.

  • Dependency free?

geoarrow.rust.core

  • standalone classes, PointArray, LineStringArray, etc
  • future: chunked classes
  • no python dependencies?
  • includes pure-rust algorithms that don't require a c extension module
  • Question: if I don't don't have python dependencies, what do I return? Should I wrap my own versions of a Float64Array and assume the user will call pyarrow.array() on the result? Or should I depend on pyarrow in the short term?

geoarrow.rust.proj, geoarrow.rust.geos

  • Adds C-based dependencies that may not be desired in geoarrow.rust.core.
  • Rust dependency on geoarrow-rs but no python dependencies
  • Only functional, no methods on classes (can't add methods to external objects)

Downsides

  • leaks implementation details: does the user want/need to know what's implemented in rust vs c? Or is that ok because we're targeting advanced users here (and libraries that build on top of geoarrow.* will handle making it simple for end users)?
  • Multiple copies of geometry array definitions. E.g. geoarrow.pyarrow.PointArray, geoarrow.c.PointArray, geoarrow.rust.core.PointArray. This is, in some ways, unfortunate. But it allows users to control dependencies closely. And unavoidable unless functions returned bare PyCapsule objects?
  • Explosion of implementations: function definition in rust, geoarrow.rust.core, geoarrow.pandas, geopolars

Static Typing

A full proposal for static typing is out of the scope of this proposal (and some operations just won't be possible to type accurately).

A few methods will be amenable to generics, as shown below. But ideally every function can be given a return type that matches one of the Arrow PyCapsule protocols. At least in the Rust implementation, I'd like to have type stubs that accurately return type classes (though sadly I'll still have to write the .pyi type stubs by hand).

from typing import Protocol, Tuple, TypeVar, reveal_type


class ArrowArrayExportable(Protocol):
    def __arrow_c_array__(
        self, requested_schema: object | None = None
    ) -> Tuple[object, object]:
        ...


class ArrowStreamExportable(Protocol):
    def __arrow_c_stream__(self, requested_schema: object | None = None) -> object:
        ...


ArrayT = TypeVar("ArrayT", bound=ArrowArrayExportable)
StreamT = TypeVar("StreamT", bound=ArrowStreamExportable)


class PointArray:
    def __arrow_c_array__(
        self, requested_schema: object | None = None
    ) -> Tuple[object, object]:
        ...


class ChunkedPointArray:
    def __arrow_c_stream__(self, requested_schema: object | None = None) -> object:
        ...


def translate(array: ArrayT | StreamT, x: float, y: float) -> ArrayT | StreamT:
    ...


p = PointArray()
p2 = translate(p, 1, 1)
reveal_type(p2)
# Type of "p2" is "PointArray"

cp = ChunkedPointArray()
cp2 = translate(cp, 1, 1)
reveal_type(cp2)
# Type of "cp2" is "ChunkedPointArray"

Error initializing a geoarrow table from pyarrow.lib.ChunkedArray

I am trying to load a csv to a geoarrow table manually using pyarrow but got an error

import gzip
import geoarrow.pyarrow as ga
import pyarrow.csv as pv

with gzip.open("/Users/x/data/points_s2_level_4_gzip/397_buildings.csv.gz") as fp:
        table = pv.read_csv(fp)

points = ga.point().from_geobuffers(None, table["latitude"], y=table["longitude"])

Screenshot 2023-10-27 at 10 15 48โ€ฏAM

[BUG] Failed to initialize GeoArrowSchemaView

Hi, I'm adding this issue because Kyle kindly asked ๐Ÿ˜…

I tried to use geoarrow.rust.core in combination with lonboard to display some data.
It didn't work (geometry has mixed types), however the exception in stacktrace came from geoarrow-pyarrow and geometry-c underneath.

File for tests: monaco_nofilter_noclip_compact.zip

Stacktrace:

from geoarrow.rust.core import read_parquet
from lonboard import viz

table = read_parquet(osm_data_path)
viz(table)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[52], line 1
----> 1 viz(table)

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
    138     layers = [
    139         create_layer_from_data_input(
    140             item,
   (...)
    146         for i, item in enumerate(data)
    147     ]
    148 else:
    149     layers = [
--> 150         create_layer_from_data_input(
    151             data,
    152             _viz_color=color_ordering[0],
    153             scatterplot_kwargs=scatterplot_kwargs,
    154             path_kwargs=path_kwargs,
    155             polygon_kwargs=polygon_kwargs,
    156         )
    157     ]
    159 map_kwargs = {} if not map_kwargs else map_kwargs
    161 if "basemap_style" not in map_kwargs.keys():

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:204, in create_layer_from_data_input(data, **kwargs)
    202 if hasattr(data, "__arrow_c_stream__"):
    203     data = cast("ArrowStreamExportable", data)
--> 204     return _viz_geoarrow_table(pa.table(data), **kwargs)
    206 # Anything with __geo_interface__
    207 if hasattr(data, "__geo_interface__"):

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/table.pxi:5221, in pyarrow.lib.table()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/ipc.pxi:880, in pyarrow.lib.RecordBatchReader._import_from_c_capsule()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:88, in pyarrow.lib.check_status()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/geoarrow/pyarrow/_type.py:58, in GeometryExtensionType.__arrow_ext_deserialize__(cls, storage_type, serialized)
     55 schema = lib.SchemaHolder()
     56 storage_type._export_to_c(schema._addr())
---> 58 c_vector_type = lib.CVectorType.FromStorage(
     59     schema, cls._extension_name.encode("UTF-8"), serialized
     60 )
     62 return cls(c_vector_type)

File src/geoarrow/c/_lib.pyx:496, in geoarrow.c._lib.CVectorType.FromStorage()

File src/geoarrow/c/_lib.pyx:375, in geoarrow.c._lib.CVectorType._move_from_ctype()

ValueError: Failed to initialize GeoArrowSchemaView: Expected valid list type for coord parent 1 for extension 'geoarrow.multipoint'

Filter geometries based on type

Hi, I'm wondering if it would be possible to have a WkbType column and filter out geometries based on a given type (Point, LineString, Polygon etc). There are some compute functions available, there even is unique_geometry_types, but I'm not sure if any of those could help me in my use case.

Pandas integration does not symmetrically store and load with feather format

I am playing around with the geoarrow.pandas integration and found something odd; if I load a data frame containing a geometry column it will successfully load and display the geometry correctly but I am unable able to do anything with it. Anything I try (e.g. df.geometry.geoarrow.*) produces the following error:

TypeError: Can't create geoarrow.array from Arrow array of type None

I created the file like this:

import geoarrow.pyarrow as ga
import geoarrow.pandas as _
import pandas as pd
import numpy as np

points = np.random.rand((1 << 20, 2))

df = pd.DataFrame({
    "geometry": ga.point().from_geobuffers(
        None,
        points[:, 0],
        points[:, 1]
    )
})

df.to_feather('points.feather')

and I load the file like this

import geoarrow.pyarrow as ga
import geoarrow.pandas as _
import pandas as pd

df = pd.read_feather("points.feather")

# Example operations that produce the above error
df.astype({ 'geometry': 'geoarrow.wkt' })
x, y = df.geometry.geoarrow.point_coords()
# etc.

Request for Interleaved Coordinate Format Support in GeoArrow Specification "FixedSizeList of interleaved values (i.e., [x, y, x, y, ...])"

Problem Statement

The GeoArrow specification currently supports coordinates encoded as a Struct array, storing the coordinate values as separate arrays (i.e., x: [x, x, ...], y: [y, y, y, ...]) and a FixedSizeList of interleaved values (i.e., [x, y, x, y, ...]). When integrating the arrow file into DeckGL, the binary format requires the interleaved coordinate format. With the current ga.as_geoarrow defaulting to Struct array encoding, there is a need to create the interleaved array from the individual x and y arrays, which introduces some overhead on the frontend.

Feature Request

Is there a way to obtain the interleaved format from the as_geoarrow function? Or a trick I am missing to do this with the current functionalities?

Thank you for all the amazing work on this project. Super cool!

Question about a new release

Hello,

I was wondering if you could publish a new release of the library in the near future.
I'd love to use new functions (especially io from and to geoparquet format) in my python library, but putting a git development url in dependencies isn't really sustainable in the long run ๐Ÿ˜„

Read from clud storage using geoarrow.pyarrow.dataset

Hi!

I was wondering if geoarrow-python has support for reading from geoparquet that is stored on a Azure blob storage? I know Pyarrow has it but uncertain if geoarrow-python has it. If it does have could you show an example of some sort?

Thank you !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.