Coder Social home page Coder Social logo

geoxarray's Introduction

Geoxarray

image

image

image

image

pre-commit.ci status

Geolocation utilities for xarray objects. Geoxarray is meant to bring together all of the features and conversions needed by various python packages working with geolocation xarray objects. This means being able to convert between various coordinate system implementations (rasterio, cartopy, pyresample, NetCDF CF grid mapping, etc). It also means providing basic access to properties of the geolocation information like bounding boxes.

Installation

The geoxarray library is available on PyPI and can be installed with pip:

pip install geoxarray

For the most recent development versions of geoxarray, it can be installed directly from the root of the source directory:

pip install -e .

Or to install into an existing conda-based environment:

.. code-block:: bash

conda install -c conda-forge geoxarray

Dependencies

Besides the xarray dependency, the geoxarray package uses CRS objects from the pyproj library. Additionally, geoxarray has a lot of optional dependencies when it comes to converting to other libraries' CRS or geolocation objects. These libraries include, but may not be limited to:

  • rasterio
  • cartopy
  • pyresample

Relationship with rioxarray

At the time of writing, rioxarray is an independent project whose features related to CRS and dimension handling are very similar if not exactly the same as geoxarray. Rioxarray existed first and paved the way to show how CRS information can be handled in an xarray-friendly way. Much of geoxarray is inspired by rioxarray, if not copied directly. Portions of code copied from rioxarray are noted in docstrings for that code and are under the Apache License of rioxarray which has been copied as LICENSE_rioxarray in the geoxarray package and repository.

Development Status

Geoxarray is actively being developed as a side project. Additions and modifications are done as developers have time. If you would like to contribute, suggest features, or discuss anything else please file a bug on github.

Features

See the documentation website for how-tos, concepts, and API documentation.

Parse various formats of storing Coordinate Reference System information:

import geoxarray

pyproj_crs = my_data_arr.geo.crs

Write CRS information in a CF-compatible way or add CRS information to an object that didn't have any:

import geoxarray

new_dataset = my_dataset.geo.write_crs("EPSG:4326", inplace=False)

geoxarray's People

Contributors

dependabot[bot] avatar djhoese avatar mraspaud avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geoxarray's Issues

MNT: Stop using ci-helpers in appveyor.yml

To whom it may concern,

If you are using https://github.com/astropy/ci-helpers in your appveyor.yml , please know that the Astropy project has dropped active development/support for Appveyor CI. If it still works, good for you, because we did not remove the relevant files (yet). But if it ever stops working, we have no plans to fix anything for Appveyor CI. Please consider using native Windows support other CI, e.g., Travis CI (see https://docs.travis-ci.com/user/reference/windows/). We apologize for any inconvenience caused.

If this issue is opened in error or irrelevant to you, feel free to close. Thank you.

xref astropy/ci-helpers#464

Use new proj.4 to manage projections

https://gdalbarn.com. This is mostly a placeholder. Ideally pyproj would be updated to support the new functionality. But, it all depends on if this is a dependency you want to add. Apparently it will support WKT2. Just food for thought ๐Ÿ”

Consider any lessons learned by CF group for defining swaths

I just found this old-ish video about CF convention changes related to geo-data and there is a small section at the end about defining swaths: https://youtu.be/79e3GC74y_w?t=2865

Although the video doesn't necessary suggest anything new that could be used for geoxarray, I'm creating this issue as a reminder to look in to what the final accepted CF convention is for this to see if it is is something xarray or geoxarray should parse and adopt for the in-memory representation of non-uniform (not gridded) swath data.

Side note: I wonder if I could make an object that mostly acts like a DataArray and can be used in a .coords coordinate without being computed. Basically, trick xarray in to thinking it isn't something that can be computed and use a simple hash (or similar) for equality checking.

Project Status

A lot of things have been changing in the scientific python ecosystem regarding geolocated data. Based on some of these changes, I consider this project to be in limbo until other projects settle on what they support and can do. Basically, I'm not sure I can dedicate time to this project when I know that large chunks of it may change in the near future (2019). Below I'll describe things I'm aware of. If you know of any other activities by other related projects that may affect this project, please comment below. Feel free to ask questions here as well.

Pangeo

@rabernat of the Pangeo community has been gathering information from various "gridding" packages in the python community. The beginning of this discussion is in pangeo-data/pangeo#356 and has since continued over video conference meetings and the Pangeo gitter. The goal of these conversations has been to get all of the various uses of "grids", what they are, and how they are defined. If efforts from the various available packages can be combined, then they should be. @rabernat is planning a blog post on the topic. Hopefully that can be linked here when it is complete.

IMO geoxarray is one avenue that could be taken to have one package that assists all of these efforts. However, my original goals for geoxarray and the type of data is was meant to support are simpler than the user cases I've heard of so far through these discussions.

PROJ versus WKT versus Other

The PROJ C library, pyproj, rasterio, libgeotiff, and gdal have been seeing a lot of changes regarding how a coordinate reference system (CRS) can be represented, where these representations get analyzed, and what the best defaults for libraries and users are. If I recall correctly @snowman2 was a big proponent of using WKT (Well Known Text) over the PROJ.4 string (or I guess it should just be "PROJ" now). If PROJ string do not properly/completely describe a CRS then it may be best to use something else that does. If the community (see libraries above) are leaning towards using WKT over PROJ then perhaps geoxarray should use that internally.

These libraries, or maybe just PROJ, have transitioned to different types of objects (CRS maybe?) to represent a CRS. If this can serve every purpose that I wanted the pycrs project to serve then I don't see a reason to use pycrs except for the "pure python" aspect. If everyone else is using PROJ or some other low-level library then so should geoxarray. I've been using PROJ strings for the last 8 or so years so I'm open to suggestions from others and feedback on anything I'm misunderstanding.

2D dimensions in xarray

There have been some efforts to better support 2D dimensions/coordinates in xarray. This would be a big deal for geoxarray for ungridded/non-uniform data (2D longitude/latitude arrays). From what I've been pointed to through the pangeo discussions I see:

Now while this doesn't affect gridded data, which should be using Affine-like descriptions or 1D x/y dimensions, this still has a big impact on what people can do with xarray and what I would want geoxarray to be able to do.

</braindump>

Design: active versus passive coordinate assignment

I'm going to start making issues here on this repository so pydata/xarray#2288 doesn't get any longer. This issue is regarding what the best and most useful way is for geoxarray to add or provide information to xarray objects. This library will likely be heavily influenced by metpy's handling (@dopplershift) of CF metadata: https://github.com/Unidata/MetPy/blob/master/metpy/xarray.py#L152. From talking with @dopplershift on gitter, he mentioned @shoyer had suggested adding a crs coordinate to DataArrays.

The main question is should geoxarray assign things to the user provided xarray object (what I'm calling "active" behavior) or should geoxarray make things available for other libraries to use by calculating things when needed ("passive"). I haven't fully laid out the design for the passive behavior, that's what this issue is for.

Active

Basic behavior:

  • Requires user to request the change (ex. new_ds = my_dataset.metpy.parse_cf())
  • Existing coordinates and attribute may be changed (x and y dimensions) to match expectations and "standardize" the data object
  • Coordinates will be added (my_data_arr.coords['crs'] = CRS(...))
  • Anyone using the new DataArray or Dataset can assume that x and y coordinates are in meters and that a crs coordinate defines the CRS.

Pros:

  • Arithmetic and merging takes the CRS coordinate in to account
  • Makes the data more useable in the xarray model without knowing of the parsing library (metpy/geoxarray)

Cons:

  • Requires changing metadata that the user may have wanted use
  • Cannot handle Datasets with multiple CRSs.

Passive

Basic behavior:

  • CRS and coordinate information is calculated/determined when requested (ex. my_data_arr.geo.crs where geo is the geoxarray accessor)
  • Information is cached and stored in the xarray accessor
  • Any special handling of metadata can be configured (ex. my_data_arr.geo.set_xy_dims('x2', 'y2')) where handling all variants of dimension/coordinate names would be difficult otherwise.

Pros:

  • Doesn't modify the data the user is using
  • Handles DataArrays separately so multiple CRSes in the same Dataset are allowed (technically)

Cons:

  • No handling of CRS in arithmetic and merging
  • Requires the using library (cartopy, etc) to know stuff about geoxarray or for geoxarray to pass information in a way that the library understands. This is technically needed by both.

Example usages

See also MetPy's example.

import xarray as xr
import geoxarray as gxr

my_dataset = xr.open_dataset('/path/to/file.nc')

# active (taken from metpy)
data_var = my_dataset.geo.parse_cf('Temperature')
...
ax = fig.add_subplot(1, 1, 1, projection=data_var.geo.cartopy_crs)
x = data_var.x
y = data_var.y
ax.imshow(im_data, extent=(x.min(), x.max(), y.min(), y.max()),
          cmap='RdBu', origin='lower' if y[0] < y[-1] else 'upper')

# passive

data_var = my_dataset['Temperature']
ax = fig.add_subplot(1, 1, 1, projection=data_var.geo.cartopy_crs)
x = data_var.geo.x
y = data_var.geo.y
ax.imshow(im_data, extent=(x.min(), x.max(), y.min(), y.max()),
          cmap='RdBu', origin='lower' if y[0] < y[-1] else 'upper')

Conclusion

I like active better I think even though passive was my original plan for solving things. I think both methods require more information from the user in special cases (accepting non-CF/non-standard coordinate/dimension names). Both require the user or using library to expect certain things about the data or require the user to know how to pass things to that library (cartopy_crs, etc). Some of the metpy CRS handling may also be handled by the pycrs library.

My biggest flexibility issue was the active method not being able to handle multiple CRS in one Dataset, but realize now that the user just needs to split the dataset up by CRS.

Obviously feedback and concerns are welcome (@mraspaud, @pnuu, @leouieda, @fmaussion, @snowman2, @karimbahgat).

Collaboration with salem package

@fmaussion How did your salem package never come up in pydata/xarray#2288? Do you see an overlap with the geoxarray package? What are the goals of salem and where do you see it going in the future? Should geoxarray's proposed functionality be absorbed by xarray?

For reference for anyone else: https://salem.readthedocs.io/en/latest/index.html

Also @fmaussion we (pytroll group) or salem may be able to benefit from collaborating between salem and our pyresample and pycoast packages. Both packages probably need xarray accessors and better interfaces overall, but they work.
https://pyresample.readthedocs.io/en/latest/
https://pycoast.readthedocs.io/en/latest/

CC: @mraspaud @pnuu

custom `numpy` dtypes

I wonder if it would be worth investigating if it is possible to implement a custom numpy dtype (see NEP 40-43), similar to geopandas' GeometryArray? We can probably try to interface with odc-geo for this.

There's not a lot of documentation so far (besides the NEPs), but there is a experimentation repository that can serve as a reference. It is written in C, though.

Ideas for crs accessor to xarray

Hey @djhoese !

I write down here some thoughts I had for the design of an interface to facilitate the reading and writing of CF-compliant netCDF with CRS information of structured grids. Do you think other people would be interested to finalize this accessor?

Such accessor could be placed in a lightweight separate package and:

  • Within pytroll, it could replace lot of code in satpy CFWriter and/or be used by pyresample to retrieve the AreaDefinition from dataset.
  • In rioxarray, it could simplify and improve the rioxarray.rioxarray.XRasterBase class
  • Facilitate user life when dealing with the creation of CF-compliant netCDF regarding the CRS and area information
  • Improve compatibility across the CRS python ecosystem

Regarding package dependencies, pyproj, rasterio, pyresample and cartopy would be only optional.

Currently, I just have coded stuff for the CF-Writer utility (available here).
Currently, these codes enable to write CF-compliant netCDF (with respect to CRS) which are displayed correctly also on QGIS.

The remaining code could be "borrowed" from rioxarray, pyresample and xgeo

Here below the interface design I thought:

CF-Writer utility

ds.crs.remove() # remove crs coordinates, grid_mapping and coordinates variable attributes 
ds.crs.add(crs, grid_mapping_name="spatial_ref", inplace=False)  # Add CF-compliant info 

CF-Reader utility

da.crs.is_swath()          # Or is_non_uniform_grid/area
da.crs.is_area()           # Or is_uniform_grid/area
da.crs.to_pyproj_crs(area_crs=True)    # If 2 crs(area and WGS84 for lon/lat) return only area
da.crs.to_rasterio_crs(area_crs=True)  # If 2 crs(area and WGS84 for lon/lat) return only area
da.crs.to_pyresample_area()            # If exists
da.crs.to_pyresample_swath()
da.crs.to_cartopy_projection()

CRS information

da.crs.is_geographic()   
da.crs.is_projected()   
da.crs.is_geostationary()        # Not in rasterio and pyproj crs 
da.crs.is_polar()                # Not in rasterio and pyproj crs 

Area Information

Area Bounds (lower left, upper right)

  • pyproj area_of_use.bounds: [west, south, east, north]
  • rasterio.transform.from_bounds: [west, south, east, north]
  • rasterio.BoundingBox: [left, bottom, right, top]
  • rioxarray: [left, bottom, right, top]
  • pyresample area_extent: [x_lower_left, y_lower_left, x_upper_right, y_upper_right]
  • cartopy projection bounds: [west, east, south, north] (code here) follow another convention
  • matplotlib imshow extent: [left, right, bottom, top] follow another convention
    --> Available only for area, not swath !
da.crs.area_bounds    #  (or area_extent)

Area shape, resolution, and transform

da.crs.shape           # (width, height)(x, y) or (heigth, width) (y, x)
da.crs.resolution      # (x, y)  (only for uniform area)
da.crs.transform       # rioxarray, rasterio, gdal  (only for uniform area)

Bouding Box [x_min, x_max, y_min, y_max]

  • cartopy set_extent: [x_min, x_max, y_min, y_max)
da.crs.bbox              # For uniform_area same as area_bounds (but rearranged). For non-uniform area it computes min/max of the boundary
da.crs.bbox_lonlat       # For non-uniform area (i.e. swath) same as bbox. For uniform area need to project boundary coords. If polar wrapping area or crossing antimeridian, enforce -180, 180 on x ? 
da.crs.to_shapely_bounds # [x_min, y_min, x_max, y_min]

Design for CRS extension

Continuation from: corteva/rioxarray#4 (comment)

Related to #2 & #8

Here are the basic needs for rioxarray and probably other geospatial projects as well:

  1. A way to retrieve a pyproj.CRS based on the information in the file. (I think pyproj.CRS.from_cf() will be useful here)

    It would be useful to have an attribute such as:

    xdf.geo.crs
    

    See:

  2. A way to set a pyproj.CRS if it does not exist/is incorrect. (I think pyproj.CRS.from_cf() and pyproj.CRS.from_user_input() will be useful here)

    It would be useful to be able to set the attribute such as:

    Idf.geo.crs = "epsg:4326"
    xdf.geo.crs_from_cf(...)
    

    See xgeo projection setter

  3. A way to update the CRS information on the Dataset/DataArray in a CF compliant way. (I think pyproj.CRS.to_cf() will be useful here)

    Something like:

    xdf_cf = xdf.geo.write_crs(version="WKT1_GDAL")
    

    Also see add_spatial_ref

Handling of CRS objects in coordinates

I started looking more at 2D coordinate arrays (especially when they are dask arrays) and using a pyproj CRS object as a coordinate and ran in to a weird situation. I'm hoping @snowman2 or someone else can help me decipher what is happening here.

from pyproj import CRS
import xarray as xr
import dask.array as da

crs1 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=25 +lat_1=25')
crs2 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=35 +lat_1=35')

a = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'yc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'xc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'crs': crs1})
b = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'yc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'xc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'crs': crs2})
c = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'yc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'xc': (('y', 'x'), da.zeros((5, 5), chunks=3)), 'crs': crs1})

# Adding DataArrays with different CRS coordinates
a + b

# Results in:
# <xarray.DataArray 'zeros-e5723e7f9121b7ac546f61c19dabe786' (y: 5, x: 5)>
# dask.array<shape=(5, 5), dtype=float64, chunksize=(2, 2)>
# Coordinates:
#     yc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
#     xc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
# Dimensions without coordinates: y, x

# Adding DataArrays with the same CRS coordinates
a + c

# Results in:
# <xarray.DataArray 'zeros-e5723e7f9121b7ac546f61c19dabe786' (y: 5, x: 5)>
# dask.array<shape=(5, 5), dtype=float64, chunksize=(2, 2)>
# Coordinates:
#     yc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
#     xc       (y, x) float64 dask.array<shape=(5, 5), chunksize=(3, 3)>
#     crs      object +proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=25 +lat_1=25 +type=crs
# Dimensions without coordinates: y, x

I would have expected a + b to not be allowed since they differ and are not equal. Thoughts? Is this expected?

State of this package

Is geoxarray a fully functional package?
The docs doesn't even show how to reproject a dataarray between two crs. I would like to test this feature, if it is a feature.

Renaming 'master' to 'main'

Hi everyone,

I'm hoping to start work on this project again. I've been renaming a lot of my projects to use a "main" branch instead of "master" and decided to do the same for geoxarray. The term is offensive (needlessly as other branch names would do) and at this point it is inconsistent with a large number of projects and future github projects (which now default to main). If you have an issue with this change please feel free to contact me directly. I'm just making this issue to publicly say "yes, I did it and I did it on purpose".

Now let's see what I can turn this project into...

Add `sel` and `loc` methods for selection by standard names

One concept that came up while talking at SciPy 2019 with @dopplershift and @snowman2 was the feature in metpy to do data_arr.sel(x=slice(...), y=slice(...)) even though the dimensions of the DataArray may not be named with the preferred names.

One thing that @snowman2 talked about was a similar method he has in rioxarray that in addition to allowing the dimension name mapping it will also reorder the slices for x and y dimensions if they are "backwards". This usually happens if the data is flipped for some reason from normal +x (right) and +y (up) orientation. The method doesn't affect the data but still allows users to select a subset of the data. Without this no data would be returned.

I wanted to write this down before I forget.

Design differences with rioxarray

@snowman2 I'm getting back into this project and I'm hoping you can help me with some lessons learned from rioxarray and help guide how I implement things. I've been reviewing parts of rioxarray's accessor, specifically CRS/grid_mapping handling and I have at least one question and hope to continue the discussion here for anything else that comes up. There is a good chance that I'll be copying large chunks of rioxarray (let's be nice and say "extracting") for CRS handling...if that's OK with you and you don't advise against it.

Question 1: I noticed you use (read and write) some things to .encoding like "grid_mapping" (https://github.com/corteva/rioxarray/blob/33c0059ae97f108d631dad2ede721f5a0d4c18dc/rioxarray/rioxarray.py#L385-L387). Is there a functional reason to putting things like this in .encoding or are you just trying to cache things in a less visible place compared to .coords? Does xarray actually do something with those values being in .encoding?

Question 2: I think I've asked this in other places, but if you ignore backwards compatibility, what things would you have done differently in rioxarray as far as dimension and CRS handling?

Related #7, #8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.