Coder Social home page Coder Social logo

Comments (8)

snowman2 avatar snowman2 commented on June 9, 2024

This is a great start for the discussion. I like how you have clearly laid out everything and have some great points. I have a couple of items to add to your comments that could potentially aid in the discussion.

  1. From my experience, the active method can support both geographic and projected coordinate systems. You just specify the projection in the crs coordinate connected to the grid_mapping attribute and put the correct coordinates in x and y and it works. One alternative I have seen to differentiate that the system is to put the coordinates in longitute and latitude variables. This is a nice alternative as it makes it more clear that it is a geographic coordinate system. However, it is not required as the crs coordinate can store the projection information and would be just fine without.

  2. With a slightly more complex design, the active method could support multiple projections in the same dataset via the grid_mapping attribute on each variable and additional crs coordinates. There is probably a more elegant method, but this is just food for thought at the moment.

Here is an example of how multiple CRS in the same dataset might work in the active method:

<xarray.Dataset>
Dimensions:      (x: 65, y: 31, x1: 30, y1: 30)
Coordinates:
  * x            (x) float64 ...
  * y            (y) float64  ...
  * x1           (x1) float64 ...
  * y1           (y1) float64  ...
    time         datetime64[ns] ...
    crs          int64 ...
    crs1          int64 ...
Data variables:
    ndvi         (y, x) float64 ...
    ndvi1         (y1, x1) float64 ...
Attributes:

And on the variables:

<xarray.DataArray 'ndvi' (y: 31, x: 65)>
array([[ ...]])
Coordinates:
  * x            (x) float64 ...
  * y            (y) float64 ...
    time         datetime64[ns] ...
    crs          int64 0
Attributes:
    grid_mapping:  crs
<xarray.DataArray 'ndvi1` (y1: 30, x1: 30)>
array([[ ...]])
Coordinates:
  * x1           (x1) float64 ...
  * y1            (y1) float64 ...
    time         datetime64[ns] ...
    crs1          int64 0
Attributes:
    grid_mapping:  crs1

from geoxarray.

djhoese avatar djhoese commented on June 9, 2024

@snowman2 Thanks. For your first point, metpy currently does this by creating a CRS stating it is handling lat/lon data. Really it looks like it is trying to see if the coordinates for a variable have coordinates that are lon/lat variables: https://github.com/Unidata/MetPy/blob/master/metpy/xarray.py#L187

For 2, I could see one big issue with this being that attributes tend to disappear by default in xarray so any arithmetic on these DataArrays/Dataset and the grid_mapping attribute would disappear. Right?

from geoxarray.

djhoese avatar djhoese commented on June 9, 2024

Also related: Unidata/MetPy#960

from geoxarray.

snowman2 avatar snowman2 commented on June 9, 2024

I could see one big issue with this being that attributes tend to disappear by default in xarray so any arithmetic on these DataArrays/Dataset and the grid_mapping attribute would disappear. Right?

Yep, that is definitely something to be wary about. If you choose to go down this route, you would have to have it well documented and maybe raise warnings/exceptions if it is missing when more than one crs like coordinate is present. If you want to produce a CF compliant file, the grid_mapping attribute will definitely have to be there. 🤔

from geoxarray.

jthielen avatar jthielen commented on June 9, 2024

@djhoese MetPy's handling of simultaneous x/y and longitude/latitude might not be optimal right now. Currently, the line you referenced should only come into play if the grid_mapping on a variable is missing (either the variable corresponding to the attribute or the attribute itself) and longitude and latitude coordinates are found. Otherwise, the attributes of the variable corresponding to the grid_mapping are used directly create the CFProjection object, which becomes the crs coordinate (see here).

In terms of coordinate recognition, if both projection x/y coordinates and lon/lat coordinates are present for a variable, MetPy will try to default to the projection coordinates, and if those are not unique, then to dimension coordinates, as in this function:

https://github.com/Unidata/MetPy/blob/master/metpy/xarray.py#L345-L368

I'm not sure if this is the best approach, but this is what we had settled on this past summer.

from geoxarray.

fmaussion avatar fmaussion commented on June 9, 2024

Thanks for the summary @djhoese ! My 2 cents:

  • I think you won't be able to avoid active handling altogether - especially if you want to support less conventional datasets. I know of modelling products which have parsable CRS at the dataset level only (not the variable level). If you want to support these, you'll have to actively add (meta)data to all variables of the dataset beforehand.
  • an elegant way to solve this (I believe) it to offer a dataset constructor at the library level: gxr.open_geodataset() (or simply: gxr.open_dataset). This is how salem does it.
  • The active version doesn't need to be too intrusive, i.e. if possible add things as opposed to change things.
  • As mentioned earlier, the CRS info has to be stored as a coordinate, not an attr. I don't see much downsides with this approach though.

from geoxarray.

djhoese avatar djhoese commented on June 9, 2024
  • For your first point: understood. Things like these are why the SatPy library exists for satellite instrument data. People keep inventing new standards even within standards ("your file says it follows CF conventions but why are your flag meanings only in a 'description' attribute?").
  • Agreed. Although, since geoxarray will probably be limited to only geolocation information parsing I could see having an open_dataset that does extra stuff that is only available in that function being an issue. For example, if another library provides an open_dataset to handle some specific file format but doesn't handle geolocation then geoxarray would need to provide a way to parse the geolocation post-open. This may be obvious, just thinking "out loud".
  • I guess that would be the goal and if the user calls a "parse" like method then I suppose there could be keyword arguments for what it is allowed to do with the understanding by the user that future operations may not work perfectly. For example, you tell geoxarray not to add/overwrite x/y coordinates then you run the risk of xarray merging things in undesired ways.
  • The only downside is as I said earlier, that if the coordinate is called crs in every DataArray then putting data in to a Dataset is not allowed by xarray because of conflicting CRS coordinates. This is mainly an issue for CF-compliant NetCDF files where they could have multiple "grid_mapping" variables.

I guess an extension of this issue's general question is: should users be expected to know what "structure" their data is in? Should they know that geolocation can be retrieved from conventional CF variables (open_cf_dataset) or should geoxarray be able to figure it out? I think geoxarray could probably figure things out in most cases and either raise an exception or produce a warning or maybe silently not provide any geolocation information if it can't figure it out. Geoxarray can ultimately provide a way to customize how it parses geolocation.

from geoxarray.

djhoese avatar djhoese commented on June 9, 2024

I'm going to close this issue as I think in #24 and the initial 0.1.0 version (now released), geoxarray follows kind of an half-way design between active and passive behavior. On one hand, doing .geo.crs will try its best to find "standard" CRS information. On the other, geoxarray needs some in-memory way of storing CRS information that doesn't get lost between operations...or at least not easily lost. @snowman2's experience with rioxarray is a huge benefit to geoxarray when it comes to this point and I've followed those lessons learned in the current version of geoxarray. Mainly:

  1. When asked to store CRS information (via .geo.write_crs), store it in a CF-compatible manner.
  2. Store the grid mapping variable as a coordinate variable in .coords.
  3. Store the name of the grid mapping variable to use for each DataArray in the .encoding of that DataArray.
  4. Use crs_wkt as a fallback for CRS cases that CF doesn't completely define while still being CF compliant.
  5. Replicate that WKT to a spatial_ref attribute in the grid mapping variable for GDAL compatibility.

Until I/we/geoxarray have more use cases (data loading all the way to final usage), I don't think we can say for certain which way each possible feature of geoxarray should be implemented. The basic functionality have an inplace keyword argument gets around a lot of the "hhhmmm should we really be modifying the user's object?" type of confusion/decisions.

People can feel free to continue to comment here, but in my opinion specific cases should probably be discussed in new issues.

from geoxarray.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.