Coder Social home page Coder Social logo

Comments (9)

andersy005 avatar andersy005 commented on August 18, 2024

Yeah +1 on moving things upstream.

From a quick look, it seems like esmlab basically adds time-aware weighted groupby/resampling; time bounds consistency; and time units/calendargpropagation (pydata/xarray#1614).

For the last one, it would be good to document how much of the table in pydata/xarray#1614 is implemented and what's left.

Originally posted by @dcherian in #156 (comment)

from esmlab.

andersy005 avatar andersy005 commented on August 18, 2024

@dcherian

From a quick look, it seems like esmlab basically adds time-aware weighted groupby/resampling; time bounds consistency; and time units/calendargpropagation

This is correct.

We are relying on the time bounds for all time-aware operations.

  1. Decoding Time: Internally, esmlab decodes time it by using the time bounds variable. Xarray doesn't seem to be aware of time-bounds variable when decoding time:
In [1]: import xarray as xr                                                                                                                        

In [2]: import numpy as np                                                                                                                         

   ...:     ds['variable_1'] = xr.DataArray( 
   ...:         np.append( 
   ...:             np.zeros([12, 2, 2], dtype='float32'), np.ones([12, 2, 2], dtype='float32'), axis=0 
   ...:         ), 
   ...:         dims=['time', 'lat', 'lon'], 
   ...:     ) 
   ...:     ds['variable_2'] = xr.DataArray( 
   ...:         np.append( 
   ...:             np.ones([12, 2, 2], dtype='float32'), np.zeros([12, 2, 2], dtype='float32'), axis=0 
   ...:         ), 
   ...:         dims=['time', 'lat', 'lon'], 
   ...:     ) 
   ...:     ds.time.attrs['units'] = 'days since 0001-01-01 00:00:00' 
   ...:     ds.time.attrs['calendar'] = 'noleap' 
   ...:     ds.time.attrs['bounds'] = 'time_bound' 
   ...:     return ds.copy(True) 
   ...:                                                                                                                                            

In [4]: ds = create_dataset()                                                                                                                      

In [5]: ds                                                                                                                                         
Out[5]: 
<xarray.Dataset>
Dimensions:     (d2: 2, lat: 2, lon: 2, time: 24)
Coordinates:
  * lat         (lat) int64 0 1
  * lon         (lon) int64 0 1
  * time        (time) float64 31.0 59.0 90.0 120.0 ... 638.0 669.0 699.0 730.0
  * d2          (d2) int64 0 1
Data variables:
    time_bound  (time, d2) float64 0.0 31.0 31.0 59.0 ... 699.0 699.0 730.0
    variable_1  (time, lat, lon) float32 0.0 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0
    variable_2  (time, lat, lon) float32 1.0 1.0 1.0 1.0 1.0 ... 0.0 0.0 0.0 0.0

In [6]: xr.decode_cf(ds)                                                                                                                           
Out[6]: 
<xarray.Dataset>
Dimensions:     (d2: 2, lat: 2, lon: 2, time: 24)
Coordinates:
  * lat         (lat) int64 0 1
  * lon         (lon) int64 0 1
  * time        (time) object 0001-02-01 00:00:00 ... 0003-01-01 00:00:00
  * d2          (d2) int64 0 1
Data variables:
    time_bound  (time, d2) object ...
    variable_1  (time, lat, lon) float32 ...
    variable_2  (time, lat, lon) float32 ...

Notice how the time axis is shifted to the right by one month.

Ideally, you would want the time to be:

In [10]: import cftime            
In [14]: cftime.num2date(ds.time_bound.mean(dim='d2'), units='days since 0001-01-01 00:00:00', calendar='noleap')                                  
Out[14]: 
array([cftime.DatetimeNoLeap(0001-01-16 12:00:00),
       cftime.DatetimeNoLeap(0001-02-15 00:00:00),
       cftime.DatetimeNoLeap(0001-03-16 12:00:00),
       cftime.DatetimeNoLeap(0001-04-16 00:00:00),
       cftime.DatetimeNoLeap(0001-05-16 12:00:00),
       cftime.DatetimeNoLeap(0001-06-16 00:00:00),
       cftime.DatetimeNoLeap(0001-07-16 12:00:00),
       cftime.DatetimeNoLeap(0001-08-16 12:00:00),
       cftime.DatetimeNoLeap(0001-09-16 00:00:00),
       cftime.DatetimeNoLeap(0001-10-16 12:00:00),
       cftime.DatetimeNoLeap(0001-11-16 00:00:00),
       cftime.DatetimeNoLeap(0001-12-16 12:00:00),
       cftime.DatetimeNoLeap(0002-01-16 12:00:00),
       cftime.DatetimeNoLeap(0002-02-15 00:00:00),
       cftime.DatetimeNoLeap(0002-03-16 12:00:00),
       cftime.DatetimeNoLeap(0002-04-16 00:00:00),
       cftime.DatetimeNoLeap(0002-05-16 12:00:00),
       cftime.DatetimeNoLeap(0002-06-16 00:00:00),
       cftime.DatetimeNoLeap(0002-07-16 12:00:00),
       cftime.DatetimeNoLeap(0002-08-16 12:00:00),
       cftime.DatetimeNoLeap(0002-09-16 00:00:00),
       cftime.DatetimeNoLeap(0002-10-16 12:00:00),
       cftime.DatetimeNoLeap(0002-11-16 00:00:00),
       cftime.DatetimeNoLeap(0002-12-16 12:00:00)], dtype=object)

Is this something that is reasonable to implement in xarray or is too domain-specific?

  1. Wrong results when applying ops such as resample() on the time axis. Because of the way xarray decodes the time, the user ends up getting wrong results when they try to do some operations along the time axis:

    • For instance, when applying annual resampling on the above dataset, we end up getting three years instead of two years:
In [19]: ds.resample(time='A').mean()                                                                                                              
Out[19]: 
<xarray.Dataset>
Dimensions:     (d2: 2, lat: 2, lon: 2, time: 3)
Coordinates:
  * time        (time) object 0001-12-31 00:00:00 ... 0003-12-31 00:00:00
  * lat         (lat) int64 0 1
  * d2          (d2) int64 0 1
  * lon         (lon) int64 0 1
Data variables:
    variable_1  (time, lat, lon) float32 0.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0
    variable_2  (time, lat, lon) float32 1.0 1.0 1.0 1.0 ... 0.0 0.0 0.0 0.0
  1. xarray discards the time_bounds variable. As a user, I would expect xarray to keep the time_bounds variable, and generate new values for it accordingly. For instance:
In [26]: esmlab.resample(create_dataset(), freq='ann')                                                                                             
Out[26]: 
<xarray.Dataset>
Dimensions:     (d2: 2, lat: 2, lon: 2, time: 2)
Coordinates:
  * lat         (lat) int64 0 1
  * lon         (lon) int64 0 1
  * time        (time) float64 181.7 546.7
  * d2          (d2) int64 0 1
Data variables:
    time_bound  (d2, time) float64 0.0 365.0 365.0 730.0
    variable_1  (time, lat, lon) float64 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
    variable_2  (time, lat, lon) float64 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0

Is it reasonable to have this in xarray or is domain specific (i.e. should we handle this in esmlab?) ?

from esmlab.

andersy005 avatar andersy005 commented on August 18, 2024

xarray discards the time_bounds variable. As a user, I would expect xarray to keep the time_bounds variable, and generate new values for it accordingly.

#142 proposes a method to generate the new time-coordinate.

from esmlab.

dcherian avatar dcherian commented on August 18, 2024
  1. Is this something that is reasonable to implement in xarray or is too domain-specific?

    It is inconvenient but I don't think its wrong. The file is intentionally saved with time at the end of the month. Why wasn't it saved with the midpoint of time bounds in the first place? In any case, this won't fly (pydata/xarray#2231 (comment)). xarray basically does not deal with relationships between variables and that is unlikely to change. Our only exception (I think) is making sure bounds variables are encoded and decoded with the same units as the grid variable.

  2. As above.

  3. This looks like a bug to me. As a data variable, time_bounds should not be getting dropped. (coordinate variables get dropped pydata/xarray#2944 but I think that should change). Your code is incomplete (missing def create_dataset...). can you update that; i'll take a look.

Some of this might get solved eventually (pydata/xarray#1475) when xarray adds more fancy indexes but that's a longer term project.

I should be clear: esmlab adds very useful capabilities and a package like it is definitely required. It's just that general things like machinery for weighted operations should be moved upstream.

from esmlab.

andersy005 avatar andersy005 commented on August 18, 2024

Your code is incomplete (missing def create_dataset...). can you update that; i'll take a look.

My bad. I partially pasted the code. Here's the complete version:

import xarray as xr
import numpy as np
def create_dataset():
    start_date = np.array([0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334], dtype=np.float64)
    start_date = np.append(start_date, start_date + 365)
    end_date = np.array([31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365], dtype=np.float64)
    end_date = np.append(end_date, end_date + 365)
    ds = xr.Dataset(coords={'time': 24, 'lat': 2, 'lon': 2, 'd2': 2})
    ds['time'] = xr.DataArray(end_date, dims='time')
    ds['lat'] = xr.DataArray([0, 1], dims='lat')
    ds['lon'] = xr.DataArray([0, 1], dims='lon')
    ds['d2'] = xr.DataArray([0, 1], dims='d2')
    ds['time_bound'] = xr.DataArray(
        np.array([start_date, end_date]).transpose(), dims=['time', 'd2']
    )
    ds['variable_1'] = xr.DataArray(
        np.append(
            np.zeros([12, 2, 2], dtype='float32'), np.ones([12, 2, 2], dtype='float32'), axis=0
        ),
        dims=['time', 'lat', 'lon'],
    )
    ds['variable_2'] = xr.DataArray(
        np.append(
            np.ones([12, 2, 2], dtype='float32'), np.zeros([12, 2, 2], dtype='float32'), axis=0
        ),
        dims=['time', 'lat', 'lon'],
    )
    ds.time.encoding['units'] = 'days since 0001-01-01 00:00:00'
    ds.time.encoding['calendar'] = 'noleap'
    ds.time.encoding['bounds'] = 'time_bound'
    return ds

from esmlab.

andersy005 avatar andersy005 commented on August 18, 2024

Why wasn't it saved with the midpoint of time bounds in the first place?

I agree with you. This is a CESM issue and not a general issue, and it makes more sense to address it at the source.

from esmlab.

andersy005 avatar andersy005 commented on August 18, 2024

Or we could just have a separate minimal package to handle the strange behaviors/inconsistencies of CESM output that are problematic when using xarray.

from esmlab.

dcherian avatar dcherian commented on August 18, 2024

pydata/xarray#2922 could use some testing if someone's got the time

from esmlab.

andersy005 avatar andersy005 commented on August 18, 2024

@dcherian, I took pydata/xarray#2922 for a test drive. From the results I am getting, it looks good to me:

Screen Shot 2019-12-11 at 11 49 29 AM

from esmlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.