metoffice / cube_helper Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 4.0 11.79 MB

A Python module, for easier manipulation of Cubes with Iris

Home Page: https://cube-helper.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

iris python science

cube_helper's People

Contributors

Stargazers

Watchers

Forkers

balazagi synapticarbors uk-gov-mirror nhsavage

cube_helper's Issues

Consider using intersphinx for documentation links

See https://docs.readthedocs.io/en/stable/guides/intersphinx.html for more details of how to use it.

Filter only latest version of files

In the DRS directory structure there can be multiple versions of the same dataset. The name of the files will be identical in all versions and the only way to distinguish between them is from the directory path. Provide a way to only return files in the most recent version of a variable.

add_categorical() fails with cftime 1.1.1

add_categorical() fails with cftime 1.1.1, specifically the add month categorical does not work with builds of iris using 1.1.1 cftime version

Add automatic fixing of known data

An example FGOALS file CMIP6/CMIP/CAS/FGOALS-f3-L/historical/r1i1p1f1/Amon/psl/gr/v20190927/psl_Amon_FGOALS-f3-L_historical_r1i1p1f1_gr_185001-201412.nc has the first three latitude bounds:

[-89.9, -89.1],
[-88.9, -88.1],
[-87.9, -87.1],
[-86.9, -86.1],

This can be fixed with the Iris code:

cube.coord('latitude').bounds = None
cube.coord('latitude').guess_bounds()
cube.coord('longitude').bounds = None
cube.coord('longitude').guess_bounds()

There are dangers in applying this to all data with non-contiguous latitude bounds, but we know that this fix is safe and needs applying for affected FGOALS files.

Consider adding a cube_helper function like:

ch.correct_known_issues(cube)

Users can call this if they won't (I don't believe that it should be added to ch.load() unless there is an option that is normally turned off, e.g. ch.load(<paths>, fix_known=False)) and it would check the file's model name and experiment and apply any known fixes such as the four lines above to affected FGOALS.

test_equaliser failing

despite not having been altered test_equalise_time_units is failing, it looks like a change in iris might have caused it.

qualise_aux_coords() not detailing differing aux coords

On some cubes equalise_aux_coord() does not specifiy which aux coord is inconsistent across datasets.

Repository size

The repository is quite large (721 MB) because of the old data files that were in the repository but have now been removed (see .git/objects), but the old ones have been kept for the history. Is there any way to prune these from the history? I suspect not because then you don't have the full history. This isn't too much of a problem because these old large files aren't in the releases that most users will download.

Constrained loading with bounded data causing issues.

See issue discussed at:
https://groups.google.com/forum/#!topic/scitools-iris/0fdZEqX2g3U

Loading with partial date times can also cause issues.

No cubes doesn't raise an error

If Iris tries to load an empty directory then it raises an exception:

>>> cubes = iris.load('/some/dir/*')
...
OSError: One or more of the files specified did not exist:
    * "/some/dir/*" didn't match any files

But cube_helper returns a string:

>>> cube = ch.load('/scratch/jseddon/sandbox/wibble/*')
>>> type(cube) 
<class 'str'>
>>> cube
'No cubes found'

This could confuse users as the lack of error may make them assume that cube_helper's load has been successful. Should cube_helper raise an exception rather than return a string?

Add note to Quickstart guide about loading single file

A list rather than a string is required, which isn't currently documented.

Create a library of test functions

I think that there's some repetition of setUp type code in the tests. Could this be moved into a tests/common.py file and imported from there?

Constrained loading fails if more than one variable in a file

var_con = iris.Constraint(cube_func=(lambda c: c.var_name == 'vo'))
vo = ch.load(['nemo_ay652o_1m_19500101-19500201_grid-V.nc'], constraints=var_con)

fails with

ConstraintMismatchError: failed to merge into a single cube.
  cube.long_name differs: 'VS' != 'VV'
  cube.var_name differs: 'vso' != 'v2o'
  cube.units differs: Unit('unknown') != Unit('m/s')
  cube.attributes keys differ: 'invalid_units'

because in load_from_filelist() line 283:

--> 284                                           iris.load_cube(paths[0])):

iris.load_cube() will fail because there are multiple variables in the file. A new test should be introduced that contains multiple variables in a file. This isn't like CMIP6 data, but many users outside of CMIP6 analysis could work like this.

cube_helper won't import with Iris 3.1

cube_helper won't import with Iris 3.1 at JASMIN with the error "module 'iris' has no attribute 'analysis'"

Correct iris faulty generation of altitude bounds when reading files with hybrid height coordinate

There is a long running discussion around whether the bounds on the vertical coordinate are being appropriately set on CMORised data; PCMDI/cmor#177 and SciTools/iris#3678.

At present any iris v2 code will give spurious bounds as it won't read in the b_bounds (a.k.a. sigma bounds) due to the lack of variable attributes. This leads to invalid altitude bounds over orography as the point values are used instead.

Time constraints on ch.load alter the time origin

Using a constraint on time when loading a cube with ch.load() results in cube_helper altering the origin time of the resultant cube. I.e:
>>> historical_constraint = iris.Constraint(time = lambda cell: cell.point.year > 1925 and cell.point.year < 2013)
>>> cube = ch.load(hist_fnames, constraints=historical_constraint)

cube dim coordinates differ:

latitude coords var_name inconsistent

longitude coords var_name inconsistent

time coords long_name inconsistent

cube attributes differ:

history attribute inconsistent

tracking_id attribute inconsistent

creation_date attribute inconsistent

cube time coordinates differ:

time start date inconsistent

Deleting history attribute from cubes

Deleting tracking_id attribute from cubes

Deleting creation_date attribute from cubes

New time origin set to days since 1920-01-01 00:00:00

_redirect_stdout breaks logging testing.

_redirect_stdout, a method made to capture the logged output users are presented with for testing purposes doesn't working after being used once. This means only 1 test can accurately pass.

This is causing issues with testing.

Address numpy deprecations

Running pytest -vv gives the following summary of warnings, which it would be good to address:

~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/_ff.py:819
  ~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/_ff.py:819: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    def _parse_binary_stream(file_like, dtype=np.float, count=-1):

~/conda/envs/iris3/lib/python3.7/site-packages/pyke/knowledge_engine.py:28
  ~/conda/envs/iris3/lib/python3.7/site-packages/pyke/knowledge_engine.py:28: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/test_cube_equaliser.py: 105 warnings
tests/test_cube_help.py: 243 warnings
tests/test_cube_loader.py: 79 warnings
  ~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/netcdf.py:439: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    var = variable[keys]

tests/test_cube_equaliser.py: 72 warnings
tests/test_cube_help.py: 180 warnings
tests/test_cube_loader.py: 66 warnings
  ~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/cf.py:186: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    return self.cf_data.__getitem__(key)

issues with extracting bounded sections of cubes (decouple _fix_partial_datetime())

Proposals for a method to extract data from 2 specified points along the time bounds. This could potentially be quite complicated as it would mean de-coupling the lambda construction.

This will be done eventually but it would also require a similar lambda construction for the bounds function. A solution like the one below was originally proposed.

def extract_bounds(cube, lower_bound, upper_bound): constraint = iris.Constraint( time=lambda cell: lower_bound >= cell.point <= upper_bound) return extract(cube, constraint)

Consider if additional ways to install cube_helper are required

A setup.py (or similar) may make it easier for users to install cube_helper in addition to just setting PYTHONPATH.

Give clearer examples of loading PP files

Cube_helper was designed to help with loading netCDF from CMIP and so it assumes that each file has an 'nc' suffix. To load PP files something like:

test2 = ch.load([file], 'pp')

is required. This is documented at https://cube-helper.readthedocs.io/en/v2.2.3/Cube%20Helper%20Module.html#cube_helper.load but isn't clear in the Quickstart or Tutorial. A quick example should be added to these two sections of the documentation.