Coder Social home page Coder Social logo

metoffice / cube_helper Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 4.0 11.79 MB

A Python module, for easier manipulation of Cubes with Iris

Home Page: https://cube-helper.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
iris python science

cube_helper's People

Contributors

jonseddon avatar synapticarbors avatar theelectricflock avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cube_helper's Issues

Filter only latest version of files

In the DRS directory structure there can be multiple versions of the same dataset. The name of the files will be identical in all versions and the only way to distinguish between them is from the directory path. Provide a way to only return files in the most recent version of a variable.

Add automatic fixing of known data

An example FGOALS file CMIP6/CMIP/CAS/FGOALS-f3-L/historical/r1i1p1f1/Amon/psl/gr/v20190927/psl_Amon_FGOALS-f3-L_historical_r1i1p1f1_gr_185001-201412.nc has the first three latitude bounds:

[-89.9, -89.1],
[-88.9, -88.1],
[-87.9, -87.1],
[-86.9, -86.1],

This can be fixed with the Iris code:

cube.coord('latitude').bounds = None
cube.coord('latitude').guess_bounds()
cube.coord('longitude').bounds = None
cube.coord('longitude').guess_bounds()

There are dangers in applying this to all data with non-contiguous latitude bounds, but we know that this fix is safe and needs applying for affected FGOALS files.

Consider adding a cube_helper function like:

ch.correct_known_issues(cube)

Users can call this if they won't (I don't believe that it should be added to ch.load() unless there is an option that is normally turned off, e.g. ch.load(<paths>, fix_known=False)) and it would check the file's model name and experiment and apply any known fixes such as the four lines above to affected FGOALS.

test_equaliser failing

despite not having been altered test_equalise_time_units is failing, it looks like a change in iris might have caused it.

Repository size

The repository is quite large (721 MB) because of the old data files that were in the repository but have now been removed (see .git/objects), but the old ones have been kept for the history. Is there any way to prune these from the history? I suspect not because then you don't have the full history. This isn't too much of a problem because these old large files aren't in the releases that most users will download.

No cubes doesn't raise an error

If Iris tries to load an empty directory then it raises an exception:

>>> cubes = iris.load('/some/dir/*')
...
OSError: One or more of the files specified did not exist:
    * "/some/dir/*" didn't match any files

But cube_helper returns a string:

>>> cube = ch.load('/scratch/jseddon/sandbox/wibble/*')
>>> type(cube) 
<class 'str'>
>>> cube
'No cubes found'

This could confuse users as the lack of error may make them assume that cube_helper's load has been successful. Should cube_helper raise an exception rather than return a string?

Create a library of test functions

I think that there's some repetition of setUp type code in the tests. Could this be moved into a tests/common.py file and imported from there?

Constrained loading fails if more than one variable in a file

var_con = iris.Constraint(cube_func=(lambda c: c.var_name == 'vo'))
vo = ch.load(['nemo_ay652o_1m_19500101-19500201_grid-V.nc'], constraints=var_con) 

fails with

ConstraintMismatchError: failed to merge into a single cube.
  cube.long_name differs: 'VS' != 'VV'
  cube.var_name differs: 'vso' != 'v2o'
  cube.units differs: Unit('unknown') != Unit('m/s')
  cube.attributes keys differ: 'invalid_units'

because in load_from_filelist() line 283:

--> 284                                           iris.load_cube(paths[0])):

iris.load_cube() will fail because there are multiple variables in the file. A new test should be introduced that contains multiple variables in a file. This isn't like CMIP6 data, but many users outside of CMIP6 analysis could work like this.

Correct iris faulty generation of altitude bounds when reading files with hybrid height coordinate

There is a long running discussion around whether the bounds on the vertical coordinate are being appropriately set on CMORised data; PCMDI/cmor#177 and SciTools/iris#3678.

At present any iris v2 code will give spurious bounds as it won't read in the b_bounds (a.k.a. sigma bounds) due to the lack of variable attributes. This leads to invalid altitude bounds over orography as the point values are used instead.

Time constraints on ch.load alter the time origin

Using a constraint on time when loading a cube with ch.load() results in cube_helper altering the origin time of the resultant cube. I.e:
>>> historical_constraint = iris.Constraint(time = lambda cell: cell.point.year > 1925 and cell.point.year < 2013)
>>> cube = ch.load(hist_fnames, constraints=historical_constraint)

cube dim coordinates differ:

latitude coords var_name inconsistent

longitude coords var_name inconsistent

time coords long_name inconsistent

cube attributes differ:

history attribute inconsistent

tracking_id attribute inconsistent

creation_date attribute inconsistent

cube time coordinates differ:

time start date inconsistent

Deleting history attribute from cubes

Deleting tracking_id attribute from cubes

Deleting creation_date attribute from cubes

New time origin set to days since 1920-01-01 00:00:00

_redirect_stdout breaks logging testing.

_redirect_stdout, a method made to capture the logged output users are presented with for testing purposes doesn't working after being used once. This means only 1 test can accurately pass.

This is causing issues with testing.

Address numpy deprecations

Running pytest -vv gives the following summary of warnings, which it would be good to address:

~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/_ff.py:819
  ~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/_ff.py:819: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    def _parse_binary_stream(file_like, dtype=np.float, count=-1):

~/conda/envs/iris3/lib/python3.7/site-packages/pyke/knowledge_engine.py:28
  ~/conda/envs/iris3/lib/python3.7/site-packages/pyke/knowledge_engine.py:28: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/test_cube_equaliser.py: 105 warnings
tests/test_cube_help.py: 243 warnings
tests/test_cube_loader.py: 79 warnings
  ~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/netcdf.py:439: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    var = variable[keys]

tests/test_cube_equaliser.py: 72 warnings
tests/test_cube_help.py: 180 warnings
tests/test_cube_loader.py: 66 warnings
  ~/conda/envs/iris3/lib/python3.7/site-packages/iris/fileformats/cf.py:186: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    return self.cf_data.__getitem__(key)

issues with extracting bounded sections of cubes (decouple _fix_partial_datetime())

Proposals for a method to extract data from 2 specified points along the time bounds. This could potentially be quite complicated as it would mean de-coupling the lambda construction.

This will be done eventually but it would also require a similar lambda construction for the bounds function. A solution like the one below was originally proposed.

def extract_bounds(cube, lower_bound, upper_bound): constraint = iris.Constraint( time=lambda cell: lower_bound >= cell.point <= upper_bound) return extract(cube, constraint)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.