Comments (10)
We can keep it as a long term goal. I think testing with real data is easier to implement and indeed more reliable, but it requires a lot of computation time. Therefore I think we should first focus on getting a reliable testing procedure based on real data, but it would be good to later augment this with small synthetic data tests later. We could then e.g. run the real data tests before every release and the synthetic data tests more often.
from esmvaltool.
Also usefull for integration tests of the backend
from esmvaltool.
We could use data from iris-sample from @bjlittle. Does that also include CMOR compliant examples?
from esmvaltool.
There is a package for artificial data generation written by @bulli92 and me. It's in an early stage but it might be worth a look:
https://github.com/pygeo/dummydata
from esmvaltool.
Great! That looks exactly like what we need :-)
Would it make sense to contribute to that to generate CMIP5 compatible dummy data?
from esmvaltool.
I think so! We made a fork of this repository here:
https://github.com/ESMValGroup/dummydata
So we can continue with the development based on this fork.
from esmvaltool.
In summary, with regards to test data in iris
, we have the following:
- we have iris-sample-data, which isn't really used for testing, rather it is more user focused and is also heavily used in our documentation to drive examples. It's intentionally light-weight, and is an optionally importable package.
- we have iris-test-data, which is used to drive our testing. This is a resource that contains richer test data in a variety of data formats. We consciously work hard not to bloat this repo with big files, so real world data tends to be decimated in some form to make it practical but still maintain it's testing utility (for whatever purpose it is intended for)
- we have synthetic test stock cubes, which are various different iris cubes that are generated at runtime, each of which are minimalist cubes with enough data payload and cube metadata to drive specific tests. Such stock cubes are cheap and we try to use these as much as possible ... but sometimes you need real-ish data, hence the need for the other options.
- for unit tests we heavily used mock, to avoid the need for loading data from file, and although this is useful, using mock tightly couples your test code to your code base, so if the code changes you have the maintenance burden of realigning the associated mock tests.
With regards to testing itself in iris we have a collection of helper mixin Test classes that we heavily use in our testing framework for unit and integration tests. Have a look through the main IrisTest_nometa class, where we've added extra assert
methods to compliment unittest (we're not yet using pytest). With regards to netCDF we've have means of representing a netCDF file using CDL (see assertCDL) and CML (see assertCML), which may be options for ESMValTool, depending on what you want to do ... or there is the philosophy of not comparing at the file format level, but at the cube level.
HTH
from esmvaltool.
@bouweandela @jvegasbsc @valeriupredoi can we close this?
At the last workshop we discussed the problem and concluded that we need real data to perform reliable tests.
from esmvaltool.
@bouweandela @nielsdrost now that we actually have sample data and a bot you think this should still be open?
from esmvaltool.
Yes, because this issue is about having test data for running diagnostics. I think the ESMValTool_sample_data
package is great for testing ESMValCore, but probably not realistic enough for running diagnostics with because
- Not all required variables are present
- Not all required experiments are present
- It seems likely that diagnostic scripts have not been designed to be run with a small sample of the real data
from esmvaltool.
Related Issues (20)
- Fixing global attributes for recipe_martin18grl HOT 1
- Documentation build is failing HOT 2
- CMORize tool fails for RAWOBS if directory structure does not include Tier2/Tier3 HOT 1
- Warnings during full development installation HOT 2
- Consider using sphinx-autoapi HOT 1
- New recipe and diagnostic for calculating Lamb Weathertypes HOT 2
- data format command broken for HadCRUT4: too many months? HOT 1
- Replace the MO-specific URL in the RTW with a public URL
- Investigate slurm `--ntasks` and ESMValTool's `MAX_PARALLEL_TASKS` for RTW HOT 1
- Add "How to add a site" to RTW documentation
- Missing data for `recipe_bock20jgr_fig_8-10.yml` HOT 2
- Missing data on DKRZ for `recipe_check_obs.yml` HOT 3
- Diagnostic failure for `recipe_wenzel16jclim.yml` on `v2.11.0rc1` HOT 2
- Update the list of broken recipes for `v2.11.0` HOT 2
- Add the `recipe_ocean_amoc.yml` recipe to the RTW
- Update `precommit` rev to fix `precommit` installation error
- Add code owners for the RTW
- Include verbose output from `compare.py` in RTW
- Fix failing tests after CMIP6 climate patterns merge HOT 4
- Broken R recipes from v2.11.0 due to use of R v4.3.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esmvaltool.