For running the diagnostic tests we need some sort of data to run these diagnostics on

We could use data from iris-sample from <a class="user-mention notranslate" data-hover

There is a package for artificial data generation written by <a class="user-mention no

I think so! We made a fork of this repository here: <a href="https://github.com/ES

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Synthetic / Sample data needed for diagnostic tests about esmvaltool HOT 10 OPEN

esmvalgroup commented on August 18, 2024

Synthetic / Sample data needed for diagnostic tests

from esmvaltool.

Comments (10)

bouweandela commented on August 18, 2024 1

We can keep it as a long term goal. I think testing with real data is easier to implement and indeed more reliable, but it requires a lot of computation time. Therefore I think we should first focus on getting a reliable testing procedure based on real data, but it would be good to later augment this with small synthetic data tests later. We could then e.g. run the real data tests before every release and the synthetic data tests more often.

from esmvaltool.

nielsdrost commented on August 18, 2024

Also usefull for integration tests of the backend

from esmvaltool.

nielsdrost commented on August 18, 2024

We could use data from iris-sample from @bjlittle. Does that also include CMOR compliant examples?

from esmvaltool.

bjoernbroetz commented on August 18, 2024

There is a package for artificial data generation written by @bulli92 and me. It's in an early stage but it might be worth a look:
https://github.com/pygeo/dummydata

from esmvaltool.

nielsdrost commented on August 18, 2024

Great! That looks exactly like what we need :-)

Would it make sense to contribute to that to generate CMIP5 compatible dummy data?

from esmvaltool.

bjoernbroetz commented on August 18, 2024

I think so! We made a fork of this repository here:
https://github.com/ESMValGroup/dummydata
So we can continue with the development based on this fork.

from esmvaltool.

bjlittle commented on August 18, 2024

In summary, with regards to test data in iris, we have the following:

we have iris-sample-data, which isn't really used for testing, rather it is more user focused and is also heavily used in our documentation to drive examples. It's intentionally light-weight, and is an optionally importable package.
we have iris-test-data, which is used to drive our testing. This is a resource that contains richer test data in a variety of data formats. We consciously work hard not to bloat this repo with big files, so real world data tends to be decimated in some form to make it practical but still maintain it's testing utility (for whatever purpose it is intended for)
we have synthetic test stock cubes, which are various different iris cubes that are generated at runtime, each of which are minimalist cubes with enough data payload and cube metadata to drive specific tests. Such stock cubes are cheap and we try to use these as much as possible ... but sometimes you need real-ish data, hence the need for the other options.
for unit tests we heavily used mock, to avoid the need for loading data from file, and although this is useful, using mock tightly couples your test code to your code base, so if the code changes you have the maintenance burden of realigning the associated mock tests.

With regards to testing itself in iris we have a collection of helper mixin Test classes that we heavily use in our testing framework for unit and integration tests. Have a look through the main IrisTest_nometa class, where we've added extra assert methods to compliment unittest (we're not yet using pytest). With regards to netCDF we've have means of representing a netCDF file using CDL (see assertCDL) and CML (see assertCML), which may be options for ESMValTool, depending on what you want to do ... or there is the philosophy of not comparing at the file format level, but at the cube level.

HTH

from esmvaltool.

mattiarighi commented on August 18, 2024

@bouweandela @jvegasbsc @valeriupredoi can we close this?
At the last workshop we discussed the problem and concluded that we need real data to perform reliable tests.

from esmvaltool.

valeriupredoi commented on August 18, 2024

@bouweandela @nielsdrost now that we actually have sample data and a bot you think this should still be open?

from esmvaltool.

bouweandela commented on August 18, 2024

Yes, because this issue is about having test data for running diagnostics. I think the ESMValTool_sample_data package is great for testing ESMValCore, but probably not realistic enough for running diagnostics with because

Not all required variables are present
Not all required experiments are present
It seems likely that diagnostic scripts have not been designed to be run with a small sample of the real data

from esmvaltool.

Synthetic / Sample data needed for diagnostic tests about esmvaltool HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent