Coder Social home page Coder Social logo

Comments (10)

bouweandela avatar bouweandela commented on August 18, 2024 1

We can keep it as a long term goal. I think testing with real data is easier to implement and indeed more reliable, but it requires a lot of computation time. Therefore I think we should first focus on getting a reliable testing procedure based on real data, but it would be good to later augment this with small synthetic data tests later. We could then e.g. run the real data tests before every release and the synthetic data tests more often.

from esmvaltool.

nielsdrost avatar nielsdrost commented on August 18, 2024

Also usefull for integration tests of the backend

from esmvaltool.

nielsdrost avatar nielsdrost commented on August 18, 2024

We could use data from iris-sample from @bjlittle. Does that also include CMOR compliant examples?

from esmvaltool.

bjoernbroetz avatar bjoernbroetz commented on August 18, 2024

There is a package for artificial data generation written by @bulli92 and me. It's in an early stage but it might be worth a look:
https://github.com/pygeo/dummydata

from esmvaltool.

nielsdrost avatar nielsdrost commented on August 18, 2024

Great! That looks exactly like what we need :-)

Would it make sense to contribute to that to generate CMIP5 compatible dummy data?

from esmvaltool.

bjoernbroetz avatar bjoernbroetz commented on August 18, 2024

I think so! We made a fork of this repository here:
https://github.com/ESMValGroup/dummydata
So we can continue with the development based on this fork.

from esmvaltool.

bjlittle avatar bjlittle commented on August 18, 2024

In summary, with regards to test data in iris, we have the following:

  • we have iris-sample-data, which isn't really used for testing, rather it is more user focused and is also heavily used in our documentation to drive examples. It's intentionally light-weight, and is an optionally importable package.
  • we have iris-test-data, which is used to drive our testing. This is a resource that contains richer test data in a variety of data formats. We consciously work hard not to bloat this repo with big files, so real world data tends to be decimated in some form to make it practical but still maintain it's testing utility (for whatever purpose it is intended for)
  • we have synthetic test stock cubes, which are various different iris cubes that are generated at runtime, each of which are minimalist cubes with enough data payload and cube metadata to drive specific tests. Such stock cubes are cheap and we try to use these as much as possible ... but sometimes you need real-ish data, hence the need for the other options.
  • for unit tests we heavily used mock, to avoid the need for loading data from file, and although this is useful, using mock tightly couples your test code to your code base, so if the code changes you have the maintenance burden of realigning the associated mock tests.

With regards to testing itself in iris we have a collection of helper mixin Test classes that we heavily use in our testing framework for unit and integration tests. Have a look through the main IrisTest_nometa class, where we've added extra assert methods to compliment unittest (we're not yet using pytest). With regards to netCDF we've have means of representing a netCDF file using CDL (see assertCDL) and CML (see assertCML), which may be options for ESMValTool, depending on what you want to do ... or there is the philosophy of not comparing at the file format level, but at the cube level.

HTH

from esmvaltool.

mattiarighi avatar mattiarighi commented on August 18, 2024

@bouweandela @jvegasbsc @valeriupredoi can we close this?
At the last workshop we discussed the problem and concluded that we need real data to perform reliable tests.

from esmvaltool.

valeriupredoi avatar valeriupredoi commented on August 18, 2024

@bouweandela @nielsdrost now that we actually have sample data and a bot you think this should still be open?

from esmvaltool.

bouweandela avatar bouweandela commented on August 18, 2024

Yes, because this issue is about having test data for running diagnostics. I think the ESMValTool_sample_data package is great for testing ESMValCore, but probably not realistic enough for running diagnostics with because

  • Not all required variables are present
  • Not all required experiments are present
  • It seems likely that diagnostic scripts have not been designed to be run with a small sample of the real data

from esmvaltool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.