esmvalgroup / esmvaltool Goto Github PK

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP

License: Apache License 2.0

NCL 48.12% R 7.38% Python 40.75% Shell 0.02% Emacs Lisp 1.62% Dockerfile 0.01% XSLT 0.08% Julia 0.23% TeX 1.81%

esmvaltool's Introduction

Introduction

ESMValTool is a community-developed climate model diagnostics and evaluation software package, driven both by computational performance and scientific accuracy and reproducibility. ESMValTool is open to both users and developers, encouraging open exchange of diagnostic source code and evaluation results from the Coupled Model Intercomparison Project CMIP ensemble. For a comprehensive introduction to ESMValTool please visit our documentation page.

Running esmvaltool

Diagnostics from ESMValTool are run using recipe files that contain pointers to the requested data types, directives for the preprocessing steps that data will be subject to, and directives for the actual diagnostics that will be run with the now preprocessed data. Data preprocessing is done via the ESMValCore package, a pure Python, highly-optimized scientific library, developed by the ESMValTool core developers, and that performs a number of common analysis tasks such as regridding, masking, levels extraction etc. Diagnostics are written in a variety of programming languages (Python, NCL, R, Julia) and are developed by the wider scientific community, and included after a scientific and technical review process.

Input data

ESMValTool can run with the following types of data as input:

CMIP6
CMIP5
CMIP3
observational and re-analysis datasets
obs4MIPs
ana4mips
CORDEX (work in progress)

Getting started

Please see getting started on our instance of Read the Docs as well as ESMValTool tutorial. The tutorial is a set of lessons that together teach skills needed to work with ESMValTool in climate-related domains.

Getting help

The easiest way to get help, if you cannot find the answer in the documentation in our docs, is to open an issue on GitHub.

Contributing

If you would like to contribute a new diagnostic or feature, please have a look at our contribution guidelines.

esmvaltool's People

Contributors

Stargazers

Watchers

Forkers

lbdreyer bjlittle mibotz gavin971 koldunovn subond c3s-magic cehbrecht atmoschris zklaus kurkutesa jrsuckert iremav ccarouge irenecionni wk1984 lestermk xigrug riddernina annakue nmanuben amarjiitpandde ledm katjaweigel nperezzanon rchg ruthlorenz bianh522 yifatdzigan huangynj martinini18 weilin2018 khadassah jdha nozop sciencewiki jutzhang fridunn znicholls ipelupessy jeromaerts aperezpredictia apv154 leontavares joakimkjellsson aytacpacal antsia-github lmxb jgcri minalspatil thomasremke egor7 zhiyijiang lijunde cchen-aos cguz tomastorsvik-tools ssmithclimate chrisbrierley karinvdwiel xinzhou-1 nielsdrost jvegreg sverhoeven nicewasabi esmvalbottestgroup lukasbrunner morobking tylov-climate xiajz agile-lee henry-leexy malininae forschung tjiputra lbdiaz emmadd jnegrel kmcgarry17 nvt-1009 chauncy-git bsc-es cffbots l5d1l5 shuyunina dhohn liyang0711 lprra rbeucher marpyr jon-lillis janstreffing noije yanchunhefork climatepals amy-defnet trellixvulnteam jyqjyx nicolasdettling yanchunhe

esmvaltool's Issues

stickler-ci way to verbose

Our stickler-ci setup produces seemingly endless notifications on a pull request (see e.g. #74 ).

This probably means we are a lot of issues. But, this spam flood is not helping.

Could we try to tune down the notifications?

PEP328 multi-line and relative/absolute imports

Did we want to refactor with a view to adopting PEP328 within ESMValTool?

Dockerhub settings need updating

Building the 1.1.0 docker containers does not work atm, as they relied on a specific branch (docker) to exist. As a result of #67, the development branch now will contain the docker setup as well.

We should also add the "develop", "master" and perhaps "refactor_backend" branches to dockerhub so it automatically builds images for these. master and refactor_backend do not currently contain a valid docker setup.

Lastly, Circle-CI also needs a docker container. I propose we build it from the "develop" branch, where it lives in .circleci/Dockerfile. This should be tagged with the current date.

Resulting in the following setup:

Type	Name	Dockerfile Location	Docker Tag Name
branch	development	/docker/1.1.0/centos/7/Dockerfile	1.1.0-centos7
branch	development	/docker/1.1.0/debian/8.5/Dockerfile	1.1.0-debian-8.5
branch	development	/docker/1.1.0/ubuntu/xenial/Dockerfile	1.1.0-xenial
branch	development	/Dockerfile	development
branch	development	/.circleci/Dockerfile	dependencies-20170611

And once the docker setup makes it into master and/or the refactoring branches:

Type	Name	Dockerfile Location	Docker Tag Name
branch	master	/Dockerfile	latest
branch	REFACTORING_backend	/Dockerfile	refactoring-backend

How to deal with different CMOR tables from different projects

Variable names, dimensions and metadata are not always defined consistently across the different projects.
Differences have been reported, for example, in sea-ice concentration and ozone between CMIP5 and CMIP6.

Enable stickler-ci

Could we enable the stickler-ci service for the repo please.

This will perform automated lint and pep8 tests using flake8 and pep8 on all pull-requests. The service uses sensible defaults but is easily customisable, see here.

I don't have sufficient repo privileges to enable this service.

Ping owner @veyring

Manupulations on data from GeoData object

I am trying to run namelist_lauer17rse.xml diagnostic. When enter the sst calculations get the following error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    launcher_arguments=currDiag.get_launcher_arguments())
  File "./interface_scripts/projects.py", line 3021, in run_executable
    exit_on_warning)
  File "./interface_scripts/launchers.py", line 290, in execute
    self._execute_script(python_executable, project_info, verbosity, exit_on_warning)
  File "./interface_scripts/launchers.py", line 326, in _execute_script
    usr_script.main(project_info)
  File "./diag_scripts/sst_ESACCI.py", line 107, in main
    Diag.run_diagnostic()
  File "./diag_scripts/aux/LMU_ESACCI-diagnostics/sst_diagnostic.py", line 28, in run_diagnostic
    super(SeaSurfaceTemperatureDiagnostic, self).run_diagnostic(globmeants=self.cfg.globmeants, portrait=self.cfg.portrait, globmeandiff=self.cfg.globmeandiff, trend=self.cfg.trend)
  File "./diag_scripts/aux/LMU_ESACCI-diagnostics/diagnostic.py", line 698, in run_diagnostic
    self._portrait_statistic(self.refname)
  File "./diag_scripts/aux/LMU_ESACCI-diagnostics/diagnostic.py", line 880, in _portrait_statistic
    self._stat_r_data = self._p_stat(self._ref_data,self._ts)
  File "./diag_scripts/aux/LMU_ESACCI-diagnostics/diagnostic.py", line 859, in _p_stat
    _std_data=np.sqrt((D2.fldmean()-_mean_data**2))
TypeError: unsupported operand type(s) for ** or pow(): 'GeoData' and 'int'

So there is a line in diagnostic.py :

_std_data=np.sqrt((D2.fldmean()-_mean_data**2))

that try to do operations on the instance of the GeoData object. Removing the power operator does not really help, since then we try to subtract GeoData from GeoData and that also fail.

I use the latest version of the geoval, 0.1.5. Maybe @bulli92 can suggest a quick fix?

namelist xml to yaml

Is there motivation and consensus from the core developers to re-implement the namelists in yaml rather than xml? i.e. see PyYAML, which is available on pypi and conda-forge

Has this been considered or explored previously? It seems to me that there would be very tangible benefits to using yaml, which is certainly more user friendly than xml and it would allow us to replace the existing namelist xml parsing in ESMValTool

Auto-pep8ify the code

The code is very much not pep8 at the moment. Perhaps we can fix this.

We do need a decent amount of testing first.

use argparse instead of getopt in main.py

The standard library argparse module is easier to use than the getopt library and produces cleaner code.

If there are no objections, I would be happy to implement this.

Use standard python logging

ESMValTool uses a custom logging infrastructure. This is perhaps not needed, python has standard logging with all sorts of nice features.

Running the ESMValTool got problem

When I saw the ESMValTool on the web https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6 , I tried to use it. As a new user, I read the ESMValTool v1.1 User’s and Developer’s Guide and learn the Chapter 6 Running the ESMValTool. So I edit the “namelist configuration file”like below and create the same file.

./work/ working directory

            <usrpath category="userDirectory" type="output" id="PLOTPATH">
                    <path>./work/plots/</path>
                    <description>directory for output plots</description>
            </usrpath>

            <usrpath category="userDirectory" type="output" id="CLIMOPATH">
                    <path>./work/climo/</path>
                    <description>directory for output files</description>
            </usrpath>

            <usrpath category="userDirectory" type="output" id="REGRPATH">
                    <path>./work/regridding/</path>
                    <description>directory for regridding files</description>
            </usrpath>

            <usrpath category="simulation" type="input" id="MODELPATH">
                    <path>./data/modeldata/</path>
                    <description>root directory of model data</description>
            </usrpath>

            <usrpath category="observation" type="input" id="OBSPATH">
                    <path>./data/obsdata/</path>
                    <description>root directory of observational data</description>
            </usrpath>

            <usrpath category="observation" type="input" id="OBSPATH2">
                    <path>./data/obsdata2/</path> 
                    <description>alternative root directory of observational data</description>
            </usrpath>

            <usrpath category="observation" type="input" id="RAWOBSPATH">
                    <path>./data/rawobsdata/</path>
                    <description>root directory of raw observational data (for reformat_obs scripts)</description>
            </usrpath>

    </pathCollection>

Secondly, I download the data from https://esgf-data.dkrz.de/search/cmip5-dkrz/ and put one of data in the file
"/data/modeldata/"
The data's name is "ta_Amon_MPI-ESM-LR_historical_r1i1p1_200001-200512.nc" according to section of"nml/namelist_MyDiag.xml"
Then, I run the tool "python main.py nml/namelist_MyDiag.nml".
The error is "No input files found in" just like below
................
PY info: NAMELIST = namelist_MyDiag.xml
PY info: WORKDIR = /public/home/xiongfl/cmip_tool/ESMValTool-master/work
PY info: CLIMODIR = /public/home/xiongfl/cmip_tool/ESMValTool-master/work/climo
PY info: PLOTDIR = /public/home/xiongfl/cmip_tool/ESMValTool-master/work/plots
PY info: LOGFILE = /public/home/xiongfl/cmip_tool/ESMValTool-master/work/ref-sacknows_MyDiag.log
PY info: _____________________________________________________________
PY info:
PY info: Starting the Earth System Model Evaluation Tool v1.1.0 at time: 2017-04-27 -- 21:37:02...
PY info:
PY info: MODEL = MPI-ESM-LR (CMIP5_ETHZ)
PY info: No input files found for ta (T3M) as ./data/modeldata//ETHZ_CMIP5/historical/Amon/ta/MPI-ESM-LR/r1i1p1/ta_Amon_MPI-ESM-LR_historical_r1i1p1*.nc
Traceback (most recent call last):
File "main.py", line 168, in
project_info)
File "./interface_scripts/diagdef.py", line 369, in select_base_vars
infile)
IOError: [Errno 2] No input files found in : u'./data/modeldata//ETHZ_CMIP5/historical/Amon/MyVar/MPI-ESM-LR/r1i1p1/MyVar_Amon_MPI-ESM-LR_historical_r1i1p1*.nc'

So my question is how to put the data in right path?

More informative error message when python script can't be imported

At present if there is a problem when importing python script the exception is raised that is not very informative and looks like this:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    launcher_arguments=currDiag.get_launcher_arguments())
  File "./interface_scripts/projects.py", line 3021, in run_executable
    exit_on_warning)
  File "./interface_scripts/launchers.py", line 289, in execute
    self._execute_script(python_executable, project_info, verbosity, exit_on_warning)
  File "./interface_scripts/launchers.py", line 314, in _execute_script
    raise ValueError('The script %s can not be imported!' % python_executable)
ValueError: The script ./diag_scripts/sst_ESACCI.py can not be imported!

So there is no particular reason on why the script can't be imported and hence it's quite hard to debug. In my case, for example, the python module were missing, but from the error message there were no way to understand that. I suggest changing this part of launchers.py ether by adding at least

 print("Error while importing python script:", sys.exc_info())

that in my case would lead to:

('Error while importing python script:', (<type 'exceptions.ImportError'>, ImportError('No module named geoval.core.data',), <traceback object at 0x7f42ea5e29e0>))

or if you don't mind additional import use traceback module:

import traceback
...
try:
    exec cmd
except:
     print(cmd)
     print(traceback.format_exc())
     raise ValueError('The script %s can not be imported!' % python_executable)

that lead to:

import sst_ESACCI.py
Traceback (most recent call last):
  File "test.py", line 19, in <module>
    exec cmd
  File "<string>", line 1, in <module>
  File "./diag_scripts/sst_ESACCI.py", line 42, in <module>
    from sst_diagnostic import SeaSurfaceTemperatureDiagnostic
  File "./diag_scripts/aux/LMU_ESACCI-diagnostics/sst_diagnostic.py", line 1, in <module>
    from diagnostic import *
  File "./diag_scripts/aux/LMU_ESACCI-diagnostics/diagnostic.py", line 17, in <module>
    from geoval.core.data import GeoData
ImportError: No module named geoval.core.data

Traceback (most recent call last):
  File "test.py", line 23, in <module>
    raise ValueError('The script %s can not be imported!' % python_executable)
ValueError: The script sst_ESACCI.py can not be imported!

that is way easier to debug. I can do PR with this fix.

Default linear regridder and nearest regridder extrapolation mode

At the moment, the backend regrid function supports the 'linear' and 'nearest' schemes in order to perform bilinear interpolation regridding and nearest neighbour regridding.

For both of these schemes, the regridder uses the default extrapolation_mode - see the associated docs iris.analysis.Linear and iris.analysis.Nearest for further details.

In both cases, the default extrapolation_mode may not be appropriate, given the users expectation and their data e.g. @valeriupredoi used the linear regridder on a limited domain dataset of air_temperature in units of K over North America i.e. iris.sample_data_path('E1_north_america.nc'), and regridded to a 30x30 global target grid ... needless to say, the resultant regridded cube contained extrapolated data points below 0K. Bummer.

The simple fix (on my advice) was for him to change the default linear regridder to be Linear(extrapolation_mode="mask") in his branch. This indeed, resolved the issue.

So that all said,

do we want to change the default extrapolation_mode for both linear and nearest schemes? And if so, to what? I propose mask.
do we want users to be able to specify the extrapolation_mode in their namelist? And if so, the parser, the orchestrator (or what ever it's called these days) and the API to the backend regrid function need to change to support this.

Thoughts?

Input and Output folder structure for new preprocessor/backend

There are multiple reasons to have a look at the data folders used by ESMValTool:

The new preprocessor/backend will need a place to keep pre-processed files.
The interface file is hindering using ESMValTool as a service (see #3 )
Namelists are now sometimes writing to the same folder, causing problems in case of parallelization.

@axel-lauer and I brainstormed on a plan. We came up with the following proposal:

Folders:
/rawobsdata: Raw observation files in for example csv format.
/obsdata: Processed observation files in NetCDF format.
/modeldata: Model data files in NetCDF format.
/cache: cache of pre-processor/backend.
/output/$NAMELIST_NAME/: output of namelist_name (one folder per namelist)

the output folder of each name list will be created by the workflow manager (main.py) and contain:

plots created by this namelist.
nc files created by this namelist.
log file for the namelist
a file per diagnostic containing all the settings needed to run the diagnostic. This contains all the information currently in:
- the "interface file"
- the diagnostic config files
- the namelist
- the environment variables

The output folder of a namelist will also be used as the current-working-directory of a diagnostic, so it can simply write output without worrying too much about paths. The "settings" file will be passes as the first and only command line argument to the tool. The settings file will be in a format that makes sense for the diagnostic. Possibly a yaml file for Python and R diagnostics, and a ncl file for ncl diagnostics.

This should solve a few issues at once, including problems with provenance, not being able to run multiple diagnostics at once, output files in the (possibly read only) esmvaltool installation folder, etc.

Documentation: which docstring standard?

Many of the current python classes has no docstrings yet. To be able to easily generate also the documentation for individual classes/functions or entire modules, we need to agree on some convention for the the docstring formats.

https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt

There are several ways to do that. Details here
a) docstrings like typically expected by Spinx
b) Google like docstrings
c) Numpy like docstrings

Personally I think option a) is quite odd. In other projects I am using Numpy like style (option c), but we should somehow agree on how we do it.

Please comment

Naming convention for preprocessed files

Additional tag for names of preprocessed output files

Here is an idea for a replacement of the "field type" tag used in the names of the output files produced by the backend:

t<tsteps>z<lev>y<tag>x<tag>

with

t = time dimension
z = vertical dimension (level)
y = horizonal dimension 1 (latitude)
x = horizontal dimension 2 (longitude)

t: <tsteps> gives the number of time steps in the output file; together with start and end year this should provide the possibility to distinguish between different time resolutions (e.g. monthly, daily, 3-hourly) for otherwise identical parameters; example: t360 = 360 time steps in the preprocessed output file
z: <lev> is only specified if a specific level has been extracted; suboptions would by
- <pnnn> for pressure levels, e.g. p850 for 850 hPa
- <mnnn> for model levels, e.g. m5 for the 5th model level
- <znnn> for altitude, e.g. z12 for 12 km
y and x: <tag> specifies special operations such as zonal averages, latitude belts, etc.
- <m> for mean
  example: xm = zonal mean

Examples:

Missing dimensions are simply omitted.

t360zyx = 360 time steps (e.g. 10 years of monthly means), full 3-dim (+ time) field
t360yx = 2-dim (+ time) field
t360zp850yx = pressure level 850 hPa extracted from 3-dim (+ time) field
t360zyxm = zonal means (2-dim (+ time) field)
t360z = vertical profile at specific location (e.g. station measurement), no x- and y-dimensions

conda-forge recipe

#We should make the public ESMValTool easily available to the community by authoring a recipe on staged-recipes of conda-forge.

This will allow us to easily install the project as:

$ conda install -c conda-forge esmvaltool

ESMValTool Docker container

Did we want to consider developing an automated build of an ESMValTool docker image on Docker Hub, triggered by PR changes to a github controlled Dockerfile (centos:6 base image ?) with all the environment dependencies containerized?

NCL can't create netCDF file that already exists

When I use force_processing option and the reformatted file already exists, I get the following error:

NCL ERROR MESSAGE: fatal:Could not create (/mnt/lustre01/work/ab0995/a270088/ESMV/climo/CMIP5/CMIP5_Amon_MM_AWI-CM_r1i1p1_T2Ms_pr-mmday_2008-2040.nc)

The reformatting is triggered by this call from reformat.py:

    if ((not os.path.isfile(project_info['TEMPORARY']['outfile_fullpath']))
            or project_info['GLOBAL']['force_processing']):

        info("  Calling " + reformat_script + " to check/reformat model data",
             verbosity,
             required_verbosity=1)

        projects.run_executable(reformat_script, project_info, verbosity,
exit_on_warning)

ESMValTool/interface_scripts/reformat.py

Lines 172 to 181 in af4f2aa

    
           # Execute the ncl reformat script 
        
           if ((not os.path.isfile(project_info['TEMPORARY']['outfile_fullpath'])) 
        
                   or project_info['GLOBAL']['force_processing']): 
        
               info("  Calling " + reformat_script + " to check/reformat model data", 
        
                    verbosity, 
        
                    required_verbosity=1) 
        
               projects.run_executable(reformat_script, project_info, verbosity, 
        
                                       exit_on_warning)

And the error is happened when executing this NCL code (data_handling.ncl):

    ;; Output data to file
    info_output("adding file " + out_file, verbosity, 2)
    fout = addfile(out_file, "c")
    filedimdef(fout, "time", -1, True)
    fout->$variable$ = new_data

ESMValTool/interface_scripts/data_handling.ncl

Lines 123 to 127 in af4f2aa

    
           ;; Output data to file 
        
           info_output("adding file " + out_file, verbosity, 2) 
        
           fout = addfile(out_file, "c") 
        
           filedimdef(fout, "time", -1, True) 
        
           fout->$variable$ = new_data

Looks like NCL do not support recreation of files that already exist.

So my question is it is a bug or feature? :) If this behavior is not intended, I would suggest removing the file with os.remove() in the reformat.py.

Synchronization of "development" branches between ESMValTool and ESMValTool-private

Are there plans to sync the branches?

My problem is - I would like to start developing diagnostic in the private repository (ESMValTool-private), but I need features already added to "development" branch of the public one (ESMValTool). I can re-implement the features in the private branch, but it's kind of ugly.

Ping @mattiarighi

ESMValTool CI

As developers we need continual-integration testing using either travis or circleci to run the unit and integration tests automatically as part of each PR.

In our workflow we should insist that no PR is merged without the CI tests passing - this should be a minimum criterion.

ESMValTool setup.py

As a developer I want a setup.py script to build and install ESMValTool.

This will also, for example, allow me as a developer to perform a python setup.py develop into a conda environment, as part of the development workflow.

Using setup.py can also be used as a hook to commence the stock of unit/integration tests with

$ python setyp.py test

python-stratify

I've already tagged a pre-release of python-stratify (now needed within ESMValTool for vertical interpolation) but I still need to create a conda-forge recipe so that it is available to install in our development conda environments - it also would be helpful for our docker image creation/maintenance.

use editorconfig

EditorConfig helps developers define and maintain consistent coding styles between different editors and IDEs. The EditorConfig project consists of a file format for defining coding styles and a collection of text editor plugins that enable editors to read the file format and adhere to defined styles. EditorConfig files are easily readable and they work nicely with version control systems.

I have added the .editorconfig with configuration in editorconfig branch here.
Once the .editorconfig file is added to to repository it will be recognised by the editor or IDE and any opened file will be automatically reformatted. To avoid a series of uncontrolled reformatting going on it'll be good to run at the beginning batch reformatting of all the files. This can be for example done with editorconfig-tools npm package.

python functions for info / error / warning messages to stdout

Messages to stdout in python are currently handled by two functions in interface_scripts/auxiliary.py: info (depending on a user-specified verbosity level) and error. A function for warning messages is not available.

These functions shall be extended and all print statements in the python code replaced by them.

A required feature is the ability to print the calling function/script together with the message, something like:

info(script/function, string, verbosity, required_verbosity)
warning(script/function, string, exit_on_warning)
error(script/function, string)

interface communication files should not be written into interface_data folder

Currently the ESMValTool writes temp files used for Python/NCL communication into the interface_data folder of the ESMValTool installation.

The folder name for these temp files should be configurable for each ESMValTool run. The installation folder of ESMValTool should be read-only.

integration of ESGF coupling module

The existing ESGF coupling module should be integrated into the ESMValTool core.

Convert config.ini to config.yml

For consistency with other configuration files in the tool, config.ini shall be converted to yml.
This will also allow specifying the class-dependent rootpaths as dictionaries.

namelists should point to config_private.xml

Currently they all point to DLR version of configuration file.

Index error in /interface_scripts/projects.py

I am trying to run the ESMValTool at IITM(India) HPC Aaditya and the following error is coming from the file /interface_scripts/projects.py

Traceback (most recent call last):
File "main.py", line 180, in
reformat.cmor_reformat(currProject, project_info, base_var, model)
File "./interface_scripts/reformat.py", line 177, in cmor_reformat
exit_on_warning)
File "./interface_scripts/projects.py", line 2708, in run_executable
write_data_interface(string_to_execute, project_info)
File "./interface_scripts/projects.py", line 2687, in write_data_interface
currInterface = vars(data_interface)suffix.title() + '_data_interface'
File "./interface_scripts/data_interface.py", line 321, in init
Data_interface.init(self, project_info)
File "./interface_scripts/data_interface.py", line 75, in init
= self.get_interface_fileinfo(project_info)
File "./interface_scripts/data_interface.py", line 139, in get_interface_fileinfo
exp=self.exp))
File "./interface_scripts/projects.py", line 288, in get_figure_file_names
msd = self.get_model_sections(model)
File "./interface_scripts/projects.py", line 30, in get_model_sections
for modelpart in self.model_specifiers]
File "./interface_scripts/projects.py", line 22, in get_model_subsection
model.split_entries()[self.model_specifiers.index(model_section)]
IndexError: list index out of range

Can you please let me know the fix for this issue

Installation issues have been reported, especially in getting NCL.

ESGF_CMIP5 class overwrites climofiles for different model versions

I took the models MPI-ESM-LR and MPI-ESM-MR on DKRZ in the ESGF_CMIP5 class, and for both models the modelname used for the climofiles or plots is MPI-M. In the end, there is just one climofile with the modelname MPI-M.

Overlapping development between projects

Hi.

We realised this week that ESMValTool has two diagnostics for blocking coming from projects, one from MAGIC and another one from PRIMAVERA with no coordination between them. In fact, it was even worse: the blocking for PRIMAVERA was already a merged version of two diagnostics developed separately.

I think this kind of issue is going to become more common in the future and that we need a strategy to improve coordination between projects.

My suggestion is;

Create a GitHub project for each research project involved in the tool
Create an issue for each diagnostic or metric that is going to be added, including if possible the due date for it.

Once the diagnostics are added there are two possibilities:

No conflicts: Bingo!
There is already a version on the tool or under development: a decision has to be made. This can lead to joint developments between projects, enhancements to existing diagnostics or even to include independent alternatives if we see a good reason for that.

What do you think?

pylint support

We should add pylint support (either in documentation, with a script, in setup.py, ....)

pep8 code style checking for python

The coding style of Python code according to PEP8 rules can be checked by the flake8 tool:

http://flake8.pycqa.org/en/latest/index.html

pep8 rules can be customized for your project. To have a consistent coding style helps to avoid merge conflicts just due to formatting changes (sometimes automatically done by your editor).

To have a chance to cope with pep8 rules you need to configure your editor. See example:

http://birdhouse.readthedocs.io/en/latest/dev_guide.html#python-code-style

pep8 checks can be integrated in continuous integration platforms like travis. See example from the pywps project:

https://github.com/geopython/pywps/blob/develop/.travis.yml

Automated tests ... toward CI

I have seen that the draft I developed for testing individual diagnostics (tests/test_diagnostics) using synthetic data has already been moved from my original SVN branch to the current refactoring branch.

I am a bit puzzled about that, as the code was not ready yet to be merged.

Thus my question is now
a) who was doing that integration
b) what would be the baseline for the further testing. Should I branch off from the current refactoring backend branch?

This is quite essential if we want to use the testing for continuous integration with services like travis-ci

Best, Alex

PEP440 version identification

Did we want to adopt the PEP440 version identification standard for ESMValTool or more of a semantic versioning approach?

PEP440 makes sense to me, and if there is consensus did we want to also consider use of versioneer?

config_private_DLR-PA2.xml in many namelists

I think in the release version it should be just, config_private.xmlotherwise might be misleading for new users.

Move all machine-dependent path settings to a developer's configuration file

Machine-dependent paths for the different project classes (CMIP5, OBS, EMAC etc) are defined in a DataFinder class in data_finder.py.

Since more and more settings will be added in the future, it is desirable to have them in a configuration file without need to change the code (also in view of the ESGF coupling).

This configuration file is to be considered as a developer's config, while the config.ini (later config.yml) is the user config.

get_wks with epsi produces filename.epsi.epsi

Choosing epsi as output file type and using get_wks (plot_scripts/ncl/aux_plotting.ncl) produces output files, where epsi is attached two times: filename.epsi.epsi
It is attached by get_outfile_name and once again by the ncl intrinsic function gsn_open_wks.

Suggestion to resolve this issue:
Change in get_wks line : wks = gsn_open_wks(file_type, basename(outfile) )

Documentation

Documentation should be done in the future using Sphinx and continuous integration.

Could some of the admins please enable ReadTheDocs as service, the we can further go ahead with the automatic compilation of the documentation

Backend Milestone

Should we commit to a development backend milestone for the project?

If so, then create one for the project here and we then also need to agree on a date for the milestone.

Milestones can then be added to relevant issues and pull-requests, which makes it easy to see what is on-going and outstanding with respect to the milestone.

files only differing in using upper or lower case name

There are 8 ncl files that have version with upper and lower case named. This causes issues on non case sensitive file systems. Following files are an issue:
reformat_scripts/cmor/CMOR_NBP.dat
variable_defs/CSOIL.ncl
variable_defs/CVEG.ncl
variable_defs/FGCO2.ncl
variable_defs/GPP.ncl
variable_defs/LAI.ncl
variable_defs/NBP.ncl
variable_defs/O2_onelev.ncl

Synthetic / Sample data needed for diagnostic tests

For running the diagnostic tests we need some sort of data to run these diagnostics on. Could be based on a few simple models, completely synthetic, or otherwise.

Gitter on ESMValGroup

It would be nice to have a gitter channel on the organization level to "chat" among users and developers.

Coverage reporting setup in Codacy

We should add test coverage reporting to Codacy so can see how we are doing with testing.

https://support.codacy.com/hc/en-us/articles/207279819-Coverage
https://github.com/codacy/python-codacy-coverage#setup

As we use multiple languages this could get rather complex. I suggest we start by only computing coverage for python.

As a side note, it seems our Coday setup is in "bjoernbroetz". Perhaps we should migrate it to "ESMValGroup"

Constants and their units for cube calculations

There should be a central source for constant values within the ESMValTool.
- one source might be scipy.constants

>>>from scipy import constants
>>>constants.value('Avogadro constant'), constants.unit('Avogadro constant')
6.022140857e+23, 'mol^-1'

- less common constants should be defined centrally

For cube operations involving constants there should be a way to automatically generate the unit of the resulting cube.
- cf_units (iris dependency) has the functionality to parse and combine units

>>>import cf_units
>>>g = 9.81
>>>g_unit = cf_units.Unit('m s^-2')
>>>mw_air = 100
>>>mw_air_unit = cf_units.Unit('g mol^-1')
>>>g * mw_air, g_unit * mw_air_unit
>>>print(g * mw_air, g_unit * mw_air_unit)
(981.0, Unit('0.001 meter-kilogram-second^-2-mole^-1'))

With a small class to prevent modifications to the constants this could then look like this:

class _Const(object):
    def __init__(self, value, unit):
        self._value = value
        self._unit = unit
    
    @property
    def value(self):
        return self._value
    
    @value.setter
    def value(self, x):
        raise TypeError
    
    @property
    def unit(self):
        return self._unit
    
    @unit.setter
    def unit(self, x):
        raise TypeError

>>>G = _Const(9.80665, 'm s^-2')
>>>print(G.value, G.unit)
9.80665 m s^-2
>>>G.value = 3
TypeError: Not allowed to change the value of a constant.
>>>G.unit = 'kg'
TypeError: Not allowed to change the unit of a constant.

new_cube = cube * G.value
new_cube.unit = cube.units * G.unit

Depreciated 'commands' module since python 2.6

Get the following error:
Traceback (most recent call last):
File "main.py", line 36, in
from auxiliary import info, error, print_header, ncl_version_check
File "./interface_scripts/auxiliary.py", line 3, in
import commands
ModuleNotFoundError: No module named 'commands'

using python3. O.k, this is not supported yet, but the commands module is actually depreciated since python 2.6. It should be replaced by subprocess.

Docker setup for different NCL versions

It would be great to have docker containers for different NCL versions. NCL 6.3 is not supported, but we should setup a container for NCL 6.2 and NCL 6.4.

This would enable us to run unit tests and diagnostic tests in both environments.

Is related also to #68 .
@nielsdrost , could you take care of this?

using NCL from conda

NCL (version 6.3.0) can now be installed via conda:

https://www.ncl.ucar.edu/Download/conda.shtml

The ESMValTool requirements can be installed with a conda environment.yml file:

https://conda.io/docs/using/envs.html#share-an-environment

	# Execute the ncl reformat script
	if ((not os.path.isfile(project_info['TEMPORARY']['outfile_fullpath']))
	or project_info['GLOBAL']['force_processing']):

	info(" Calling " + reformat_script + " to check/reformat model data",
	verbosity,
	required_verbosity=1)

	projects.run_executable(reformat_script, project_info, verbosity,
	exit_on_warning)

	;; Output data to file
	info_output("adding file " + out_file, verbosity, 2)
	fout = addfile(out_file, "c")
	filedimdef(fout, "time", -1, True)
	fout->$variable$ = new_data