Coder Social home page Coder Social logo

climaf's Introduction

CliMAF : a Climate Model Assessment Framework

CliMAF doc is available at [Readthedocs] (http://climaf.readthedocs.org/) , which includes installation instructions

The aim of CliMAF is to allow for an actual, easy, collaborative development of climate model outputs assessment suites by climate scientists with varied IT background, and to ultimately share such suites for the benefit of the Climate Science.

It is basically a python-scriptable way to process NetCDF CF compliant climate model outputs by piping arbitrary executables, with easy access to datasets and automated results caching.

CliMAF was designed by Stéphane Sénési, who also developped most of the code before version 2.

Test suite Doc build

climaf's People

Contributors

jservonnat avatar ludivinev avatar rigoudyg avatar senesis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

climaf's Issues

An EOF operator for CliMAF and multi-variable outputs

The computation of EOFs comes with an issue: we normally have two outputs from an EOF decomposition (the eigenvectors and the principal components). We might also need to have access to the eigenvalues (explained variance).
Therefore two strategies are possible:

  • we say that climaf only handles one-variable output files and we try to find a solution to produce the EOFs, the principal components and the eigenvalues separately (cdo might be able to do that)
  • we produce one netcdf with more than one variable (in this case, three)
    Discussion is open!

Automatic cache management

When the cache becomes heavily loaded with results, the time spent scanning the index can be of the same order than the time spent to actually compute the result (especially in the context of the production of big atlases).

Here are some ideas to implement an automatic smart cache management (applied at the end of a CliMAF routine script for instance).

Each time we do a cfile (only when it leads to a new result), we keep:

  • the time spent by CliMAF to scan the index
  • the time spent to obtain the result

Using this information, we could clean the cache (say, with user-provided instructions at the end of the CliMAF script) by removing:

  • the files that have a longer 'search time' than 'execution time'
  • the files that have a longer 'search time' than a user-provided threshold (say, 1s)

We could also say that we don't want to have more than XX files in the cache (use of the disks), so we keep the XX files that have the longest 'execution time' and remove all the others.

Outputs and logs written to current directory

I'm using climaf in a web processing service and the climaf executable is started in directory where it might not have write permission (using an unprivileged service user). Climafs writes log files (and probably more) to the current directory ... this fails in my service use case.

Output path with expected write permission should be made configurable (logs/, temp/, outputs/, ...).

Interface to Drakkar CDFTools

CliMAF should interface to the following Drakkar CDFTools : cdfmean, cdfheatc, cdftransport, cdfsection, cdfmxlheatc, cdfstd

Fix for daily CMIP5 datasets

I've encountered difficulties to reach daily datasets on the CMIP5 archive:

summary(ds(project = 'CMIP5', variable='pr', model = 'GFDL-CM3', experiment = 'historical', frequency = 'daily', period = '19900101-19901231'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-3564c6456b32> in <module>()
      9 #crm(pattern='ensemble_ts_plot')
     10 #ncdump(clim_average(ds(variable='tas', **cmip_dict), 'JJA'))
---> 11 summary(ds(variable='pr', **cmip_dict))
     12 #if 'daily' in test.crs:
     13 #    print 'ok'

/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/functions.pyc in summary(dat)
    348             print '--'
    349     elif isinstance(dat,classes.cdataset):
--> 350         if not dat.baseFiles():
    351             print '-- No file found for:'
    352         else:

/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/classes.pyc in baseFiles(self, force)
    446                 if filenameVar : dic["filenameVar"]=filenameVar
    447             clogger.debug("Looking with dic=%s"%`dic`)
--> 448             self.files=dataloc.selectLocalFiles(**dic)
    449         return self.files
    450 

/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/dataloc.py in selectLocalFiles(**kwargs)
    196             rep.extend(selectEmFiles(**kwargs2))
    197         elif (org == "CMIP5_DRS") :
--> 198             rep.extend(selectCmip5DrsFiles(urls,**kwargs2))
    199         elif (org == "generic") :
    200             rep.extend(selectGenericFiles(urls, **kwargs2))

/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/dataloc.py in selectCmip5DrsFiles(urls, **kwargs)
    517                     #if freqd in ['daily','day']:
    518                     #   regex=r'^.*([0-9]{4}[0-9]{2}[0-9]{2}-[0-9]{4}[0-9]{2}[0-9]{2}).nc$'
--> 519                     fileperiod=init_period(re.sub(regex,r'\1',f))
    520                     if (fileperiod and period.intersects(fileperiod)) :
    521                         rep.append(f)

/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/period.pyc in init_period(dates)
    160     start=(4-len(start))*"0"+start
    161     # TBD : check that start actually matches a date
--> 162     syear  =int(start[0:4])
    163     smonth =int(start[4:6])  if len(start) > 5  else 1
    164     sday   =int(start[6:8])  if len(start) > 7  else 1

ValueError: invalid literal for int() with base 10: '/pro'

It actually comes from dataloc.py: the provided 'regex' is only valid for monthly datasets. I've tried this patch in dataloc (line 516) and it works:
replace:

                    regex=r'^.*([0-9]{4}[0-9]{2}-[0-9]{4}[0-9]{2}).nc$'

with:

                    if freqd in ['monthly','mo']:
                       regex=r'^.*([0-9]{4}[0-9]{2}-[0-9]{4}[0-9]{2}).nc$'
                    if freqd in ['daily','day']:
                       regex=r'^.*([0-9]{4}[0-9]{2}[0-9]{2}-[0-9]{4}[0-9]{2}[0-9]{2}).nc$'

I will set up a PR as soon as possible.

Beta-testing -- Adding realm to the dataloc class

When exploring the dataloc functionality with the example available in cmip5drs.py, I've had the case of getting two files for the following request:

urls_CMIP5_Ciclad=["/prodigfs/esg"]
dataloc(organization="CMIP5_DRS", url=urls_CMIP5_Ciclad)
cdef("frequency","monthly") ;  cdef("project","CMIP5")
tas1pc=ds(model="IPSL-CM5A-MR", experiment="historical", variable="pr", period="1860-1961")
files=tas1pc.selectFiles()
print files

Here is what I get:

/prodigfs/esg/CMIP5/merge/IPSL/IPSL-CM5A-MR/historical/mon/atmos/Amon/r1i1p1/v20111119/pr/pr_Amon_IPSL-CM5A-MR_historical_r1i1p1_185001-200512.nc /prodigfs/esg/CMIP5/merge/IPSL/IPSL-CM5A-MR/historical/mon/ocean/Omon/r1i1p1/v20111119/pr/pr_Omon_IPSL-CM5A-MR_historical_r1i1p1_185001-200512.nc

I've tried adding cdef("realm","atmos") but it didn't change the result.

cdef("frequency","monthly") ;  cdef("project","CMIP5") ; cdef("realm","atmos")

If you confirm that it is relevant then I'll try to do my first contribution to CliMAF by adding "realm" to dataloc.

Make used data archive configurable

Currently one has to edit the Python module site_settings.py to change or add the data archive used by climaf. It would be nice if the data archive could be configured.

Ensembles on multiple attributes?

Today, we can easily build an ensemble over multiple values of one attribute using cdataset.explore('ensemble').
It would definitely be interesting to build ensembles over multiple attributes: model and realization, institute, driving_model and model (for CORDEX/RCM projects).
One issue is the naming of the member; an answer is that we name the members with the value of each attribute (for the member), separated with '_' (or a user-provided separator?).
Example: CNRM-CM5_r1i1p1

The behaviour of explore('ensemble') could be:

  • cdataset.explore('ensemble') returns an error if more than one attribute has multiple values; and also returns the list of those attributes
  • cdataset.explore('ensemble', build_ensemble_on=['model','realization']) forces to build the ensemble on the multiple values of model and realization and names the members with model_realization; returns a similar error as above if not only the attributes provided to build_ensemble_on (here model and realization) have multiple values)

Need to handle multi-variable datasets

It would be useful to let dataset be multi-variable, in order to cope with datafile organizations where variables are grouped in files (such as for Nemo model diags). This would save file operations when such groups of variable are used together by some operators. This would however break the regularity of CliMAF dataset model

last_XXY now exists! what about first_XXY?

first_XXY is the last option left (period='last_XXY' and period='*' are already available) that could be transferred from the CESMEP modules (time_manager) to cdataset.explore.

Dealing with a 5 dimensional dataset (CMIP6 msftyz)

The variable msftyz in CMIP6 is 5 dimensional (x, y, olevel, time, and \3basin).
At the moment mcdo.sh (and consequently ds()) can't work on a 5 dimensional dataset (not supported by CDO, and not in short term plans on their side).
Adding a ncks + collapsing of the \3basin dimension would allow reducing to 4 dimensions from mcdo, but the feasibility is yet to be explored...

Saving the 'url' that matches the file found in ds()

Having access to the pattern that matches the file(s) found with ds() would allow:

  • automatically fill the missing values of the keywords
  • and an automatic handling of all the projects in the period manager (C-ESM-EP); avoid hard coding

a CliMAF server would be useful

CliMAF allows to derive basic and advanced results, and it can cache it. When the data is already computed and handled in the cache, the main part of the response time is due to loading the software. Implementing a CliMAF server with a light client communicating through RPC would significantly improve the response time

Need a punchy notebook for popularizing CliMAF unique features

There is extensive doc and examples for CliMAF use. It is however rather a reference documentation, which is boring at first glance. The front page of doc should give access to a rather short doc, which would be the html version of a punchy notebook, and which would exemplifies CliMAF most unique features, from the point of view of its use by a scientist. It may also provide links to the various chapter of the doc for providing further reference for those features.

Problem converting to CliMAF object to MA

I've been working with on the possibility to mix CDAT and CliMAF. But for the moment I have difficulties to convert a CliMAF object to a MA:

jservon@ciclad-ng:~/Evaluation/CliMAF> python
Python 2.7.4 (default, Apr 22 2014, 14:55:23)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> from climaf.api import *
Cache set to /data/jservon/climaf_cache
>>> cdef('project','CMIP5')
>>> cdef('experiment','historical')
>>> cdef('frequency','monthly')
>>> dataloc(organization="CMIP5_DRS",url=['/prodigfs/esg/'])
<climaf.dataloc.dataloc instance at 0x7f47c10a1c68>
>>> dat=ds(model='IPSL-CM5A-LR',
...        rip='r1i1p1',
...        variable='tas',
...        period='1980-2000',
...        )
>>> dat.baseFiles()
'/prodigfs/esg/CMIP5/merge/IPSL/IPSL-CM5A-LR/historical/mon/atmos/Amon/r1i1p1/v20110406/tas/tas_Amon_IPSL-CM5A-LR_historical_r1i1p1_185001-200512.nc'
>>> test = cMA(dat)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ssenesi/climaf/climaf/api.py", line 160, in cMA
    return climaf.driver.ceval(obj,format='MaskedArray',deep=deep)
  File "/home/ssenesi/climaf/climaf/driver.py", line 180, in ceval
    rep=ceval(extract,userflags=userflags,format=format,deep=deep,recurse_list=recurse_list)
  File "/home/ssenesi/climaf/climaf/driver.py", line 267, in ceval
    return cread(file)
  File "/home/ssenesi/climaf/climaf/driver.py", line 545, in cread
    if varname is None: varname=varOfFile(datafile)
  File "/home/ssenesi/climaf/climaf/netcdfbasics.py", line 14, in varOfFile
    if (filevar not in fileobj.dimensions) and not re.findall("^time_",filevar) :
NameError: global name 're' is not defined
>>>

@senesis any thought?

Check the actually available period

We need to add (or confirm) a check from explore that ensures that the CliMAF dataset actually covers the period requested by the user.
If the period is not fully available, it would be very useful to update the .period in the ds object (from explore('resolve') for instance).

Need for more basic functions in module html

Function html_table_line(s) are fine for tables. There is however a need for a more basic function, which would take two arguments : a CliMAF object of type figure and a label, and which would return the html code for a link from that label to the Climaf cache file for the figure

Latitude and longitude names in gplot.ncl

We have to find a way so that the script gplot.ncl is less sensitive to the name of the dimensions, notably for the spatial dimensions 'lat' and 'lon'. At the moment, if the input file has dimensions called 'LON' and 'LAT', the script returns an error.

fatal:["Execute.c":5861]:variable (lat) is not in file (ffile)

operators 'lines' and 'plot' should be smarter re. time axis

Tick marks should be smartly adapted to the time period duration. When datasets does not cover the same time period, the user should be able to choose wether time axis should be aligned to the same origin or just be the union of all time periods

CliMAF and provenance

CliMAF should record in NetCDF files history the list of basic datafiles used upstream of the computation, together with their creation date and maybe their tracking-ID and checksum.

More examples on how to plug scripts

We should add some simple examples of scripts in the different languages (python, ncl, R, ferret...) to provide a simple basis to develop a script that can be plugged in CliMAF.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.