rigoudyg / climaf Goto Github PK
View Code? Open in Web Editor NEWCliMAF - a Climate Model Analysis Framework - doc at : http://climaf.readthedocs.org/
License: Other
CliMAF - a Climate Model Analysis Framework - doc at : http://climaf.readthedocs.org/
License: Other
first_XXY is the last option left (period='last_XXY' and period='*' are already available) that could be transferred from the CESMEP modules (time_manager) to cdataset.explore.
This would prove that CliMAF is mature enough for most use cases
It would be useful to let dataset be multi-variable, in order to cope with datafile organizations where variables are grouped in files (such as for Nemo model diags). This would save file operations when such groups of variable are used together by some operators. This would however break the regularity of CliMAF dataset model
Currently one has to edit the Python module site_settings.py
to change or add the data archive used by climaf. It would be nice if the data archive could be configured.
When exploring the dataloc functionality with the example available in cmip5drs.py, I've had the case of getting two files for the following request:
urls_CMIP5_Ciclad=["/prodigfs/esg"]
dataloc(organization="CMIP5_DRS", url=urls_CMIP5_Ciclad)
cdef("frequency","monthly") ; cdef("project","CMIP5")
tas1pc=ds(model="IPSL-CM5A-MR", experiment="historical", variable="pr", period="1860-1961")
files=tas1pc.selectFiles()
print files
Here is what I get:
/prodigfs/esg/CMIP5/merge/IPSL/IPSL-CM5A-MR/historical/mon/atmos/Amon/r1i1p1/v20111119/pr/pr_Amon_IPSL-CM5A-MR_historical_r1i1p1_185001-200512.nc /prodigfs/esg/CMIP5/merge/IPSL/IPSL-CM5A-MR/historical/mon/ocean/Omon/r1i1p1/v20111119/pr/pr_Omon_IPSL-CM5A-MR_historical_r1i1p1_185001-200512.nc
I've tried adding cdef("realm","atmos") but it didn't change the result.
cdef("frequency","monthly") ; cdef("project","CMIP5") ; cdef("realm","atmos")
If you confirm that it is relevant then I'll try to do my first contribution to CliMAF by adding "realm" to dataloc.
I've encountered difficulties to reach daily datasets on the CMIP5 archive:
summary(ds(project = 'CMIP5', variable='pr', model = 'GFDL-CM3', experiment = 'historical', frequency = 'daily', period = '19900101-19901231'))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-3564c6456b32> in <module>()
9 #crm(pattern='ensemble_ts_plot')
10 #ncdump(clim_average(ds(variable='tas', **cmip_dict), 'JJA'))
---> 11 summary(ds(variable='pr', **cmip_dict))
12 #if 'daily' in test.crs:
13 # print 'ok'
/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/functions.pyc in summary(dat)
348 print '--'
349 elif isinstance(dat,classes.cdataset):
--> 350 if not dat.baseFiles():
351 print '-- No file found for:'
352 else:
/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/classes.pyc in baseFiles(self, force)
446 if filenameVar : dic["filenameVar"]=filenameVar
447 clogger.debug("Looking with dic=%s"%`dic`)
--> 448 self.files=dataloc.selectLocalFiles(**dic)
449 return self.files
450
/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/dataloc.py in selectLocalFiles(**kwargs)
196 rep.extend(selectEmFiles(**kwargs2))
197 elif (org == "CMIP5_DRS") :
--> 198 rep.extend(selectCmip5DrsFiles(urls,**kwargs2))
199 elif (org == "generic") :
200 rep.extend(selectGenericFiles(urls, **kwargs2))
/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/dataloc.py in selectCmip5DrsFiles(urls, **kwargs)
517 #if freqd in ['daily','day']:
518 # regex=r'^.*([0-9]{4}[0-9]{2}[0-9]{2}-[0-9]{4}[0-9]{2}[0-9]{2}).nc$'
--> 519 fileperiod=init_period(re.sub(regex,r'\1',f))
520 if (fileperiod and period.intersects(fileperiod)) :
521 rep.append(f)
/home/jservon/Evaluation/CliMAF/climaf_installs/climaf_1.0.3_CESMEP/climaf/period.pyc in init_period(dates)
160 start=(4-len(start))*"0"+start
161 # TBD : check that start actually matches a date
--> 162 syear =int(start[0:4])
163 smonth =int(start[4:6]) if len(start) > 5 else 1
164 sday =int(start[6:8]) if len(start) > 7 else 1
ValueError: invalid literal for int() with base 10: '/pro'
It actually comes from dataloc.py: the provided 'regex' is only valid for monthly datasets. I've tried this patch in dataloc (line 516) and it works:
replace:
regex=r'^.*([0-9]{4}[0-9]{2}-[0-9]{4}[0-9]{2}).nc$'
with:
if freqd in ['monthly','mo']:
regex=r'^.*([0-9]{4}[0-9]{2}-[0-9]{4}[0-9]{2}).nc$'
if freqd in ['daily','day']:
regex=r'^.*([0-9]{4}[0-9]{2}[0-9]{2}-[0-9]{4}[0-9]{2}[0-9]{2}).nc$'
I will set up a PR as soon as possible.
When the cache becomes heavily loaded with results, the time spent scanning the index can be of the same order than the time spent to actually compute the result (especially in the context of the production of big atlases).
Here are some ideas to implement an automatic smart cache management (applied at the end of a CliMAF routine script for instance).
Each time we do a cfile (only when it leads to a new result), we keep:
Using this information, we could clean the cache (say, with user-provided instructions at the end of the CliMAF script) by removing:
We could also say that we don't want to have more than XX files in the cache (use of the disks), so we keep the XX files that have the longest 'execution time' and remove all the others.
This would allow to use e.g. 'ncdump -h' from inside CliMAF. The text output wouldn't be managed by CliMAF (but only displayed)
Tick marks should be smartly adapted to the time period duration. When datasets does not cover the same time period, the user should be able to choose wether time axis should be aligned to the same origin or just be the union of all time periods
We need to add (or confirm) a check from explore that ensures that the CliMAF dataset actually covers the period requested by the user.
If the period is not fully available, it would be very useful to update the .period in the ds object (from explore('resolve') for instance).
This is needed at least for operator 'plot' for a secondary scalar input field
Today, we can easily build an ensemble over multiple values of one attribute using cdataset.explore('ensemble').
It would definitely be interesting to build ensembles over multiple attributes: model and realization, institute, driving_model and model (for CORDEX/RCM projects).
One issue is the naming of the member; an answer is that we name the members with the value of each attribute (for the member), separated with '_' (or a user-provided separator?).
Example: CNRM-CM5_r1i1p1
The behaviour of explore('ensemble') could be:
I'm using climaf in a web processing service and the climaf executable is started in directory where it might not have write permission (using an unprivileged service user). Climafs writes log files (and probably more) to the current directory ... this fails in my service use case.
Output path with expected write permission should be made configurable (logs/
, temp/
, outputs/
, ...).
This to match cases where, depending on the input parameters, an operator will or won't output a given, secondary, field
A short notebook to show how to use cMA or get the file name and open it in python with netcdf4 (or any other netcdf library).
This is due to limitation at CNRM of the implementation of HDF5, which is not thread safe
It would we preferable that alternate packages are supported too (e.g. NetCDF4, scipy.io.netcdf ..)
Python 3.x (starting with 3.5) is more and more used. climaf is currently only supporting Python 2.7. It would be nice if climaf can support both 2.7 and 3.x (>=3.6).
Compatibility can be achieved by using six:
https://pythonhosted.org/six/
One can also use a compat.py
module to handle 2.7/3.x compatibility, example:
https://github.com/geopython/pywps/blob/master/pywps/_compat.py
climaf currently has now setup.py
and can not be installed using pip.
I just hacked quickly a setup.py
for climaf in my fork:
https://github.com/cehbrecht/climaf/blob/pingudev/setup.py
This should be done in a cleaner way (scripts?).
I'm using the fork to build a conda package:
Data users should be provided with an easy way to query the ES-Doc errata system for the dataset they are using
CliMAF allows to derive basic and advanced results, and it can cache it. When the data is already computed and handled in the cache, the main part of the response time is due to loading the software. Implementing a CliMAF server with a light client communicating through RPC would significantly improve the response time
I've been working with on the possibility to mix CDAT and CliMAF. But for the moment I have difficulties to convert a CliMAF object to a MA:
jservon@ciclad-ng:~/Evaluation/CliMAF> python
Python 2.7.4 (default, Apr 22 2014, 14:55:23)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> from climaf.api import *
Cache set to /data/jservon/climaf_cache
>>> cdef('project','CMIP5')
>>> cdef('experiment','historical')
>>> cdef('frequency','monthly')
>>> dataloc(organization="CMIP5_DRS",url=['/prodigfs/esg/'])
<climaf.dataloc.dataloc instance at 0x7f47c10a1c68>
>>> dat=ds(model='IPSL-CM5A-LR',
... rip='r1i1p1',
... variable='tas',
... period='1980-2000',
... )
>>> dat.baseFiles()
'/prodigfs/esg/CMIP5/merge/IPSL/IPSL-CM5A-LR/historical/mon/atmos/Amon/r1i1p1/v20110406/tas/tas_Amon_IPSL-CM5A-LR_historical_r1i1p1_185001-200512.nc'
>>> test = cMA(dat)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ssenesi/climaf/climaf/api.py", line 160, in cMA
return climaf.driver.ceval(obj,format='MaskedArray',deep=deep)
File "/home/ssenesi/climaf/climaf/driver.py", line 180, in ceval
rep=ceval(extract,userflags=userflags,format=format,deep=deep,recurse_list=recurse_list)
File "/home/ssenesi/climaf/climaf/driver.py", line 267, in ceval
return cread(file)
File "/home/ssenesi/climaf/climaf/driver.py", line 545, in cread
if varname is None: varname=varOfFile(datafile)
File "/home/ssenesi/climaf/climaf/netcdfbasics.py", line 14, in varOfFile
if (filevar not in fileobj.dimensions) and not re.findall("^time_",filevar) :
NameError: global name 're' is not defined
>>>
@senesis any thought?
The computation of EOFs comes with an issue: we normally have two outputs from an EOF decomposition (the eigenvectors and the principal components). We might also need to have access to the eigenvalues (explained variance).
Therefore two strategies are possible:
The variable msftyz in CMIP6 is 5 dimensional (x, y, olevel, time, and \3basin).
At the moment mcdo.sh (and consequently ds()) can't work on a 5 dimensional dataset (not supported by CDO, and not in short term plans on their side).
Adding a ncks + collapsing of the \3basin dimension would allow reducing to 4 dimensions from mcdo, but the feasibility is yet to be explored...
We should add some simple examples of scripts in the different languages (python, ncl, R, ferret...) to provide a simple basis to develop a script that can be plugged in CliMAF.
Olivier : 4.5s for plotting a 2d field is too long
Function html_table_line(s) are fine for tables. There is however a need for a more basic function, which would take two arguments : a CliMAF object of type figure and a label, and which would return the html code for a link from that label to the Climaf cache file for the figure
CliMAF should interface to the following Drakkar CDFTools : cdfmean, cdfheatc, cdftransport, cdfsection, cdfmxlheatc, cdfstd
In that case, it should provide a single file with variable names suffixed by member label
In order to cope with cases where for a given variable, the corresponding filename may be formed either using the variable name or using 'filenameVar', another string which is declared using 'calias()'
We have to find a way so that the script gplot.ncl is less sensitive to the name of the dimensions, notably for the spatial dimensions 'lat' and 'lon'. At the moment, if the input file has dimensions called 'LON' and 'LAT', the script returns an error.
fatal:["Execute.c":5861]:variable (lat) is not in file (ffile)
Having access to the pattern that matches the file(s) found with ds() would allow:
I run into an issue with temp folders in mcdo.sh
:
https://github.com/senesis/climaf/blob/3e1762ec788674b470d895b15aa398184c77bb4a/scripts/mcdo.sh#L25
It creates the temp folder in the current folder which might be write protected. The following patch worked for me:
$ mktemp -t climaf_mcdo -d
OR
$ mktemp -d /tmp/climaf_mcdo_XXXXXX
See also #85.
There is a need to be able to read a dataset from a file and work at once with it without configuring a 'CliMAF project' for that (issue actually reported by Jerome)
CliMAF should record in NetCDF files history the list of basic datafiles used upstream of the computation, together with their creation date and maybe their tracking-ID and checksum.
There is extensive doc and examples for CliMAF use. It is however rather a reference documentation, which is boring at first glance. The front page of doc should give access to a rather short doc, which would be the html version of a punchy notebook, and which would exemplifies CliMAF most unique features, from the point of view of its use by a scientist. It may also provide links to the various chapter of the doc for providing further reference for those features.
Should develop a way to read a file system 'audit file', in order to avoid globbing too large directories, as e.g. /bdd on Ciclad
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.