Code associated with cmip5 analysis
durack1 / cmip5 Goto Github PK
View Code? Open in Web Editor NEWCode associated with cmip analysis
Code associated with cmip analysis
https://goo.gl/JF52fu - DRS (Data Reference Syntax) comparisons between MIPs
The read directory block in make_cmip5_xmls.py
here
Compare list_outfiles and list_outfiles_paths to previous version of xml database prior to initiating xml creation - this way if no changes are to be made, the job will complete rather than loading up machines and the network over a weekend.
make_cmip5_xml.py
May need to interrogate Jeff Painter's database of files - which provides an update as to whether any new files (or partial files) have changed since the last run
Deal with non-warning errors: CDAT/cdat#1512
Tweaks required at https://github.com/durack1/cmip5/blob/master/make_cmip5_xml.py#l493-l508
make_cmip5_xml.py
has become difficult to test, so extracting the functions and rewriting this as a driver script that loads functions would be a prudent way forward - it will also allow much easier testing of code tweaks
MPI-MR loses lat/lon coordinates somewhere - chase down where this info is lost, UV-CDAT bug or needs hard-coding
Python upgrade is tripping over poorly formed dictionaries
/work/cmip5/cron.sh: line 21: activate: No such file or directory
File "/work/cmip5/make_cmip5_xml.py", line 356
'adm07.cmcc.it',
^
SyntaxError: invalid syntax
Most data has been migrated across to the CSS03 hardware, the scan dir list should be updated to reflect this change
Branch times, indirect effects and other factors identifiable from Table 9.A.1 (AR5 - Chap9)
Add: hfls, hfss, pflw, rlds, rlus, rsds, rsus, sic, sim, sit, snc, snd, snw, tpf, tsl
variables and check to ensure that variables found in Amon
and Omon
are correctly included in each
From: "Po-Chedley, Stephen" @pochedls
Date: Tuesday, November 7, 2017 at 4:11 PM
To: "Durack, Paul J."
Subject: Biogeochemistry variables
Hi Paul,
I was hoping you could add the following search in your cdscan cron:
variables: dissic, ph
experiments: historical, rcp*
frequency: monthly
realm: ocnBgchem
If this is a pain, let me know and I can find another path forward.
As a side note, as I was looking for this data, I wrote python script that uses the ESGF API to generate wget scripts. If this could ever be useful to you, let me know.
Thanks,
Steve
The standard paths of data/scratch also include a new path of cmip5 - this needs to be added to the upper level search list and scan code tested to ensure files are being retrieved
The code needs to be updated to prevent purge/rm attempts at the spliced historical-rcp85
subdirs that are owned by @doutriaux1 - these subdirs are not updated as part of the xml scan.
** Generating new *.xml files **
rm: cannot remove `historical-rcp85/atm/mo/rsdt/cmip5.CESM1-WACCM.historical-rcp85.r3i1p1.mo.atm.Amon.rsdt.ver-v20130314.latestX.xml': Permission denied
rm: cannot remove `historical-rcp85/atm/mo/rsdt/cmip5.IPSL-CM5A-LR.historical-rcp85.r3i1p1.mo.atm.Amon.rsdt.ver-1.latestX.xml': Permission denied
...
The log files and pkl files have started to take up significant disk space:
151122_120003
** Write mode - new *.xml files will be written **
master pid20829
p1 pid: 20862
p2 pid: 20867
p3 pid: 20873
p4 pid: 20882
p5 pid: 20889
p6 pid: 20899
p7 pid: 20906
p8 pid: 20912
pathToFile - Exception: list index out of range /cmip5_css01/scratch/cmip5/output1/MRI
pathToFile - Exception: list index out of range /cmip5_css02/scratch/cmip5/output1/CCCma/CanESM2/jfyfe_special
p9 pid: 20923
gdo2_scratch scan complete.. 0 paths total; 0 output files to be written (107 vars sampled)
gdo2_data scan complete.. 1897 paths total; 882 output files to be written (107 vars sampled)
css01_scratch scan complete.. 106484 paths total; 46351 output files to be written (107 vars sampled)
css01_data scan complete.. 9054 paths total; 5549 output files to be written (107 vars sampled)
css02_scratch scan complete.. 113255 paths total; 53797 output files to be written (107 vars sampled)
css02_data scan complete.. 85076 paths total; 42756 output files to be written (107 vars sampled)
css02_cmip5 scan complete.. 41 paths total; 21 output files to be written (107 vars sampled)
css02_gc scan complete.. 15222 paths total; 9156 output files to be written (107 vars sampled)
css01_gc scan complete.. 202 paths total; 129 output files to be written (107 vars sampled)
Traceback (most recent call last):
File "/work/cmip5/make_cmip5_xml.py", line 822, in <module>
logFile = logFiles[logCount]
IndexError: list index out of range
** Crash while trying to create a new directory: /work/cmip5/1pctCO2/atm/mo_new/pr
Issue maybe resolved by a pre-check for directory existence. Atomic operations are likely the cause of this.
From: Zelinka, Mark
Date: Tuesday, March 10, 2015 at 9:09 AM
To: "Zhou, Chen", "Paul J. Durack"
Subject: Re: XML files for PiControl clcalipso
Hi Paul,
Just to clarify / modify this request:
variable: clcalipso
experiments: piControl and abrupt4xCO2
realm: atmos
freq: monthly
table: cfMon
Thanks!
Mark
@painter1 has just alerted me that a new directory structure will be created during the process of scratch
data sanitation..
Existing directories with problematic data such as:
/cmip5_css01/scratch/cmip5/output1/MRI
/cmip5_css01/scratch/cmip5/output1/LASG-CESS/
will become
/cmip5_css01/scratch/_gc/*/cmip5/output1/MRI
/cmip5_css01/scratch/_gc/to_keep/cmip5/output1/LASG-CESS/
These additional paths will need to be added to the existing search as 105k of the 145k local xmls point to data in scratch
either on css01
or css02
list_vars
PID
of master process to logfile contents (not just filename)/sendmail outputcmip[5-6]/output[0-9]
placeholder for indexversion
(and variable
) - so that paths are recognized rather than hard-codedPID
test for existing run - prevent xml over runs** Processing xml: cmip5.EC-EARTH.rcp45.r14i1p1.mo.ocn.Omon.so.ver-v20120307.latestX.WARN2.xml
25yr annual calculation for cmip5.EC-EARTH.rcp45.r14i1p1.mo.ocn.Omon.so.ver-v20120307.latestX.WARN2.xml
** Outfile: cmip5.EC-EARTH.rcp45.r14i1p1.an.so.ver-v20120307.2006-2009.nc being processed **
2006-1-1 0:0:0.0 2009-1-1 0:0:0.0
(0, 1097)
** Processing annual means for 2006 to 2009 **
(1097, 42, 292, 362)
id: time
Designated a time axis.
units: days since 2006-01-01 00:00:00
Length: 1097
First: 0.0
Last: 1096.0
Other axis attributes:
calendar: gregorian
axis: T
Python id: 0x7f1f594275d0
['000', '01', 2006-1-1 0:0:0.0]
['1096', '05', 2009-1-1 0:0:0.0]
Traceback (most recent call last):
File ".//make_cmip_annualMeans3D.py", line 292, in <module>
dan = cdu.YEAR(d)
File "/usr/local/uvcdat/2015-04-09/lib/python2.7/site-packages/cdutil/times.py", line 1291, in get
m = mergeTime(s,statusbar=statusbar,fill_value=getattr(slab,'fill_value',1.e20))
File "/usr/local/uvcdat/2015-04-09/lib/python2.7/site-packages/cdutil/times.py", line 199, in mergeTime
raise Exception,err
Exception: Error in merging process : duplicate time point
2006-7-2 12:0:0.0 is duplicated, cannot merge[]
And same issue with ** Processing xml: cmip5.EC-EARTH.rcp45.r13i1p1.mo.ocn.Omon.thetao.ver-v20120303.latestX.WARN2.xml
Leaving mrsofc
and sftgif
out of the fx_vars
declaration caused the following subdirs to be written:
[durack1@oceanonly ~]$ find /work/cmip5/*/*/fx_new -maxdepth 0
/work/cmip5/1pctCO2/land/fx_new
/work/cmip5/abrupt4xCO2/land/fx_new
/work/cmip5/amip4K/land/fx_new
/work/cmip5/amip4xCO2/land/fx_new
/work/cmip5/amipFuture/land/fx_new
/work/cmip5/amip/land/fx_new
/work/cmip5/historicalExt/land/fx_new
/work/cmip5/historicalGHG/land/fx_new
/work/cmip5/historical/land/fx_new
/work/cmip5/historicalMisc/land/fx_new
/work/cmip5/historicalNat/land/fx_new
/work/cmip5/past1000/land/fx_new
/work/cmip5/piControl/land/fx_new
/work/cmip5/rcp26/land/fx_new
/work/cmip5/rcp45/land/fx_new
/work/cmip5/rcp60/land/fx_new
/work/cmip5/rcp85/land/fx_new
These need to be systematically cleaned up
There maybe some utility in capturing the creation_date and tracking_id global attributes of all or some of the files which the xmls span - these could be included in the YYMMDD_HHMMSS_list_outfiles.pickle file
The current version (4cbde80) of make_cmip5_xml.py
attempts to purge all existing files in */*/mo
this is yielding the following errors for non-cdscan-ed directories:
Cron <duro@crunchy> /work/cmip5/cron.sh ; # Start Sunday 12pm
171224_120004
** Write mode - new *.xml files will be written **
master pid31168
UV-CDAT: /usr/local/uvcdat/2014-03-31/bin/python
p1 pid: 31191
p2 pid: 31198
p3 pid: 31206
p4 pid: 31221
p5 pid: 31230
p6 pid: 31238
p7 pid: 31251
p8 pid: 31259
pathToFile - Exception: list index out of range /cmip5_css01/scratch/cmip5/output1/MRI
pathToFile - Exception: list index out of range /cmip5_css02/scratch/cmip5/output1/CCCma/CanESM2/jfyfe_special
p9 pid: 31274
gdo2_scratch scan complete.. 0 paths total; 0 output files to be written (120 vars sampled)
gdo2_data scan complete.. 12148 paths total; 6139 output files to be written (120 vars sampled)
css01_scratch scan complete.. 34900 paths total; 19317 output files to be written (120 vars sampled)
css01_data scan complete.. 80140 paths total; 37684 output files to be written (120 vars sampled)
css02_scratch scan complete.. 114576 paths total; 59847 output files to be written (120 vars sampled)
css02_data scan complete.. 86899 paths total; 47379 output files to be written (120 vars sampled)
css02_cmip5 scan complete.. 41 paths total; 35 output files to be written (120 vars sampled)
css02_gc scan complete.. 15222 paths total; 9912 output files to be written (120 vars sampled)
css01_gc scan complete.. 202 paths total; 134 output files to be written (120 vars sampled)
** make_cmip5_xml.py run (PID: 31168) starting, querying for existing previous process **
** previous make_cmip5_xml.py run (PID: 21586) not found, continuing current process **
** Updating 175922 existing *.xml files **
** Generating new *.xml files **
rm: cannot remove `historical-rcp85/atm/mo/rsdt/cmip5.CESM1-WACCM.historical-rcp85.r3i1p1.mo.atm.Amon.rsdt.ver-v20130314.latestX.xml': Permission denied
rm: cannot remove `historical-rcp85/atm/mo/rsdt/cmip5.IPSL-CM5A-LR.historical-rcp85.r3i1p1.mo.atm.Amon.rsdt.ver-1.latestX.xml': Permission denied
rm: cannot remove `historical-rcp85/atm/mo/rsdt/cmip5.CESM1-WACCM.historical-rcp85.r2i1p1.mo.atm.Amon.rsdt.ver-v20130314.latestX.xml': Permission denied
rm: cannot remove `historical-rcp85/atm/mo/rsdt/cmip5.CCSM4.historical-rcp85.r1i1p1.mo.atm.Amon.rsdt.ver-v20130426.latestX.xml': Permission denied
The */*/mo
should be replaced by $experiment/*/mo
to get around this issue
Currently code purges problem xmls, rather change behaviour so that files that trigger a warning are renamed to include -WARN in the filename.
This ensures that existing data is indexed, but a user has to consider whether to use it without investigating or not.
An existing example of such an issue is below:
[durack1@oceanonly ~]$ more /work/cmip5/_logs/140822_120004_make_cmip5_xml-oceanonly-threads40-PID20836.log | grep /cmip5_css02/scratch/cmip5/output1/MOHC/HadGEM2-ES/piControl/mon/atmos/Amon/r1i1p1/v20130114/tas
** 0001830 140825_162356 274928.01s PROBLEM 2 (cdscan error - 'Warning, file tas_Amon_HadGEM2-ES_piControl_r1i1p1_208412-209910.nc, dimension time overlaps file tas_Amon_HadGEM2-ES_piControl_r1i1p1_209812-212311.nc') indexing /cmip5_css02/scratch/cmip5/output1/MOHC/HadGEM2-ES/piControl/mon/atmos/Amon/r1i1p1/v20130114/tas **
** 0001831 140825_162356 274928.48s PROBLEM 2 (cdscan error - 'Warning, file tasmax_Amon_HadGEM2-ES_piControl_r1i1p1_208412-209910.nc, dimension time overlaps file tasmax_Amon_HadGEM2-ES_piControl_r1i1p1_209812-212311.nc') indexing /cmip5_css02/scratch/cmip5/output1/MOHC/HadGEM2-ES/piControl/mon/atmos/Amon/r1i1p1/v20130114/tasmax **
** 0001866 140825_162952 275083.37s PROBLEM 2 (cdscan error - 'Warning, file tasmin_Amon_HadGEM2-ES_piControl_r1i1p1_208412-209910.nc, dimension time overlaps file tasmin_Amon_HadGEM2-ES_piControl_r1i1p1_209812-212311.nc') indexing /cmip5_css02/scratch/cmip5/output1/MOHC/HadGEM2-ES/piControl/mon/atmos/Amon/r1i1p1/v20130114/tasmin **
[durack1@oceanonly ~]$
Currently pickle dictionary archives and the main file have the extension *.cpklz
*.cpkl.gz
and *.cpkl
respectively.*.logz
filesPID
query will need to check for *.log.gz
rather than *.log
files[durack1@oceanonly cmip5]$ make_cmip5_xml.py report
160402_200052
** Report mode - no *.xml files will be written **
master pid2932
UV-CDAT: /usr/local/uvcdat/2016-02-17/bin/python
p1 pid: 2943
p2 pid: 2948
p3 pid: 2954
p4 pid: 2964
p5 pid: 2971
p6 pid: 2977
p7 pid: 2990
p8 pid: 3004
p9 pid: 3011
gdo2_scratch scan complete.. 0 paths total; 0 output files to be written (118 vars sampled)
gdo2_data scan complete.. 2222 paths total; 1184 output files to be written (118 vars sampled)
pathToFile - Exception: list index out of range /cmip5_css01/scratch/cmip5/output1/MRI
[u'', u'cmip5_css01', u'data', u'cmip5', u'CNRM-CERFACS', u'CNRM-CM5', u'amip', u'mon', u'land', u'Lmon', u'r1i1p1', u'v20111018', u'mrsos']
Process Process-6:
Traceback (most recent call last):
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File ".//make_cmip5_xml.py", line 282, in pathToFile
if 'HadGEM2-AO' in model and experiment in ['historical','rcp26','rcp45','rcp60','rcp85']:
UnboundLocalError: local variable 'model' referenced before assignment
css01_scratch scan complete.. 46267 paths total; 25482 output files to be written (118 vars sampled)
Traceback (most recent call last):
File ".//make_cmip5_xml.py", line 711, in <module>
[css01_data_outfiles,css01_data_outfiles_paths,time_since_start,i1,i2,len_vars] = queue5.get_nowait()
File "<string>", line 2, in get_nowait
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/managers.py", line 774, in _callmethod
raise convert_to_error(kind, result)
Queue.Empty
Process Process-4:
Traceback (most recent call last):
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File ".//make_cmip5_xml.py", line 311, in pathToFile
queue1.put_nowait([data_outfiles,data_outfiles_paths,time_since_start,i1,i2,len_vars]) ; # Queue
File "<string>", line 2, in put_nowait
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/connection.py", line 308, in SocketClient
s.connect(address)
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
pathToFile - Exception: list index out of range /cmip5_css02/scratch/cmip5/output1/CCCma/CanESM2/jfyfe_special
Process Process-5:
Traceback (most recent call last):
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File ".//make_cmip5_xml.py", line 311, in pathToFile
queue1.put_nowait([data_outfiles,data_outfiles_paths,time_since_start,i1,i2,len_vars]) ; # Queue
File "<string>", line 2, in put_nowait
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/multiprocessing/connection.py", line 308, in SocketClient
s.connect(address)
File "/usr/local/uvcdat/2016-02-17/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
[durack1@oceanonly cmip5]$
From: "Anderson, Gemma Jayne"
Date: Friday, January 8, 2016 at 10:23 AM
To: Paul Durack
Subject: Soil Moisture
Hi Paul,
I was wondering if there is a way to get soil moisture as an output of the (amip/amipFuture/amip4xC02) cmip5 models or if you knew where I could find it? The variable is “mrso”.
Many thanks,
Gemma
Current duplicate removal is non-discriminatory - files contained in /data
subdirs should be prioritized
Current paths scanned include:
/cmip5_css01/data/cmip5/
/cmip5_css01/scratch/cmip5/
/cmip5_css02/data/cmip5/
/cmip5_css02/scratch/cmip5/
/cmip5_gdo2/data/cmip5/
/cmip5_gdo2/scratch/cmip5/
/cmip5_css02/cmip5/data/cmip5/
/cmip5_css02/scratch/_gc/
From: "Ames, Sasha"
Date: Thursday, April 21, 2016 at 4:32 PM
To: "Durack , Paul J."
Subject: Re: removal of IPSL and INM duplicates on gdo2
Hi Paul,
It turns out that much of the data didn’t delete because permissions weren’t set and its owned
by a user who left the lab. We’ll get that addressed. As there’s nothing left on that that’s
not duplicated anyway, I’d suggest just removing that mount from your list of cdscans.
Thanks,
Sasha
There are pathologies where the "final" directory contains problem files:
>>> (path,dirs,files) = os.walk('/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/abrupt4xCO2/mon/atmos/Amon/r1i1p1/v1/hus','false')
>>> path
('/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/abrupt4xCO2/mon/atmos/Amon/r1i1p1/v1/hus', ['bad1', 'bad0'], ['hus_Amon_FGOALS-s2_abrupt4xCO2_r1i1p1_185001-199912.nc'])
>>> files
('/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/abrupt4xCO2/mon/atmos/Amon/r1i1p1/v1/hus/bad0', [], ['hus_Amon_FGOALS-s2_abrupt4xCO2_r1i1p1_185001-199912.nc'])
>>> files != []
True
>>> dirs == []
False
>>> dirs
('/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/abrupt4xCO2/mon/atmos/Amon/r1i1p1/v1/hus/bad1', [], ['hus_Amon_FGOALS-s2_abrupt4xCO2_r1i1p1_185001-199912.nc'])
>>>
From: "Bonfils, Celine J. W" @bonfils2
Date: Monday, January 25, 2016 at 3:01 PM
To: "Durack , Paul J."
Cc: "Bonfils, Celine J. W", "Taylor, Karl Taylor" @taylor13
Subject: Re: stomatal conductance, AMIPFUTURE, sstClim4xCO2
Dear Paul,
We had a brainstorming session this morning with Karl, and it seems that we have a way to explore the vegetation response using sstClim and sstClim4xCO2 experiments.
Would it be possible to add the sstClim and sstClim4xCO2 outputs on the /work/cmip5/list?
I also see that "gpp" (photosynthesis) is a possible output. Could we add it as well for all AMIP/AMIPFuture/AMIP4xCO2/sstClim/sstClim4xCO2? That would be super helpful!
Thanks a lot, I know it is a lot to add. Let me know what how I can help.
Cheers,
Celine
PS: Karl, I think we can already compute the net Energy at TOA with: rsus, rsut, rlus and rout.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.