roocs / 34e-mngmt Goto Github PK
View Code? Open in Web Editor NEWManagement of 34e
License: BSD 2-Clause "Simplified" License
Management of 34e
License: BSD 2-Clause "Simplified" License
As subject.
Maybe use same as CDS, but also consider that if we are using GEO stack then should we use most common GIS standard.
NOTE: our client library can easily map from CDS representation to our own standard.
Create an ElasticSearch index of the "character" of the C3S "data pool" of CMIP5 data.
The index should record all the "characteristics" identified as being important at the level of the ESGF Dataset.
Try the default cookie cutter.
Try the CEDA cookie cutter:
https://github.com/cedadev/cookiecutter-pypackage
Input data chunking:
memory-profiler
package to check that the memory limit is being respected
Output data chunking:
subset
in clisops and daops to use new solution.See: https://github.com/roocs/daops/blob/master/daops/__init__.py
Can we just import from clisops import logging
?
We have defaults hard-coded in rook subset process:
https://github.com/roocs/rook/blob/master/rook/processes/wps_subset.py
These need to be removed for area
, time
and level
. You cannot set them to None
- it breaks the code.
So I think the best fix would be to set the defaults to ''
and then parse them later with:
kwargs = parameterise.parameterise(collection=collection,
time=request.inputs['time'][0].data or None,
level=request.inputs['level'][0].data or None,
area=request.inputs['area'][0].data or None)
Assuming that master
is ready on all our repositories:
cedadev/release-maker
bumpversion
to update.
HISTORY.md
file and any references to version
.pypirc
file to authenticate to PyPIroocs-utils/roocs-releases/
, named like:
roocs-suite-release-<YYYY>-<MM>-<DD><x>.yml
- where <x>
is a letter used if more than one in a single dayrequirements.txt
files to use actual releases.requirements_dev.txt
file that just lists our packages by name (and we can install them with pip -e ...
when developing.daops
library:
daops
test suite but injecting real data when we want to run in OCIWe should do some testing of the production web service to test how it stands up to receiving many requests. These requests should not be data requests, they should poll quick end-points such as:
We could use Locust to do this:
An example of using it for one of our services is here:
E.g.:
Error is:
ows:ExceptionTextProcess error: method=wps_subset.py._handler, line=80, msg=Input longitude bounds ([-30. 125.]) cross the 0 degree meridian but dataset longitudes are all positive.</ows:ExceptionText>
Not needed at present so closing.
Input file:
ncdump -h /group_workspaces/jasmin2/cp4cds1/vol1/data/c3s-cmip5/output1/MOHC/HadGEM2-ES/rcp85/3hr/atmos/3hr/r1i1p1/vas/files/20111002/vas_3hr_HadGEM2-ES_rcp85_r1i1p1_202512010300-203012010000.nc | grep _FillValue
vas:_FillValue = 1.e+20f ;
Output files:
ncdump -h vas_3hr_HadGEM2-ES_rcp85_r1i1p1_20280101-20301023.nc | grep _FillValue
height:_FillValue = NaN ;
lat:_FillValue = NaN ;
lon:_FillValue = NaN ;
lat_bnds:_FillValue = NaN ;
lon_bnds:_FillValue = NaN ;
vas:_FillValue = 1.e+20f ;
We need to check whether this might be an issue. It is only adding _FillValue
attributes where they don't exist in the input.
Things To Do:
Gather all the existing content that might live in the INI files:
dachar/utils/options.py
,dachar/config.py
roocs_utils/config.py
settings.py
?Place the project-specific variables into a sub-directory of the python package, e.g. "roocs_utils/etc/roocs.ini" file.
Update the setup.py
to include line like: package_data={'roocs_utils': ['etc/roocs.ini']},
Allow the user to specify the location of their own INI file(s), based on the content of the env var ROOCS_CONFIG
which is a colon-separated string of paths to config files.
Write a document (in Markdown format - as used in GitHub) called:
Investigations into Characterisation of Data sets
Include the following sections:
Reboot sorted out my problems, but also needed to restart everything:
wps1:
munge
slurmctld
nginx
rook:
batch[123]:
munge
slurmd
Also /run/pywps
is removed on wps1.
Need to test and fix all this.
Develop a tool to work through each dimension of the multi-dimensional structure of the CMIP5 data pool. The tool should analyse each dimension and identify outliers/anomalies in each case, and report them.
Done :-)
Let's test out how easy it would be to:
xr.roll
on the ds
before we send it to clisops.ops.core
Example:
Error is:
ows:ExceptionTextProcess error: method=wps_subset.py._handler, line=80, msg=Input longitude bounds ([-30. 125.]) cross the 0 degree meridian but dataset longitudes are all positive.</ows:ExceptionText>
As empty repo initially, with same ability to use/override roocs.ini
.
No issue so closing.
auto
is working.Setup notebook for with rooki to runs checks against demo service (using C/I).
clisops
:
def subset(
ds,
time=None,
area=None,
level=None,
output_dir=None,
output_type="netcdf",
split_method="time:auto",
file_namer="standard")
daops
:
def subset(
collection,
time=None,
area=None,
level=None,
output_dir=None,
output_type="netcdf",
split_method="time:auto",
file_namer="standard")
rook
:
def subset(
collection,
time=None,
area=None,
level=None)
Look into all repositories/tools that capture problems/issues in CMIP5, CORDEX and CMIP6.
shape
of main variable)CDS Team would like to see:
import esgf_wps
subset = {
'name': 'subset',
'arguments': {
'period': '195001-190012',
'area': [-10, -10, 10, 10]}
}
average = {
'name': 'average',
'arguments': {
'axes': ['time', 'latitude']}
}
retrieve = {
'name': 'retrieve',
'arguments': {
'ensemble_member': 'r6i1p2',
'format': 'zip',
'model': 'giss_e2_h',
'variable': 'geopotential_height',
'experiment': 'historical'}
}
request =
result = esgf_wps.orchestrate(
'projections-cmip5-monthly-pressure-levels',
{
'_service': (average,
{'_service': (subset,
{'_service': (retrieve, ')}
}
},
'download.zip')
Note: Currently we have:
import cdsapi
c = cdsapi.Client()
c.retrieve(
'projections-cmip5-monthly-pressure-levels',
{
'ensemble_member': 'r6i1p2',
'format': 'zip',
'model': 'giss_e2_h',
'variable': 'geopotential_height',
'experiment': 'historical',
'period': '185001-190012',
},
'download.zip')
split_method
look in the APIs?Default of "time:auto" is defined in section clisops:write
of config.
The initial deployment will be on port 80.
This has been updated in the playbook.
Take the current version of daops
library, add in fix
functions to apply fixes for the character of specific datasets. Also write tests and run them.
Draw a diagram of the software architecture needed to scan lots of CMIP5 ESGF data sets (in parallel on LOTUS), and capture the character in an appropriate format/database.
Capture and log any exceptions when we cannot read a dataset and/or cannot capture a specific characteristic.
Points to consider regarding the capture of time information:
time
information recorded in our characterisation includes the calendar
.Would need to carefully farm out connections to download server(s) to manage bandwidth.
First idea for an inventory:
$ more c3s-cmip5.yml
- basedir: /group_workspaces/jasmin2/cp4cds1/vol1/data
project: c3s-cmip5
- path: c3s-cmip5/output1/MOHC/HadGEM2-ES/rcp45/day/atmos/day/r1i1p1/prsn/v20111128
dsid: c3s-cmip5.output1.MOHC.HadGEM2-ES.rcp45.day.atmos.day.r1i1p1.prsn.v20111128
var_id: prsn
array_dims: time lat lon
array_shape: 3600+ 145 192
time: 2005-12-01T12:00:00 2015-11-30T12:00:00
area: -180 90 180 -90
facets:
- model: ...
- experiment: ...
Now implemented.
daops-tester
from a cookie-cutter templateUpdate existing notebooks
Create a Python API that can be used to interrogate the analyses from the "characterisation" process.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.