The 34e-mngmt from roocs

Check that `rook` can respond with meta-link for multiple output files

As subject.

Standardise bounding box order

Maybe use same as CDS, but also consider that if we are using GEO stack then should we use most common GIS standard.

NOTE: our client library can easily map from CDS representation to our own standard.

Create "character" index for C3S CMIP5

Create an ElasticSearch index of the "character" of the C3S "data pool" of CMIP5 data.

The index should record all the "characteristics" identified as being important at the level of the ESGF Dataset.

Output dataset gets the wrong name - get_main_variable needs attention

http://rook-wps1.ceda.ac.uk/wps?service=WPS&version=1.0.0&request=Execute&identifier=subset&status=true&storeExecuteResponse=true&DataInputs=collection=c3s-cmip5.output1.ICHEC.EC-EARTH.historical.day.atmos.day.r1i1p1.tas.latest;time=1860-01-01/1860-12-30

Create example repositories with cookiecutter - discuss best option

Try the default cookie cutter.
Try the CEDA cookie cutter:
https://github.com/cedadev/cookiecutter-pypackage

Should we write a weekly blog?

Create scalable solution for data-chunking

Input data chunking:

See: roocs/clisops#27
Implement the suggested solution using dask.
@ellesmith88: Run tests with the memory-profiler package to check that the memory limit is being respected
- Create a unit test in clisops that uses the memory profiler and captures the memory usage
  - it should then check the memory did not (greatly) exceed the dask chunk limit

Output data chunking:

Need a setting for the size limit for output netCDF files - or a function to manage it
Prototype file-naming rules for naming the output files:
- Default: output_001.nc, output_002.nc etc
- Renamer rules: for CMIP5
Fully implement file-namer:
- Integrate file-naming with templates from CONFIG files (in roocs-utils config):
- Add CMIP6 and CORDEX - and add more tests to check they work
Modify subset in clisops and daops to use new solution.

Plan how we demo on Friday

deploy rook service on the old compute node at DKRZ [CE]
demo the test notebooks in binder [AS]
demo real data usage via browser [AS]
test some large dataset extractions (through browser) - to confirm that multi-file outputs are possible [ES]
- record the examples for us to re-use

See:
https://docs.google.com/document/d/1sV2yuogYSBjrZW_wOaw9_oseQ_bta3DLghrdobE77bM/edit#heading=h.3ocfqbf91lhb

Use common logging approach like that in `clisops/init.py`

See: https://github.com/roocs/daops/blob/master/daops/__init__.py

Can we just import from clisops import logging?

Remove default values in rook subset process

We have defaults hard-coded in rook subset process:

https://github.com/roocs/rook/blob/master/rook/processes/wps_subset.py

These need to be removed for area, time and level. You cannot set them to None - it breaks the code.

So I think the best fix would be to set the defaults to '' and then parse them later with:

        kwargs = parameterise.parameterise(collection=collection,
                                           time=request.inputs['time'][0].data or None,
                                           level=request.inputs['level'][0].data or None,
                                           area=request.inputs['area'][0].data or None)

Create system for automating releases

Assuming that master is ready on all our repositories:

Create a new library from cookie-cutter template
Push to github repository: cedadev/release-maker
In the repo, write a simple package that will:
1. Read a YAML manifest file for each release collection.
2. For each release in the collection record:
- package repo name/location
- tag name
- release name
- release notes
1. Parse the manifest and validate it
2. Publish releases to GitHub
3. Publish releases to PyPI
Pull each package from github, run bumpversion to update.
- bumpversion should create git commit
- update the HISTORY.md file and any references to version
- on push to github:
  - initially, when testing with dummy releases: use PyPI API to publish
  - once tested, use: Travis-CI could run tests and on-success: push to PyPI
Base this on work in: https://github.com/agstephens/api-playground
- Use a github access token to authenticate to github
- Use a .pypirc file to authenticate to PyPI
Need to agree the YAML format, keep it simple
Put the history of YAML files in: roocs-utils/roocs-releases/, named like:
- roocs-suite-release-<YYYY>-<MM>-<DD><x>.yml - where <x> is a letter used if more than one in a single day
Update all our requirements.txt files to use actual releases.
Update all our packages to include a requirements_dev.txt file that just lists our packages by name (and we can install them with pip -e ... when developing.

In rooki, metalink will cache files - how to cope with multiple files with same name and different content

Set up GitHub presence

Create "roocs" GitHub organisation because "csorr" was taking. "roocs" is nominally "Robust Operations On Climate Simulations" – but it is just a useful short namespace to hang our work off.
Create a Project: https://github.com/orgs/roocs/projects/1
Invite all internal people to it.
Create initial repos
Create wiki page to describe how we will manage the project
- Using the project:
  - All issues from all "roocs" repositories should be included in the project.
  - If we depend on an issue in an external repository then we should create an issue in our "roocs" project and link it to the external issue.
  - Technical discussions: where to document them?
  - Detailed technical discussions should take place in GitHub issues (not in Slack).
- Code management:
  - Use PRs to enforce some level of code review
  - Use Codacy on GitHub repositories (for automated checking of errors)
  - Mandate use of PEP8 or Black
- Code testing:
  - Use tests in Birdhouse WPS
  - Most testing will be in daops library:
    - Use Pytest fixtures to parametrise tests.
    - Re-use daops test suite but injecting real data when we want to run in OCI
  - Minimum of 80% code coverage in unit tests
- Documentation:
  - Need good, clear public documentation in GitHub Pages.
  - Need examples of usage of each tool/library
  - Encourage inclusion of Jupyter Notebooks to demonstrate usage
Share the info with people
Put in initial tasks in as discussed in meeting

Load-test with python locust - to test service resilience

We should do some testing of the production web service to test how it stands up to receiving many requests. These requests should not be data requests, they should poll quick end-points such as:

GetCapabilities
DescribeProcess

We could use Locust to do this:

https://locust.io/

An example of using it for one of our services is here:

https://github.com/glamod/glamod-wfs-benchmarker

Implement or deploy appropriate Exception raises so that client gets info back

E.g.:

http://rook-wps1.ceda.ac.uk/wps?service=WPS&version=1.0.0&request=Execute&identifier=subset&storeExecuteResponse=false&status=false&datainputs=collection=c3s-cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.tas.latest;time=2008-01-01/2015-01-01;area=-30,-40,125,20

Error is:

ows:ExceptionTextProcess error: method=wps_subset.py._handler, line=80, msg=Input longitude bounds ([-30. 125.]) cross the 0 degree meridian but dataset longitudes are all positive.</ows:ExceptionText>

Consider argument names

"time": GOOD
"space" or "bbox": use OGC convention
"level": ?
"data_ref": need to consider whether "data_ref(s)", "resource(s)", "data" or other is best terminology.

Open access to CDS and DKRZ

Request IP list/ranges from DKRZ and ECMWF
liaise with SCD to check service
need to write service documentation
Modify iptables firewall settings to allow access
Request that site firewall is opened up to allow access
Check connectivity

Deployment at DKRZ, with Load-balanced front-end (Route55)

Set up a deployment at DKRZ to match that at CEDA
Re-configure the Route55 load-balancer to provide the external front-end to the service (if required)

Not needed at present so closing.

Investigate _FillValues in output files

Input file:

ncdump -h   /group_workspaces/jasmin2/cp4cds1/vol1/data/c3s-cmip5/output1/MOHC/HadGEM2-ES/rcp85/3hr/atmos/3hr/r1i1p1/vas/files/20111002/vas_3hr_HadGEM2-ES_rcp85_r1i1p1_202512010300-203012010000.nc | grep _FillValue
                vas:_FillValue = 1.e+20f ;

Output files:

ncdump -h vas_3hr_HadGEM2-ES_rcp85_r1i1p1_20280101-20301023.nc | grep _FillValue
                height:_FillValue = NaN ;
                lat:_FillValue = NaN ;
                lon:_FillValue = NaN ;
                lat_bnds:_FillValue = NaN ;
                lon_bnds:_FillValue = NaN ;
                vas:_FillValue = 1.e+20f ;

We need to check whether this might be an issue. It is only adding _FillValue attributes where they don't exist in the input.

Set up GeoHealthCheck service at DKRZ

Things To Do:

Enable overriding of project-specific configs via environment variable

Gather all the existing content that might live in the INI files:
- most of dachar/utils/options.py,
- maybe some of dachar/config.py
- roocs_utils/config.py
- stuff in settings.py?
Place the project-specific variables into a sub-directory of the python package, e.g. "roocs_utils/etc/roocs.ini" file.
Update the setup.py to include line like: package_data={'roocs_utils': ['etc/roocs.ini']},
Allow the user to specify the location of their own INI file(s), based on the content of the env var ROOCS_CONFIG which is a colon-separated string of paths to config files.

Write a document outlining the characterisation investigations

Write a document (in Markdown format - as used in GitHub) called:

Investigations into Characterisation of Data sets

Include the following sections:

Overview - i.e. what we are trying to achieve and why
Scope of investigation - limitations and focus of this investigation
Results - what we found out in the data
Analysis and proposed approach - a summary of which characteristics we need to extract

Server reboot does not restart all the services

Reboot sorted out my problems, but also needed to restart everything:

wps1:
munge
slurmctld
nginx
rook:

batch[123]:
munge
slurmd

Also /run/pywps is removed on wps1.

Need to test and fix all this.

Analyse content of C3S CMIP5 "character" index

Develop a tool to work through each dimension of the multi-dimensional structure of the CMIP5 data pool. The tool should analyse each dimension and identify outliers/anomalies in each case, and report them.

Write the service documentation

Done :-)

Can we roll this dataset?

Let's test out how easy it would be to:

Transform the box, or
Use xr.roll on the ds before we send it to clisops.ops.core

Example:

http://rook-wps1.ceda.ac.uk/wps?service=WPS&version=1.0.0&request=Execute&identifier=subset&storeExecuteResponse=false&status=false&datainputs=collection=c3s-cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.tas.latest;time=2008-01-01/2015-01-01;area=-30,-40,125,20

Error is:

ows:ExceptionTextProcess error: method=wps_subset.py._handler, line=80, msg=Input longitude bounds ([-30. 125.]) cross the 0 degree meridian but dataset longitudes are all positive.</ows:ExceptionText>

Set up a `daops-tester` repository using Cookiecutter

As empty repo initially, with same ability to use/override roocs.ini.

Create dachar repository

Xarray sets "_FillValue" = NaN for all coordinates. This is not needed so try to avoid.

No issue so closing.

Test chunking - if `read` chunks are smaller than `write` chunk, will it result in correct output?

Check that dask chunking using auto is working.
Check that the output file size/slice can exceed the chunked dataset and that the data will still write correctly to NetCDF

Testing and monitoring the service and repos

Setup notebook for with rooki to runs checks against demo service (using C/I).

example notebooks in rooki (with real IPs?)
could use PAVICS system
or just call pytest (with extension for notebooks)
via cron or running on Travis C/I

Unify APIs based on updates to clisops

clisops:

def subset(
    ds,
    time=None,
    area=None,
    level=None,
    output_dir=None,
    output_type="netcdf",
    split_method="time:auto",
    file_namer="standard")

daops:

def subset(
    collection,
    time=None,
    area=None,
    level=None,
    output_dir=None,
    output_type="netcdf",
    split_method="time:auto",
    file_namer="standard")

rook:

def subset(
    collection,
    time=None,
    area=None,
    level=None)

Review all examples we can find of problematic "character"

Look into all repositories/tools that capture problems/issues in CMIP5, CORDEX and CMIP6.

Record all characteristics in the files/datasets that are responsible for causing errors.
Categorise all the errors in terms of the characteristics that we need to record
Document a list of all characteristics that our characterisation store should record (e.g.: shape of main variable)

CDS API - basic overview

CDS Team would like to see:

import esgf_wps

subset = {
    'name': 'subset',
    'arguments': {
        'period': '195001-190012',
        'area': [-10, -10, 10, 10]}
}

average = {
    'name': 'average',
    'arguments': {
         'axes': ['time', 'latitude']}
}

retrieve = {
    'name': 'retrieve',
    'arguments': {
        'ensemble_member': 'r6i1p2',
        'format': 'zip',
        'model': 'giss_e2_h',
        'variable': 'geopotential_height',
        'experiment': 'historical'}
}

request =   

result = esgf_wps.orchestrate(
    'projections-cmip5-monthly-pressure-levels',
   {
    '_service': (average, 
           {'_service': (subset,
               {'_service': (retrieve,  ')}
           }
    },
    'download.zip')

Note: Currently we have:

import cdsapi

c = cdsapi.Client()

c.retrieve(
    'projections-cmip5-monthly-pressure-levels',
    {
        'ensemble_member': 'r6i1p2',
        'format': 'zip',
        'model': 'giss_e2_h',
        'variable': 'geopotential_height',
        'experiment': 'historical',
        'period': '185001-190012',
    },
    'download.zip')

Implement "split_method" parameter in APIs

How should `split_method` look in the APIs?

all options (for rook, daops and clisops):
time:auto [DEFAULT]
time:month - split output files by month
time:year - split output files by year
time:decade - split output files by decade

Default of "time:auto" is defined in section clisops:write of config.

We need to ensure that time information recorded in our characterisation includes the calendar.
Start and end date.
Length of time axis.
When we have read in a multi-file dataset we need to check that the resulting Dataset does not include multiple time definitions - which would indicate inconsistencies in the time definition between files.
Not sure if we want to capture/estimate the frequency.

Plan to push all data requests through WPS (if agreed)

Would need to carefully farm out connections to download server(s) to manage bandwidth.

Create an inventory of data sets

First idea for an inventory:

$ more c3s-cmip5.yml
- basedir: /group_workspaces/jasmin2/cp4cds1/vol1/data
  project: c3s-cmip5

- path: c3s-cmip5/output1/MOHC/HadGEM2-ES/rcp45/day/atmos/day/r1i1p1/prsn/v20111128
  dsid: c3s-cmip5.output1.MOHC.HadGEM2-ES.rcp45.day.atmos.day.r1i1p1.prsn.v20111128
  var_id: prsn
  array_dims: time lat lon
  array_shape: 3600+ 145 192
  time: 2005-12-01T12:00:00 2015-11-30T12:00:00
  area: -180 90 180 -90
  facets: 
     - model: ...
     - experiment: ...

Create a new repository called daops-tester from a cookie-cutter template
Create a simple inventory of our "c3s-cmip5" holdings
Create a library that can:
- select a dataset
- based on the shape and dims information:
- decide which type of subset to extract
- extract a subset through daops
- analyse the subset to check that it complies with subset specifier
Log everything so that we can analyse the outputs
Build a LOTUS layer to submit all the jobs to our batch system.

Improve documentation for rooki users

Update existing notebooks

Tighten up restrictions on deployment system

set password for db user in postgres and redeploy
limit postgres server access in iptables
limit postgres server access in pg_hba.conf
limit postgres server access in postgresql.conf
limit server access to slurm and postgres in iptables

Create an API to interrogate the "character" index

Create a Python API that can be used to interrogate the analyses from the "characterisation" process.

roocs / 34e-mngmt Goto Github PK

34e-mngmt's People

Contributors

Watchers

34e-mngmt's Issues

How should split_method look in the APIs?

Recommend Projects

Recommend Topics

Recommend Org

How should `split_method` look in the APIs?