Coder Social home page Coder Social logo

xcube-cds's People

Contributors

forman avatar mattfung avatar pont-us avatar toniof avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

trellixvulnteam

xcube-cds's Issues

Remove constant-valued parameters from open params schema

Currently, the open parameter schemas include some optional parameters which only have one allowed value -- for instance crs in both ERA5 and Soil Moisture, which can only be WGS84. These should be removed from the schemas to make things easier for UI generation code.

ERA5 monthly means by hour of day: time not monotonically increasing

For the reanalysis-era5-land-monthly-means:monthly_averaged_reanalysis_by_hour_of_day dataset (and, probably, the similarly structuredreanalysis-era5-single-levels-monthly-means:monthly_averaged_ensemble_members_by_hour_of_day and reanalysis-era5-single-levels-monthly-means:monthly_averaged_reanalysis_by_hour_of_day), data are sometimes returned with the time not monotonically increasing, causing them to fail cube validation.

Setup.py requirements break xcube docker build

Description

When building xcube in docker using default xcube 0.6.1 Dockerfile, installation of xcube-cds plugin fails due to the xcube requirements entry in the setup.py file.

Perhaps the setup.py requirements should be commented out like the xcube-sh plugin

Info

Error message when running Docker build

source activate xcube && cd xcube-cds && python setup.py develop && sed "s/- xcube/# - xcube/g" -i environment.yml && mamba env update -n xcube
running develop
running egg_info
creating xcube_cds.egg-info
writing xcube_cds.egg-info/PKG-INFO
writing dependency_links to xcube_cds.egg-info/dependency_links.txt
writing requirements to xcube_cds.egg-info/requires.txt
writing top-level names to xcube_cds.egg-info/top_level.txt
writing manifest file 'xcube_cds.egg-info/SOURCES.txt'
reading manifest file 'xcube_cds.egg-info/SOURCES.txt'
writing manifest file 'xcube_cds.egg-info/SOURCES.txt'
running build_ext
Creating /opt/conda/envs/xcube/lib/python3.8/site-packages/xcube-cds.egg-link (link to .)
Adding xcube-cds 0.6.1.dev0 to easy-install.pth file

Installed /xcube/xcube-cds
Processing dependencies for xcube-cds==0.6.1.dev0
Searching for cdsapi>=0.2.7
Reading https://pypi.org/simple/cdsapi/
Downloading https://files.pythonhosted.org/packages/d6/9e/952b99737b2dfc56229306abdd8f353b9114480db29e379ad2a621b12e6b/cdsapi-0.4.0.tar.gz#sha256=e2fc7c06c18810b2dea38522e4d41470ab91607a155df7de6a6a8bceea90b39d
Best match: cdsapi 0.4.0
Processing cdsapi-0.4.0.tar.gz
Writing /tmp/easy_install-mfu_wykp/cdsapi-0.4.0/setup.cfg
Running cdsapi-0.4.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-mfu_wykp/cdsapi-0.4.0/egg-dist-tmp-u9y3tkvv
warning: no files found matching 'LICENSE'
warning: no files found matching '*.in' under directory 'tests'
Moving cdsapi-0.4.0-py3.8.egg to /opt/conda/envs/xcube/lib/python3.8/site-packages
Adding cdsapi 0.4.0 to easy-install.pth file

Installed /opt/conda/envs/xcube/lib/python3.8/site-packages/cdsapi-0.4.0-py3.8.egg
Searching for xcube>=0.5.0
Reading https://pypi.org/simple/xcube/
Couldn't find index page for 'xcube' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
No local packages or working download links found for xcube>=0.5.0
error: Could not find suitable distribution for Requirement.parse('xcube>=0.5.0')

Update ERA5 notebook with improved plots

The Soil Moisture demo notebook has far better-looking plots than its ERA5 counterpart (tailored orthographic projections with superimposed grid and coastlines). These improvements should be ported back to the ERA5 notebook.

Don't try to install conda-forge xcube in AppVeyor build

Co-ordinated releases between xcube and xcube-cds (and other plugins) are complicated by the versioning: xcube-cds 0.x typically depends on xcube 0.x, but specifying xcube >= 0.x in the environment.yml breaks the build until the xcube 0.x conda-forge package becomes available -- meaning that xcube 0.x has to be both released and conda-packaged before xcube-cds 0.x can be tested.

The xcube-cds AppVeyor config already goes part of the way to resolving this problem: it actually installs xcube from the head of the master branch. However, before doing this, it also installs its own environment -- which includes the latest (and possibly as yet non-existent) xcube version -- then removes the conda-installed xcube. It would make more sense to do the following:

  1. Clone the xcube repository, create an xcube conda environment (containing xcube's dependencies but not xcube itself), and install xcube directly from the repository.
  2. Create a modified xcube-cds environment.yml which doesn't contain xcube as a dependency.
  3. Update the xcube environment using this modified environment.yml.
  4. Install xcube-cds from the repository.

This ensures that all xcube-cds's transitive dependencies through xcube are satisfied but avoids trying to install xcube itself through conda.

Fix JSON Schema and dateutil deprecation warnings

Running the test suite produces the following warnings:

test/test_store.py: 16 warnings
  /home/pont/loc/envs/xcube-latest/lib/python3.7/site-packages/jsonschema/validators.py:931: DeprecationWarning: The types argument is deprecated. Provide a type_checker to jsonschema.validators.extend instead.
    validator = cls(schema, *args, **kwargs)

test/test_store.py: 39 warnings
  /home/pont/loc/envs/xcube-latest/lib/python3.7/site-packages/dateutil/rrule.py:476: DeprecationWarning: Using both 'count' and 'until' is inconsistent with RFC 5545 and has been deprecated in dateutil. Future versions will raise an error.
    "raise an error.", DeprecationWarning)

These should be fixed before the libraries in question upgrade them to errors.

Replace Travis CI configuration with an AppVeyor equivalent

Travis builds are no longer working due to their new pricing model, and DCS4COP is switching to AppVeyor for CI. The xcube-cds .travis.yml configuration therefore needs to be converted into an equivalent AppVeyor configuration, and AppVeyor needs to be configured to build automatically on pushes to this repository.

xcube gen ui shows end dates of 1970-01-01 for CDS Store datasets

When the CDS Store is used in the xcube gen UI, the default end date for the data is set to 1970-01-01 (before the start date!). This is because, for continually updating datasets, the CDS Store gives an end date of None (= null in JSON), with the intended semantics of ‘the present’. The gen UI, on the other hand, evidently interprets this None as ‘the Unix epoch’.

This is unfortunately an undefined point in the API specification: the store conventions document currently makes no mention of how a None value in time_range should be interpreted. To fix this bug, the following steps are necessary:

  1. Agree on the semantics of None values in time_range (for both start and end dates), or prohibit them entirely.
  2. Document the decision in the store conventions document.
  3. Update the CDS Store, the gen UI, or both, to conform to the clarified specification.

Switch from mamba to micromamba for AppVeyor builds

Micromamba now seems stable enough for everyday use in CI builds, and it doesn't need to be installed with conda. We should use this in the AppVeyor build configuration, since currently about 30% of the build time consists merely of installing miniconda, then using miniconda to install mamba!

Remove soil moisture subsetting feature

This issue is effectively the inverse of #28.

After further discussion, it has been decided that stores should not implement geographical subsetting for datasets which can't be subsetted via the store's backend. This is the case for the CDS Store's Soil Moisture dataset, which is not subsettable in the CDS API. Subsetting will instead be carried out by the gen2 feature, so the CDS Store should revert to offering only a fixed [-180, -90, 90, 180] bbox for soil moisture. Removal of the bbox parameter itself (along with other constant-valued open parameters) is covered in Issue #36.

Provide version number in xcube_cds.__version__

Currently the version number is only provided in xcube_cds.version.version, which is difficult for users to find. In Python, the standard location is <packagename>.__version__. xcube_cds should follow this standard by providing the version number in xcube_cds.__version__.

Implement minor API updates

  • Make DataDescriptor.data_vars a Mapping[str, VariableDescriptor] (currently it's List[VariableDescriptor]).
  • Rename constructor parameter cds_api_url to endpoint_url.

different result for variable names when using store.get_open_data_params_schema(data_id) or store.describe_data(data_id)

When I take a look at the variable names of a dataset via the xcube-cds plugin, I get different results when using store.get_open_data_params_schema(data_id) and
store.describe_data(data_id)

To reproduce:

from xcube.core.store import new_data_store

store = new_data_store('cds')
data_id = 'reanalysis-era5-single-levels-monthly-means:monthly_averaged_reanalysis'

store.get_open_data_params_schema(data_id)

image

store.describe_data(data_id)

image

Is this intentional? When I compare both ways of accessing data variables with xcube-sh plugin, the result is the same for both manners.

ERA5 monthly means by hour of day: data poorly structured

For the reanalysis-era5-land-monthly-means:monthly_averaged_reanalysis_by_hour_of_day dataset (and, probably, the similarly structuredreanalysis-era5-single-levels-monthly-means:monthly_averaged_ensemble_members_by_hour_of_day and reanalysis-era5-single-levels-monthly-means:monthly_averaged_reanalysis_by_hour_of_day), the data structure is more or less unchanged from what the CDS API returns. Unfortunately this is not a helpful structure for these data. Ideally, one would want each of the 24 possible "hour of day" values to be implemented as a channel -- that is, you can select any subset of x (>= 0, <= 24) hours and receive x parallel time series at monthly resolution, with the timestamps placed in the middle of each month.

Instead, what we get is a single time series with 24 hourly values on the first day of every month, representing the averages for that month for the corresponding hours. (These values aren't monotonically increasing -- see Issue #5 -- but this doesn't significantly complicate the task of restructuring the data.)

CDS credential parameters can't be passed to new_data_store

When attempting to use the CDS store via the xcube.core.store.new_data_store method:

> cds = new_data_store('cds', endpoint_url='https://cds.climate.copernicus.eu/api/v2', cds_api_key='12345:abcd1234-1234-5678-abcd-123456789abc')

...
ValidationError: Additional properties are not allowed ('cds_api_key', 'endpoint_url' were unexpected)
...

However, these parameters do work when supplied directly to the CDSDataStore constructor.

Soil moisture requests failing due to CDS API changes

The CDS server has recently changed the API for the soil moisture dataset: the valid version specifiers are no longer

'v201706.0.0', 'v201812.0.0', 'v201812.0.1', 'v201912.0.0'

but

'v201706', 'v201812', 'v201912', 'v202012',

This results in an error when an old-style version specifier is supplied (or when none is supplied and the default is used), and new-style version specifiers can't be used since they are checked against the old JSON Schema.

Steps to reproduce:

from xcube.core.store import new_data_store
cds = new_data_store('cds')
ds = cds.open_data(
    'satellite-soil-moisture:volumetric:monthly',
    time_range=['2015-01-01', '2015-12-31']
)

Result:

Exception: the request you have submitted is not valid. Value 'v201912.0.0' not valid for parameter 'version', valid values are: v201706, v201812, v201912, v202012.

The variable names have changed too.

Increase the required CDS API library version

Currently, the environment file for xcube-cds specifies cdsapi >=0.2.7. Newer versions are available: 0.3.1 has been out since November 2020, and 0.5.1 has just been released on GitHub -- I assume that a conda-forge release will follow shortly.

It's hard to determine exactly what improvements have been made since 0.2.7, since there's no changelog and the commit messages are very brief. But updating the minimum required version to 0.5.1 makes sense nevertheless, since it reduces the chance of incompatibility with the Copernicus server and simplifies the environment solving step during installation of the conda environment.

Add options to control output verbosity / warnings

In use, the CDS plugin writes a lot of logging information and warnings to the standard output and/or error (see attached screenshot). It should have verbose and/or suppress_warnings options (or similar) to allow these to be switched off.

image_2020_11_25T15_14_47_211Z

Soil Moisture: spurious "Request too large" errors

Consider, for example, the following request:

generated_cube = cds_store.open_data(
    'satellite-soil-moisture:volumetric:monthly',
    variable_names=['volumetric_surface_soil_moisture'],
    time_range=['2015-01-01', '2016-01-31']
)

This requests a modest three months at monthly resolution, but at time of writing results in the following error from the CDS API backend:

2021-04-12 13:53:46,053 ERROR Message: the request you have submitted is not valid
2021-04-12 13:53:46,054 ERROR Reason:  Request too large. Requesting 17856 items, limit is 12000

This request worked as recently as 2021-02-18, and the relevant parts of the CDS Store code have not changed since then, so evidently something has changed in the CDS server's implementation of its undocumented API.

Closer examination reveals that the actual request sent by the CDS store includes the following:

'time' = ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00', '23:00'],
'day' = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31']

This should not be there for a monthly average request. Presumably it was just ignored by previous versions of the CDS API. Removing the time parameter and setting day appropriately (according to the requested time-span and averaging period) would probably fix the bug.

Implement consistent syntax and semantics for time coverage attributes

Currently, the time_coverage_start and time_coverage_end attributes of the output dataset are produced by the combine_netcdf_time_limits method, which calculates them as the minimum and maximum, respectively, of the corresponding attributes in the set of NetCDF files downloaded from the CDS API. Unfortunately, both the syntax and semantics of the values returned by the CDS API appears to be inconsistent. In one soil moisture test, the time_coverage_start is 2014-12-31T12:00:00Z (corresponding to the start of the requested time period); in another, with different parameters, time_coverage_start is 19910805T000000Z -- not only is the syntax here a different dialect of ISO-8601, but the actual time represented has no relation to the requested period -- rather, it seems to be the coverage start for the entire product!.

In principle, it would not be difficult for the CDS store to recalculate the values of time_coverage_start and time_coverage_end directly from the data itself. The desired semantics are presumably the first of those mentioned above (i.e. covering the requested period rather than the whole product), but we should document this officially, and also decide:

  1. The syntax/format for these attributes.
  2. Whether the attributes are necessary at all. They don't seem to be part of the CF conventions. If they're not needed, the simplest solution would be to erase them.

Extract archives more safely

Pull request #64 applies an automated fix for CVE-2007-4559, an archive unpacking vulnerability in tarfile's extractall method. There is not much danger in the contexts in which this is called in xcube-cds, but it would be as well to implement the changes. The PR isn't suitable for merging as-is, mainly due to code duplication. For each of the three calls to extractall in the codebase (in request-one-var-per-file.py, satellite_sea_ice_thickness.py, and satellite_soil_moisture.py), it suggests the following replacement:

def is_within_directory(directory, target):
    abs_directory = os.path.abspath(directory)
    abs_target = os.path.abspath(target)
    prefix = os.path.commonprefix([abs_directory, abs_target])
    return prefix == abs_directory

def safe_extract(tar, path=".", members=None, *, numeric_owner=False):
    for member in tar.getmembers():
        member_path = os.path.join(path, member.name)
        if not is_within_directory(path, member_path):
            raise Exception("Attempted Path Traversal in Tar File")
    tar.extractall(path, members, numeric_owner=numeric_owner) 

safe_extract(tf, args.output_dir)

This, or something similar, should be factored out into a utility function and called at the appropriate points.

ValidationError: '5 is greater than the maximum of -180' when requesting a cube with spatial subset for several datasets

When requesting a cds cube with a spatial subset, an Validation Error occurs for some dataset ids: ValidationError: '5 is greater than the maximum of -180'
this happens for the following dataset ids:

  • satellite-soil-moisture:saturation:daily
  • satellite-soil-moisture:saturation:10-day
  • satellite-soil-moisture:saturation:monthly
  • satellite-soil-moisture:volumetric:daily
  • satellite-soil-moisture:volumetric:10-day
  • satellite-soil-moisture:volumetric:monthly

to reproduce:

from xcube.core.store import find_data_store_extensions
from xcube.core.store import get_data_store_params_schema
from xcube.core.store import new_data_store

store = new_data_store('cds')

dataset = store.open_data('satellite-soil-moisture:saturation:daily', 
                          variable_names=['soil_moisture_saturation'], 
                          bbox=[5, 35, 6, 36], 
                          spatial_res=0.25, 
                          time_range=['1978-11-01', '1978-11-06'])
dataset

The stacktrace:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-6-82c0cb0543ee> in <module>
      3                           bbox=[5, 35, 6, 36],
      4                           spatial_res=0.25,
----> 5                           time_range=['1978-11-01', '1978-11-06'])
      6 dataset

~/Desktop/projects/xcube-cds/xcube_cds/store.py in open_data(self, data_id, opener_id, **open_params)
    768         self._assert_valid_opener_id(opener_id)
    769         self._validate_data_id(data_id)
--> 770         return super().open_data(data_id, **open_params)
    771 
    772     ###########################################################################

~/Desktop/projects/xcube-cds/xcube_cds/store.py in open_data(self, data_id, **open_params)
    417 
    418         schema = self.get_open_data_params_schema(data_id)
--> 419         schema.validate_instance(open_params)
    420         handler = self._handler_registry[data_id]
    421 

~/Desktop/projects/xcube/xcube/util/jsonschema.py in validate_instance(self, instance)
    101                             format_checker=jsonschema.draft7_format_checker,
    102                             # We have to explicitly sanction tuples as arrays.
--> 103                             types=dict(array=(list, tuple)))
    104 
    105     def to_instance(self, value: Any) -> Any:

~/miniconda3/envs/xcube/lib/python3.7/site-packages/jsonschema/validators.py in validate(instance, schema, cls, *args, **kwargs)
    932     error = exceptions.best_match(validator.iter_errors(instance))
    933     if error is not None:
--> 934         raise error
    935 
    936 

ValidationError: 5 is greater than the maximum of -180

Failed validating 'maximum' in schema['properties']['bbox']['items'][0]:
    {'maximum': -180, 'minimum': -180, 'type': 'number'}

On instance['bbox'][0]:
    5

Allow API URL and key to be passed directly to store and opener

Currently, xcube-cds does not allow direct specification of the CDS API URL and key; the cdsapi Client class instead reads them directly from a configuration file or environment variables. It should be possible to pass the API URL and key directly to the CDSDataOpener and CDSDataStore constructor in cases where the configuration file and environment variable settings are not present or need to be overridden.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.