dcs4cop / xcube Goto Github PK

View Code? Open in Web Editor NEW

164.0 164.0 17.0 362.17 MB

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.

Home Page: https://xcube.readthedocs.io/

License: MIT License

Python 99.77% Dockerfile 0.08% Shell 0.04% HTML 0.10%

xcube's People

Contributors

Stargazers

Watchers

Forkers

cyanoalert dzelge cefasrepres achtsnits digitaltopo rabaneda manzt sfoucher micder chazgorman henrykobin psaile ymoisan echoxiangzhou edd3x kpapap

xcube's Issues

Store cubegen parameterisation

Store all xcube-genl2c, -genl3 parameters and source file names in metadata. Use CF fields history and sources.

Clamping of values for bbox in xcube-grid

In xcube-grid the user passes a bounding box which is then adjusted to a grid of a certain resolution. Currently no exception is thrown when passing invalid longitudes or latitudes.
Possibility:
Setting a flag which allows the user to decide whether to clamp the values of the bounding box, set by default to false.

Important: Consider the use case where a bounding box is placed across the antimeridian.

Add possibility of usage of generic configuration files for cube generation

Include the possibility to use a generic configuration file which is then combined with a file containing specific parameters for the generation of a current cube. A solution which allows to include parallelization outside of the xcube generation is desirable.

Cannot load plugins v.2

Describe the bug
Plugin loading only triggered by executing xcube.cli. When testing the xcube gen plugins, they need the plugins loaded when executing xcube.api.gen.gen. Otherwise they observe an import error.

xcube plugin loading shall only occur when xcube code is executed, not while its modules are imported: at the moment this only happens when code of xcube.cli is executed, but would be needed whithin xcube.api.gen.gen as well.

This issue is related to #49.
This issue is related to closed issue #62

Allow xcube serve to use SNAP specific color maps

The user wants a non linear color display, to map specific values with a predefined color.
This could be solved through making xcube serve able to import SNAP specific color maps provided by the user.

Possibility of logarithmic scale for mapping

The possibility of mapping to colors of a certain variable or many variables using a logarithmic should be included. This is particularly important for parameters like chlorophyll.

Move specific `xcube gen` input processors into separate repos

xcube gen input processor implementations are often very specific with respect to the supported data input format. Some of them require physical datasets for testing to be included in the sources. xcube on the other hand provides a generic API and CLI and does not rely on physical test datasets.

Input processors are also developed by different development teams and should therefore have different repositories with own responsibilities and issues.

Move specific input processors should be moved into separate repositories.

QC tools for data cubes

We need some CLI commands and API function that help performing basic QC and data cubes:

CLI: xcube verify <path> to verify a cube has a valid structure, later may also xcube validate <path> <ref-data> for actual content QC
Validate value ranges, all empty, illegal non-monotonic increasing coordinates (time!)
API:
- function to validate a dataset and generate a validation report validate_cube(dataset)
- function to assert that a given dataset is a valid data cube: assert_cube(dataset). To be used in other API where valid cubes are expected as input)

Extending help section after changing to click

The help section is incomplete, not showing the user the choice options to choose from at e.g. --proc. Needs to be added to general help section.

Allow xcube server to have any URL prefix

Is your feature request related to a problem? Please describe.
We currenty always include version number in URL. This makes it harder to make changes during development as clients need to be reconfigured on version changes.

Describe the solution you'd like

Example:

xcube serve --prefix "api/dev/latest" ....

xcube serve --prefix "dcs4cop/api/${version}" ....

CLI for temporal aggregation (Level-3 cubes)

Is your feature request related to a problem? Please describe.

Temporal aggregation can be time consuming when done on the fly, e.g. from xcube server config or notebooks. Therefore a tool is needed that persists a time aggregated cube.

Describe the solution you'd like

New (click) CLI command, API already exists.

Restructure and clarify code base

One main CLI "xcube", many sub-commands

Simplify top-level structure:

api/ - data cubes API
cli/ - data cubes CLI
util/ - framework and implementation helpers
version.py

In detail:

grid/ --> api/grid.py, cli/grid.py
genl2c/ --> api/gen/, cli/gen.py
genl3/ --> api/tagg.py,  cli/tagg.py

cli.py --> cli/
config.py --> util/
constants.py --> util/
dsio --> util/
dsutil --> util/,  api/
expression.py --> util/expression.py
maskset.py --> util/
objreg.py --> util/
reproject.py --> api/reproj.py
types.py --> DEL
version.py --> OK

Format the units displayed on legend

The format of the units displayed on the legend are not well readable eg. mg m^⁻3 .
This should look at least like mg*m^-3.

xcube gen to force chronological order

Is your feature request related to a problem? Please describe.

xcube gen can only append time slices. But we also need to prepend and/or insert time slices into existing cubes so we maintain chronological order.

Describe the solution you'd like

Add new xcube gen option --forcechron.

Describe alternatives you've considered

See #47.

Additional context

See

EDIT

Default input processor

We currently must specify a class name for the input processor to be used for each input dataset.

If datasets are already in xcube's "standard format", that is:

Have dimensions lat, lon, optionally time of length 1;
have coordinate variables lat[lat], lon[lat], time[time] (opt.), time_bnds[time, 2] (opt.);
have any data variables of form <var>[time, lat, lon] or <var>[lat, lon] if time coordinate variable is missing;
have global attribute pair time_coverage_start, time_coverage_end (or pair time_start, time_stop, and others) if time variable coordinate is missing,

then we could use a default input processor that would be configurable w.r.t. the variables to be processed, and which mask to apply, etc.

Tool for cube rechunking and compression

A tool is needed that reads an existing datacube and write a new cube with identical data but with different data chunking and compression.

This is similar to #14 , but allows applying new chunking and compression to existing cubes.

xcube gen generates duplicates in time and unsorted time dimension for S2+

Describe the bug
xcube gen generates two identical time slices with S2+ plugin. In the input data there is only one input file. In addition, the time dimension is not sorted.

To Reproduce
The cube has been produced in xcube-gen with the the following command.
nohup xcube gen --append -v CHL_GILERSON2010_GLOBAL,KdPAR,SPM_VITOnir_SCHELDT,TUR_NECHAD2009_GLOBAL_665 -c /home/xcube/projects/xcube-services/dcs4cop/xcube-gen-configs/dcs4cop-gen_BC_config_S2.yml /data/EOdata/related/DCS4COP/cube_input/VITO/2017//.nc > nohup.out &
The resulting zarr file is still on xcube-gen at /home/xcube/xcube-output/ dcs4cop-bc-s2-sns-l2c-v1.zarr
Note that this issue might be related to the fact that the generation has been interrupted and started again with the same output name.

Screenshots

Changing the parameters by which the area of interest and resolution of the cube is selected

Instead of providing a bounding box and the width and height to define the resolution of the grid, only the starting point of the area should be provided together with the resolution together with the width and height.

Making data cube generation robust against paused process

When the process of appending is stopped, there is no way to know, at which point the process was stopped. Therefore one has to restart the cube generation, in order not to have duplicates in the data cube. It would be nice to have a robust solution, which checks in the existing data cube, whether an input file has been already used for appending the data cube or not. If the xcube generator recognizes an input file it proceeds to the next one, until an input file is found which was not used for appending the data cube.

Add time coverage info

Make sure time_coverage_start, time_coverage_end, time_coverage_units, time_coverage_calendar are set and correct in global attributes. See DataDiscoveryAttConvention

yaml.load results in error when using config file for xcube gen

Describe the bug
When trying to use xcube gen with a config file, an error message appears:

calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config_dict = yaml.load(fp)

The solution is to change xcube/api/gen/config.py line 57 into config_dict = yaml.safe_load(fp)

Rename input processor to input transformer

The term "input processor" is confusing for EO-scientists as it makes them believe, some thematic EO data products are generated in this step.

It has been agreed, the term "input transformer" better describes the nature of this components, namely to prepare a (L1 or L2 EO data file) input for appending it to a data cube, which usually requires time stamp identification, variable selection, masking, and reprojection of data to target SRS.

Therefore

rename class InputProcessor to InputTransformer
rename its method process to transform
rename configuration parameter input_proc to input_transformer in API and CLI
adjust API and CLI docs

xcube gen to accept a text file with file names as input

Is your feature request related to a problem? Please describe.
related to #33

Describe the solution you'd like
Users shall have the possibility to provide the files sorted in the correct order in a text file if a simple sorting by file names would lead to a wrongly ordered cube.

Describe alternatives you've considered
Rather than leaving it to the user to provide a correctly sorted file list, xcube gen could check if t+1>t before appending time slice to cube. And if needed re-arrange the cube.

Enable requests of timeseries for a requested polygon

At the moment only timeseries for a point location can be requested.
The user should be able predefine a polygone and request a timeseries for the area of the polygone.

Configurable per-dataset caches

Many of the caches used in WMTS defined in the ServiceContext class. All of them cache data for dataset-variable combinations. It is more effective and extensible, if we had a single extendible cache object for each dataset. The controllers can then decide what information to put into each cache. e.g.:

Computed tile grid definitions
Computed data tiles
Computed RGB tiles

When a dataset is closed, the per-dataset tile cache object is released.

In the xcube-server configuration, we specify for each dataset what should be cached.

It may make sense to not cache anything, especially when a webservice system architecure incorporates helper services such as memcached.

Zarr spec for pyramid-type data

At yesterday's zarr community call there was some discussion about a zarr spec extension for storing pyramid-level data for the next version of the zarr spec. Since you are working with level data in xcube represented by zarr, I thought you might be interested to follow, and maybe comment, on this issue: zarr-developers/zarr-specs#23

EDIT: correct link

Allow filtering by assigning a variable related to a polygon

Is your feature request related to a problem? Please describe.
Assign to a given polygon a dedicated list of variables (to be displayed on the viewer) from the available variables in the generated cube.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
the idea is that we do not show at all places all the available layers for the same variable like we do now in the viewer.

Enable distributed cube computing

Is your feature request related to a problem? Please describe.

xcube serve supports datasets that are computed on-the-fly, e.g. applying temporal aggregation. This requires high data throughput and CPU resources, therefore requests based on computed datasets often respond too slow and then time out. Such computations may be much faster if computation is distributed on a dedicated cluster.

xcube gen may be parallelized so that individual input files (usually spatial time slices) are transformed in a distributed way and then results are combined chronologically into the desired cube.

The same may apply to other xcube commands that perform heavy computing on chunks of data.

Describe the solution you'd like

Add option to xcube CLI commands that configure how the command is executed in a distributed manner. The option is TBD.

Describe alternatives you've considered

None, besides using larger machines.

Additional context

As we use xarray and xarray uses dask, the solution should be based on Dask Distributed.

New naming of data cube types

The naming of the data cube levels should give better indication of the type. One type of a generated cube is spatial regular with original time stamps, therefore temporal irregular. The second type is both, spatially and temporally regular. At the moment the naming of the types is too similar to data processing levels in earth observation.

Existing propositions are the addition of 'tirr' for time irregular and 'treg' for time regular.

Following the naming decision, the code and documentation needs to be adjusted.

Case insensitive WMTS KVP parameters

In the dcs4cop viewer, by default the WMTS getTile request is launched using the TIME dimension parameter. For Landsat, Sentinel and CMEMS layers it isn't a problem.
Only for the OLCI layers, the response is not wat we expected. A getTile request with the uppercase TIME parameter will result in the default image(I guess the latest available date), regardless of the time value.
It looks like the uppercase TIME parameter is skipped by the WMTS service. However, the OLCI WMTS service is working correctly when the time parameter is written in lowercase.

To fulfill the WMTS OGC standards, can you adapt the service so that the WMTS parameters are case insensitive?

Allow configuration of chunking and compression

The NetCDF and ZARR xarray backends support storing data in optionally compressed chunks. Allow configuring that for the cube generation.

Integrate xcube-server in xcube

Is your feature request related to a problem? Please describe.

We don't want to maintain two packages that actually require the same environment.

Describe the solution you'd like

Describe alternatives you've considered

See dcs4cop/xcube-server#37

Add the to Convert Varibales to Dimensions

Is your feature request related to a problem? Please describe.

We need the ability to convert a dataset variable into a dimension as requested by Norman.

Describe the solution you'd like

Something like this:

ds2 = xr.concat([ds[var_name] for var_name in ds.data_vars], "var")
var_coords=xr.DataArray([var_name for var_name in ds.data_vars], dims=["var"])
ds2.assign_coords(var=var_coords)

Switch from argparse package to click package for xcube-genl2c and xcube-genl3

The arguments given by the command line are currently extracted by the argparse package. When passing negative values for an argument. The click packages allows to enable negative values without returning an error.

False cube creation when output name is not set using xcube-genl2c

When the user does not set an output name for the generated data cube when using xcube-genl2c, the output is one output file per input files. This is because the default output name depends on the input file, which changes if there is more than one input file used for the cube generation. The default name should be set to a not changing name.

Time-series web API should also return uncertainty

The time-series RESTful API (/ts/...) should also return uncertainty values when there is a related anciliary variable. Hence, in addition it should return values of data variables

named <base>_<prefix> for a given data variable named <base> where <prefix> is one of stdev, uncert, error, or
listed in the value of the ancillary_variables attribute of a data variable, see section Ancillary Data in the CF-Conventions.

Provide functionality for

points and
other geometries.

Additional context

See dcs4cop/xcube-viewer#19

Allow creating spatial pyramid levels

xcube-server is encountering massive performance problems when low-res tiles are created from spatial hi-res dataset, especially when their chunking is not ideal for tile extraction in the spatial dimensions. This is because spatial resolution levels are computed on the fly. For the lowest resolution (level zero) tiles, all hi-res data need to be read.

We need a data format that allows xcube-server to read from spatial pyramid levels, if they exist, and a tool that can generate spatial pyramid levels from hi-res datasets.

Format suggestion

Let some/file/path/bigdata.zarr be the path to an hi-res dataset, then physical representation of the spatial pyramid with 8 levels could be as follows:

    - some/file/path/bigdata.zarr
    - some/file/path/bigdata.levels/
      - 0.lnk        # contains link to original dataset at spatial resolution res0
      - 1.zarr/     # First downsampled level with res = res0 * 2^1
      - 2.zarr/     # Second downsampled level with res = res0 * 2^2
      ...
      - 7.zarr/     # Second downsampled level with res = res0 * 2^7

All levels have the same chunking. The number of chunks in one of the spatial dimensions at highest level is one, all other levels have multiple chunks in spatial dimensions.

Another possibility is that all the levels go into a single ZARR dataset.

xcube gen sorts input list and therefore might append wrong time order

The order of the appending step of input files to the cube is relevant.
When having input data which has input names first differentiated by e.g. A and B, the input files are sorted by these characters instead of the time stamp. When submitting a list of input files, which is already sorted by time tamp, omitting the characters A and B xcube gen sorts the list internally and creates a wrong cube, first appending all input data with A and then with B.

Expected behavior
xcube gen should not sort again the input files. It should take the order given by the user.

Allow aligning regional cubes on a fixed Earth grid

Develop a CLI that allows computing suitable Xcube spatial resolutions and allows for adjusting a given bounding box so that it snaps into fixed Earth multi-resolution pyramid grid.

Cannot load plugins

Importing package xcube.api.gen triggers plugin loading.
But xcube gen plugins requires importing it.
Hence they observe an import error.

xcube plugin loading shall only occur when xcube code is executed, not while its modules are imported.

This issue is related to #49.

Provide xcube documentation

Setup Sphinx doc gen
Setup ReadTheDocs build
Write actual docs with focus on xcube gen and xcube serve

Modularization of new input processors as plugin

Input data sets might require different input processors than the once provided in xcube. These should be easily included as plugins for the cube creation.

Enable requests by passing lat and lon (needed for App)

Enable requests by passing lat and lon, and then the server returns the suitable dataset to the user. This is needed for requests passed by the app. Difficulty: what happens, when the user requests a lat and lon which is outside the bounding box of any data cube region included in the server?

xcube gen: indexes along dimension 'y' are not equal

When using xcube gen with processed variables that use flag values, we now get errors such as

step 2 of 9: computing variables...
Internal error: failed computing valid mask for 'rrs_560' from expression  'np.logical_not(PIXEL_CLASSIFY_FLAGS.F_INVALID)': indexes along dimension 'y' are not equal

The error happens in code that used to run without problems. Must be due to a change in xarray or deeper.

Here is the full traceback:

step 1 of 9: pre-processing dataset...
  pre-processing dataset completed in 1.6446000000058802e-05 seconds
step 2 of 9: computing variables...
Internal error: failed computing valid mask for 'rrs_560' from expression 'np.logical_not(PIXEL_CLASSIFY_FLAGS.F_INVALID)': indexes along dimension 'y' are not equal
Traceback (most recent call last):
  File "d:\projects\xcube\xcube\util\expression.py", line 49, in compute_expr
    result = eval(expr, namespace, None)
  File "<string>", line 1, in <module>
  File "d:\projects\xcube\xcube\util\maskset.py", line 98, in __getattr__
    return self.get_mask(name)
  File "d:\projects\xcube\xcube\util\maskset.py", line 128, in get_mask
    mask_var = mask_var.where((flag_var & flag_mask) != 0, 0)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\common.py", line 859, in where
    return ops.where_method(self, cond, other)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\ops.py", line 191, in where_method
    keep_attrs=True)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\computation.py", line 969, in apply_ufunc
    keep_attrs=keep_attrs)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\computation.py", line 209, in apply_dataarray_vfunc
    raise_on_invalid=False)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\alignment.py", line 217, in deep_align
    exclude=exclude)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\alignment.py", line 132, in align
    .format(dim))
ValueError: indexes along dimension 'y' are not equal

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:/Projects/xcube/xcube/cli/cli.py", line 257, in main
    exit_code = cli.main(args=args, obj=ctx_obj, standalone_mode=False)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 717, in main
    rv = self.invoke(ctx)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "d:\projects\xcube\xcube\cli\gen.py", line 126, in gen
    **config)
  File "d:\projects\xcube\xcube\api\gen\gen.py", line 142, in gen_cube
    monitor)
  File "d:\projects\xcube\xcube\api\gen\gen.py", line 284, in _process_l2_input
    dataset = transform(dataset)
  File "d:\projects\xcube\xcube\api\gen\gen.py", line 207, in step2
    return compute_dataset(dataset, processed_variables=processed_variables)
  File "d:\projects\xcube\xcube\api\compute.py", line 113, in compute_dataset
    errors=errors)
  File "d:\projects\xcube\xcube\util\expression.py", line 32, in compute_array_expr
    return compute_expr(expr, namespace=namespace, errors=errors, result_name=result_name)
  File "d:\projects\xcube\xcube\util\expression.py", line 57, in compute_expr
    raise ValueError(msg) from e
ValueError: failed computing valid mask for 'rrs_560' from expression 'np.logical_not(PIXEL_CLASSIFY_FLAGS.F_INVALID)': indexes along dimension 'y' are not equal

dcs4cop / xcube Goto Github PK

xcube's People

Contributors

Stargazers

Watchers

Forkers

xcube's Issues

Recommend Projects

Recommend Topics

Recommend Org