Coder Social home page Coder Social logo

dcs4cop / xcube Goto Github PK

View Code? Open in Web Editor NEW
164.0 164.0 17.0 362.17 MB

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.

Home Page: https://xcube.readthedocs.io/

License: MIT License

Python 99.77% Dockerfile 0.08% Shell 0.04% HTML 0.10%

xcube's People

Contributors

alicebalfanz avatar dzelge avatar edd3x avatar forman avatar gunbra32 avatar pont-us avatar rabaneda avatar ruchimotwanibc avatar tejasmorbagal avatar thomasstorm avatar tiagoams avatar toniof avatar ymoisan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xcube's Issues

Clamping of values for bbox in xcube-grid

In xcube-grid the user passes a bounding box which is then adjusted to a grid of a certain resolution. Currently no exception is thrown when passing invalid longitudes or latitudes.
Possibility:
Setting a flag which allows the user to decide whether to clamp the values of the bounding box, set by default to false.

Important: Consider the use case where a bounding box is placed across the antimeridian.

Cannot load plugins v.2

Describe the bug
Plugin loading only triggered by executing xcube.cli. When testing the xcube gen plugins, they need the plugins loaded when executing xcube.api.gen.gen. Otherwise they observe an import error.

xcube plugin loading shall only occur when xcube code is executed, not while its modules are imported: at the moment this only happens when code of xcube.cli is executed, but would be needed whithin xcube.api.gen.gen as well.

This issue is related to #49.
This issue is related to closed issue #62

Allow xcube serve to use SNAP specific color maps

The user wants a non linear color display, to map specific values with a predefined color.
This could be solved through making xcube serve able to import SNAP specific color maps provided by the user.

Possibility of logarithmic scale for mapping

The possibility of mapping to colors of a certain variable or many variables using a logarithmic should be included. This is particularly important for parameters like chlorophyll.

Move specific `xcube gen` input processors into separate repos

xcube gen input processor implementations are often very specific with respect to the supported data input format. Some of them require physical datasets for testing to be included in the sources. xcube on the other hand provides a generic API and CLI and does not rely on physical test datasets.

Input processors are also developed by different development teams and should therefore have different repositories with own responsibilities and issues.

Move specific input processors should be moved into separate repositories.

QC tools for data cubes

We need some CLI commands and API function that help performing basic QC and data cubes:

  • CLI: xcube verify <path> to verify a cube has a valid structure, later may also xcube validate <path> <ref-data> for actual content QC
    Validate value ranges, all empty, illegal non-monotonic increasing coordinates (time!)
  • API:
    • function to validate a dataset and generate a validation report validate_cube(dataset)
    • function to assert that a given dataset is a valid data cube: assert_cube(dataset). To be used in other API where valid cubes are expected as input)

Allow xcube server to have any URL prefix

Is your feature request related to a problem? Please describe.
We currenty always include version number in URL. This makes it harder to make changes during development as clients need to be reconfigured on version changes.

Describe the solution you'd like

Example:

xcube serve --prefix "api/dev/latest" ....

or

xcube serve --prefix "dcs4cop/api/${version}" ....

CLI for temporal aggregation (Level-3 cubes)

Is your feature request related to a problem? Please describe.

Temporal aggregation can be time consuming when done on the fly, e.g. from xcube server config or notebooks. Therefore a tool is needed that persists a time aggregated cube.

Describe the solution you'd like

New (click) CLI command, API already exists.

Restructure and clarify code base

One main CLI "xcube", many sub-commands

Simplify top-level structure:

  • api/ - data cubes API
  • cli/ - data cubes CLI
  • util/ - framework and implementation helpers
  • version.py

In detail:

grid/ --> api/grid.py, cli/grid.py
genl2c/ --> api/gen/, cli/gen.py
genl3/ --> api/tagg.py,  cli/tagg.py

cli.py --> cli/
config.py --> util/
constants.py --> util/
dsio --> util/
dsutil --> util/,  api/
expression.py --> util/expression.py
maskset.py --> util/
objreg.py --> util/
reproject.py --> api/reproj.py
types.py --> DEL
version.py --> OK

Default input processor

We currently must specify a class name for the input processor to be used for each input dataset.

If datasets are already in xcube's "standard format", that is:

  • Have dimensions lat, lon, optionally time of length 1;
  • have coordinate variables lat[lat], lon[lat], time[time] (opt.), time_bnds[time, 2] (opt.);
  • have any data variables of form <var>[time, lat, lon] or <var>[lat, lon] if time coordinate variable is missing;
  • have global attribute pair time_coverage_start, time_coverage_end (or pair time_start, time_stop, and others) if time variable coordinate is missing,

then we could use a default input processor that would be configurable w.r.t. the variables to be processed, and which mask to apply, etc.

Tool for cube rechunking and compression

A tool is needed that reads an existing datacube and write a new cube with identical data but with different data chunking and compression.

This is similar to #14 , but allows applying new chunking and compression to existing cubes.

xcube gen generates duplicates in time and unsorted time dimension for S2+

Describe the bug
xcube gen generates two identical time slices with S2+ plugin. In the input data there is only one input file. In addition, the time dimension is not sorted.

To Reproduce
The cube has been produced in xcube-gen with the the following command.
nohup xcube gen --append -v CHL_GILERSON2010_GLOBAL,KdPAR,SPM_VITOnir_SCHELDT,TUR_NECHAD2009_GLOBAL_665 -c /home/xcube/projects/xcube-services/dcs4cop/xcube-gen-configs/dcs4cop-gen_BC_config_S2.yml /data/EOdata/related/DCS4COP/cube_input/VITO/2017//.nc > nohup.out &
The resulting zarr file is still on xcube-gen at /home/xcube/xcube-output/ dcs4cop-bc-s2-sns-l2c-v1.zarr
Note that this issue might be related to the fact that the generation has been interrupted and started again with the same output name.

Screenshots
image

Making data cube generation robust against paused process

When the process of appending is stopped, there is no way to know, at which point the process was stopped. Therefore one has to restart the cube generation, in order not to have duplicates in the data cube. It would be nice to have a robust solution, which checks in the existing data cube, whether an input file has been already used for appending the data cube or not. If the xcube generator recognizes an input file it proceeds to the next one, until an input file is found which was not used for appending the data cube.

yaml.load results in error when using config file for xcube gen

Describe the bug
When trying to use xcube gen with a config file, an error message appears:

calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config_dict = yaml.load(fp)

The solution is to change xcube/api/gen/config.py line 57 into config_dict = yaml.safe_load(fp)

Rename input processor to input transformer

The term "input processor" is confusing for EO-scientists as it makes them believe, some thematic EO data products are generated in this step.

It has been agreed, the term "input transformer" better describes the nature of this components, namely to prepare a (L1 or L2 EO data file) input for appending it to a data cube, which usually requires time stamp identification, variable selection, masking, and reprojection of data to target SRS.

Therefore

  • rename class InputProcessor to InputTransformer
  • rename its method process to transform
  • rename configuration parameter input_proc to input_transformer in API and CLI
  • adjust API and CLI docs

xcube gen to accept a text file with file names as input

Is your feature request related to a problem? Please describe.
related to #33

Describe the solution you'd like
Users shall have the possibility to provide the files sorted in the correct order in a text file if a simple sorting by file names would lead to a wrongly ordered cube.

Describe alternatives you've considered
Rather than leaving it to the user to provide a correctly sorted file list, xcube gen could check if t+1>t before appending time slice to cube. And if needed re-arrange the cube.

Configurable per-dataset caches

Many of the caches used in WMTS defined in the ServiceContext class. All of them cache data for dataset-variable combinations. It is more effective and extensible, if we had a single extendible cache object for each dataset. The controllers can then decide what information to put into each cache. e.g.:

  • Computed tile grid definitions
  • Computed data tiles
  • Computed RGB tiles

When a dataset is closed, the per-dataset tile cache object is released.

In the xcube-server configuration, we specify for each dataset what should be cached.

It may make sense to not cache anything, especially when a webservice system architecure incorporates helper services such as memcached.

Zarr spec for pyramid-type data

At yesterday's zarr community call there was some discussion about a zarr spec extension for storing pyramid-level data for the next version of the zarr spec. Since you are working with level data in xcube represented by zarr, I thought you might be interested to follow, and maybe comment, on this issue: zarr-developers/zarr-specs#23

EDIT: correct link

Allow filtering by assigning a variable related to a polygon

Is your feature request related to a problem? Please describe.
Assign to a given polygon a dedicated list of variables (to be displayed on the viewer) from the available variables in the generated cube.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
the idea is that we do not show at all places all the available layers for the same variable like we do now in the viewer.

Enable distributed cube computing

Is your feature request related to a problem? Please describe.

xcube serve supports datasets that are computed on-the-fly, e.g. applying temporal aggregation. This requires high data throughput and CPU resources, therefore requests based on computed datasets often respond too slow and then time out. Such computations may be much faster if computation is distributed on a dedicated cluster.

xcube gen may be parallelized so that individual input files (usually spatial time slices) are transformed in a distributed way and then results are combined chronologically into the desired cube.

The same may apply to other xcube commands that perform heavy computing on chunks of data.

Describe the solution you'd like

Add option to xcube CLI commands that configure how the command is executed in a distributed manner. The option is TBD.

Describe alternatives you've considered

None, besides using larger machines.

Additional context

As we use xarray and xarray uses dask, the solution should be based on Dask Distributed.

New naming of data cube types

The naming of the data cube levels should give better indication of the type. One type of a generated cube is spatial regular with original time stamps, therefore temporal irregular. The second type is both, spatially and temporally regular. At the moment the naming of the types is too similar to data processing levels in earth observation.

Existing propositions are the addition of 'tirr' for time irregular and 'treg' for time regular.

Following the naming decision, the code and documentation needs to be adjusted.

Case insensitive WMTS KVP parameters

In the dcs4cop viewer, by default the WMTS getTile request is launched using the TIME dimension parameter. For Landsat, Sentinel and CMEMS layers it isn't a problem.
Only for the OLCI layers, the response is not wat we expected. A getTile request with the uppercase TIME parameter will result in the default image(I guess the latest available date), regardless of the time value.
It looks like the uppercase TIME parameter is skipped by the WMTS service. However, the OLCI WMTS service is working correctly when the time parameter is written in lowercase.

To fulfill the WMTS OGC standards, can you adapt the service so that the WMTS parameters are case insensitive?

image

Integrate xcube-server in xcube

Is your feature request related to a problem? Please describe.

We don't want to maintain two packages that actually require the same environment.

Describe the solution you'd like

  • Integrate all xcube-server sources and resources into xcube repo
  • Move xcube-server CLI into xcube.cli.serve.py
  • Move xcube-server production code into new xcube.webapi
  • Move xcube-server test code into new test.webapi
  • Make xcube-server callable as CLI xcube serve
  • Move generally useful functions (time-series!) into xcube.api
  • Move useful utilities into xcube.util
  • Remove code duplications, make use of xcube.util
  • Move open xcube-server issues into xcube issue tracker
  • Close xcube-server repo

Describe alternatives you've considered

See dcs4cop/xcube-server#37

Add the to Convert Varibales to Dimensions

Is your feature request related to a problem? Please describe.

We need the ability to convert a dataset variable into a dimension as requested by Norman.

Describe the solution you'd like

Something like this:

ds2 = xr.concat([ds[var_name] for var_name in ds.data_vars], "var")
var_coords=xr.DataArray([var_name for var_name in ds.data_vars], dims=["var"])
ds2.assign_coords(var=var_coords)

False cube creation when output name is not set using xcube-genl2c

When the user does not set an output name for the generated data cube when using xcube-genl2c, the output is one output file per input files. This is because the default output name depends on the input file, which changes if there is more than one input file used for the cube generation. The default name should be set to a not changing name.

Time-series web API should also return uncertainty

The time-series RESTful API (/ts/...) should also return uncertainty values when there is a related anciliary variable. Hence, in addition it should return values of data variables

  • named <base>_<prefix> for a given data variable named <base> where <prefix> is one of stdev, uncert, error, or
  • listed in the value of the ancillary_variables attribute of a data variable, see section Ancillary Data in the CF-Conventions.

Provide functionality for

  • points and
  • other geometries.

Additional context

See dcs4cop/xcube-viewer#19

Allow creating spatial pyramid levels

xcube-server is encountering massive performance problems when low-res tiles are created from spatial hi-res dataset, especially when their chunking is not ideal for tile extraction in the spatial dimensions. This is because spatial resolution levels are computed on the fly. For the lowest resolution (level zero) tiles, all hi-res data need to be read.

We need a data format that allows xcube-server to read from spatial pyramid levels, if they exist, and a tool that can generate spatial pyramid levels from hi-res datasets.

Format suggestion

Let some/file/path/bigdata.zarr be the path to an hi-res dataset, then physical representation of the spatial pyramid with 8 levels could be as follows:

    - some/file/path/bigdata.zarr
    - some/file/path/bigdata.levels/
      - 0.lnk        # contains link to original dataset at spatial resolution res0
      - 1.zarr/     # First downsampled level with res = res0 * 2^1
      - 2.zarr/     # Second downsampled level with res = res0 * 2^2
      ...
      - 7.zarr/     # Second downsampled level with res = res0 * 2^7

All levels have the same chunking. The number of chunks in one of the spatial dimensions at highest level is one, all other levels have multiple chunks in spatial dimensions.

Another possibility is that all the levels go into a single ZARR dataset.

xcube gen sorts input list and therefore might append wrong time order

The order of the appending step of input files to the cube is relevant.
When having input data which has input names first differentiated by e.g. A and B, the input files are sorted by these characters instead of the time stamp. When submitting a list of input files, which is already sorted by time tamp, omitting the characters A and B xcube gen sorts the list internally and creates a wrong cube, first appending all input data with A and then with B.

Expected behavior
xcube gen should not sort again the input files. It should take the order given by the user.

Cannot load plugins

Importing package xcube.api.gen triggers plugin loading.
But xcube gen plugins requires importing it.
Hence they observe an import error.

xcube plugin loading shall only occur when xcube code is executed, not while its modules are imported.

This issue is related to #49.

Enable requests by passing lat and lon (needed for App)

Enable requests by passing lat and lon, and then the server returns the suitable dataset to the user. This is needed for requests passed by the app. Difficulty: what happens, when the user requests a lat and lon which is outside the bounding box of any data cube region included in the server?

xcube gen: indexes along dimension 'y' are not equal

When using xcube gen with processed variables that use flag values, we now get errors such as

step 2 of 9: computing variables...
Internal error: failed computing valid mask for 'rrs_560' from expression  'np.logical_not(PIXEL_CLASSIFY_FLAGS.F_INVALID)': indexes along dimension 'y' are not equal

The error happens in code that used to run without problems. Must be due to a change in xarray or deeper.

Here is the full traceback:

step 1 of 9: pre-processing dataset...
  pre-processing dataset completed in 1.6446000000058802e-05 seconds
step 2 of 9: computing variables...
Internal error: failed computing valid mask for 'rrs_560' from expression 'np.logical_not(PIXEL_CLASSIFY_FLAGS.F_INVALID)': indexes along dimension 'y' are not equal
Traceback (most recent call last):
  File "d:\projects\xcube\xcube\util\expression.py", line 49, in compute_expr
    result = eval(expr, namespace, None)
  File "<string>", line 1, in <module>
  File "d:\projects\xcube\xcube\util\maskset.py", line 98, in __getattr__
    return self.get_mask(name)
  File "d:\projects\xcube\xcube\util\maskset.py", line 128, in get_mask
    mask_var = mask_var.where((flag_var & flag_mask) != 0, 0)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\common.py", line 859, in where
    return ops.where_method(self, cond, other)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\ops.py", line 191, in where_method
    keep_attrs=True)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\computation.py", line 969, in apply_ufunc
    keep_attrs=keep_attrs)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\computation.py", line 209, in apply_dataarray_vfunc
    raise_on_invalid=False)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\alignment.py", line 217, in deep_align
    exclude=exclude)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\xarray\core\alignment.py", line 132, in align
    .format(dim))
ValueError: indexes along dimension 'y' are not equal

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:/Projects/xcube/xcube/cli/cli.py", line 257, in main
    exit_code = cli.main(args=args, obj=ctx_obj, standalone_mode=False)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 717, in main
    rv = self.invoke(ctx)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "D:\Miniconda3\envs\xcube\lib\site-packages\click\core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "d:\projects\xcube\xcube\cli\gen.py", line 126, in gen
    **config)
  File "d:\projects\xcube\xcube\api\gen\gen.py", line 142, in gen_cube
    monitor)
  File "d:\projects\xcube\xcube\api\gen\gen.py", line 284, in _process_l2_input
    dataset = transform(dataset)
  File "d:\projects\xcube\xcube\api\gen\gen.py", line 207, in step2
    return compute_dataset(dataset, processed_variables=processed_variables)
  File "d:\projects\xcube\xcube\api\compute.py", line 113, in compute_dataset
    errors=errors)
  File "d:\projects\xcube\xcube\util\expression.py", line 32, in compute_array_expr
    return compute_expr(expr, namespace=namespace, errors=errors, result_name=result_name)
  File "d:\projects\xcube\xcube\util\expression.py", line 57, in compute_expr
    raise ValueError(msg) from e
ValueError: failed computing valid mask for 'rrs_560' from expression 'np.logical_not(PIXEL_CLASSIFY_FLAGS.F_INVALID)': indexes along dimension 'y' are not equal

Improve performance of Time-Series

The performance of the time-series commands is slow, leading to time-outs. This has been observed for all four endpoints.

GET /ts/{dataset}/{variable}/point
POST /ts/{dataset}/{variable}/geometry
POST /ts/{dataset}/{variable}/geometries
POST /ts/{dataset}/{variable}/features

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.