open-eo / openeo-r-backend Goto Github PK

A reference implementation for the openEO core API as a proof-of-concept

License: Apache License 2.0

R 100.00%

openeo-r-backend's Introduction

openEO Backend in R for proof-of-concept

A reference implementation for the openEO core API as a proof-of-concept written in R, utilizing the plumber package as a lightweight webserver. The webserver is not final and in terms of security aspects not optimized. The goal of this package is to provide a simplistic version of local openEO conformant server backend with the API version 0.3.1.

Installation

Install the package by using install_github from the devtools package. If you run R on Windows then the packages are build from binaries, but on a Linux distribution the packages are compiled. This means that if yon a Linux OS you need to install some required system libraries,first. For Ubuntu this is:

sudo apt-get -y install libgdal-dev libcurl4-gnutls-dev libssl-dev libssh2-1-dev libsodium-dev gdal-bin libudunits2-dev

But also on Windows it is highly recommended to have GDAL installed and configured in the systems PATH environment variable.

After that you can install the R packages by running:

library(devtools)
install_github(repo="Open-EO/openeo-r-backend",ref="master")

When the back-end is started the first time or if the demo data is not present it will be downloaded from an external source. The demo data set contains two small spatio-temporal raster data sets. One is a NDVI raster time series calculated from Landsat-8 and the other one is a small spatial subset of Sentinel-2 data.

Getting Started

After loading the package, you can create a server object by calling createServerInstance(). The server object is directly available as openeo.server in the global environment (.GlobalEnv). This object is intended to perform all server relevant tasks like managing references to the server data and processes as well as for starting the server.

The openeo.server can be customly configured by configuring and passing a ServerConfig object into the createServerInstance function. The configuration object has e.g. attributes like data.path and workspace.path to reference to the demo data or to the folder where the users data and job results are stored. As a default the downloaded demo data will be stored in the subfolder data under config$workspaces.path.

Note: please remove the '/' suffix from your directory paths. If the workspaces.path is not set explicitly, then it will assume to look and/or store the created data in the current working directory getwd().

You then need to load the demo data and processes for the server or you need to develop and register your own Processes and Products. If you haven't already, then loadDemo() will download the sample data for you and store it under /data in the workspace.path Also if you are starting the server for the first time, then you might create a user first.

library(openEO.R.Backend)

config = ServerConfig()
config$workspaces.path = "path/to/back-end/workspace"
config$mapserver.url = "http://url/to/mapserver" #e.g. http://localhost:8080/cgi-bin/mapserv
config$rudfservice.url = "http://url/to/r-udf-service" #e.g. http://localhost:8010/udf

createServerInstance(configuration = config)

openeo.server$initEnvironmentDefault()
openeo.server$initializeDatabase()
openeo.server$createUser(user_name="test", password="test") #only created if not exists
openeo.server$loadDemo()

openeo.server$startup()

To stop the server you need to terminate the R session process (e.g. CTRL + C).

When you want to use the server on operational level, meaning you have created your user and just want to start the server for testing, you might be advised to store the code above (without the createUser command) in a separate R file and run it from the command line with the following command:

R -f path/to/your_file.R

Additional Requirements

If you also want to use the R-UDF webservice implementation you need also to install and run r-udf-service. Also if you want to use preliminary webservice support, you also need to install mapserver.

Docker installation

As an alternatively to the installation on the local machine, you can run the R backend on a docker machine. We provided an docker-compose file to take care of most of the business. Make sure you are able to run docker-compose on the targeted machine and run the following lines to set up the base server and the actual r backend. It is important that you build the baseserver before the openeo-r-server, because it will contain the basic server configuration for the application server (openeo-rserver).

docker-compose up -d

Note: Starting with version back-end version 0.3.1-X we will also provide docker images for the r-server with demo data and the r-udf-service on docker hub openeor

Authentication / Authorization Behavior

On this local backend we consider three levels of access that require either open access, basic authorization and bearer token authorization depending on the called endpoint (see api reference). But basically we consider all meta data services that support exploration of data, processes and other functionalities as open access. Then basic authorization is currently used for the authentication services (login), and finally the bearer token authorization is applied on all services that are linked to the user like user workspace and job and service handling.

This means that you should be aware to use the proper HTTP headers in your requests. Authorization: Basic <encoded_credentials> at the login process and Authorization: Bearer <token> at the other authorized services. For the bearer token authorization you will send the token that you have retrieved at the login.

Process Graphs for Proof-of-Concept

Use Case 1


Endpoint:	POST /preview or POST /jobs
Query-Configuration:	Authorization with Bearer-Token

{
    "process_graph": {
      "process_id": "min_time",
      "imagery": {
        "process_id": "NDVI",
        "imagery": {
          "process_id": "filter_bbox",
          "imagery": {
            "process_id": "filter_daterange",
            "imagery": {
              "process_id": "get_collection",
              "name": "sentinel2_subset"
            },
            "extent": ["2017-04-01T00:00:00Z", "2017-05-31T00:00:00Z"]
          },
          "extent": {
            "west": 700000,
            "south": 7898000,
            "east": 702960,
            "north": 7900000,
            "crs": "EPSG:32734"
          }
        },
        "nir": "B8",
        "red": "B4"
      }
    } ,
    "output": {
        "format": "GTiff"
    }
}

Use Case 3


Prerequisites:	An uploaded "polygons.geojson" file in the users workspace (PUT /users/{user_id}/files/)
Endpoint:	POST /jobs or POST /preview
Query-Configuration:	Authorization with Bearer-Token

{
    "process_graph": {
      "process_id": "zonal_statistics",
      "imagery": {
        "process_id": "filter_bbox",
        "imagery": {
          "process_id": "filter_bands",
          "imagery": {
            "process_id": "filter_daterange",
            "imagery": {
              "process_id": "get_collection",
              "name": "sentinel2_subset"
            },
            "extent": ["2017-01-01T00:00:00Z", "2017-05-31T23:59:59Z"]
          },
          "bands": "B8"
        },
        "extent": {
          "west": 22.8994,
          "south": -19.0099,
          "east": 22.9282,
          "north": -18.9825
        }
      },
      "regions": "/uc3/polygons.geojson",
      "func": "mean"
    },
    "output": {
        "format": "GPKG"
    }
}

If you are interested, then check the openeo-r-client example for reference.

openeo-r-backend's People

Contributors

Stargazers

Watchers

Forkers

pramitghosh kaladharprajapati sarikayamehmet

openeo-r-backend's Issues

Implement the "services" feature

I have spared the feature for services like WMS and WCS for now, because R doesn't offer any packages to create such a service. Currently, I'm working on an integration of MapServer service that lies within the Docker network.
The idea here is to create a map file for MapServer from R that links to the processed file output. In order to allow MapServer to access the data we create a shared volume of the backend workspace.

Capabilities: Methods are not returned in spec-compliant format

According to the spec, the available method(s) for each endpoint are specified in an array of string by the name of methods.

The current implementation (in API_v0.3.0_changes branch) differs from that in two ways:

The attribute has the name method (missing plural s)
If there's only one method available for the endpoint, the content is a simple string (should be array of string with one item)

Calling a synchronous task will not yield an image

Connect the download function with the "/api/jobs/evaluate=sync" operation

Wrong configuration in server_start.R

When I run the Docker container of this repository (develop branch), I receive the following warnings and error:

Warning messages:
1: replacing previous import ‘magrittr::extract’ by ‘raster::extract’ when loading ‘openEO.R.Backend’ 
2: replacing previous import ‘dplyr::select’ by ‘raster::select’ when loading ‘openEO.R.Backend’ 
3: replacing previous import ‘dplyr::intersect’ by ‘raster::intersect’ when loading ‘openEO.R.Backend’ 
4: replacing previous import ‘dplyr::union’ by ‘raster::union’ when loading ‘openEO.R.Backend’ 
> 
> config = ServerConfig()
> config$workspaces.path = "/var/openeo/workspace"
> config$initEnvironmentDefault()
Error: attempt to apply non-function
Execution halted

Originally posted by @pramitghosh in #15 (comment)

POST job with evaluate "sync" returns a http status 400

The jupyter notebook from Open-EO/openeo-r-client shows an error when submitting a job with "sync" to the local R backend. Reason value restriction of parameter "evaluate" does not include "sync".

A fix will only be applied to versions 0.1.X < 0.2. Since with 0.2 the api version v0.0.2 will be used, where we have a separate endpoint for task execution.

Add dimension information to "Collection"

For the test implementation on UDFs we need information on collections that specifies the dimensionality for passing data to an UDF and reading the data from an UDF back into the system.

Also each offered process in this back-end need to modify this dimensionality aspect, when they modify the collection.

Notes: collections are the backbone for the data storage in the r back-end and currently it is assumed that the data has always all dimensions (space, time, bands).

Customize Docker image / container

In order to make more use of Docker we either need to allow custom start scripts or make use of environment variables.

Endpoints shouldn't have a trailing slash

Some endpoint URLs have a trailing slash, others do not have one (e.g. /services/ vs. /service_types). The latter version should be used consistently for all endpoints, because that's the way the spec defines them.

URL encoding for dots in file paths required

The R server needs the dot in file paths (e.g. /users/me/files/image.png) to be URL encoded. This doesn't follow the RFC standard for characters that need to be URL encoded. Either the R back-end should not need the dot to be URL encoded or - in case this is not possible at all - an issue should be opened for the Core API to request the dot to be URL encoded.

Collections are not returned correctly

I'm pretty sure that I managed to register at least one dataset correctly (see #18 about how I did that), but the response from /collections/ is:

{"collections":[{"name":{},"title":{},"description":{},"license":{},"extent":{},"links":{}}],"links":{"rel":"self","href":"http://localhost:8000/api/collections"}}

I.e. there is an item in the array (which is positive, because before I manually registered the data the array was completely empty), but it hasn't got any useful information...

Conformance with the API spec (trailing slashes)

This is a follow-up bug to Open-EO/openeo-r-client#4 to keep track of the issue of the R backend having different behaviour with regards to trailing slashes compared to other backends, necessitating a workaround in the client.

Demo data and processes are not being registered

I downloaded the demo data and processes via loadDemo() and the download worked (I have both the landsat7 and sentinel2 folders). But it appears that the data and processes are not being registered -- the corresponding arrays returned by the /collections and /processes endpoints are empty.

Inspecting the openeo.server object in R revealed that $data and $processes are both empty lists of length 0...

I manually executed data.R and called the loadSentinel2Data and loadLandsat7Dataset functions. The first failed with an error:

Fehler in if (nchar(fname) == 0) stop("empty file name") : Fehlender Wert, wo TRUE/FALSE nötig ist

The second worked, so I at least have one dataset in data, but this should work automatically (or be done with a registerDemo() function). And the processes are still completely empty...

Sending the same content leads to different results

I am sending two JSON process graphs with the same content, but not the same structure. Unfortunately, the requests produce different results. Why is that? Is the order of the JSON entries important? Do I miss something else?

Example 1 - can be processed successfully to a GTiff:

{
  "process_id": "calculate_ndvi",
  "args": {
    "imagery": {
      "process_id": "filter_daterange",
      "args": {
        "imagery": {
          "product_id": "sentinel2_subset"
        },
        "from": "2017-04-01",
        "to": "2017-05-01"
      }
    },
    "red": 4,
    "nir": 8
  }
}

Example 2 - gives an HTTP 500 with the message Error in getCollectionFromImageryStatement(imagery): no collection element found in function call:

{
  "process_id":"calculate_ndvi",
  "args":{
    "red":4,
    "nir":8,
    "imagery":{
      "process_id":"filter_daterange",
      "args":{
        "imagery":{
          "product_id":"sentinel2_subset"
        },
        "from":"2017-04-01",
        "to":"2017-05-01"
      }
    }
  }
}

Access-Control-Expose-Headers missing

For CORS, the Access-Control-Expose-Headers are missing. That means the JS client can't access the OpenEO-Identifier and OpenEO-Costs headers, which leads to a broken workflow as the Identifier is not available for further requests. See: https://open-eo.github.io/openeo-api/v/0.4.0/cors/

Error in filter_bbox

After creating a Job (with POST /jobs), the back end (local instance) returns the following error:

<simpleError: lexical error: invalid char in json text.
only 0's may be mixed with nega
(right here) ------^

This also appears if I send it via Postman
Seems like the Process graph is somehow wrong, but it works on other back ends and JSON validators.
Process Graph:
{ "process_graph": {"process_id": "min_time", "imagery": {"process_id": "NDVI", "red": "B8A", "imagery": {"process_id": "filter_daterange", "extent": ["2017-01-01T00:00:00Z", "2017-01-31T23:59:59Z"], "imagery": {"process_id": "filter_bbox", "extent": {"north": -18.9825, "south": -19, "crs": "EPSG: 4326", "west": 16.138916, "east": 16.524124}, "imagery": {"process_id": "get_collection", "name": "sentinel2_subset"}}}, "nir": "B4"}}}

Functionality for the first example of proof-of-concept

It would be useful to have an overview of what is available and what is still missing in order to be able to perform the first example of the proof-of-concept.

The data is there, the processes seem to be there now, generating the process graph in the client works, but I can't get a result from the backend (possibly due to issue #1). Is it intended to be functional already? Are lazy and synchronous methods of processing implemented, or not yet?

It would help to also include some sort of status overview in the repository (in the readme or a separate file).

Requesting OPTIONS on some endpoints fails with HTTP error 500

OPTIONS / works
OPTIONS /collections works
OPTIONS /collections/sentinel2_subset fails with status code 500

response body:

{
    "error": [
        "500 - Internal server error"
    ],
    "message": [
        "Error in stri_match_first_regex(str, regex, ...): Error in {min,max} interval. (U_REGEX_BAD_INTERVAL)\n"
    ]
}

RStudio console outputs the same:

<simpleError in stri_match_first_regex(str, regex, ...): Error in {min,max} interval. (U_REGEX_BAD_INTERVAL)>

Similarily:
OPTIONS /processes fails
OPTIONS /jobs fails

Reproducible on both Windows and Ubuntu

Move large data files out of the repository

The openeo.server$loadDemo() function should download and load the demo files, instead of having the files be always downloaded over again on each update of the backend. The sooner this is done, the better, since it's an invasive procedure that involves rewriting the whole history of the repository, which will cause issues on forks (if any). It would also save quite a bit of time every time users update the backend package.