Coder Social home page Coder Social logo

sdmx-rest's Introduction

Overview

This repository is used to maintain the SDMX REST API.

The API allows implementers to offer programmatic access to statistical data and metadata over HTTP.

This repository contains:

sdmx-rest's People

Contributors

airosa avatar amattioc avatar dosse avatar ioggstream avatar olavtenbosch avatar sosna avatar stratosn avatar tzaphkiel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sdmx-rest's Issues

Support language queries

From the list of features expected for SDMX 3.0: Deals with web server clients requesting for specific language match in the RESTful API.

Note: Is this more than simply documentation how Content-Negotiation can be used for language resources?

Improve documentation about start and endPeriod

Handling of the start and end periods might depend on the values of the collection attribute (such as beginning, middle, average or end of period).

This is not reflected in the documentation at the moment, but it would be valuable information for implementers.

Retrieve several Items from an ItemScheme

Proposed changes (in bold):

Additional parameter used for identifying a resource, for item scheme types:
SDMX uses the item scheme pattern to model SDMX collections of items. These are:

  • categoryscheme
  • conceptscheme
  • codelist
  • organisationscheme
  • agencyscheme
  • dataproviderscheme
  • dataconsumerscheme
  • organisationunitscheme
  • reportingtaxonomy

For these collections, it is possible to use a 4th parameter for identifying resources. The rules for the 3 other parameters, as defined in the section above, remain valid.

Parameter Type Description
itemList A concatenation of strings compliant with the SDMX common:NestedNCNameIDType for conceptscheme and agencyscheme, or with the SDMX common:NestedIDType in all other cases. The OR operator is supported using the + character. For example, the following string can be used to retrieve the codes for Germany, United States and Japan from REF_AREA codelist: DE+US.JP. The ids of the items to be returned.
This 4th parameter is used as follows:

protocol://ws-entry-point/resource/agencyID/resourceID/version/itemID+itemID

Add support for mapping to the HTTP 403 status code

SDMX web services sometimes need to throw an exception because a syntactically valid request is not allowed within a particular context. For example, a registry client could send a request to update a final artefact, which is an operation that is not allowed by the standard.

The best HTTP status code in this case seems to be 403. As defined in the HTTP specification, 403 means that “the server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity”.

If it is agreed to support this, there are 2 options:

  1. Introduce a new SDMX client error and map it to the 403 HTTP status code.
  2. Map the existing the SDMX client error with code 150 and map it to the 403 HTTP status code rather than the current 400.

Add support for updatedAfter to metadata queries

When querying for data, a client has the possibility to indicate that it wants to retrieve what has changed since a certain point in time.

It would be nice to add a similar feature to metadata queries.

The fact that metadata messages do not have the equivalent of a dataset action might be an issue, though.

Uppercases are still used for keywords defined in the RESTful API

The RESTful API offers the possibility to supply default values for some of the query fields, such as ‘latest’ (to get the latest version currently in production) and ‘all’. Initially, these keywords were in uppercase but, following the feedback received, it was decided to opt for lowercase instead. Unfortunately, some occurrences of the uppercase variants remain on pages 19, 20 and 23 of the SDMX web services guidelines.

Add a new section about best practices

Following several requests, it has been proposed to add a section about web services best practices. The section should contain useful tips and tricks for both data providers and data consumers.

400 Bad syntax or 403 Semantic error

Hi,
I was going through the SDMX RESTfull error handling wiki page and stumbled upon this line
Semantic error (403): If your request is syntactically correct but fails a semantic validation, a 403 code will be returned.
which is confusing taking into account what www.w3.org says about the HTTP 403 code here and what SDMX 2.1 Web Services Guidelines state in Paragraph 5.8 SDMX to HTTP Error Mapping
150 semantic error - 400 bad syntax

So, my questions are:

  1. Is Semantic error (403) just a misprint here: https://github.com/sdmx-twg/sdmx-rest/wiki/Error-handling? I guess that it should be Semantic error (400).
  2. What am I supposed to return when client is authenticated but has not been authorized to access the requested data set? Based on HTTP 1.1 Status Code Definitions I should use 403 Forbidden.

Thank you,
Denis

Problem with the regular expression for the KeyType defined in SDMXRestTypes.xsd

The regular expression used for KeyType (SDMXRestTypes.xsd) marks some valid queries as invalid (such as this one: M+D..EUR.SP00.A) while also failing to catch some invalid queries (such as: B.U2.EUR.RT.MM.EURIBOR1MD_.HST&startPeriod=2011-03-21).

Replace
([A-z0-9_@$-]+)(.(([A-z0-9_@$-]+)(+([A-z0-9_@$-]+)))?)
with
([A-Za-z0-9_@$-]+([+][A-Za-z0-9_@$-]+)_)?(.?)*
in SDMXRestTypes.xsd

023-Group all structural metadata queries under a structure resource

It is proposed to group all structural metadata queries under a structure resource to ease the understanding of the API. After this has been done, the API would offer only 4 main resources:

  • data: to retrieve statistical data
  • metadata: to retrieve reference metadata
  • schema: to retrieve schemas that can be used to validate data
  • structure: to retrieve structural metadata

The new structure query would be as follow:
resource/type/agency/id/version (and of course resource/type/agency/id/version/item, for item scheme queries), where:

  • resource would take a fixed value (i.e. structure)
  • type would be the artefact type, as defined by the SDMX IM (i.e. codelist, datastructure, dataflow, etc.)
  • The rest (agency/id/version) would not change compared to the current version
  • Each of the path parameters (i.e. all except resource in the definition above) accept the all keyword (in addition, version also accept latest)

So, for example, to get the latest version of the SDMX CL_FREQ codelist, one would do:
https://entry-point/structure/codelist/SDMX/CL_FREQ/latest (latest could be omitted as it is the default value if not supplied)

To get the latest version of all structures maintained by SDMX, use the following:
https://entry-point/structure/all/SDMX/all/latest or simply https://entry-point/structure/all/SDMX, as all and latest are the default values for the id and version path parameters respectively.

Retrieve a valid set of codes after constraints resolution

Resolution of the constraints on the server side for big datasets that a client would not be able to process, for which he provides a restriction for one or more dimension (CubeRegion) and the service would then return the valid set of codes (partial codelist) from the other dimensions.

Usage examples for the metadata resource.

While there are many examples of how to use the /data resource, I could not find any about the /metadata. The SDMX Web Services Guidelines has only information and examples for /data (see section 4.4.3).

Please redirect me to any resources that would explain how to retrieve reference metadata using REST and especially how to use the ‘key’ parameter as defined by the wadl.

Thanks.

002-Support reference metadata

In order to allow easily retrieving referential metadata attached to any structural artefact or to data, it is proposed to extend the Rest API in the following way (this extension was already implemented by MT):

Prefixe any data and structure query with the path parameter 'metadata', e.g.:

protocol://ws-entry-point/codelist/IMF/CL_FREQ - returns codelist Frequency

protocol://ws-entry-point/metadata/codelist/IMF/CL_FREQ - returns any reference metadata for the codelist Frequency (or any codes contained in the Codelist)

protocol://ws-entry-point/data/BIS,EXR,1.0/N.AR. - returns exchange rate dataset

protocol://ws-entry-point/metadata/data/BIS,EXR,1.0/N..AR. - returns the referential metadata for the dataset and for any series or observations contained in the dataset

This new syntax should replace the current more restrictive syntax:
protocol://ws-entry-point/metadata/flowRef/key/providerRef

Add VTL resources

VTL defines a few additional MaintainableArtefacts (e.g. TransformationScheme, RulesetScheme, etc.). They should be added as new resource types to the documentation and the OpenAPI definition.

updateAfter documentation improvement

The current documentation for data query parameter updateAfter could improved to specify how the response message should be. And maybe an example.

Some example documentation (if it is correct...)

  • A dataset with action Replace containing all the observations that have been revised since the updatedAfter date.
  • A dataset with action Delete containing all the observations that have been deleted since the updatedAfer date.
  • A dataset with action Append containing all the observations that have been added since the updatedAfer date.

Some sample example (provided the above is correct):

<url>rest/data/ESTAT,SSTSCONS_PROD_A,1.0/....NS0040../ALL/?detail=full&dimensionAtObservation=TIME_PERIOD&updatedAfter=2015-04-01T14%3A15%3A30
<?xml version="1.0" encoding="utf-8"?>
<message:StructureSpecificData xmlns:ss="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer" xmlns:ns1="urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=ESTAT:STS(2.0):ObsLevelDim:TIME_PERIOD" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xml="http://www.w3.org/XML/1998/namespace">
    <message:Header>
        <message:ID>IDTEST</message:ID>
        <message:Test>false</message:Test>
        <message:Prepared>2015-12-17T14:11:48</message:Prepared>
        <message:Sender id="ZZ9" />
        <message:Structure structureID="ESTAT_STS_2_0" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.DataStructure=ESTAT:STS(2.0):ObsLevelDim:TIME_PERIOD" dimensionAtObservation="TIME_PERIOD">
            <common:Structure>
                <Ref agencyID="ESTAT" id="STS" version="2.0" />
            </common:Structure>
        </message:Structure>
    </message:Header>
    <message:DataSet action="Replace" ss:dataScope="DataStructure" xsi:type="ns1:DataSetType" ss:structureRef="ESTAT_STS_2_0">
        <Series FREQ="M" REF_AREA="DE" ADJUSTMENT="N" STS_INDICATOR="PROD" STS_ACTIVITY="NS0040" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M">
            <Obs TIME_PERIOD="2012-02" OBS_VALUE="0" OBS_STATUS="A" />
            <Obs TIME_PERIOD="2012-03" OBS_VALUE="1" OBS_STATUS="A" />
            <Obs TIME_PERIOD="2012-04" OBS_VALUE="2" OBS_STATUS="A" />
            <Obs TIME_PERIOD="2012-05" OBS_VALUE="3" OBS_STATUS="A" />
            <Obs TIME_PERIOD="2012-06" OBS_VALUE="4" OBS_STATUS="A" />
        </Series>
    </message:DataSet>
    <message:DataSet action="Delete" ss:dataScope="DataStructure" xsi:type="ns1:DataSetType" ss:structureRef="ESTAT_STS_2_0">
        <Series FREQ="M" REF_AREA="DE" ADJUSTMENT="N" STS_INDICATOR="PROD" STS_ACTIVITY="NS0040" STS_INSTITUTION="1" STS_BASE_YEAR="2000">
            <Obs TIME_PERIOD="2006-02" OBS_VALUE="NaN" />
            <Obs TIME_PERIOD="2006-03" OBS_VALUE="NaN" />
            <Obs TIME_PERIOD="2006-04" OBS_VALUE="NaN" />
            <Obs TIME_PERIOD="2006-05" OBS_VALUE="NaN" />
            <Obs TIME_PERIOD="2006-06" OBS_VALUE="NaN" />
        </Series>
    </message:DataSet>
</message:StructureSpecificData>

Metadata queries: Unclear applicability and meaning of references attribute

There is a documentation issue with the section related to the applicability and meaning of the reference attribute (for metadata queries). Some of the artefacts to be returned are missing from the table. For example, in case the matching artefact is a codelist, the table only mentions hierarchical codelists in the column for references, but other types would be valid as well (such as Categorisation, Process, ConceptScheme, DataStructureDefinition, MetadataStructureDefinition and StructureSet).

Furthemore, it would be nice to distinguish between the artefacts referenced by the matching artefact and the artefacts that reference the matching artefact.

Need to improve the documentation of the dimensionAtObservation attribute

The RESTful API offers the possibility to clients to ask the service to package the observations in 3 different ways: as time series, as cross-section or as a flat list of observations. This is achieved using the dimensionAtObservation parameter. However, the documentation does not precise what the default should be if the client does not use this parameter.

It is proposed to :

  • Remove the footnote attached to the description of the dimensionAtObservation parameter.
  • Amend the documentation of the dimensionAtObservation parameter as follows: "The ID of the dimension to be attached at the observation level. This parameter allows the client to indicate how the data should be packaged by the service. The options are "TIME_PERIOD" (a timeseries view of the data), the ID of any other dimension used in that dataflow (a cross-sectional view of the data) or the keyword "AllDimensions" (a "flat" view of the data where the observations are not grouped, neither in time series, nor in sections). In case this parameter is not set, the service is expected to:
    • Default to TimeDimension, if the data structure definition has one;
    • If not, default to MeasureDimension , if the data structure definition has one;
    • If none of the above is true, default to AllDimensions.

Version the API

Should we propose a way to version the API? It's considered good practice, and it brings additional guarantees to clients.

Considering SDMX adoption of semantic versioning, we could limit the versioning to the MAJOR segment of the version (e.g. v1, v2, v3, etc.) as MINOR and PATCH changes should not have an impact on clients.

Review options for detail parameter

Review the information returned with detail parameter (full, stubs, ) for both Soap and Rest which are today different (full, stubs, completestubs, referencestubs, allstubs). There is a general need to return more information with stubs than currently specified (e.g. isFinal, structureURL, urn, isExternalReference and some times also annotations (with CategoryScheme_node_order)). A mean to get descriptions, isFinal and annotations in Rest without the Items of the ItemScheme is needed.

Add support for partial references

The SDMX RESTful API currently features a detail parameter. When references are requested for a structural metadata query, all available information for these references will be returned, unless the value referencestubs is used.

SDMX item schemes (codelists, concept schemes, category schemes, etc.) feature a isPartial attribute, but this functionality is currently not used in the RESTful API. It would be nice to add a referencepartial (or something along these lines) to the detail parameter, so that only items relevant in a certain context get returned.

Let's take for example a query for a DSD and its children (i.e. codelists and concept schemes). When using full (the default), all concepts contained in the concept schemes will be returned, including those not used in the DSD. When using referencestubs, no concept will be returned. If we add referencepartial, we could retrieve the concepts used in the DSD and only those, i.e. only the ones relevant to the context of the query. It would be the same for other item schemes, such as when retrieving the codelists referenced by hierarchical codelists or content constraints.

New format & version parameters (to avoid custom solutions)

Some short introduction to the issue. We (SDMX RI ) got a request for specifying the format/version from the URL and at the same time we see different solutions/approaches from various organisations e.g. from ISTAT (some kind of wrapper Service to StructureSpecificData) and Global Registry (version parameter).

Based on the above I assume there are use cases where setting the HTTP header Accept is not easy or not possible. Example on browsers without a REST plugin or using software that doesn't support this and cannot be easily modified.

In order to avoid having different custom solutions for specifying format/version without a HTTP Accept header, it might be better to have a common approach, i.e. one or two new parameters for data and structure requests in SDMX REST.
For example:

Parameter Description
format Whatever [format].[xml/json/whatever] is supported via Accept. (without the mime parts and version) e.g. format=structurespecificdata.xml
version SDMX version, whatever is supported via Accept [version]

WADL file cannot be imported in testing tools because of the resource_type construct

The SDMX WADL file makes use of the WADL resource_type construct. This is very useful in order to define common behaviour that can then be inherited by various resources. This is used in the SDMX WADL file in order to define the parameters common to all types of metadata resources.

Unfortunately, various testing tools don't support resource_type.

SDMX RESTful API 2.1 WADL does not allow hierarchical agency ids

In the WADL file the type of the agencyID parameter is NCNameIDType. This does not allow dots in the agencyID. However agency ids may optionally contain dots (e.g. SDMX.SECRETARIAT.TWG is a valid agency id).

Change the type of agencyID parameter to NestedNCNameIDType.

Support Registration Query

Currently the only SDMX Artifact that can not be retrieved via REST is the Data Registration. This will be useful for applicaitons that wish to have knowledge of where the data is located, enabling applications to browse the metadata, and then get related datasets via the Registration message.

A proposed solution is to be able to retrieve registraitons by

  • Agency/Id

  • Dataflow / Provision Agreement

  • Data Provider

For input to this discussion, we have added support into the Fusion Registry, please refer to section 3.6 in the following document

Harmonize referencing of metadata

@egreising proposed to check whether referencing of structural metadata could be harmonized.

At the moment, we have the following:

  • Structural queries: /type/agency/id/version (e.g. /codelist/SDMX/CL_FREQ/1.0)
  • Structure references (e.g. dataflowRef): /data/agency,id,version/key/providerRef (e.g. /data/ECB,EXR,1.0/A.CHF.EUR.SP00.A/ECB)

There are reasons why these are handled differently but it can nevertheless be confusing.

027-Clarity-Status code for SDMX error 150?

In 4_7_errors.md, shouldn't the HTTP status code for the SDMX error 150 "Semantic error" not be "422 Unprocessable Entity" (rather than currently "403 Forbidden")?

The SDMX description is:

   Semantic error - 150
   A web service should return this error when a request is 
   syntactically correct but fails a semantic validation or 
   violates agreed business rules.

See https://tools.ietf.org/html/rfc4918#section-11.2 :

   The 422 (Unprocessable Entity) status code means the server
   understands the content type of the request entity (hence a
   415(Unsupported Media Type) status code is inappropriate), and the
   syntax of the request entity is correct (thus a 400 (Bad Request)
   status code is inappropriate) but was unable to process the contained
   instructions.  For example, this error condition may occur if an XML
   request body contains well-formed (i.e., syntactically correct), but
   semantically erroneous, XML instructions."

A sub-question:
How should the server behave if a data query contains valid and invalid codes in the key parameter? Should the service return the available data only for the valid codes and ignore the others, or refuse the execution and return an SDMX 150 error? This would be worthwhile to be clarified in the standard so that client applications can expect a consistent behaviour.

Problem with the regular expressions used in used in the SDMX-ML schemas and SDMXRestTtypes.xsd

Many regular expressions in the SDMX-ML schemas use ‘A-z0-9’, as shortcut for lowercase and uppercase letters, and numbers. However, this is incorrect and will in fact allow invalid characters, such as ‘[‘. For example, according to the schemas, ‘A[‘ is a valid IDType, but it should not (the only allowed characters are ‘A-Z, a-z, @, 0-9, _, -, $’). The correct regular expression is: a-zA-Z0-9.

Get data as they were at a certain point in time

The SDMX 2.1 RESTful web services currently allows to:

  • Get the data matching a query;
  • Get the data matching a query if it has changed since a certain point in time (using the HTTP If-Modified-Since header);
  • Get what has changed since a certain point in time (using the updatedAfter parameter);
  • Get the full history of how data evolved over time (using the includeHistory parameter).

However, it does not offer a direct way to get the data as they were at a certain point in time (there is an indirect way, using the includeHistory parameter but the work is then left to the client).

A new parameter (e.g. snapshot) could be added, where a timestamp can be supplied and the web service would then return the data matching the query as they were at that point in time (for example, as they were when a report was published).

Support Retrieval of Content Constraints by Type

A SDMX Content Constraint can have 2 distinct flavours – actual and allowable. Each flavour can fall into one of two categories (Keyset of Cube Region).

In a way a Content Constraint should be at least 2 different maintainable objects, which can be independently retrieved, but if they remain as one object type in the next version of the standard, it would be useful at the API level to be able to distinguish between them.

In the same way the REST API allows the user to query for organisationscheme (which brings back agency, data provider, and data consumer schemes). It would be nice to add two new resource parameters to the API:

allowableconstraint
actualconstraint

It is interesting to note that the actualconstraint may even be a bit redundant considering other requests (e.g. #40). The actual constraint says what data actually exists, this could be a keyset (list of series) or cube region. However for the former (keyset) it is an incredibly verbose response, a much less verbose (and more flexible) client to server query is for the all data with detail=seriesKeysOnly. The second query, cube region, may lose importance when bug #40 is resolved. So for this purpose the Fusion Registry has already re-purposed the actualconstraint as a way to create sub-cube (pre-defined query) definitions - a use case we have been asked to support on many occasions. The actual constraint is a useful mechanism for the client to build pre-selections for large dataflows, and hence the driver for us, to be able to distinguish between constraint used to restrict content for submission, to those that are used for dissemination.

027-Clarity-Behaviour in case of mixed semantically valid and invalid requests

How should the SDMX server respond when the request contains both valid and invalid SDMX (identifiable) artefact IDs?

Examples ('A' is a valid ID, 'AA' is not a valid ID):

Should the server ignore the semantically invalid IDs and return only the resources for the valid IDs, or should the server refuse to serve the whole request by returning an 422 error?

latest observation...

Hi!
I have written some sdmx visualisation software to put SDMX data on a map, and one of the things I do is instead of using a drop down menu to select the time, I walk the tree of the data set, and pick out the latest time value, it seems a shame that i have to specify a time period range (for example 2008-2014) and then for each full key, I take only the latest observation for that country and display that on the map... and then discard the rest of the observations...
its not important that I get the latest observation for each full key, I just thought that it would be more efficient to be able to query for the latest observation rather than discarding the previous observations....
this is a minor issue =D
Cheers
JG

Extend the REST API to support query of Categorisation by TargetCategory

Currently SOAP supports Categorisation query parameters on the Categorisation TargetCategory via the CategorisationWhere type but REST does not. This feature would allow a REST API consumer to benefit from the power of the Categorisation in SDMX. Many times the CategoryScheme is used as a base discovery structure and from a chosen member category the navigation will branch to a search. In REST this is not possible. This is a major limitation, as it impacts the clients willing to browse data, using a category scheme.

A path parameter could be added to the metadata queries of the RESTful API. This parameter would accept the id of the item, in an item scheme, to be returned.

For example, currently, the following syntax can be used to retrieve a category scheme:

https://ws-entry-point/categoryscheme/AGENCY_ID/SCHEME_ID/VERSION

In order to retrieve only a particular category (e.g.: ID = XYZ), the following syntax could be used:

https://ws-entry-point/categoryscheme/AGENCY_ID/SCHEME_ID/VERSION/XYZ

Nested ids should be supported:

https://ws-entry-point/categoryscheme/AGENCY_ID/SCHEME_ID/VERSION/XYZ.A.B

When the references parameter is not used, the returned item scheme should only contain the matching item (including the parents in case a nested id is used). The isPartial attribute should be used and its value should be true.

When the references parameter is used, artefacts using/used by the matching item should be returned.

For example, the following would retrieve, along with the matching categoryscheme/category, all the categorisations categorising the matching category:

https://ws-entry-point/categoryscheme/AGENCY_ID/SCHEME_ID/VERSION/XYZ?references=categorisation

As another example, the following would allow to retrieve all the DSDs referencing the supplied concept:
https://ws-entry-point/conceptscheme/SDMX/CROSS_DOMAIN_CONCEPTS/1.0/FREQ?references=datastructure

Add option to query for number of data points to be delivered

In many use cases, knowing in advance the number of observations to be delivered by a data query may be very useful as it allow you to warn the user about "no data available" or excessive information (e.g. greater than a certain threshold) situations without having to wait for the actual data query to be completed.
It worth noting that in most databases, the query for "count" returns the amount of tuples in an invariant time, regardless of the amount of data.
The proposal is to add a new possible value "count" for the "detail=" parameter in the data query. When "detail=count" is specified, the number of observations to be returned by the query is returned.
For the returned message, the simplest way would be to return just the number of observations, regardless of the message format in effect. This has the advantage of being format independent and, as a matter of fact, if somebody is querying for the count, nothing else really matters. Besides, it will be faster, as the writer has no need to format a message. Especially in JSON, there's no need to query for the structural metadata to include.
Otherwise, if the message structure will be kept for the information returned, the count would require the definition of new optional elements in the message header of XML and in the meta of JSON formats to place the count, and some way should be agreed for the CSV data message to return the count.
TF3 (Message formats) should advise on these points.

Add support for retrieving hierarchies or mappings

Since version 1.1.0 of the SDMX RESTful API, it is possible to retrieve individual items within item schemes (such as a particular concept within a concept scheme). This becomes particularly powerful when used in combination with the references parameter, for example to see where a particular concept or code is used.

The functionality described above applies only to item schemes, but there are other types of collections in SDMX, such as hierarchical codelists and structure sets. It would be nice to extend the feature to these types of collections as well, so as to be able to retrieve, for example, a particular hierarchy within an hierarchical codelist.

Add support for retrieving history

It can be a requirement for some SDMX web services to return previous versions of the data, as they were disseminated in the past (“history” or “timeline” functionality). Although it is possible to represent the result set in one SDMX-ML data message, the current version of the SDMX RESTful API does not offer a parameter to activate this functionality.

It is proposed to add a query string parameter, named includeHistory, which accept boolean values and defaults to false.

The returned SDMX-ML data message should contain one or two datasets per data dissemination, depending on whether a dissemination also deleted observations from the data warehouse.

For example, let’s say that, for a particular series, there were, so far, 3 disseminations:

  1. In February, there was the initial dissemination, with 2 periods: 2011-12 and 2012-01.
  2. In March, the decision was taken to delete all observations before 2012 (so, 2011-12). In addition, a new observation has been published for 2012-02.
  3. In April, the value for February has been revised, and the value for March has been published.

In this scenario, it is proposed that the web service returns one data message containing 4 datasets:

  1. The first dataset will contain the data disseminated in February, so 2 observations (2011-12 and 2012-01). The dataset action flag will be Replace.
  2. The second dataset will contain the new data disseminated in March. It will contain one observation (2012-02). The dataset action flag will also be Replace.
  3. The third dataset will contain the deleted data, removed with the March dissemination. It will contain one observation (2011-12). The dataset action flag will be Delete.
  4. The fourth dataset will contain the data disseminated in April. It will contain the revised observation (2012-02) and the new one (2012-03). The dataset action flag will be Replace.

This proposal affects section 07 of the SDMX 2.1 Documentation package and the accompanying wadl file.

Document "informational" status codes

Section 4.8 of the SDMX RESTful specification documents the various error codes that implementers can return and maps them to their respective HTTP status codes.

Informational status codes (such as 200 -OK- or 304 -Not modified-) could also be added, for documentation purposes, as clients are expected to be able to handle these as well.

Document useful HTTP headers

The Content-Negotiation section of the documentation contains information about the HTTP Accept and Accept-Encoding headers. However, there are other useful HTTP headers (such as If-Modified-Since, Accept-Language, Vary, etc.,) and these are not documented. These headers are not SDMX-specific of course, but it could nevertheless be useful for implementers if we at least mention these in the documentation.

Allow slicing of data messages

Today, the SDMX Rest ws standard foresees one way for restricting number of observations through the firstNObservations and lastNObservations URL parameters. But those apply individually to each "series" and not to the whole data message, and are thus not useful when the client needs to restrict the message size, or for chunking the request into smaller pieces that are better manageable by the client.

A nice solution would be provided by the HTTP protocol that includes a way to request only a specific range of data. The related RFC 7233 standard says:

The "Range" header field on a GET request modifies the method semantics to request transfer of only one or more subranges of the selected representation data, rather than the entire selected representation data.

It works in the following way:

  1. The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource, and which types of range units, e.g. bytes.
    Accept-Ranges: bytes

  2. The client uses the "Range" header field in the GET request to indicate the requested subrange, e.g. for the first 1000 bytes (with 0-based indexes):
    Range: bytes=0-999
    And for the next 1000 bytes:
    Range: bytes=1000-1999

  3. The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
    Content-Range: bytes 0-999/1234
    Note: 1234 is the total number of bytes for the complete resource, or
    Content-Range: bytes 0-999/*
    Note: * is indicating that the total number of bytes for the complete resource is unknown by the server, or
    Content-Range: bytes 0-566/567
    Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
    Content-Range: bytes */1234

Also see here and here, for examples of requests for multiple sub-ranges.

However

Since the bytes unit is inappropriate for our purposes, in .Stat Suite we have tested the HTTP Range header approach with another range unit: values. While this works fine in most environments, some users have reported issues in AWS hosting scenarios. Indeed, some cloud hosting services do not allow using range units that have not been registered with IANA,

We therefore decided to use a non-standard/proprietary X-Range HTTP header instead of Range. This worked in all environments. The specification we have used is as follow:

  1. The server indicates with the HTTP header field "Accept-Ranges" that it supports range requests for the target resource and the range unit values.:
    Accept-Ranges: values

  2. The client uses the "X-Range" header field in the GET request to indicate the subrange of the selected representation data, e.g. for the first 1000 SDMX observation values:
    X-Range: values=0-999
    And for the next 1000 values:
    X-Range: values=1000-1999

  3. The server, if the range request is admissible, returns the 206 (Partial Content) status code that indicates that the server is successfully fulfilling a range request for the target resource by transferring one or more parts of the selected representation that correspond to the satisfiable ranges found in the request's "X-Range" header field. It also returns a "Content-Range" header field, describing what range of the selected representation is enclosed, e.g.
    Content-Range: values 0-999/1234
    Note: 1234 is the total number of values for the complete resource, or
    Content-Range: values 0-999/*
    Note: * is indicating that the total number of values for the complete resource is unknown by the server, or
    Content-Range: values 0-566/567
    Alternatively, the server generates a 416 (Range Not Satisfiable) response and a "Content-Range" header field with an unsatisfied-range value "*", e.g.
    Content-Range: values */1234

Note

This ticket should be address together with ticket Allow ordering the data query results by dimensions, since slicing/pagination makes only really sense if the order of the observations is deterministic.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.