Coder Social home page Coder Social logo

sdmx-json's People

Contributors

airosa avatar dosse avatar sosna avatar xd-deng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sdmx-json's Issues

Is the samples DSD correct?

Hi.

I am new to SDMX JSON and have a question regarding the DSD referenced in the samples. Each sample refers to the following dataflow:

https://sdw-wsrest.ecb.europa.eu/service/dataflow/ECB/EXR/1.0

Which corresponds to the following DSD:

https://sdw-wsrest.ecb.europa.eu/service/datastructure/ECB/ECB_EXR1/1.0

If I look at this data-structure we can see that this has a number of Mandatory attributes that are not referenced in the data files. e.g. TIME_FORMAT, COLLECTION, DECIMALS

Has this DSD been updated since the samples were created or did the samples ignore these attributes for simplicity? If there is another DSD that was used and if so can I get a copy of it?

Thanks,
Phil

Attachment of annotations

Issue received on the SDMX users forum:

"In the spec, there is an annotations array, this array can contain multiple annotations,

In sdmx-ml, annotations can be attached to either dataset, series or observation,

In sdmx-json is the fact that an annotation attached to an observation lost in the translation from sdmx-ml to sdmx-json? I can't see any way to put in the sdmx-json that the annotation is attached to an observation ?? all the annotations are grouped together in the array with no field on the observation to denote that a particular annotation is associated with that observation ?

Thanks in advance for clearing things up for me.."

Improve documentation for errors

The documentation should clearly state that, although the error types are fixed, the error messages are fully customizable by the service providers.

document identifier for json document

Hi!
do you think you could include an identifier for the document type and version of sdmx-json inside the document... this is so that i can take a string and interrogate it to check if it is a sdmx-json object, without parsing and checking the entire contents... just something simple to distinguish between data & structure, and (possibly) distinguish between sdmx-json version..
Thanks!
JG

Errors and omissions in the full example with comments

The commented full example from the documentation contains few simple typos (e.g. observations open with a curly bracket but end with a square bracket) and therefore is not valid JSON (even if the comments are removed). It is also lacking the any examples of links.

Inconsistency in 'attributeRelationship' definition in structure message

On behalf of Kyle from MT:

Hello there,
I am currently creating JSON readers and writers for the SDMX-JSON standard defined on https://github.com/sdmx-twg/sdmx-json/blob/develop/structure-message/docs/1-sdmx-json-field-guide.md
Now, I am reading the schema as I am doing this, and have noticed an issue that is preventing me from continuing, this issue is on the Attribute type, part of the DSD.
The issue at hand is on the 'attributeRelationship' type. This type, in the Schema says that everything, except for 'none' is a URN. so the Group is the URN to the group:

image

This is the same for the dimensions and the PM.
Now, the official documentation backs this up by saying that it is URNS:
https://github.com/sdmx-twg/sdmx-json/blob/develop/structure-message/docs/1-sdmx-json-field-guide.md#attributerelationship
'One or more URN references to (a) local GroupKey Descriptor(s)'
'One or more URN references to (a) local dimension(s).'
'Urn reference to a primary measure locally' etc...
HOWEVER, the example given seems to actually use the ID of the dimension, Group and PM, this makes sense i guess as I do not see much point in specifying the FULL URN for a local object like the URN or the group (but maybe this is provision for a future development where you can cross reference dimensions).
Currently, we use ID's like the given example.
Can i clarify what one is right?

Is it the URN of the Group, Dimension, PM, etc... or is it the ID of the objects?

Many thanks,
Kyle

Issue with handing of roles in the SDMX-JSON schema?

Since SDMX 2.1, component roles are handled by linking to concepts defined in the SDMX cross-domain concepts.

This can be seen in the SDMX-JSON sample files, via the role attribute attached to some of the dimensions and attributes. For example, there are 4 roles defined in samples/exr/exr-times-series.json.

However, the SDMX-JSON schema appears to limit the allowed roles to time and measure, which does not seem to be in line with the handling of roles in SDMX 2.1. As a side note, it also means that, currently, validation of the sample files fails.

Allow links also in components

Currently only the three top-level objects (header, structure, dataset) allow links. Links could well be useful also in other objects inside the structure. Particularly components (dimensions and attributes) could benefit from links to code lists etc. Links could also be added to all objects in the message so that everything in the message would be "linkable".

Define Registration Message

To complete support for JSON in SDMX, please consider adding the Data Registration as a SDMX JSON format.

Please see here for an example of a SDMX Data Registration

With an example here in JSON format

Correct references to ECB web services

The field guide and the sample files sometimes reference the previous version of the ECB web services. This previous version was written before the realease of the SDMX 2.1 RESTful API and therefore, the links used in the documentation won't work.

datamessage/samples/exr-time-series.json: missing id in values of series level attribute 'TITLE'

It seems that the 'title' attrib values lack the ID's from the underlying codelist. See the following snippet taken from the file:

"dataSet": [],
"series": [
{
"id": "TITLE",
"name": "Series title",
"role": "TITLE",
"values": [
{
"name": "New Zealand dollar (NZD)"
}, {
"name": "Russian rouble (RUB)"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"role": "OBS_STATUS",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
}

ID missing from schema (contact)

On the JSON schema (https://raw.githubusercontent.com/sdmx-twg/sdmx-json/develop/structure-message/tools/schemas/1.0/sdmx-json-structure-schema.json) under the contactType it defines all the properties for things like telephone. x400, etc... But does not define ID

Contact is not identifiable, but does contain an ID, the mistake might have been assuming that anything with an ID will inherit the attribute from identifiable type, but contact doesn't do this and explicitly defines it on the element.

The samples provided does give an ID.

Is the schema correct and ID has been removed, or is it just a mistake?

SDMX-Json data messages DataStructure/Dataflow reference/URN

Hello,

For SDMX Json datasets. Would it possible to have a mandatory property under structure or dataSet object containing the unique identification of the Dataflow or DSD in some form ? E.g. a reference or a URN.

It could be useful for cases where we need to know the DSD and either

  • A link to a dataflow or datastructure might not exist
  • A link to a dataflow or datastructure might not be accessible
  • There might be more than one links to dataflow and/or datastructure.
  • In case the link is not accessible the URL might not be fully SDMX REST compliant or not easily parsed.
  • For some reason we need to use the DSD/Dataflow from a specific source.

Disclaimer: To avoid any misunderstandings. Opinions expressed here are solely my own and do not express the views or opinions of anyone else.

Only numbers are allowed as observation values

According to the SDMX-JSON schema, only numbers are accepted in the observation array.

This looks odd as, typically, this should be driven by the DSD. For generic schemas (i.e. schemas not derived from a DSD), the convention so far was to use the most generic type, i.e. string. This is for example what is done in the SDMX-ML Generic Data schemas:

<xs:complexContent>
    <xs:restriction base="BaseValueType">
        <xs:attribute name="id" type="common:NCNameIDType" use="optional" fixed="OBS_VALUE">
            <xs:annotation>
                <xs:documentation>The id attribute contains a fixed reference to the primary measure component of the data structure definition.</xs:documentation>
            </xs:annotation>
        </xs:attribute>
        <xs:attribute name="value" type="xs:string" use="required"/>                                             
    </xs:restriction>
</xs:complexContent>

Is the intention to limit SDMX-JSON to dataflows where OBS_VALUE must be a number or is this a bug in the spec?

Non-coded attribute values in SDMX-JSON data message

The specification needs to clarify or to be enhanced about the way in which non-coded attribute values are to be returned in the SDMX-JSON data message.
Possibilities;

  1. using the id property of the component value --> issue: it is confusing to use id for something that is not coded
  2. using the name/name(s) properties of the component value --> issue: non-coded attribute values are non-localised and it is confusing to receive them in localised properties

Note: The SDMX-ML message uses value for non-coded attribute values in data messages.

Null values for attributes not allowed by SDMX-JSON schema

null is not allowed by the SDMX-JSON schema, neither for the array of attribute values nor the observations.

This is in violation with the field guide, which indicates that null should be supported, such as in the example below, copied from the guide:

{
	"action": "Information",
	"reportingBegin": "2012-05-04",
	"reportingEnd": "2012-06-01",
	"validFrom": "2012-01-01T10:00:00Z",
	"validTo": "2013-01-01T10:00:00Z",
	"publicationYear": "2005",
	"publicationPeriod": "2005-Q1",
	"links": [
		# links array #
	],
	"annotations": [ 3, 42 ],
	"attributes": [ 0, null, 0 ],
	"series": {
		# series object #
	}
}

As can be seen there, the attributes property allows null to be used: "attributes": [ 0, null, 0 ]

I set the severity to major, as it's affecting the normative part of the spec (i.e. the schema).

Regular expressions for fields

Hi!
Do you think you could include the regular expressions that field types should adhere to....
and which fields the regular expression applies to;
the main ones are
Type Regex Used In
IDType [A-z0-9_@$-]+ (data elements, IDs)
NestedNCNameIDType [A-z][A-z0-9_-](.[A-z][A-z0-9-])* (AgencyID)
NestedIDType [A-z][A-z0-9
-](.[A-z][A-z0-9-]_)* (containerId)
VersionType [0-9]+(.[0-9]+)* (version)
ids can be "all" as well, version can be "*" or "latest"

Also return original AttributeRelationship (attachment level) in data message

In order to allow displaying an intelligent data table with the related attributes at the right position (table header, rows, columns or cells) it would be necessary to know the original attachment level of attributes, because they are so far replicated at series or observation level.
Today an additional request for the structure is necessary.

The proposal is to add a "relationship" property to the "component" object (for attributes only) in which the relationship is specified: None, (which) Dimensions + (which) AttachmentGroups?, Group, PrimaryMeasure (=all dimensions).

JSON structure: array VS generic members

Hello -
when attempting to parse the returned data format (members of "dataSets", see exr-time-series.json), I noted that neither series nor observations are arrays. Is there a rational explanation for this design choice?

I.e. instead of

{ "dataSets": [
        {
            "action": "Information",
            "series": {
                "0": {
                    "attributes": [0],
                    "observations": {
                        "0": [1.5931, 0],
                        "1": [1.5925, 0]
                    }
                },
                "1": {
                    "attributes": [1],
                    "observations": {
                        "0": [40.3426, 0],
                        "1": [40.3000, 0]
                    }
                }
            }
        }
    ]
}

the format could try to avoid using index numbers as keys

{"dataSets": [
    {
        "action": "Information",
        "series": [
            {
                "id": "0",
                "attributes": [0],
                "observations": [
                    {
                        "id": 0,
                        "values": [1.5931, 0]
                    },
                    {
                        "id": 1,
                        "values": [1.5925, 0]
                    }
                ]
            },
            {
                "id": "1",
                "attributes": [1],
                "observations": [
                    {
                        "id": 0,
                        "values": [40.3426, 0]
                    },
                    {
                        "id": 1,
                        "values": [40.3000, 0]
                    }
                ]
            }]
    }]
}

As a consequence, this would allow generating a class pattern to match the structure of the dataSets Array, e.g.

case class Obs(id: Int, value: Seq[Double])
case class Series(id: String,
                  attributes: Seq[Int],
                  observations: List[Obs])
case class Dataset(action: String, series: List[Series])
case class DatasetParam(dataSets: List[Dataset])

using circe, the returned JSON object could be parsed using a derived decoder:

object DatasetParam {
  implicit val decodeDatasetParam: Decoder[DatasetParam] =
    deriveDecoder[DatasetParam]
}
io.circe.parser.decode[DatasetParam](jsondoc).foreach(println)

Browse by Topic promotes out of date workflow

browse-by-topic.md Query 2: Find data in the selected dataflow, using concept filters

This step promotes using the structure query for dataflow and referencing constraints to build up a picture of what data is available. This mechanism is out of date and should not be the promoted mechanism to support this use case. The mechanism to support his use case is via the Data Availability API which is part of the SDMX REST query specification and documented in following places:

https://github.com/sdmx-twg/sdmx-rest/blob/master/v2_1/ws/rest/docs/4_6_1_other_queries.md
https://github.com/sdmx-twg/sdmx-rest/wiki

The data availability mechanism was put in place to solve a number of issues with the currently documented process. This is mainly that the REST API for structures does not distinguish between a constraint which is used to restrict data reported vs data that exists – therefore the response to the currenlty documented process may contain much more information then what is required by the client, leaving it up to the client to pick through the response to interpret it. In addition as the structure API tends to return fixed structures, which are maintained by a user, it was deemed better to have a separate API which could return dynamically created structural metadata based on the data that exists, i.e the constraint does not really exist and is not physically maintained, it is created based on what data exists – this allows for dynamic behaviour, such as asking the question ‘what Countries have data for this dataflow’ or ‘If I select this Frequency, what Indicators are valid’.

Therefore, this document should be modified to promote the API which was created specifically to solve this use case.

Document usage of relative URIs in the response messages

Documentation should clearly state that relative URIs are allowed and explain the impact of providing relative URIs in the response. For example SDMX-JSON files cannot be used for archiving purposes anymore (standalone docs).

Example of a relative URI:

"links": [
  {
    "href": "/ws/data/ECB_ICP1/M.PT.N.071100.4.INX",
    "rel": "request"
  }
]

Improve documentation for concept roles (geography)

The documentation about the concept roles could to be improved and the emphasis on the geographical dimension could be stronger (very popular use case). Current documenation is the following:

String nullable. Defines the component role(s), if any. Roles are represented by the id of a concept defined as SDMX cross-domain concept. Several of the concepts defined as SDMX cross-domain concepts are useful for data visualisation, such as for example, the series title, the unit of measure, the number of decimals to be displayed, the reference area (e.g. when using maps), the period of time to which the measured observation refers, etc. It is recommended to identify any component that can be useful for data visualisation purposes by using the appropriate SDMX cross-domain concept as role.

Here is a direct link to the latest cross-domain concepts from the Global Registry:

https://registry.sdmx.org/ws/rest/conceptscheme/SDMX/CROSS_DOMAIN_CONCEPTS/LATEST/?detail=full&references=none&version=2.1

annotations, how to attach to dataset, series, observation

Hi!
I have never actually seen an sdmx-ml file with annotations on the dataset, series or observation, even though they all extend from AnnotableType, I have seen lots of KeyFamilies and DataStructureTypes with annotations on them. I was wondering what the method was for determining which object the annotation is attached to (dataset,series,obs, not possibly keyfamily)?
Thanks

Improve documentation on extending objects with custom fields

The JSON objects in the specification are extendable (similar to the XML formats where other namespaces are allowed). This is implied in the JSON Schema but not mentioned in the specification. In practise this would mean that the response may contain custom fields that are not part of the specification.

For example the following error object with a custom field would be valid:

"errors": [
  {
    "code": 150,
    "message": "Invalid number of dimensions in the key parameter",
    "wsCustomErrorCode": 39272
  }
]

Simplify DSD Dimensions

After agreeing on keeping one way of defining Dimensions within a DSD (see here) we need to specify the Dimension definition in JSON.

In addition, managing the dataset formats that will be available has to be decided. Considering the deprecation of the TimeDimension, it seems that the time-series specific dataset formats are no longer relevant. We need to assess if this affect the JSON dataset.

Bugs in the id of 2 schema files

Json format proposal for CategoryScheme and Categories

In this issue I would suggest a Json format for CategoryScheme and Categories:

Like AgencyScheme or Categorisation: CategoryScheme and Categories will be added to the "references" object.

Compared to the xml format: having the hierarchy of Categories represented nested I suggest a flat representation with a 'parent' link to establish the relationship between categories.

Here is an example of the parent relationship inside a Category:

"urn:sdmx:org.sdmx.infomodel.categoryscheme.Category=ESTAT:ESTAT_DATAFLOWS_SCHEME(1.2).14.14200": {
"id": "14200",
"name": "Normalisation of information systems",
"urn": "urn:sdmx:org.sdmx.infomodel.categoryscheme.Category=ESTAT:ESTAT_DATAFLOWS_SCHEME(1.2).14.14200",
"parent": {
"urn": "urn:sdmx:org.sdmx.infomodel.categoryscheme.Category=ESTAT:ESTAT_DATAFLOWS_SCHEME(1.2).14"
}

}

Full example:
From the source file: CATEGORY_SCHEME_ESTAT_DATAFLOWS_SCHEME_annotations.xml.txt
We will get: CATEGORY_SCHEME_ESTAT_DATAFLOWS_SCHEME_annotations.xml-output.json.txt

Migrate the documentation to readthedocs.org

Decision has been taken to migrate the documentation to readthedocs.org.

We should take this opportunity to consolidate and improve the current documentation.
This repository will details the JSON format part of the full documentation glued together in sdmx-im.

Annotations as object

Feedback from our web developers:
"Would it be possible to change the format of the « annotations » in structure messages to make it an objet? This is especially for performance reasons, since with an object you can access directly the needed content, while with an array you need to first run through it until you find the target."
annotation_object

How to determine the dimension order when keyPosition is missing for a particular dimension?

Hi,

the field guide says about the keyPosition field:

"Number nullable. Indicates the position of the dimension in the key, starting at 0. This field should not be supplied for attributes and it may also be
omitted for dimensions. This field could be used to build the "key" parameter string (i.e. D.USD.EUR.SP00.A) for data queries, whenever the order of the
dimensions cannot easily be derived from the structural metadata information available in the data message."

This definition suffers from obvious weaknesses. It grants data providers way too much room for discretion to the extent that keyPosition may be NULL or missing for any dimension. There is no clear-cut rule when it is NULL or missing. And why should it be NULL or missing in the first place? Minimizing file size cannot seriously be invoked as an argument.

What’s the point in establishing all these cases and needs for distinctions for client implementors to handle whereas the DSD forceably specifies the position for every dimension? And what does “easily derived” mean?

Aside from key Validation as mentioned in the above excerpt, data consumers should benefit from Client Software which preserves Dimension order when exporting data to analytic Tools such as R, Pandas or Excel. This use-case should be mention in the field guide as well.

To give an example: The test data files exr-cross-section.json omits the keyPosition for the TIME_PERIOD dimension. How can client implementors correctly determine the position of TIME_PERIOD needed to properly export a dataset comprising dimensions at dataset, series, and/or observation level, or validate a key based on a structure-only DataMessage? Well, there could be some heuristics such as the exclusion principle. But what if more than one dimension has no keyPosition? Is the “easy derivation” criterion not fulfilled in this case as derivation would be somewhat difficult?

All this is unclear and subject to speculation and uncertainty. Having worked on a variety of standards, I doubt that things should be this way.

As the author of pandaSDMX, an SDMX client for Python, I struggle with all this ambiguity and humbly suggest to rethink it.

I am happy to prepare a PR for this once there is sufficient Support for making keyPosition mandatory and non-nullable for every dimention, and forbid it for attributes. Even the latter Point is not clearly stated ("should").

Leo

Are the observations in the cross-section JSON example correct?

In the cross-section JSON example:

https://github.com/sdmx-twg/sdmx-json/blob/master/data-message/samples/exr/exr-cross-section.json

The attributes at the observation level are presented in the order: TITLE and OBS_STATUS (lines 102-123). So TITLE is index 0 and OBS_STATUS is index 1. Title has 2 values (NZD and RUB) whereas OBS_STATUS only has 1 (A).

The observations (lines 134-135) show that the 2nd attribute is incrementing rather than the first. Is this a mistake of have I made a mistake in my understanding of the observation section?

properties for language used in structure and possibly data SDMX JSON messages

In this ticket it is is proposed to have new optional properties for indicating the message default language and any alternative language used for specific text.

SDMX JSON Messages contain only one name/description per nameable and annotation but it does not contain:

  1. The default language used for all text in the document could be used in cases where Content-Language information is not available or contains multiple languages. This property could for example be added under header.
    e.g.
    "header" : {..., "defaultLanguage" : "fr", ... }
  2. When a text was not available in the language from (1) then another (optional) property could be added at nameable objects (DSD, Concept, Code) and/or annotation. This property could either indicate that no text exists because it was not available or the alternative language that was used. E.g.

"name" : "Frequency", alternativeLanguage: { "name" : "en"}
or
"name" : "FREQ Stub", alternativeLanguage: { "missingText" : "name"}

To avoid using such property in all codes of a possibly big codelist the property could also be set once :

"name" : "Frequency", alternativeLanguage: { "name" : "en", "child" : "de" }, "values": [ { "id": "M", "name": "monatlich" } ]

Restrict type in links to media types

It could be useful to restrict type in links to media types. Effectively same as in RFC5988:

The "type" parameter, when present, is a hint indicating what the
media type of the result of dereferencing the link should be. Note
that this is only a hint; for example, it does not override the
Content-Type header of a HTTP response obtained by actually following
the link. There MUST NOT be more than one type parameter in a link-
value.

For example following would be valid:

"links": [
  {
    "href": "http://www.myorg.org/ws/data/ECB_ICP1/M.PT.N.071100.4.INX",
    "rel": "request",
    "type": "application/json"
  }
]

Following would be invalid:

"links": [
  {
    "href": "http://www.myorg.org/ws/data/ECB_ICP1/M.PT.N.071100.4.INX",
    "rel": "request",
    "type": "some random text"
  }
]

Add support for links and HATEOAS

It has been proposed to add support for links with predefined semantics in SDMX-JSON.

The core idea is that a service offers links with predefined semantics and the clients are then free to resolve the links if they understand the semantics.

In SDMX-JSON, it was proposed to add a link object with href and rel as mandatory attributes, title and type as optional attributes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.