WIS2 Topic Hierarchy
- View drafts: https://wmo-im.github.io/wis2-topic-hierarchy
WIS2 Topic Hierarchy
Home Page: https://wmo-im.github.io/wis2-topic-hierarchy
License: Apache License 2.0
WIS2 Topic Hierarchy
GET request that is used to check existence of a registry/code ends with the following error:
HTTP Status 415 – Unsupported Media Type
Type Status ReportMessage Unsupported Media Type
Description The origin server is refusing to service the request because the payload is in a format not supported by this method on the target resource.
As discussed, I think that maintenance (GitHub), publication (WMO Codes Registry) and validation (Broker) of the subcategories shall apply to core data only. Subcategories associated with recommended data are not in scope.
If we are in agreement, then we should clearly document this.
However, there are use cases where a particular domain might want to define subcategories for certain types of recommended data, e.g. NWP highly-recommended, and we should consider options for supporting the normalization of these codes in a feasible manner.
For testing purposes it would be valuable to have a mock country and centre-id.
Initial proposal of:
Having the version in the top of the tree is a problem because a change anywhere in the topic hierarchy and requiring an increase of the version number will force all topic publishers to update their configuration.
See below an example where it is decided to rename Ocean into "Marine" a one of the sub domains. In that case the version would be incremented from a to b and all messages publishers would then be required to update their destination topic configuration at the same time because the topic paths would not be valid anymore.
This means that potentially an centre publishing only messages in the hydrology domain would have to update its topic paths because the Ocean sub tree has been modified. A lot of messages consumers could also be impacted.
Here is an example with the original tree:
Here would be the modified tree with Marine:
Because the version is incremented and at the top of the tree, all publishers in order to publish messages in their correct new topics will have to update their configuration (their topic paths have changed).
This is like an unnecessary big bang approach when only the publisher (and consumers) impacted by the change should make a modification of their systems.
My proposal would be to remove the version number from the topic hiearchy. The changes and version number should be managed outside of the tree implementation and announced/communicated on the main WIS2 web presence (Global Catalogue, the web site around the Global Broker if there is one). Then a parallel phase where both old and new trees will be supported by the Broker for a given period and then the dead branch removed at the end of the parallel message distribution.
Cases to manage are:
Below is an example where core would be removed from the tree and all its sub-topics reattached to data.
Here is the original tree with the new structure (sub-topics attached to data) maintained in parallel.
Then the core topic is deleted at the end of the parallel period:
In that case all the publishers not impacted by the tree branch change can continue publishing without changing there configuration.
Did I forget something ? Thoughts ?
@tomkralidis @jsieland @amilan17 @josusky @antje-s @solson-nws ....
In general it seems to me that we are going to have a lot of surprises with the topic hierarchy when implemented and we should not make it yet part of any "standards" or official WMO paper from which it will be very difficult to change it afterward.
This should be done once the Pilot or frist implementation phase has been completed.
Through the development of sub-discipline categories many questions are arising and I think that the domain specific communities would benefit from more guidance on TH rules for the Level 9+ values.
https://github.com/wmo-im/wis2-topic-hierarchy/wiki/Change-Management.
https://github.com/wmo-im/wis2-topic-hierarchy/wiki/Business-rules
Other Related issues:
wmo-im/wis2-guide#38
#21
As discussed at TT-WISMD 2024-04-22 creating a crosswalk from AHL to WTH topic/subtopic. The output should be a configuration (CSV, etc.) that will be added to WTH publication workflow and release artifacts.
@efucile / @antje-s to provide an update / first pass for TT-WISMD 2024-06-03.
cc @wmo-im/tt-wismd
As discussed at TT-WISMD 2023-09 face-to-face, as well as W2AT 2023-09-18
WTH levels 4/5 describe country
and centre-id
as follows:
country
: Lower case representation of ISO3166 3-letter code. Includes extensions for partner organizationscentre-id
: Acronym as specified by member and endorsed by the PR of the country and by WMOcountry
Remove country
and define the centre-id
on reverse hostname notation (starting with TLD) into a single compound level. In other words, the citation authority based on the Internet domain name of the issuing centre.
W2AT 2023-11-14:
In the context of monitoring and metrics, it was decided that reports/metrics/alerts would occur in a WIS2 system message bus.
As a result, it was decided to remove report
from the notification-type level of WTH.
already in use on WIS2:
origin/a/wis2/hk-hko-swic/data/core/weather/advisories-warnings
Noting that it will remain flexible to change the topic hierarchy during the WIS2 pilot phase, TT-NWPMD meeting (2023.06.13) agreed to submit the current draft of topic hierarchy to TT-WISMD for integration into the main WIS2 topic hierarchy.
The topic hierarchy of NWP is here https://github.com/wmo-im/tt-nwpmd/tree/main/
I hope that I am misunderstood something and that it should be resolved easily by updating my understanding of the WIS architecture but I have a couple of point to raise on the topic hierarchy.
I have been looking at the WIS2 topic hierarchy structure which is meant to be built for helping users finding datasets and filtering the data topics per subject. Thinking of it and how it could be implemented, it looks to me that its complexity will be a very large barrier to entry or it could lead to having users completely ignoring it.
Another point is that the topic hierarchy could lead to the implementation of a very complex system for the main broker reflecting the entire hierachy and in addition maintaining good performances could be extremelly challenging.
Below are the points that I have been trying to develop:
Large Discovery/Domain information in the topic hierarchy will be counter-productive in helping user understanding what data is available and how to find relevant data for users
A quick calculation taking the 8 first levels and assuming that we have around 195 countries and 20 centres per country in average (which is probably below the real number).
I end-up to 2x1x1x195x20x4x2x8 = 499200 branches for the 8 first levels and for the total tree taking 3 level of 5 sub discipline each: 2x1x1x195x20x4x2x8x5x5x5 = 62.400.000 topics. The assumption taken might be too large but reducing the problem by a factor 100 will lead to the same conclusion.
From the discovery/usability point of view, this is a large obstacle for users if the intention is to have them understanding the topic hierarchy and use it to find the data they are interested.
Users will most probably not find their way and might simply use + or # wildcards at many levels to receive some data.
They could then be overwhelmed by the number of messages received and the main brokers could be overloaded by such queries and the number of clients subscribing to many topics.
This is why I am questioning, the purpose of providing so much semantic and discovery information in the topic hierarchy and making it so deep.
Additionally, if the intention is to help users understanding what data is available why do we have 8 levels of technical (version, WIS2) and political information before the domain information ?
At least the topic hierarchy should be reversed but in my opinion, mostly simplified.
If the answer to the interrogations above is that the catalogue will provide the discovery services to find the data then there is no need to create such a complex topic hierarchy structure that will make the implementation very complex and challenging for the users.
Potential performance issues and challenges for implementation
Another point is performance of a system that will have to replicate and manage for distribution 62 Millions topics with some topics having a very high distribution frequency. This means that it is certainly leading to the implementation of a large scale system and tests of that scale should be performed to assess that the products on the market (HIVEMQ, RabbitMQ, Mosquitto, Amazon MQTT service) can cope easily with such scale.
It should also be noted that this complex hierarchy forces users to use wildcards (+, #) which will make the system to be created, even more demanding in term of resources (need of tables in memory, on disc, databases to resolve the wild cards and maintain the multi subscriptions or thousands of users).
Proposal for a way forward
I would propose to re-think the topic hierarchy and go back to the initial requirements:
How the topic hierarchy should be organise to focus on such requirement ?
Here are some leads that could help solving the issue and not leading to a difficult full scale implementation:
Another proposal would be to implement a large scale prototype simulating the load and number of topics to be created and reflected on the main brokers.
What do you think ? Comments ?
Level 3 of the topic hierarchy is named wis2
and defines a fixed value of wis2
.
Should we consider renaming the name of the level (not the fixed / single value of wis2
) itself to something like project
or system
for some "future proofness" (essentially renaming topic-hierarchy/wis2.csv
to topic-hierarchy/project.csv
topic-hierarchy/system.csv
, or something else)?
proposal circa Feb 2024, see branch
According to issue #1 , ECMWF will be associated with RAVI, but we need to have the centre id in the topic hierarchy also.
(from @golfvert; manually moved from https://github.com/wmo-cop/wis2-topic-hierarchy/issues/1)
According to https://www.iso.org/glossary-for-iso-3166.html
User-assigned codes - If users need code elements to represent country names not included in ISO 3166-1, the series of letters AA, QM to QZ, XA to XZ, and ZZ, and the series AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ respectively, and the series of numbers 900 to 999 are available.
NOTE: Please be advised that the above series of codes are not universal, those code elements are not compatible between different entities.
for Partner Organisation without a Country Code (eg. ECMWF, EUMETSAT, ESA,...) we can choose a 3-letter acronym in the list of user-assigned.
XPO seems to be used by Interpol (XPO is used for Interpol travel documents) so, what do we choose APO ? AAA ? ZZZ ? and then that would mean (eg) ZZZ.ECMWF in the topic tree.
See standard/sections/clause_7_normative_text.adoc
line 19:
The representation is encoded as a simple text string of values in each topic level separated by a /. For example, origin/a/wis2/data/ca-eccc-msc/core/weather/surface-based-observations/synop or origin/a/wis2/data/ca-eccc-msc/recommended/atmospheric-composition/experimental/space-based-observation/geostationary/solar-flares.
Should be centre-id
before notification-type
Currently the country code is in upper case
https://github.com/wmo-im/wis2-topic-hierarchy/blob/main/topic-hierarchy/country.csv
Should we change to lower case for the topic values to have everything in lower case or not?
WTH should provide guidance (requirements or recommendations) on how centres should craft their centre-id value. Examples:
my-centre-wis2node
)cc @golfvert
As I understand it so far,
The questions are:
These questions arose during the NWPMetadata workshop in January.
As discussed at TT-WISMD 2023-09-13, rename https://github.com/wmo-im/wis2-topic-hierarchy/blob/main/topic-hierarchy/resource-type.csv to https://github.com/wmo-im/wis2-topic-hierarchy/blob/main/topic-hierarchy/notification-type.csv given the clash of the "resource-type" concept with WCMP2 codelists. This is a working level change to the inner workings of the CSV files here on GitHub.
Thinking more, I'm thinking we should rename to simply type.csv
. Thoughts?
cc @gaubert @jsieland @antje-s @josusky @solson-nws @Amienshxq @McDonald-Ian @amilan17 @david-i-berry
We need to figure out how the Topic Hierarchy will be represented in the WMO Codes Registry, the format of the TH for other applications, e.g. the GB and GDC validation....
As discussed in https://github.com/wmo-im/tt-wismd/wiki/Meeting-2023-06-22#actions, define how WTH will be documented (loosely related to #26).
The WIS2 Global Registry manages WIS2 services from members, part of which includes centre-id definitions.
As part of WTH and WIS2 development, WTH manages centre-id's in https://github.com/wmo-im/wis2-topic-hierarchy/blob/main/topic-hierarchy/centre-id.csv
The intention of #129 attempts to sync GR, however we can see some inconsistencies (Description wording, etc.)
We need to establish a clear working level workflow that results in consistent and quality centre-id entries with a single source of truth which is synchronized accordingly.
(TBD whether principles are put forth in the Guide or internal workings of GR).
<centre name>,<global service type>
, for example: Deutscher Wetterdienst (Germany), Global Cache Service
For @wmo-im/tt-wismd discussion.
Levels 1 - 7 are organized as flat CSVs, but at some point the sub-topics will need to branch off. For discussion, below are some screenshots of the current organization and some alternative options.
During the TT-NWPMetadata there were questions about how the versions of the topic hierarchy are managed. We should have it documented and also discuss with architecture team.
updated: 31 May 2023
...hope it is ok to ask directly here.
Should the value for version not be adjusted to "v04" instead of "a"?
see https://github.com/wmo-im/wis2-topic-hierarchy/blob/main/topic-hierarchy/version.csv
use case:
ideas:
Level 4 (country) is currently defined as:
Lower case representation of ISO3166 3-letter code. Includes extensions for partner organizations
We need to clarify the role of the country in the documentation so that the topic hierarchy's county level is clearly defined to users. Options:
It is critical to finalise the topic hierarchy for SYNOP and TEMP to be able to make the wis2box usable by the NCs. This is the primary type of data they are exchanging.
W2AT 2023-11-14:
experimental
foreach earth-system-discipline-subcategory
, where subtopics are not to be validated by WIS2 global servicesIn WIS-Guide a monitor topic is mentioned, e.g. under 2.7.3.1 "...Global Broker will not discard the message but will send a message on the monitor topic hierarchy to inform the originating centre and its GISC." and under 2.7.4.1 "...Global Cache decides not to cache data it should behave as though the cache property is set to false and send a message on the monitor topic hierarchy to inform the originating centre and its GISC.". Should we add the monitor value to WTH even if it is a separate subtree, so that everyone is aware of it and for clarity?
Currently the notes states that it would be "Alphabetic version of the topic hierarchy", but originally it was for the version of the message format - as far as I know. I think it would be good to use the version of message format, because then you have a possibility to introduce a new message format and all consumers can switch independently
As discussed with @golfvert, we need to clarify whether "partial" topics are deemed valid and can/should be used or not.
For example, origin/a/wis2/ca-eccc-msc/data/core/weather/surface-based-observations/synop
is a valid topic.
Should origin/a/wis2/ca-eccc-msc/data/core/weather/surface-based-observations
be considered a valid topic as well? This means a topic without a leaf?
We would need to update the specification to be clear in this regard (and whether Requirement 1B needs to be updated/augmented. In addition, we would need to update the artefacts made available on schemas.wmo.int for the WTH CV bundle/lookup.
Ensure consistency of WTH URLs and resources outside of the Manual on WIS. Below are lists of what exists today in WTH.
Dear colleagues, when looking at the centre-id.csv I noticed that at first it seems to separate hierarchy levels by hyphens, e.g. for DWD:
de-dwd
: <country>-<institution>
But then it doesn't seem the case such as in
de-dwd-gts-to-wis2
where the last 3 items seem to be one name, but suggest further hierarchy levels via the hyphens.
or in
fr-meteo-france
: after the country, only "meteo" would be the institution when machine-parsing with a hyphen as separator.
As far as I understood, the scheme urn:wmo:md:{centre_id}:{local_identifier}
offers the opportunity to parse the origin of a dataset without opening it. In the examples above, hyphens as hierarchy level separators are mixed with hyphens as part of names. That will make automatic parsing of the data source ambiguous.
Best regards,
Hella Riede (DWD)
Currently, we have a GitHub Actions CI that uses pywcmp's bundle workflow to publish a first pass/working level JSON file of all topics.
There are a few issues with this workflow:
Assigning to @antje-s and @josusky; additional help is welcome (cc @wmo-im/tt-wismd)
It needs to be clear that the 'country' is for the location of the data center, not the geographic location of the dataset.
requested by Simon Elliott (EUMETSAT)
As part of transition from GTS to WIS2, it is important to be able to clearly articulate what data is coming from GTS vs. WIS2.
Right now this content is just a table in the readme
Level | Name | Notes |
---|---|---|
1 | channel | Location of where the data originates from (data providers [origin ] or global services [cache ]) |
2 | version | Alphabetical version of the topic hierarchy |
3 | system | Fixed value of wis2 for WIS2 |
4 | country | Lower case representation of ISO3166 3-letter code. Includes extensions for partner organizations |
5 | centre-id | Acronym as specified by member and endorsed by the PR of the country and by WMO |
6 | resource-type | WIS2 resources types (data , metadata , reports [from monitoring activities]) |
7 | data-policy | Data policy as defined by the WMO Unified Data Policy. core data are available from the Global Caches with open access on a free and unrestricted basis. Notifications for core and recommended data are available by subscription to Global Brokers. recommended data are downloaded from the original NC/DCPC and may require authentication/authorisation |
8 | earth-system-discipline | As per Annex 1 of resolution 1 Cg-Ext-2021 |
9 | earth-system-discipline-subcategory | As proposed by domain experts and further approved by INFCOM |
Add all non-commercial with a status of operational, standby or commissioning as defined in the OSCAR Satellite DB using the following topics.
satellite-topics-final-draft.xlsx
* weather > space-based-observations > orbit-type > sensor-type > satellite-name > data-type
* space-weather > space-based-observations > orbit-type > sensor-type > satellite-name > data-type
There is a constraint between countries and centre-id's where one country may have 0..n associated centre-id's.
We need to represent this relationship to prevent possibilities such as:
origin/a/wis2/usa/eccc-msc/...
We need to represent this relationship to be able to map centre-ids to countries.
One option can be adding a parent
column to centre-id.csv
would help express this constraint between these two levels.
W2AT 2023-11-14:
Action to add to specification as a requirement.
Proposed initial topics for climate domain (and mapping of GTS headers, see #147).
name | description | source |
---|---|---|
monthly | Monthly values from land stations, e.g. CLIMAT | |
daily | Daily values from land stations, e.g. DAYCLI | |
sub-daily | Reprocessed hourly and other sub-daily observations from land stations |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.