Add more precision when defining “what is being measured” in https://github.com/wmo-im/wmcp/blob/master/KPI/A.adoc such that a developer can take the rules and implement it.

TT-GISC is going to have a meeting in Offenbach 26-29 August 2019. It is very important that the activity of defining KPIs for the metadata is done in coordination with TT-GISC. We should finalise a first draft and present it at the meeting with the aim to decide how TT-GISC will act on the KPIs proposal.
We should also decide if we are proposing KPIs or guidance. At the moment my impression is that we are not developing the KPIs with a metric that will allow to measure the status of the metadata and their improvement.

KPIs Definition

Please post a message here once you have updated your assigned KPI.
Please feel free to add additional KPIs.

Study discovery APIs to complement SRU

This was related to a presentation by Tom regarding OGC CSW and related developments on OpenAPI.

Define the KPI B Good quality title

Use of github for the team

@wmo-im/tt-wismd I started this issue just because I am trying to find a good way to apply an agile methodology to our working practice.
I have created projects to group the task and have an agile board. This will make more visible the tasks and we can also add milestones. Please have a look in the projects tab.

Define KPI L Distribution Info

Benefits of 19115-3

Create a codelist KPI

For all value associated to a codelist, it should be verified that the value belongs to the referenced codelist.

Check if the schematron file wmcp13.sch is implementing the verification of the WMCP mandatory rules.

It is available here:
https://github.com/wmo-im/wmcp/blob/master/validationTestSuite/wcmp13.sch

Define KPI I GraphicOverview for non bulletins metadata records

teleconference 23.1.2020

Hello @wmo-im/tt-wismd
we are going to have a teleconference on Thursday to review the progress on KPIs. Please have a look at the KPI assigned to yourself.
These are info for the blueJeans teleconference.

Meeting URL
https://bluejeans.com/497597571?src=join_info

Meeting ID
497 597 571

Want to dial in from a phone?

Dial one of the following numbers:
+1.408.740.7256 (US (San Jose))
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Clarify the use of Anchor or CharacterString in data policy

Guidance document from KPIs

Introduction and structure to be done first

Define the KPI D Valid Station Identifier

WMCP Migration to github

Schema, Code Tables import in github (csv, ttl (skos) for a provision in codes.wmo.int ?) and governance
Add documentation (wiki page) as necessary (for the XSD schema, ...) ?
WMCP Wiki (including Specification) Migration ?
Guidance + documentation migration ?

This is a test note

the purpose is to understand how to convert "note" to an "issue".

Definition of Metadata Quality KPIs

Drivers: WIS GUI, WIS2, Usability, Discovery, need to handle granularity, OSCAR to WIS, etc
Scope: identifying quality KPIs - how, priorities, etc
Initial sources for defining the KPIs (WMCP 1.3, guidance documentation, WMO Wiki, NOAA Schematron, ...)

New KPI to check no URL in character string

Provide list of elements that need to have constrained vocabulary

This is a requirement by GISCs to have more constrained vocabularies. The work has to be done in close collaboration with GISCs.

Migrate the code list from wis.wmo.int to codes.wmo.int

New KPI for Bulletin Improvements

The templates need to be improved, etc.

Develop paper on drivers, benefits, challenges

Proposal for a WMO Metadata quality KPI Scoring

The following issue summarizes a first draft proposal for providing a scoring algorithm for qualifying each individual metadata records (or information provided regarding the product information content).

This scoring mechanism is greatly inspired by the NOAA rubric (https://data.noaa.gov/metaview/page?xml=NOAA/NESDIS/NCDC/Geoportal//iso/xml/C00589.xml&view=rubricv2/recordHTML&header=none) and the Qualys SSL Labs "SSL Certificate" evaluation scoring explained here: https://github.com/ssllabs/research/wiki/SSL-Server-Rating-Guide

Results of an evaluation page can be checked here:

https://www.ssllabs.com/ssltest/index.html

KPI Types

Within the discussion and creation of KPIs and looking at other scoring engines for metadata quality 3 categories of KPIs are emerging:

Mandatory: It regroups all the rules that are necessary to comply to a WMO Core Profile 1.3 (compliance to Metadata schema and compliance to additional rules)
Required: That category regroups all the additional KPIs (the ones currently defined by TT-WMD) that have been judged necessary to create a meaningful metadata record
Recommended: this category is an optional category that is not required but makes a metadata record standing out of the rest of the record

Overall score mechanism

Following the Qualys labs, a two levels strategy can be adopted to evaluate a metadata record.

The overall score can be between 0 and 100 and mapped to a letter grade (A, B, C ,D) with the following translation table.

Letter grade translation

Numerical Score	Grade
score >= 80	A
score >= 65	B
score >= 50	C
score >= 35	D
score >= 20	E
score < 20	F

This score will be calculated only if the metadata is compliant to WMO Core profile 1.3. If not compliant a special Letter (U like uncompliant) should be provided.

Score computation

To compute the overall score over 100, the different KPIs are split in categories:

Here is an initial proposal of the categories:

Identification Info (Title Abstract)
Coverage (temporal geographical)
Distribution information
- Formats
Data Policy
Enhancements
- Broken links
- Graphic Overview

Inside each category, there will be required and recommended KPIs. A 0 score for a required KPI brings a 0 in the complete category.

For instance, a 0 in the Title KPI would give the IdentificationInfo category a 0.

Then each category will receive a weighted value. For instance (numbers are to be defined).

Category	Score	Example of score for metadata
C1 Identification Info	20%	85 over 100
C2 Coverage	25%	75 over 100
C3 Distribution Information	25%	55 over 100
C4 Data Policy	15%	45 over 100
C5 Enhancements	15%	65 over 100

Formula pending the overall scoring contribution is over 100 and the individual category scores are over 100:

(C1x0.2) + (C2x0.25)+(C3x0.25)+(C4x0.15)+(C5x0.15) = 66 % that would be B.

Individual Categorie Scoring

A category should contain multiple KPIs tests and should return a score over 100.

There are different strategies that can be applied: a ponderation similar applied to make the overall score:

Category A:

KPI 1 50%
KPI 2 25%
KPI3 25%

Or in the case that some of the KPIs results should be used or prevented it could impact the result.

For instance in the data policy, let’s imagine that only data policies with anchors are authorized but non anchor data policy are authorized for a transition period, and the metadata record contains both.

Anchor Data Policy = 100%
Non Anchor Data Policy = 75 %

To define the total score, you can take the best way of encoding (anchor DP), add the non anchor DP score and divide by 2.

Non Compliance information

For each category and KPI, all individual errors should be presented: error title, show what XML element is evaluated and its value and then the value expected.

The NOAA rubric can be copied for that.

Example of the Qualys SSL Test web site

Secretariat to provide a page with examples and tools for wcmp

Define KPI G Bulletins have an ongoing temporal extent

implement WCMP validator via pywcmp

pywcmp was initially implemented in 2013 as part of MSC DCPC requirements or WMO WIS. Update pywcmp functionality as per below:

Features:

composable: can be used as a library by a downstream Python application or command line tooling
implements ATS
implements KPIs
implements scoring rubric

WIS Catalogue statistics

Dear all,
I created a set of tool/test in the attached file "testWISCatalog.py" and ran it over the whole WIS catalogue (DWD GISC snapshot from 18 November 2019).

WMO Rubric Records
I did a conversion of the whole catalogue by http://www.ngdc.noaa.gov/metadata/published/xsl/wmoRubricReport.xsl and extracted WMO Profile 1.3 Score. I did it because I hope it would help me to identify good entries for #45.
For this, you can use wmoRubric and wmoRubricScore commands.
Summary:

There are 102827 entries in total
highest possible score is 14/22 ("Encoding Compliance" and "Global Exchange Compliance " are just empty rules)

entries	%	Score
13397	13%	14/22
6322	6.1%	13/22
2465	2.4%	12/22
1608	1.6%	11/22
25459	24.8%	10/22
1428	1.4%	9/22
52140	50.7%	8/22
8	0.0%	7/22

Statistic over constrains
The "dir" and "files" tests check the presence of following elements defined by XPath:

//gmd:descriptiveKeywords[gmd:MD_Keywords/gmd:type/gmd:MD_KeywordTypeCode- /text()="dataCentre"]/gmd:MD_Keywords/gmd:keyword/gco:CharacterString/text()
//gmd:resourceConstraints/gmd:MD_Constraints/gmd:useLimitation/gco:CharacterString/text()
//gmd:resourceConstraints/gmd:MD_LegalConstraints/gmd:accessConstraints/gmd:MD_RestrictionCode/text()
//gmd:resourceConstraints/gmd:MD_LegalConstraints/gmd:useConstraints/gmd:MD_RestrictionCode/text()
//gmd:resourceConstraints/gmd:MD_LegalConstraints/gmd:otherConstraints/gco:CharacterString/text()
in element
/gmd:MD_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification
The result is xpathStatDir.csv where you can apply filter in e.g. Excel for closer analysis.
During this analysis, I found the following issues:

Identified issues:

There are 29933 entries (from 102827) which use default namespace for gmd. It conflicts with:
6.2.1 Each WIS discovery metadata record shall name explicitly all namespaces used within the record: use of default namespaces is prohibited.
Some of such entries define gmd namespace as a default namespace, and in addition, also by gmd prefix. I guess they are generated automatically.
There are 718 entries which don't define all namespaces used in XML document.

Entries duplicity
During the testing, we realized that there are a lot of entries which differ only by time, or station they are related.

346 entries which start with "METOP-A.GOME2.L3.O3.VCD.NRT.GDP-4.", differ by the datetime.
2980 entries which start with "de.dwd.mosmix." differ only by the related location
A lot of entries are not well formtted, e.g. entries which start with "de.pangaea.dataset." Humans need to use formatting tool for reading, comparing, ... Maybe it should be at least some recommendation to format it.

I know that these duplicities are sometimes necessary, but it seems to me that sometimes they are not. We are going to define KPI to check only a particular entry, but maybe it will be useful to define also KPI which will test one entry against the rest of the catalogue.
I started to write a command "compare" which should compare entries by content. First attempt was to group them by name prefixes. But this is not such a straightforward approach. Maybe some brute force algorithm for comparing all files should be used for this. But I am not sure if my thoughts about it are right. The intention is to "tidy up" WIS catalogue. What do you think?

I wish you a Merry Christmas.
Jan

Subset of catalogue record to be downloaded and copied in the github repository

Add support for ISO 19119 - "Geographic information — Services"

D. WMO Core Profile 1.3 is derived from the ISO 19139 XML representation of ISO 19115 "Geographic Information — Metadata"
Support should be added for ISO 19119 - "Geographic information — Services" to support Web Services such as Web Feature Service (WFS)/ Web Coverage Service (WCS) / Web Map Service (WMS) and others. Example xml files are attached

de.dwd.cdc.services.txt
de.dwd.services.wfs.txt

Define KPI N Citation information to reference parties that have contributed to the creation of the dataset (presence of DOI)

scope of activities needed:

Based on drivers: additional content that would be newly highlighted in a revision of the WMCP
Benefits of 19115-3
Mapping from 19115:2003 to 19115-3 (2016/8)
Other dependencies ?

Development of document setting out background, drivers, issues #5
Assessment of GISCs capacity to handle full 19115:20113 records
Assessment of issues/ needs, for GISCs to handle current and new 19115, and services metadata

wmo-im / wcmp Goto Github PK

wcmp's Introduction

WMO Core Metadata Profile Specification

Associated Publications

wcmp's People

Stargazers

Watchers

Forkers

wcmp's Issues