Coder Social home page Coder Social logo

wcmp's Introduction

wcmp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wcmp's Issues

Present KPIs to TT-GISC

TT-GISC is going to have a meeting in Offenbach 26-29 August 2019. It is very important that the activity of defining KPIs for the metadata is done in coordination with TT-GISC. We should finalise a first draft and present it at the meeting with the aim to decide how TT-GISC will act on the KPIs proposal.
We should also decide if we are proposing KPIs or guidance. At the moment my impression is that we are not developing the KPIs with a metric that will allow to measure the status of the metadata and their improvement.

KPIs Definition

Please post a message here once you have updated your assigned KPI.
Please feel free to add additional KPIs.

Use of github for the team

@wmo-im/tt-wismd I started this issue just because I am trying to find a good way to apply an agile methodology to our working practice.
I have created projects to group the task and have an agile board. This will make more visible the tasks and we can also add milestones. Please have a look in the projects tab.

Create a codelist KPI

For all value associated to a codelist, it should be verified that the value belongs to the referenced codelist.

teleconference 23.1.2020

Hello @wmo-im/tt-wismd
we are going to have a teleconference on Thursday to review the progress on KPIs. Please have a look at the KPI assigned to yourself.
These are info for the blueJeans teleconference.

Meeting URL
https://bluejeans.com/497597571?src=join_info

Meeting ID
497 597 571

Want to dial in from a phone?

Dial one of the following numbers:
+1.408.740.7256 (US (San Jose))
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

WMCP Migration to github

  • Schema, Code Tables import in github (csv, ttl (skos) for a provision in codes.wmo.int ?) and governance
  • Add documentation (wiki page) as necessary (for the XSD schema, ...) ?
  • WMCP Wiki (including Specification) Migration ?
  • Guidance + documentation migration ?

Definition of Metadata Quality KPIs

  • Drivers: WIS GUI, WIS2, Usability, Discovery, need to handle granularity, OSCAR to WIS, etc
  • Scope: identifying quality KPIs - how, priorities, etc
  • Initial sources for defining the KPIs (WMCP 1.3, guidance documentation, WMO Wiki, NOAA Schematron, ...)

Proposal for a WMO Metadata quality KPI Scoring

Proposal for a WMO Metadata quality KPI Scoring

The following issue summarizes a first draft proposal for providing a scoring algorithm for qualifying each individual metadata records (or information provided regarding the product information content).

This scoring mechanism is greatly inspired by the NOAA rubric (https://data.noaa.gov/metaview/page?xml=NOAA/NESDIS/NCDC/Geoportal//iso/xml/C00589.xml&view=rubricv2/recordHTML&header=none) and the Qualys SSL Labs "SSL Certificate" evaluation scoring explained here: https://github.com/ssllabs/research/wiki/SSL-Server-Rating-Guide

Results of an evaluation page can be checked here:

https://www.ssllabs.com/ssltest/index.html

KPI Types

Within the discussion and creation of KPIs and looking at other scoring engines for metadata quality 3 categories of KPIs are emerging:

  • Mandatory: It regroups all the rules that are necessary to comply to a WMO Core Profile 1.3 (compliance to Metadata schema and compliance to additional rules)

  • Required: That category regroups all the additional KPIs (the ones currently defined by TT-WMD) that have been judged necessary to create a meaningful metadata record

  • Recommended: this category is an optional category that is not required but makes a metadata record standing out of the rest of the record

Overall score mechanism

Following the Qualys labs, a two levels strategy can be adopted to evaluate a metadata record.

The overall score can be between 0 and 100 and mapped to a letter grade (A, B, C ,D) with the following translation table.

Letter grade translation

Numerical Score Grade
score >= 80 A
score >= 65 B
score >= 50 C
score >= 35 D
score >= 20 E
score < 20 F

This score will be calculated only if the metadata is compliant to WMO Core profile 1.3. If not compliant a special Letter (U like uncompliant) should be provided.

Score computation

To compute the overall score over 100, the different KPIs are split in categories:

Here is an initial proposal of the categories:

  • Identification Info (Title Abstract)

  • Coverage (temporal geographical)

  • Distribution information

    • Formats
  • Data Policy

  • Enhancements

    • Broken links

    • Graphic Overview

Inside each category, there will be required and recommended KPIs. A 0 score for a required KPI brings a 0 in the complete category.

For instance, a 0 in the Title KPI would give the IdentificationInfo category a 0.

Then each category will receive a weighted value. For instance (numbers are to be defined).

Category Score Example of score for metadata
C1 Identification Info 20% 85 over 100
C2 Coverage 25% 75 over 100
C3 Distribution Information 25% 55 over 100
C4 Data Policy 15% 45 over 100
C5 Enhancements 15% 65 over 100

Formula pending the overall scoring contribution is over 100 and the individual category scores are over 100:

(C1x0.2) + (C2x0.25)+(C3x0.25)+(C4x0.15)+(C5x0.15) = 66 % that would be B.

Individual Categorie Scoring

A category should contain multiple KPIs tests and should return a score over 100.

There are different strategies that can be applied: a ponderation similar applied to make the overall score:

Category A:

  • KPI 1 50%

  • KPI 2 25%

  • KPI3 25%

Or in the case that some of the KPIs results should be used or prevented it could impact the result.

For instance in the data policy, let’s imagine that only data policies with anchors are authorized but non anchor data policy are authorized for a transition period, and the metadata record contains both.

  • Anchor Data Policy = 100%

  • Non Anchor Data Policy = 75 %

To define the total score, you can take the best way of encoding (anchor DP), add the non anchor DP score and divide by 2.

Non Compliance information

For each category and KPI, all individual errors should be presented: error title, show what XML element is evaluated and its value and then the value expected.

The NOAA rubric can be copied for that.

Example of the Qualys SSL Test web site

screencapture-ssllabs-ssltest-analyze-html-2020-01-23-13_47_51

implement WCMP validator via pywcmp

pywcmp was initially implemented in 2013 as part of MSC DCPC requirements or WMO WIS. Update pywcmp functionality as per below:

Features:

  • composable: can be used as a library by a downstream Python application or command line tooling
  • implements ATS
  • implements KPIs
  • implements scoring rubric

WIS Catalogue statistics

Dear all,
I created a set of tool/test in the attached file "testWISCatalog.py" and ran it over the whole WIS catalogue (DWD GISC snapshot from 18 November 2019).

WMO Rubric Records
I did a conversion of the whole catalogue by http://www.ngdc.noaa.gov/metadata/published/xsl/wmoRubricReport.xsl and extracted WMO Profile 1.3 Score. I did it because I hope it would help me to identify good entries for #45.
For this, you can use wmoRubric and wmoRubricScore commands.
Summary:

  • There are 102827 entries in total
  • highest possible score is 14/22 ("Encoding Compliance" and "Global Exchange Compliance " are just empty rules)
entries % Score
13397 13% 14/22
6322 6.1% 13/22
2465 2.4% 12/22
1608 1.6% 11/22
25459 24.8% 10/22
1428 1.4% 9/22
52140 50.7% 8/22
8 0.0% 7/22

Statistic over constrains
The "dir" and "files" tests check the presence of following elements defined by XPath:

  • //gmd:descriptiveKeywords[gmd:MD_Keywords/gmd:type/gmd:MD_KeywordTypeCode- /text()="dataCentre"]/gmd:MD_Keywords/gmd:keyword/gco:CharacterString/text()
  • //gmd:resourceConstraints/gmd:MD_Constraints/gmd:useLimitation/gco:CharacterString/text()
  • //gmd:resourceConstraints/gmd:MD_LegalConstraints/gmd:accessConstraints/gmd:MD_RestrictionCode/text()
  • //gmd:resourceConstraints/gmd:MD_LegalConstraints/gmd:useConstraints/gmd:MD_RestrictionCode/text()
  • //gmd:resourceConstraints/gmd:MD_LegalConstraints/gmd:otherConstraints/gco:CharacterString/text()
    in element
  • /gmd:MD_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification
    The result is xpathStatDir.csv where you can apply filter in e.g. Excel for closer analysis.
    During this analysis, I found the following issues:

Identified issues:

  1. There are 29933 entries (from 102827) which use default namespace for gmd. It conflicts with:
    6.2.1 Each WIS discovery metadata record shall name explicitly all namespaces used within the record: use of default namespaces is prohibited.
    Some of such entries define gmd namespace as a default namespace, and in addition, also by gmd prefix. I guess they are generated automatically.
  2. There are 718 entries which don't define all namespaces used in XML document.

Entries duplicity
During the testing, we realized that there are a lot of entries which differ only by time, or station they are related.

  1. 346 entries which start with "METOP-A.GOME2.L3.O3.VCD.NRT.GDP-4.", differ by the datetime.
  2. 2980 entries which start with "de.dwd.mosmix." differ only by the related location
  3. A lot of entries are not well formtted, e.g. entries which start with "de.pangaea.dataset." Humans need to use formatting tool for reading, comparing, ... Maybe it should be at least some recommendation to format it.

I know that these duplicities are sometimes necessary, but it seems to me that sometimes they are not. We are going to define KPI to check only a particular entry, but maybe it will be useful to define also KPI which will test one entry against the rest of the catalogue.
I started to write a command "compare" which should compare entries by content. First attempt was to group them by name prefixes. But this is not such a straightforward approach. Maybe some brute force algorithm for comparing all files should be used for this. But I am not sure if my thoughts about it are right. The intention is to "tidy up" WIS catalogue. What do you think?

I wish you a Merry Christmas.
Jan

Provide list of mandatory elements

Find a list of mandatory elements and use them as a metric in the Performance Indicator. For example, the DataDistributionScope (GlobalExchange, RegionalExchange, OriginatingCenter) is crucial to the WIS Core Cache

WMCP Migration to ISO-19115-3

  1. scope of activities needed:
  • Based on drivers: additional content that would be newly highlighted in a revision of the WMCP
  • Benefits of 19115-3
  • Mapping from 19115:2003 to 19115-3 (2016/8)
  • Other dependencies ?
  1. Development of document setting out background, drivers, issues #5
  2. Assessment of GISCs capacity to handle full 19115:20113 records
  3. Assessment of issues/ needs, for GISCs to handle current and new 19115, and services metadata

WMO Rubric

put example and documentation to github

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.