semiceu / dcat-ap_shacl Goto Github PK

View Code? Open in Web Editor NEW

12.0 18.0 3.0 5.84 MB

DCAT-AP SHACL constraint definitions

Home Page: https://github.com/SEMICeu/dcat-ap_shacl

XSLT 98.98% CSS 0.85% JavaScript 0.17%

shacl application-profile dcat dcat-ap rdf validation shapefile

dcat-ap_shacl's Introduction

Archiving note: the shacl definitions are part of DCAT-AP specifications.

DCAT-AP SHACL constraint definitions

The Application Profiles (AP) offered here represent a set of integrity constraints used to check the physical and logical correctness or rationality of a certain dataset. The SHACL specifications for DCAT Application Profile.

The rules of thumb page offers insights into how the shapes were created.

The Source Structure

./shacl folder contains the latest version of the official DCAT-AP shacl expression
./resources folder contains the latest version of the controlled vocabularies required by dcat-ap-mdr-vocabularies.shapes.rdf. Note that in order to validate for controlled vocabularies, all of them need to be loaded as part of the data graph.
./test folder contsins rdf unit tests that can be executed with Free version of TopBraid Composer. Currently they offer partial coverage for DCAT-AP and more unit tests shall be created from real data.
./dev folder contains experiemntal in progress work or other sources. For example shapeview.ui.ttlx can be used to generate the DCAT-AP HTML documentation from its SHACL implementation.
./documentation folder contains the static HTML generated automatically using the SWP technology. The source code for generating this document is in ./dev/shapeview.app

The scope of the current release

dcat-ap.shapes.ttl validates instance shapes with respect to:
- o cardinality constraints, both for properties that have a minimum occurrence greater or equal to than 1 (for example, there needs to be at least one occurrence of the mandatory property title of a Dataset) and properties that have an explicit maximum occurrence (for example, there can be a maximum of one licence for a Distribution)
- o restrictions concerning ranges (expressed as sh:nodeKind) of properties to avoid that properties, irrespective of whether they are mandatory, recommended or optional, are provided that violate range declarations.
dcat-ap-mandatory-classes.shapes.ttl validates that data includes the mandatory classes.
dcat-ap-mdr-vocabularies.shapes.ttl validates that values of a property are taken from the mandatory controlled vocabularies

Usage Notes

To execute these shape files you can use shacl-cl tool build on top of TopQuadrant's implementation of SHACL API or any other implementation of the SHACL standard.

dcat-ap_shacl's People

Contributors

Stargazers

Watchers

Forkers

yeuk0 marshallduval tombaker

dcat-ap_shacl's Issues

sh:name question

Very minor remark (probably due to the software tool being used I guess)

dcat-ap.shapes.ttl line 206 reads "sh:name "Catalog"@en.
It appears to be the only shape with a sh:name (and also the only literal with a language tag), e.g. adms:Identifier does not use it. Should this be set on the other shapes ? (for viewing / reporting purposes ?)

On a related note, sh:name is not used on foaf:name (line 651), which seems to be the only sh:property not setting a sh:name

remove the validation of adms:changeType

the ADMS Change TypeVocabulary was never created

Readability of the RDF/XML file

The file would benefit from commentary, at least to indicate the various sections. Maybe even an index to show where the various sections begin?
It could be good to organise lists alphabetically, e.g. the list of namespaces at the start. Also, there is no obvious order in the properties in the description of the vocabulary in lines 42-168; maybe they could be grouped together alphabetically per namespace?
It’s a bit hard to figure out, but it seems that the class shape of the Distribution is described inside the property shape for dataset-sample. The class shape for Dataset starts at line 208 and ends at line 626, Distribution from line 237 to 424, CatalogRecord 627-692 and Catalog 639-818. Should the class shape of Distribution not be on the same hierarchical level as the shapes for Dataset, Catalog and CatalogRecord?
The property shapes under the classes do not seem to be ordered in any way. It could be useful to do the mandatory ones first, then the recommended ones and the optional ones last.

why sh:class rdfs:Resource ?

A few shapes in dcat-ap.shapes.ttl contain the rule "sh:class rdfs:Resource"
Which seems odd, since rdfs:Resource is really anything...

Can this just be ignored / removed ?
Or replaced with sh:NodeKind sh:IRIOrLiteral (assuming that's what the rule is about ?)

schema:startDate minCount

In dcat-ap.shapes.ttl (line 138) minCount is 1 for startDate, so that would mean that a start date is always mandatory ?

Which makes sense, but DCAT-AP 1.1 spec itself allows a cardinality of 0..1 ("4.11 Period of Time"), so I'd suggest to remove the minCount (and/or take this into account for the next DCAT-AP revision)

sh:Violation of sh:Warning

Raised by Håvard Mikkelsen Ottestad:

Everything is sh:Violation. I used sh:Warning for things that were optional/recommended, for instance with minCount 1 for dcat:distribution. This approach may not be the best though.

to be done

Document where to find what (version of old standard and tool support vs new standard )
Would be nice in the future :warnings and :info constraints in a separate files [Recommended and optional fields for the future]
Delete all non essential statements

TODO: by 5th of July
TODO: send a message with the latest version

#16 - check if mandatory or not

#17 -

#18 - double check it works

#20 - make sure they are URIs only;

#21 - point out that it is for SKOS-AP, thank for suggestion,

#22 - remove type check and just make sure it is an URI

#23 - comment that it is not a problem

#24 - tool problem

#25 - relax the min card to 0 for startDate to comply with the specification

#26 - delete all non essential statements, check and remove all sh:name, sh:description, (but not sh:message)

#27 - not implementing, similar to #21

double implemntations

We need:

one implementation for old standard with reliable tool support
one implementation with the latest standard specification (with maybe shaky tool support)

related to #17

Class validation (explicit)

You declare a violation in property shape dataset-conformsTo if the object (sh:class) is not a http://purl.org/dc/terms/Standard.

dcat:Dataset
  rdf:type sh:Shape ;
  sh:property [
      sh:predicate dcterms:conformsTo ;
      sh:class dcterms:Standard ;
      sh:name "conforms to" ;
      sh:severity sh:Violation ;
    ] ;```

My[Makx] question is how you can validate this in a case where the object does not explicitly declare itself to be a dct:Standard? 

In addition, in OWL terms, doesn’t something become a dct:Standard as a result of the fact that it is made the object of dct:conformsTo? I guess the question applies to all class validations.

DCTerms relations missing in the rdf distribution

Surprisingly, a range or DCT relations are listed in the documentation on DC site but do not appear in non of the RDF distributions I have found.

Using these properties without importing them has a consequence: the editors add by default their definition into the source file.

the list of relations is the following:

 <!-- http://purl.org/dc/terms/conformsTo -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/conformsTo"/>
    
    <!-- http://purl.org/dc/terms/creator -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/creator"/>
    
    <!-- http://purl.org/dc/terms/license -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/license"/>
    
    <!-- http://purl.org/dc/terms/publisher -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/publisher"/>
    
    <!-- http://purl.org/dc/terms/relation -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/relation"/>
    
    <!-- http://purl.org/dc/terms/rightsHolder -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/rightsHolder"/>
    <!-- http://purl.org/dc/terms/accessRights -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/accessRights"/>
    


    <!-- http://purl.org/dc/terms/accrualPeriodicity -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/accrualPeriodicity"/>
    


    <!-- http://purl.org/dc/terms/conformsTo -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/conformsTo"/>
    


    <!-- http://purl.org/dc/terms/description -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/description"/>
    


    <!-- http://purl.org/dc/terms/format -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/format"/>
    


    <!-- http://purl.org/dc/terms/hasPart -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/hasPart"/>
    


    <!-- http://purl.org/dc/terms/hasVersion -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/hasVersion"/>
    


    <!-- http://purl.org/dc/terms/isPartOf -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/isPartOf"/>
    


    <!-- http://purl.org/dc/terms/isVersionOf -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/isVersionOf"/>
    


    <!-- http://purl.org/dc/terms/issued -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/issued"/>
    


    <!-- http://purl.org/dc/terms/language -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/language"/>
    


    <!-- http://purl.org/dc/terms/license -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/license"/>
    


    <!-- http://purl.org/dc/terms/modified -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/modified"/>
    


    <!-- http://purl.org/dc/terms/provenance -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/provenance"/>
    


    <!-- http://purl.org/dc/terms/publisher -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/publisher"/>
    


    <!-- http://purl.org/dc/terms/relation -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/relation"/>
    


    <!-- http://purl.org/dc/terms/rights -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/rights"/>
    


    <!-- http://purl.org/dc/terms/source -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/source"/>
    


    <!-- http://purl.org/dc/terms/spatial -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/spatial"/>
    


    <!-- http://purl.org/dc/terms/temporal -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/temporal"/>
    


    <!-- http://purl.org/dc/terms/title -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/title"/>
    


    <!-- http://purl.org/dc/terms/type -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/type"/>
    


    <!-- http://schema.org/endDate -->

    <owl:NamedIndividual rdf:about="http://schema.org/endDate"/>
    


    <!-- http://schema.org/startDate -->

    <owl:NamedIndividual rdf:about="http://schema.org/startDate"/>

dcat:mediaType value constraint

To my knowledge IANA does not provide a list of URIs for mime type. Is there an alternative to that?
Otherwise cannot set a validation constraint on dcat:mediaType controlled vocabulary

tosh namespace

Probably auto-inserted. It seems that the shacl files don't really use TopBraid's tosh, so I'd suggest to remove the tosh namespace prefix from the files.

shex

Really interesting work! Quick q: did you consider shex at all, or make any investigations into the relative expressivity / usability of these two formats?

implement recommended and optional constraints

Would be nice in the future :warnings and :info constraints in a separate files [Recommended and optional fields for the future]

Publisher defined as skos:Concept?

Raised by Martín Álvarez Espinar:

In dcat-ap-mdr-vocabularies.shapes.ttl,
dcat:Catalog and dcat:Dataset publishers are considered as IRI skos:Concepts. In this case I suppose all the publishers are listed in the Corporate body NAL, right? If not, it sounds weird the class restriction (skos:Concept) and the nodetype (that should be sh:BlankNodeOrIRI).

vCard vocabulary

Raised by Paul Hermans:
The vcard:Kind class is used, but in TopBraid Composer the vCard ontology loaded is still the previous one where vCard:Kind doesn’t exist.
This is related to the fact, as you probably know, that vCard is being substantially refactored, but still using the same ns identifier.
I don’t have wisdom on this one, but it can be very confusing.

Non essential statements (Asset metadata)

You have added quite a lot of non-essential assertions, e.g. org:memberOf, foaf:homepage, vs:term_status, odrs:copyrightNotice etc. Also, not all of the declared namespace prefixes are used in the file, e.g. geo, dc, gr, wot etc. Maybe we should limit the contents to the bare minimum to make the file smaller and easier to follow.

adms:changeType does not exist in ADMS vocabulary

adms:changeType does not exist in ADMS vocabulary therefor cannot create a verifiable controlled vocabulary constraint.

for reference See DCAT-AP document v1.1 page 18

test compatibility with official SHACL recommendations (from 20/07/2017)

the language variant used to express the DCAT-AP may slightly differ from the official release. Therefore it shall be tested.

Shape dcat:Dataset

I don't see following properties included:

dcat:keyword
dct:identifier
adms:versionNotes

inverse Path constraint

:MandatoryDataset rdf:type sh:NodeShape ; rdfs:comment "Mandatory dataset " ; rdfs:label "Mandatory dataset" ; sh:property [ sh:minCount 1 ; sh:path [ sh:inversePath rdf:type ; ] ; ] ; sh:targetNode "dcat:Dataset" ; .

The targetNode should not be a string.

older version of SHACL?

I see in the latest proposed recommendation

2017-01-19: Removed sh:predicate in favor of using sh:path only (ISSUE-217).

This SHACL file is still using sh:predicate

test suite

Are there plans top develop a test suite?

Importing external vocabularies

Raised by Paul Hermans:
In the OWL/SHACL all used external vocabs are imported which is good.
However some of those imported ones use other vocabs without importing those (e.g. dcat itself).
It uses e.g. classes from http://purl.org/dc/dcmitype/ but without importing.
It might be an option to import these second line vocabs in the application profile vocab itself.
This shows up nicer in some IDE’s.

Use of sh:hasValue

Raised by Håvard Mikkelsen Ottestad:

The way GeoNamesRestriction etc. are modelled with sh:hasValue won’t work (AFAIK). It’ll just force one of your frequencies to be the node http://publications.europa.eu/resource/authority/frequency

Limiting number of prefLabels

Raised by Håvard Mikkelsen Ottestad:

Another rule I’ve looked at is limiting the number of prefLabels. By design you shouldn’t have two prefLabels unless they have different languages, but you can have a prefLabel with a language and one without at the same time. This can be done by using a combination of sh:uniqueLang and maxCount. I discussed this on the SHACL mailing lists and got two solutions (that very rather complicated): https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jun/0000.html https://lists.w3.org/Archives/Public/public-rdf-shapes/2017May/0042.html

vcard:Kind fn mandatory property ?

Maybe a vcard:fn should be added to the vcard:Kind shape in dcat-ap.shapes.ttl

The vCard / RDF spec https://www.w3.org/TR/vcard-rdf/ (part 2.3) mentions that fn or hasFN is a (actually, the only) mandatory vCard property

Something like:

sh:property [
sh:predicate vcard:fn ;
sh:minCount 1 ;
sh:nodeKind sh:Literal ;
] ;

Mandatory class test

Shall we check for presence of mandatory class instances in the data-set and if so which ones?

Currently there are three shapes :MandatoryAgent, :MandatoryCatalog and :MandatoryDataset that are checking for presence of at least one instances of foaf:Agent, dcat:Catalog and dcat:Dataset.

Shall we keep or remove them?

PS: this leads me to a broader question: when we validate a data-set want do we want to verify?

that it is conform the shape file or
that it is not violating a set of shapes

semiceu / dcat-ap_shacl Goto Github PK

dcat-ap_shacl's Introduction

DCAT-AP SHACL constraint definitions

The Source Structure

The scope of the current release

Usage Notes

dcat-ap_shacl's People

Contributors

Stargazers

Watchers

Forkers

dcat-ap_shacl's Issues

Recommend Projects

Recommend Topics

Recommend Org