Coder Social home page Coder Social logo

semiceu / dcat-ap_shacl Goto Github PK

View Code? Open in Web Editor NEW
12.0 18.0 3.0 5.84 MB

DCAT-AP SHACL constraint definitions

Home Page: https://github.com/SEMICeu/dcat-ap_shacl

XSLT 98.98% CSS 0.85% JavaScript 0.17%
shacl application-profile dcat dcat-ap rdf validation shapefile

dcat-ap_shacl's Introduction

Archiving note: the shacl definitions are part of DCAT-AP specifications.

DCAT-AP SHACL constraint definitions

The Application Profiles (AP) offered here represent a set of integrity constraints used to check the physical and logical correctness or rationality of a certain dataset. The SHACL specifications for DCAT Application Profile.

The rules of thumb page offers insights into how the shapes were created.

The Source Structure

  • ./shacl folder contains the latest version of the official DCAT-AP shacl expression
  • ./resources folder contains the latest version of the controlled vocabularies required by dcat-ap-mdr-vocabularies.shapes.rdf. Note that in order to validate for controlled vocabularies, all of them need to be loaded as part of the data graph.
  • ./test folder contsins rdf unit tests that can be executed with Free version of TopBraid Composer. Currently they offer partial coverage for DCAT-AP and more unit tests shall be created from real data.
  • ./dev folder contains experiemntal in progress work or other sources. For example shapeview.ui.ttlx can be used to generate the DCAT-AP HTML documentation from its SHACL implementation.
  • ./documentation folder contains the static HTML generated automatically using the SWP technology. The source code for generating this document is in ./dev/shapeview.app

The scope of the current release

  • dcat-ap.shapes.ttl validates instance shapes with respect to:

    • o cardinality constraints, both for properties that have a minimum occurrence greater or equal to than 1 (for example, there needs to be at least one occurrence of the mandatory property title of a Dataset) and properties that have an explicit maximum occurrence (for example, there can be a maximum of one licence for a Distribution)
    • o restrictions concerning ranges (expressed as sh:nodeKind) of properties to avoid that properties, irrespective of whether they are mandatory, recommended or optional, are provided that violate range declarations.
  • dcat-ap-mandatory-classes.shapes.ttl validates that data includes the mandatory classes.

  • dcat-ap-mdr-vocabularies.shapes.ttl validates that values of a property are taken from the mandatory controlled vocabularies

Usage Notes

To execute these shape files you can use shacl-cl tool build on top of TopQuadrant's implementation of SHACL API or any other implementation of the SHACL standard.

dcat-ap_shacl's People

Contributors

andrea-perego avatar bertvannuffelen avatar costezki avatar emidiostani avatar espinr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dcat-ap_shacl's Issues

sh:name question

Very minor remark (probably due to the software tool being used I guess)

dcat-ap.shapes.ttl line 206 reads "sh:name "Catalog"@en.
It appears to be the only shape with a sh:name (and also the only literal with a language tag), e.g. adms:Identifier does not use it. Should this be set on the other shapes ? (for viewing / reporting purposes ?)

On a related note, sh:name is not used on foaf:name (line 651), which seems to be the only sh:property not setting a sh:name

Readability of the RDF/XML file

  • The file would benefit from commentary, at least to indicate the various sections. Maybe even an index to show where the various sections begin?
  • It could be good to organise lists alphabetically, e.g. the list of namespaces at the start. Also, there is no obvious order in the properties in the description of the vocabulary in lines 42-168; maybe they could be grouped together alphabetically per namespace?
  • It’s a bit hard to figure out, but it seems that the class shape of the Distribution is described inside the property shape for dataset-sample. The class shape for Dataset starts at line 208 and ends at line 626, Distribution from line 237 to 424, CatalogRecord 627-692 and Catalog 639-818. Should the class shape of Distribution not be on the same hierarchical level as the shapes for Dataset, Catalog and CatalogRecord?
  • The property shapes under the classes do not seem to be ordered in any way. It could be useful to do the mandatory ones first, then the recommended ones and the optional ones last.

why sh:class rdfs:Resource ?

A few shapes in dcat-ap.shapes.ttl contain the rule "sh:class rdfs:Resource"
Which seems odd, since rdfs:Resource is really anything...

Can this just be ignored / removed ?
Or replaced with sh:NodeKind sh:IRIOrLiteral (assuming that's what the rule is about ?)

schema:startDate minCount

In dcat-ap.shapes.ttl (line 138) minCount is 1 for startDate, so that would mean that a start date is always mandatory ?

Which makes sense, but DCAT-AP 1.1 spec itself allows a cardinality of 0..1 ("4.11 Period of Time"), so I'd suggest to remove the minCount (and/or take this into account for the next DCAT-AP revision)

sh:Violation of sh:Warning

Raised by Håvard Mikkelsen Ottestad:

Everything is sh:Violation. I used sh:Warning for things that were optional/recommended, for instance with minCount 1 for dcat:distribution. This approach may not be the best though.

to be done

  • Document where to find what (version of old standard and tool support vs new standard )
  • Would be nice in the future :warnings and :info constraints in a separate files [Recommended and optional fields for the future]
  • Delete all non essential statements

TODO: by 5th of July
TODO: send a message with the latest version

#16 - check if mandatory or not

#17 -

#18 - double check it works

#20 - make sure they are URIs only;

#21 - point out that it is for SKOS-AP, thank for suggestion,

#22 - remove type check and just make sure it is an URI

#23 - comment that it is not a problem

#24 - tool problem

#25 - relax the min card to 0 for startDate to comply with the specification

#26 - delete all non essential statements, check and remove all sh:name, sh:description, (but not sh:message)

#27 - not implementing, similar to #21

double implemntations

We need:

  • one implementation for old standard with reliable tool support
  • one implementation with the latest standard specification (with maybe shaky tool support)

related to #17

Class validation (explicit)

You declare a violation in property shape dataset-conformsTo if the object (sh:class) is not a http://purl.org/dc/terms/Standard.

dcat:Dataset
  rdf:type sh:Shape ;
  sh:property [
      sh:predicate dcterms:conformsTo ;
      sh:class dcterms:Standard ;
      sh:name "conforms to" ;
      sh:severity sh:Violation ;
    ] ;```

My[Makx] question is how you can validate this in a case where the object does not explicitly declare itself to be a dct:Standard? 

In addition, in OWL terms, doesn’t something become a dct:Standard as a result of the fact that it is made the object of dct:conformsTo? I guess the question applies to all class validations.

DCTerms relations missing in the rdf distribution

Surprisingly, a range or DCT relations are listed in the documentation on DC site but do not appear in non of the RDF distributions I have found.

Using these properties without importing them has a consequence: the editors add by default their definition into the source file.

the list of relations is the following:

 <!-- http://purl.org/dc/terms/conformsTo -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/conformsTo"/>
    
    <!-- http://purl.org/dc/terms/creator -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/creator"/>
    
    <!-- http://purl.org/dc/terms/license -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/license"/>
    
    <!-- http://purl.org/dc/terms/publisher -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/publisher"/>
    
    <!-- http://purl.org/dc/terms/relation -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/relation"/>
    
    <!-- http://purl.org/dc/terms/rightsHolder -->

    <owl:ObjectProperty rdf:about="http://purl.org/dc/terms/rightsHolder"/>
    <!-- http://purl.org/dc/terms/accessRights -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/accessRights"/>
    


    <!-- http://purl.org/dc/terms/accrualPeriodicity -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/accrualPeriodicity"/>
    


    <!-- http://purl.org/dc/terms/conformsTo -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/conformsTo"/>
    


    <!-- http://purl.org/dc/terms/description -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/description"/>
    


    <!-- http://purl.org/dc/terms/format -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/format"/>
    


    <!-- http://purl.org/dc/terms/hasPart -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/hasPart"/>
    


    <!-- http://purl.org/dc/terms/hasVersion -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/hasVersion"/>
    


    <!-- http://purl.org/dc/terms/isPartOf -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/isPartOf"/>
    


    <!-- http://purl.org/dc/terms/isVersionOf -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/isVersionOf"/>
    


    <!-- http://purl.org/dc/terms/issued -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/issued"/>
    


    <!-- http://purl.org/dc/terms/language -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/language"/>
    


    <!-- http://purl.org/dc/terms/license -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/license"/>
    


    <!-- http://purl.org/dc/terms/modified -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/modified"/>
    


    <!-- http://purl.org/dc/terms/provenance -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/provenance"/>
    


    <!-- http://purl.org/dc/terms/publisher -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/publisher"/>
    


    <!-- http://purl.org/dc/terms/relation -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/relation"/>
    


    <!-- http://purl.org/dc/terms/rights -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/rights"/>
    


    <!-- http://purl.org/dc/terms/source -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/source"/>
    


    <!-- http://purl.org/dc/terms/spatial -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/spatial"/>
    


    <!-- http://purl.org/dc/terms/temporal -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/temporal"/>
    


    <!-- http://purl.org/dc/terms/title -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/title"/>
    


    <!-- http://purl.org/dc/terms/type -->

    <owl:NamedIndividual rdf:about="http://purl.org/dc/terms/type"/>
    


    <!-- http://schema.org/endDate -->

    <owl:NamedIndividual rdf:about="http://schema.org/endDate"/>
    


    <!-- http://schema.org/startDate -->

    <owl:NamedIndividual rdf:about="http://schema.org/startDate"/>

dcat:mediaType value constraint

To my knowledge IANA does not provide a list of URIs for mime type. Is there an alternative to that?
Otherwise cannot set a validation constraint on dcat:mediaType controlled vocabulary

tosh namespace

Probably auto-inserted. It seems that the shacl files don't really use TopBraid's tosh, so I'd suggest to remove the tosh namespace prefix from the files.

shex

Really interesting work! Quick q: did you consider shex at all, or make any investigations into the relative expressivity / usability of these two formats?

Publisher defined as skos:Concept?

Raised by Martín Álvarez Espinar:

In dcat-ap-mdr-vocabularies.shapes.ttl,
dcat:Catalog and dcat:Dataset publishers are considered as IRI skos:Concepts. In this case I suppose all the publishers are listed in the Corporate body NAL, right? If not, it sounds weird the class restriction (skos:Concept) and the nodetype (that should be sh:BlankNodeOrIRI).

vCard vocabulary

Raised by Paul Hermans:
The vcard:Kind class is used, but in TopBraid Composer the vCard ontology loaded is still the previous one where vCard:Kind doesn’t exist.
This is related to the fact, as you probably know, that vCard is being substantially refactored, but still using the same ns identifier.
I don’t have wisdom on this one, but it can be very confusing.

Non essential statements (Asset metadata)

You have added quite a lot of non-essential assertions, e.g. org:memberOf, foaf:homepage, vs:term_status, odrs:copyrightNotice etc. Also, not all of the declared namespace prefixes are used in the file, e.g. geo, dc, gr, wot etc. Maybe we should limit the contents to the bare minimum to make the file smaller and easier to follow.

Shape dcat:Dataset

I don't see following properties included:

  • dcat:keyword
  • dct:identifier
  • adms:versionNotes

inverse Path constraint

:MandatoryDataset rdf:type sh:NodeShape ; rdfs:comment "Mandatory dataset " ; rdfs:label "Mandatory dataset" ; sh:property [ sh:minCount 1 ; sh:path [ sh:inversePath rdf:type ; ] ; ] ; sh:targetNode "dcat:Dataset" ; .

The targetNode should not be a string.

older version of SHACL?

I see in the latest proposed recommendation

2017-01-19: Removed sh:predicate in favor of using sh:path only (ISSUE-217).

This SHACL file is still using sh:predicate

test suite

Are there plans top develop a test suite?

Importing external vocabularies

Raised by Paul Hermans:
In the OWL/SHACL all used external vocabs are imported which is good.
However some of those imported ones use other vocabs without importing those (e.g. dcat itself).
It uses e.g. classes from http://purl.org/dc/dcmitype/ but without importing.
It might be an option to import these second line vocabs in the application profile vocab itself.
This shows up nicer in some IDE’s.

Limiting number of prefLabels

Raised by Håvard Mikkelsen Ottestad:

Another rule I’ve looked at is limiting the number of prefLabels. By design you shouldn’t have two prefLabels unless they have different languages, but you can have a prefLabel with a language and one without at the same time. This can be done by using a combination of sh:uniqueLang and maxCount. I discussed this on the SHACL mailing lists and got two solutions (that very rather complicated): https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jun/0000.html https://lists.w3.org/Archives/Public/public-rdf-shapes/2017May/0042.html

vcard:Kind fn mandatory property ?

Maybe a vcard:fn should be added to the vcard:Kind shape in dcat-ap.shapes.ttl

The vCard / RDF spec https://www.w3.org/TR/vcard-rdf/ (part 2.3) mentions that fn or hasFN is a (actually, the only) mandatory vCard property

Something like:

sh:property [
sh:predicate vcard:fn ;
sh:minCount 1 ;
sh:nodeKind sh:Literal ;
] ;

Mandatory class test

Shall we check for presence of mandatory class instances in the data-set and if so which ones?

Currently there are three shapes :MandatoryAgent, :MandatoryCatalog and :MandatoryDataset that are checking for presence of at least one instances of foaf:Agent, dcat:Catalog and dcat:Dataset.

Shall we keep or remove them?

PS: this leads me to a broader question: when we validate a data-set want do we want to verify?

  • that it is conform the shape file or
  • that it is not violating a set of shapes

Handling of external resources

Raised by Håvard Mikkelsen Ottestad:

How are you going to handle external resources in general, for instance for licenses? I have not found a good way of handling this, and ended up loading in a bunch of external rdf files so that the user can reference any resources in those files.

Spelling errors

Some spelling errors, e.g. dcterms:RigthsStatement, line 60 and 63

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.