semiceu / style-guide Goto Github PK

View Code? Open in Web Editor NEW

9.0 11.0 2.0 5.38 MB

SEMIC style guide to create reusable vocabularies and application profiles

Home Page: https://semiceu.github.io/style-guide/

License: Creative Commons Attribution 4.0 International

Handlebars 3.13% Makefile 1.51% CSS 94.08% TypeScript 1.28%

style-guide's Introduction

SEMIC Style Guide

SEMIC style guide to create reusable vocabularies

Contributing

You are more than welcome to help expand and mature this project.

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Please note that we adhere to a Code of Conduct, please follow it in all your interactions with the project.

Licence

The documents, such as reports and specifications are licenced under a CC BY 4.0 licence.

The source code and other scripts are licenced under EUPL v1.2 licence.

Self-assessment & Validation

Wondering how well your own semantic data specifications adhere to the rules listed in the Style Guide? The Interoperability Test Bed offers the SEMIC Style Guide Validator as a service. The SEMIC Style Guide Validator allows the user to upload a model in either UML (XMI), SHACL or OWL format and validate it against the rules expressed in the Style Guide. The Style Guide Validator is available in the following formats:

Documentation on the Style Guide validator can be found on Joinup or on the respective pages on the ITB.

style-guide's People

Contributors

Stargazers

Watchers

Forkers

bertvannuffelen natasasofou

style-guide's Issues

Artefact generation

Quote from https://semiceu.github.io/style-guide/public-review/arhitectural-clarifications.html

Relations between the artefacts are depicted in the figure above. 
The conceptual model is the source from which a) the data shapes, 
b) the formal ontology and c) the data specification document can 
be generated.

This may not necessarily be true, as the Conceptual model would not contain enough default
to generate technical artefacts. We already address this issue while modeling using different
models: Conceptual, Logical, Physical. As a result I would propose to consider another
level bellow the conceptual that could be used as a single source of truth to generate,
perhaps even automatically, other artefacts.

Note: It could be that you already consider all the necessary information to be available
in the conceptual model. For me the conceptual model capture only the core entities
with a little detail.

Bad subclass example

In § Explicit depiction of external dependencies. In the example Person inherites from foaf:Person. This contradicts a rule I think I saw elsewhere in the Style Guide that says that subclassing without added semantic value is not allowed. Wgich is the case here: a Person cannot be a specialisation of a Person.

Chapters 7.9, Abstract classes + 7.17 Element stereotypes

There is a conflict between these sections. 7.9 suggests an ability of marking classes abstract, but 7.17 advises against using any kind of stereotypes.
There is a valid use-case for expressing abstract classes, but this can be done with SHACL constraints. On the conceptual side, abstract classes are not a coherent concept: every OWL class - be it as abstract as possible - will encompass every individual of its subclasses. Thus, every OWL instance is also a member of owl:Thing, even though Thing could itself be described as an "abstract class".

rdfs:label vs skos:prefLabel

This one is more of a developer concern.
Quote In OWL, - the original label is adopted as is (e.g. either rdfs:label or skos:prefLabel)

In OWL, - the original label is adopted as is (e.g. either rdfs:label or skos:prefLabel) - 
the new label is added as an alternative label (e.g. skos:altLabel)

Imagine situation when using the OWL I would like to generate a visual for a user. As I see it
I have to check for both original labels (rdfs:label and skos:prefLabel). In addition which one
should I chose when both would be available? Is there a benefit to have two label predicate
for class in lightweight-ontology?

I know this may not be a concern from data modeling standpoint, yet it can be for
a developer that has to work with the lightweight-ontology.

Provide better titles for the rules (MC-R2 and MC-R3, and perhaps others)

Provide titles to rules/conventions that are "context-free" (i.e. they make sense even when read outside of the surrounding text/page where they appear), and suggestive (i.e. provide an appropriate summarization of the rule).

E.g. the current title of MC-R2 ("On goals and scope"), and MC-R3 ("On modularisation") are not very good titles. We could make them "Recommendations on goals and scope" and "Recommendations on modularisation", or f you want to be more specific "Define a clear goal and scope" and "Consider breaking up complex domain models in modules"

Other titles that could be improved, include: GC-R4, GC-R5, PC-R6, but in general all rule titles should be reviewed and improved where necessary.

IRIs (Chapter 7.3, Element names and URIs)

In our system, we are using IRIs as it is the basis of all the primary LD standards - and upcoming (RDF-star, SPARQL-star), also HTML5
Support for Unicode is de facto expected for all modern systems.
We recognize the HTTP request limitation of having to percent-encode IRIs to URIs, that punycoded domains further complicate things, and that displayed IRIs can be used in homonym attacks. These are however things that we cannot escape when operating in a larger environment of linked data. Thus, we see no harm in minting IRIs, as per our experience most IRIs are URIs anyway and in cases where Unicode glyphs are needed, there are good justifications (e.g. utilising the set of glyphs in European languages).
When minting, we can ensure that no collisions with percent-encoded versions can occur, and that only a well-defined subset of Unicode is allowed (there is no good reason to allow all of the Unicode glyphs allowed in e.g. NCNames).

Contents navigation bars need to be improved

Make sure that all "Contents" navigation bars (that appear on the right hand side of the pages), are complete and accurate.

For one, the "Contents" of the Guidelines and conventions page is incomplete (it lists only half of the available sub-pages).

It would be very valuable to have also links to the individual rules in the "Contents" navigation bar. It would be nice, perhaps, to add them as as third-level list elements in the "Contents" bar on the "Guidelines and conventions" overview page, if this would not be too ugly, complicated, or impossible. Regardless of that, they should at least appear on the individual subpage (e.g. General conventions, Conceptual model conventions (UML)), where there is currently no navigation opportunity at all, that would allow one to jump directly to a specific rule.

rdfs:subPropertyOf missing in SC-R2

In SC-R2 rdfs:subPropertyOf is missing. However, its usage is suggested multiple times in Clarifications on "reuse", which I agree with.

Provide a consistent ordering of guideline groups (especially concerning the "Methodology Conventions" group)

The "Methodology Conventions" group is listed in a different order (as second) on the "Guidelines and Conventions" page than in other places (navigation bars, index page, etc., where it is listed in the next to last position).

We should do our best to be consistent with the ordering of the rule groups across the entire Style Guide. The question is: What is the best place for the "Methodology conventions" section to be listed at?
Is it the next to last position (just before the Publication Conventions), the second position (after the "General conventions", but before "Conceptual model conventions"), or perhaps should it be the very first group?

I think that the first position is perhaps the most appropriate one for the "Methodology Conventions" group, especially if we look at the rules MC-R1 ("Follow a methodology") and MC-R2 ("Scope and goals definition"), which should happen at the very beginning of the data specification development process, but MC-R3 ("Modularisation") might not be relevant in many use cases, so it might seem inappropriate to some to have this rule listed before the rules in the "General conventions" section.

@costezki, @bertvannuffelen, @EmidioStani, @NatasaSofou, @pfragkou or anyone else interested in this topic, please let me know what you think, in the comments below, and I am happy to implement whatever solution ends up as the clear winner.

URI vs IRI

Rule Acronym: PC-R5 states:
Any URI identifiable resource devised in a data specification shall be dereferenceable.
Later there is
Dereferencing means that one can use the URI as an URL to retrieve related information back.

Why not require use of URL in the first place?
Is this done so IRIs are not used to identify resources?

Do Guidelines and conventions contains all rules?

It seems like section "Guidelines and conventions" contains all rules from (General conventions, Conceptual model conventions (UML), ... ). There are even links to those section e.g. https://semiceu.github.io/style-guide/public-review/guidelines-and-conventions.html#_general_conventions but link from the menu is to: https://semiceu.github.io/style-guide/public-review/gc-general-conventions.html

As a result it is not clear whether the "Guidelines and conventions" contains all rules or not.

UML Composition for Application Profiles

UML class diagrams often use composition relations (the black lozenge) to express that the individual at the other end of the relation can’t live without its owner, e.g. a bike has two wheels, – when the bike dies, so do the wheels.

This notion of life cycle is missing from SEMIC, probably because it's not a conceptual consideration.

UML modellers will need explaining why the different aggregation types are absent. Secondly, the life cycle does become relevant when creating an AP. I would like SHACL to warn me that I define a bike without wheels and vice versa.

Suggestions:

Consider introducing the black lozenge connector or
Justify in the style guide the absence of the composition relation because it's equivalent to an attribute in terms of life cycle
Introduce a SHACL shape that checks that a part (e.g. a wheel) has an owner (the bicycle).

Final remark; imho SEMIC may wish to state that the white lozenge is equivalent to the plain association

No class datatypes

In § Attribute definition and usage. Like for the issue #x, there is no explanation. If the explanation would be that it makes a diagram more simple then that is not true: it simplifies most diagrams as there is no association line on the diagram. And the toolchain makes objectproperties of associations anyhow.

A technical comment/question to Human-readable form (PC-R6)

Human-readable form (PC-R6) says "Each artefact shall have a corresponding human-readable form representing the model documentation". An artefact according to Data specification and artefact types may be among other things a formal ontology expressed in OWL 2 or data shapes expressed in SHACL, which according to PC-R6 therefore also shall have human-readable form?

Or does PC-R6 actually want to say "Each data specification shall have a corresponding human-readable form representing the model documentation"?

Conf. though Issue #55 about human-readable documentation of shapes.

Atomic datatypes

In § Attribute definition and usage. Ok that you would like to stick with atomic datatypes but the why is not explained. Structured datatypes are rather common, eg Geometry is definitly not a class, you would not normally attribute an identifier to a geometry, it is identified by its values.

Issues with HTML rendering of the style guide

There are some rendering issues in the HTML version of the style guide on GitHub pages:

https://semiceu.github.io/style-guide/

I list below those I have identified so far.

Images are not displayed, and the corresponding markup is shown as plain text - e.g., §2.2:

image::../img/uml.png[The main components of the eProcurement ontology and their relation to each other and the UML conceptual model,scaledwidth=99.0%]
References to bibliographic entries and cross-references for sections and figures are not interpreted - see, e.g., §2.2:

The relations between them are represented in Figure #fig:components[1] where each component is represented as an UML package.
...
These formal modules are derived from the conceptual model through model transformations as described in Section #sec:process-approach following the rules laid out in .
Mathcal commands are not interpreted, and shown as plain text - see, e.g., §6.2:

Description logics provide a concise language for OWL axioms and expressions. DLs are characterised by their expressive features. The description logic that supports all class expressions with (>, \bot, \sqcap, \sqcup, \neg, \exists) and (\forall) is known as (\mathcal{ALC}) (which originally used to be an abbreviation for Attribute Language with Complement). For a formal introduction into DL please consult .
The "References" section lists just citation keys, but not the full bibliographic entries.

Optional diagram

In § Data specification and artefact types. I would agree that the kind of diagram would be optional, but not adding a diagram to at least the Application Profile is over-estimating the ability of the reader to see how all the different elements of the model work together to represent the domain.

Prefixes

In § Namespaces and prefixes in element names. Good idea to allow the use of prefixes. But it is not explained how or where to declare these prefixes. This is considering my current knowledge of the toolchain which is the one we use at OSLO and not the one used by SEMIC (altough both are derived from the same source).

OWL vs OWL 2

To quote a single site (https://semiceu.github.io/style-guide/public-review/arhitectural-clarifications.html):
This is addressed with lightweight ontologies expressed in OWL 2.
formal ontology expressed in OWL

I assume it is still the same version and not OWL 1. Or is this an intention
and both OWL 1 and OWL 2 shall be used for different purpose?

Some editorial comments

Section Attribute definition and usage, Examples: the text says that the ePO- ontology uses the "type" attribute converted to a dependency connector, but the figure doesn't show this.
Section Multiplicity of attributes and connectors, Examples:
2a. [0..\*] --> [0..*]
2b. The text says "right" and "left", but the figures are aligned vertically.
Section Limited (OWL 2) expressivity, second last paragraph before the examples, see the bolded and italic part of the following sentence which should be corrected: "The need for setting property domain and range constraints shall be is better fulfilled by the data shapes expressed in SHACL language."
Section URI dereferencing, second last paragraph, this sentence is incomplete: "The other representations/formats."
Section References, the link for [puri-gov-eu] points to joinup in general, not that particular publication.

Chapter 9.4, Shape definitions

Embedding PropertyShapes into NodeShapes may result in problems with reusability and a limitation of SHACL expressivity.
There is no need to limit the use-cases of validation only to contexts where data is in tabular/relational form. Some PropertyShapes can be extremely reusable (for example a shape validating person's national identification number) in many contexts, and thus it is not justified that each application profile reimplements the same validation constraint, instead of using single nationally defined constraint that can be managed and updated in a centralised manner when needed (e.g. the ongoing NiN-update process in Finland).
Our proposal is to treat PropertyShapes as having an identity and being easily reusable in multiple Nodeshapes in multiple application profiles.

Data examples

It would be interesting to add some recommendations concerning the provision of data-examples. As it turns oou data-examples enhance significantly the understanding of a datamodel. These could be provided as artefacts in json-ld and/or other formats.

Blog Post: XML Mapping - Question 4: Intrinsic modularity

In the context of the webinar on the review of the Core Vocabularies and the Style Guide blog post on XML Schema the following question was raised:

Is the below proposed approach of structuring Core Vocabularies in different files useful? [Link to slides] [Link to Blog post]

The feedback below was provided during the webinar:

Having different files is useful, however a consolidated presentation is also fine.

The community members are encouraged to use this issue as a platform to provide additional feedback on this question.

Lightweight ontologies (chapter 2.2)

In Style Guide (chapter 2.2, Ontology) is stated:

In the SEMIC context, we only consider lightweight ontologies [defined in rule SC-R2]

We are afraid that this is too lightweight and limits expressivity too much. The limitation is perhaps due to restriction of UML notation, but makes later usage of data models more challenging.

In our approach, we are going to use OWL profiles that already narrow expressivity, meaning OWL 2 profiles (EL/QL/RL) that are already quite narrow.
Limiting the expressivity of OWL further loses expressive power which is useful in mapping ontologies together, enriching data with inferencing and performing semantic mapping between datasets.
With relying on the "lightweight" approach, we can easily end up in a situation where users merely tag data with commonly interpreted resources. This will make data machine-readable but not machine-interpretable to any meaningful extent, similar to how e.g. DC Terms functions to a large extent.
The primary use for semantic interoperability is the fact that we are harnessing computation to make explicit the implicit links and features within data structures, and to allow for clear separation of what has been asserted and what can be derived from the data. Thus rich inferencing allows us to avoid situations where derived records would have to be stored and managed as separate assertions and not as dynamic inferred views into the primary records.

Conceptual model as single source of truth (chapters 2.1 and 7.1)

We think that UML is not solid enough basis for conceptual modelling. We understand that UML has been chosen e.g. because of the graphical presentation that may be easier to business users to comprehend. But this may result in problems in later phases of the process. Therefore, we suggest an approach where the concepts are expressed in OWL, and the tool visualises them automatically or semi-automatically. Visualisation can be presented in UML-like notation. Shortly, the single source of truth should be based on a formal model, from which different representations are derived from.

SKOS vocabularies are conceptual models, as are OWL ontologies. SKOS is not logically rigorous to the same extent as OWL (due to the nature of skos:closeMatch, skos:broadMatch etc. properties).
The UML specification is not meant to be a formal verified specification in the sense of being logically internally coherent. It contains conceptual gaps (open definitions) and a wide latitude for interpretation. It is also a very heavy specification and not conceptually "lightweight" at all. In addition, the majority of UML modelers are not well-versed in MOF and have a rather pragmatic OOP-like interpretation of the principal structures. Many modelers only model with a specific stereotype for e.g. producing XML Schemas and see a direct 1:1 mapping between schemas and the UML Class diagram. There are nevertheless major gaps and differences in how class diagrams (and their limits) are interpreted. For example, many assume that polyhierarchies are disallowed by base UML because in their modeling domain it is prohibited. All in all, UML is not a conceptually stable basis, except when interpreting it with a very narrow common denominator in the user-base.
The single source of truth should be based on a formal model, from which different representations are derived from. A single source of truth should be formal, because we do not want to rely on differing conceptual interpretations of the base model. Instead, we want axioms on top of which we can form models and consistently end up with the same interpretations. OWL is based on description logics which provides this kind of truth. Granted, there is always interpretation in how OWL constructs are applied to a domain, but the construct themselves are mathematical entities and thus their structural semantics are unambiguous.
There is a fundamental recurring misunderstanding between UML modelers and the RDF domain in what a "class" encompasses. If our formal vocabulary is RDFS/OWL (as proposed by SEMIC), then we are talking about sets, not templates for types. We cannot give the wrong impression to modelers that e.g. attributes do not have an identity, whereas in OWL each DatatypeProperty is always an atomic individual and not owned by any class.
We must not confuse an application paradigm (e.g. OOP programming or data quality/lifecycle rules) with what the data conceptually is (i.e. what kind of information it is). The conceptual OWL model level must not attempt to enforce instance-related constraints on conceptual definitions on the kinds of instances. Naturally in instance data (ABox) we should require that a room must always be composed in exactly 1 building and its life-cycle is dependent on that of the building. But, this is separate from the conceptual essentialist definition of a room (we can model an ontology where the necessary condition for something to be a room is that it is in exactly one building). But, here we must remember that due to OWA only inconsistencies can be used to "invalidate" instance data, a missing link between a specific room and a buliding is just that - missing data and not a cause of concern on the ontological side.
The whole purpose of a formal conceptual model is to enrich the data, and this is separate from the purpose of an application profile used to validate instance data. The first is primarily descriptive, latter prescriptive (as said also in the Style Guide). The point is that the latter also operates within the conceptual frame provided by the first one, so here as well the possibilities for rich inferencing provide us opportunities to validate data that has not been explicitly tagged as belonging to a certain class or having certain properties - those classes or properties might surface during inferencing, allowing us to make the application profiles simpler and more universal.
Using UML as a single source of truth is risky also in that sense that the explicit assertions in UML models are not always transmitted 1:1 between modeling applications, as the XMI spec doesn't encompass unambiguously all possible use-cases, and applications interpret it in varying ways. It usually works, but it is not a reliable transfer format.
UML - particularly a comprehensive reliable XMI support - relies on costly modeling applications, which raises an adoption barrier for users. On the RDF side, there is a multitude of FOSS and freeware versions of commercial tools available, most of which rely on well-established libraries (TQ SHACL API, RDFLib etc.). Additionally, Turtle is an easy lightweight format for interpreting the models in addition to graphical representations. With UML we would be constrained purely to the graphical model, as the XML representations are not user-friendly.

Expanding existing controlled vocabularies

A request that comes through CPSV-AP: SEMICeu/CPSV-AP#79

In the current style guide, it is explained how to model and and use them, would it possible to suggest how to expand them ?

Blog Post: XML Mapping - Question 2: How to define data types

In the context of the webinar on the review of the Core Vocabularies and the Style Guide blog post on XML Schema the following question was raised:

Is it important to bring a mapping like the Literal expression below within the XML schema? Are there any Member States using SAWSDL, as Finland? [Link to slides] [Link to Blog post]

The feedback below was provided during the webinar:

A preference for the usage of SAWSDL is expressed.
In the case of the Netherlands CPSV-AP is used as a blueprint, therefore the need for an XSD does not exist.

The community members are encouraged to use this issue as a platform to provide additional feedback on this question.

Transformation from UML to RDF shall consider the definition of reuse

The transformation script shall consider re-used terms, and external terms must not be generated in the OWL artefact.
However, in the SHACL artefact, the reused terms might need to appear as they may play a role in the validation.

Multiple classification

It turns out that generalizations sets are rather important in conceptual modeling. It should be clear that there can exist several kinds of subclassifications that do not exclude each other; Eg an Organization can be a publicOrganization and at the same time a RegisteredOrganization (or not). Allows double typing of data.

Blog Post: XML Mapping - Question 1: Need for metadata

In the context of the webinar on the review of the Core Vocabularies and the Style Guide blog post on XML Schema the following question was raised:

Currently SEMIC does not provide an XML schema. What is the minimum metadata needed for an XML schema aside from versioning? [Link to slides] [Link to Blog post]

The feedback below was provided during the webinar:

The responsible party or contact point is mentioned as a key metadata element.

The community members are encouraged to use this issue as a platform to provide additional feedback on this question.

Fixes to References

https://semiceu.github.io/style-guide/1.0.0/references.html

sort them
provide a link in each reference
remove the highlighting anchor from reference [vocab] VOCABULARIES (2015).

Refine the "open and closed world assumption" recommendation

Regarding the last paragraph of https://semiceu.github.io/style-guide/public-review/gc-data-shape-conventions.html#sec:dsc-r3 :

This may have an impact, especially for larger vocabularies (such as the eProcurement ontology), on how the data shapes are organised. As data shapes may be used to suggest how the data may be fragmented and how it shall not.

I would welcome more details on this topic. Could the style guide elaborate on how the eProcurement ontology organizes its shapes ? The paragraph is not explicit on this. How concretely can data shapes be used to suggest how the data may be fragmented ?

We are often seeing situations where it is necessary to design two levels of shapes:

Shapes to validate/describe single datasources, where each datasource holds a part of the data
Shapes encoding the complete application profile, once all datasources have been merged

Does the style guide offer any suggestion on how these two levels can/should be articulated ? can this be a single SHACL file, with extensions of the SHACL file with certain constraints deactivated ? (using sh:deactivated) ? should these 2 levels be maintained separately ?

Names of enumerations

In § Case sensitivity and charsety. Why should the name of an enumeration start with a lowercase letter? In the UML metamodel an enumeration is a specialisation of the classifier Datatype. Datatype names start with a capital, so why not enumerations?

Announcement of work on an RDF-based approach to creation of Application Profiles

As a feedback to the blog post on Application Profiles I would like to announce, that for some time already, based on the SEMIC Style Guide, in our research group at Charles University, we are also working on an RDF-based way of defining Application Profiles and their relationships, with a long-term goal of supporting coherent change propagation among them, as they evolve.

Our aim is to integrate this approach with Dataspecer, our already existing (open source, free to use) tool that generates consistent data specification technical artefacts for multiple data formats, following the standards and keeping the data mappable to RDF, including JSON Schema, JSON-LD context, XSD schema, XSLTs for XML->RDF, RDF->XML, CSV on the Web descriptors including RDF mapping, SHACL shapes, Bikeshed based specification document, etc., based on, among others, RDFS-based vocabularies and data structure definitions created in Dataspecer, reusing them.

Our approach differs slightly from the one identified in the blog post in the way how the application profiles are represented. We work with a new RDF vocabulary, complementing PROF, and we do not rely on SHACL to represent the entity profiles. Instead, we aim to generate the SHACL shapes from our representation, as SHACL is a vocabulary for validation, not for profile definition. To illustrate the idea, here is a preliminary conceptual model of the vocabulary:

We are open to any kind of collaboration/discussion on this topic.
One of our first goals is to represent the Czech DCAT-AP-CZ profile of DCAT-AP (including the relevant subset of DCAT default profile and DCAT-AP) in this way as a proof-of-concept, during 2024.

Specification of the documentation of an application profile

SHACL Play can generate a documentation from a SHACL definition of an application profile (https://shacl-play.sparna.fr/play/doc)
This documentation contains property tables with following columns:

property label
property URI
expected value (can contain either a list of allowed values, a reference to a class, to a node shape, or simply IRI / Literal)
cardinalities
description

Does the style guide have any recommendation on how to best provide a human-readable documentation of shapes ? especially regarding how the properties table should be generated ?

Provide an example for the data shape definitions (DSC-R4)

Suggestion from @NatasaSofou (slightly rephrased), regarding the changes that we made to the DSC-R4: Shape definition conventions to address issue #75 :
"I think this change is a good addition to the Datashape definitions. However, I think that an example in TTL could further clarify the added text. It can be inline or at the end of the description."

Error in SEMIC AP profile

The DCAT model example used to illustrate the Appliction Profile model gives names to assocations which goes against the rule that target roles should be named instead of connector names,

This probably affects the profile model requiring below addition whereas - at least - the association meta-class should be removed

AP Modelling - simplify the profile

The AP Modelling blog and associated UML model demonstrates a DCAT and DCAT-AP profile. This allows masking classes and tailoring properties to a specific application.

Toggling, i.e. masking or activating the classes for a specific AP by means of a stereotype may not be needed. One could argue that it's easier to create either

a "AP container" class that refers to the selected AP concepts. This is tried and tested in a XSD context where the container class turns into a root element of an XML document - which is close to an information exchange use case. or
an AP package where active classes are specialised from the ones in the ontology. An AP would be created from the package's XMI. or
a diagram that offers an "AP view". Only the classes in the diagram are used in the AP.

Subsidiary questions;

the class application profile inherits from TerminologyProfile::ScopeIdentifier and Profile Constraints::view specification. How and where are these defined ?
did you investigate existing technology profiles such as the ODM ?
the Class meta-class has a boolean attribute isActive which supposedly indicates that this concept is used in the AP. Is this orthogonal to the "Mandatory/Optional" toggles on the Class ? There already is an "active" switch on the class (see right panel).

Use subclassing of skos:Concept

When working with and defining the use of concepts, define a subclass of skos:Concept to represent the concepts (or enumerations) you need. So instead of just referring to skos:Concept we define a well defined class, making it easier to refer to in an explicit and unambiguous way.
Examples of classes defined as subclasses of skos:Concept:
From Organization Ontology (ORG):
https://www.w3.org/TR/vocab-org/#class-role
From Data Catalog Vocabulary (DCAT):
https://www.w3.org/ns/dcat#Role
From Data Quality Vocabulary (DQV):
https://www.w3.org/TR/vocab-dqv/#dqv:Dimension
https://www.w3.org/TR/vocab-dqv/#dqv:Category

Concept versus Class

In § What is a conceptual model? You are quick to substitute the notion of Concept with the notion of Class. It would maybe be good to explain that Classes are a sort of extension of Concepts, by adding things like attributes to further describe the characteristics of instances of some Concept.

RDFS vs OWL

https://semiceu.github.io/style-guide/1.0.0/gc-semantic-conventions.html

I find R1 and R2 kind of contradictory. - one says no RDFS but OWL2,

the other says cut down OWL2 to RDFS minus domain/range (and add Object vs Datatype properties)

Furthermore, RDFS is a subset of OWL2. So I think these 2 rules should not put RDFS and OWL in opposition, but simply enumerate the RDFS and OWL constructs recommended for use

Bad class inheritance example

In § Class inheritance. The attributes and associations of PublicOrganization are all attributes of org:Organization. So they should be visualised in that class and not in the subclass. The association unitOf/hasUnit is incorrect also: it should be between Organization and OrganizationalUnit.

Sharing Italy ongoing work on vocabularies technical writing guidelines

Italy is working on technical guidelines to ease the collaborative creation of vocabulaires.
Guidelines inherit various best practices from source code management, such as:

automatic linting and style guidelines;
consistent formatting;
ease of pipeline creations.

Guidelines can be google-translated from https://teamdigitale.github.io/dati-semantic-guida-ndc-docs/

In parallel, a template repository for semantic assets (ontologies, vocabularies, data schemas) is published on

https://github.com/teamdigitale/dati-semantic-cookiecutter

The repository uses tools to validate and standardize content at each commit, and gurther checks can be implemented.

Domain & Range

In § Fixed UML interpretation. Maybe explain why domain & range declarations are left out of the OWL ontology (and kept for the SHACL shapes). I agree with the idea but think it would be good to explain why (eg greater flexibility in choice of range when using the vocabulary for different application profiles).

Namespace organisation

Rule CMC-R3 states that

All UML element names should be fit for URI generation with clear namespace organisation.

I've seen two approaches in SEMIC:

The styleguide suggests tagging elements with URI information. E.g. data property addressArea has tag uri with value http://w3.org/ns/locn#AddressArea.
Model2owl uses no tags but needs an external configuration file namespaces.xml that maps prefixes to URI's.

This model2owl approach seems to contradict some design premises

the UML model is a single source of truth where I would expect to find everything needed to transform the UML into RDF.
encoding prefixes into package names implies that packages are namespaces. Does this match the idea that packages merely organise concepts but don't carry meaning ?
a notation such as DCAT::dcat::Catalog is confusing: which is a package and which is namespace ?

I suggest that the rule become binding, e.g.

Replace

All UML Element names should be fit for URI generation with clear namespace organisation.

All UML Element names must be fit for URI generation with clear namespace organisation.

With additional hints for developers such as

An entity can have a uri tag.
A package can define a base-uri that trickles down to entitities inside
Entities use the base uri from the containing package. Entities can override the uri
Entity uri is a concatenation of base-uri and element name
Connectors that have no uri tag use the base uri from the source-class.

Alternative model2owl-style (which has drawbacks described above):

The namespace map can be stored as a note in the model (clunky but it works)
If a package name matches a known prefix then everything inside the packages is allocated to that namespace

Blog Post: XML Mapping - Question 3: Reusing existing XML schemas

In the context of the webinar on the review of the Core Vocabularies and the Style Guide blog post on XML Schema the following question was raised:

What kind of approach should be considered when integrating external namespaces that do not have a respective XML schema? [Link to slides] [Link to Blog post]

The feedback below was provided during the webinar:

The SDG XSD is available and can be used as inspiration.
The community requested to take OWL/XML into consideration.

The community members are encouraged to use this issue as a platform to provide additional feedback on this question.

Style guide should encourage provision of examples/instance diagrams

Conceptual data model are hard to understand especially for business stakehoders.
From my experience, an efficient way to convey the intended usage of a data model to business stakeholders or to developers is to provide examples / instances diagrams, along with the conceptual model and the ontology implementation.

I am concerned that the SEMIC style guide nowhere adresses the question of providing examples, e.g. it is not listed in the artifact types (https://semiceu.github.io/style-guide/public-review/arhitectural-clarifications.html#sec:artefact-types)

UML does offer ways to design instances diagrams, I think.

Were instances diagrams deliberatley left out of scope ?

Encoding of "complex business rules" in the shapes

We are often facing the situation where a SHACL application profile needs to encode "business rules" that go beyond what the SHACL core constraints can express. Typical real-world examples are:

"All instances of class X with dcterms:type = A or B must have the value M, N, O, or P in the property Z"
"On instances of class X with property P1 referring to a entity where dcterms:type = A, property P1 is mandatory, otherwise it is not"

These can be encoded as an additionnal "business rule" layer in the application profile using SHACL SPARQL constraints (https://www.w3.org/TR/shacl/#sparql-constraints)

Is this in scope of the style guide ?

AP model stereotypes need refinement and mapping to shacl

The Application Profile defines mandatory, recommended, and optional elements.

It makes sense to discuss the meaning of the stereotypes in terms of SHACL messages that non-conformance would generate. This allows one to hammer out the intention.

What does it mean when a mandatory attribute has multiplicity [0..n]. Mandatory, optional, arity 0..* aren’t orthogonal.
o There should a “nillable” requirement to deal with absent attributes. For instance the statement “this wheel vehicle has null wheels” means “this wheel vehicle has an unknown number of wheels”. This is like "entry left blank on purpose".
o Situations arise where insight grows and facts are added by-and-by. Some datasets use nulls to indicate “I’m aware that this attribute must be informed but at this stage its value is unknown”. The SHACL message should reflect this by issuing a warning like “The attribute is blank. Confirm that this is intended”. Upon maturity, these messages should all be gone. Thus, a wheel vehicle has a <mandatory,nillable> attribute “numberOfWheels” and an individual “MyMotor” would have numberOfWheels=null which at a later stage would turn to numberOfWheels=3.
o A model attribute <optional,nillable> numberOfWheels [0..1] can express:

numberOfWheels = null means “will be informed later”
numberOfWheels absent means “irrelevant for this application”
numberOfWheels = 2 asserts “2 wheels”
numberOfWheels = 0 asserts “no wheels”

How does one express nillability in RDF ?
An application may need mutually exclusivity such as “for an individual person, provide either age or date of birth”. Defining Attributes age[0..1] and dateOfBirth[0..1] is too lax and stereotyping doesn’t help either.
o sh:xone can express this in SHACL but UML can't. In a previous project, we simply attached a UML note saying "XOR".

Here's my suggestion for the set of requirement levels, note that I dropped "optional" because I'm unaware of the difference between arity [0..n] and optional :

HTML page title is missing

We need to add a title to the HTML page, so that people looking at the tabs would know that this is about the SEMIC Style Guide.

Currently it looks like this:

semiceu / style-guide Goto Github PK

style-guide's Introduction

Contributing

Licence

Self-assessment & Validation

style-guide's People

Contributors

Stargazers

Watchers

Forkers

style-guide's Issues

Recommend Projects

Recommend Topics

Recommend Org