Comments (8)
Quick and practical advice for this and similar data validation tasks:
You need to produce your custom SHACL shapes definitions, either manually or derived from the RDFS or OWL vocabulary or vocabularies and maybe manually augmented. Neither OWL nor RDFS are particularly suited for defining constraints regarding the domain or range of properties.
After all, that is the entire motivation for the notion of data shapes and shape languages like SHACL; worthwhile reading is e.g. Tim Berners-Lee's piece Linked Data Shapes, Forms and Footprints.
from pyshacl.
It looks to me like you've stumbled into an undesired extension of schema:nutrition
, and you're looking for why it's not prevented by SHACL. The short answer is, this particular use is part of the open-world behavior of RDF (not necessarily just OWL), and hasn't been closed off by any closed-world rule from SHACL. schema.org's rdf:Property
s are written in a way that gently suggests what classes they should be associated with, but using structural predicates that don't trigger behaviors from RDFS or OWL inferencing engines.
First, on trying to flag this as an error from inferencing: The ontology you linked is not an OWL ontology, and the nearby OWL transcoding does not include any owl:disjointWith
statements. Without owl:disjointWith
, even if the properties were defined in a way supporting OWL inferencing, you would not find any errors from OWL inferencing by using schema:nutrition
on something unexpected, like a schema:Concert
. The thing using schema:nutrition
would just become a schema:Concert
and either a schema:MenuItem
or schema:Recipe
.
Neither OWL nor RDFS inferencing using the schema.org ontology (i.e. the RDF Schema or OWL Ontology) would cause new @type
values to be inferred. schema.org's properties in the RDF Schema eschew rdfs:domain
, instead using schema:domainIncludes
which (IIRC) has no entailment (/inference) semantics to generate new RDF triples.
And, while the OWL transcoding has rdfs:domain
statements, they all (IIRC - a little hard to grep at a skim) tie to owl:unionOf
anonymous classes, which need an OWL reasoner to whittle down to a named class from other axioms describing the object. But again, this still won't raise an error for you, because the OWL transcoding does not include any owl:disjointWith
statements that would let you arrive at a logical inconsistency.
In a purposefully-unimaginative and closed-world style, I could declare that the set of concerts and set of cooking recipes are disjoint. But, that's not encoded in schema.org, so I'd need to write a SHACL rule for my own data. And, respecting imagination and open-world modeling, someone could probably link a video to a counterexample, with a rock band playing while someone's baking a cake. Depending on any other ontology foundations your graph is built on, that concert could rightly get a nutrition value.
If you wanted to be really careful with your usage of schema:nutrition
and make sure it only appears on MenuItem
or Recipe
, you would need this shape:
<urn:example:schema-nutrition-subjects-shape>
a sh:NodeShape ;
sh:or (
[
a sh:NodeShape ;
sh:class schema:MenuItem ;
]
[
a sh:NodeShape ;
sh:class schema:Recipe ;
]
) ;
sh:targetSubjectsOf schema:nutrition ;
.
from pyshacl.
Thank you for the prompt reply!
Thanks, it was fun to write and think through.
If I understand it right, in a few words, schema.org defines what a class is but does not define what a class is not. [...]
Yes.
[...] Is there a quick way to assume the
owl:disjointWith
relation with all other classes unless specified by the author?
This will likely be very difficult when you start considering subclasses.
When I mentioned "Other ontology foundations," I was alluding to how some other ontologies rely on some foundational ontology that divides "Everything" into subsets, and sometimes those subsets are disjoint, sometimes they aren't. For instance, one division drawn from the philosophical literature is endurants vs. perdurants, which behave differently with how they relate to time. (Take thing X. Freeze time. Is X wholly contained in that time slice---an endurant---or must you look outside that time slice to have the full definition of X---a perdurant? My body is an endurant. My life is a perdurant.)
Endurants and perdurants are disjoint. If you had those in your ontology near the top (near owl:Thing
), you'd probably pick up that a rock concert, which would be an event which would be a perdurant, can't also be a recipe, which is wholly containable in a time slice and is this an endurant. So, schema:nutrition
on a concert would fail because of a disjointedness several superclasses up.
Placing disjointedness statements in an ontology is unlikely to be, or remain once tried, quick.
The use case is this:
- I generated a markup and I want to validate it (similar to https://validator.schema.org/ but automatically).
- The schema.org validator is able to say that "nutrition" is not a property of "Product" or any superclass of "Product".
The shape I sketched would satisfy this use case.
If you wanted to be really careful with your usage of
schema:nutrition
and make sure it only appears onMenuItem
orRecipe
, you would need this shape:In the shape graph,
nutrition
only appears inMenuItem
orRecipe
through:schema:Recipe a rdfs:Class ; a sh:NodeShape ; rdfs:comment "A recipe. For dietary restrictions covered by the recipe, a few common restrictions are enumerated via [[suitableForDiet]]. The [[keywords]] property can also be used to add more detail."^^rdf:HTML ; rdfs:label "Recipe" ; rdfs:subClassOf schema:HowTo ; sh:property schema:Recipe-cookTime ; sh:property schema:Recipe-cookingMethod ; sh:property schema:Recipe-ingredients ; sh:property schema:Recipe-nutrition ; sh:property schema:Recipe-recipeCategory ; sh:property schema:Recipe-recipeCuisine ; sh:property schema:Recipe-recipeIngredient ; sh:property schema:Recipe-recipeInstructions ; sh:property schema:Recipe-recipeYield ; sh:property schema:Recipe-suitableForDiet ; schema:Recipe-nutrition a sh:PropertyShape ; sh:path schema:nutrition ; sh:class schema:NutritionInformation ; sh:description "Nutrition information about the recipe or menu item."^^rdf:HTML ; sh:name "nutrition" ;
These shapes prescribe how schema:nutrition
behaves when on schema:Recipe
. To check that schema:nutrition
is on a schema:Recipe
, you'd need to orient your shape around the predicate schema:nutrition
, not the subject schema:Recipe
. That's why the shape I sketched uses sh:targetSubjectsOf
.
from pyshacl.
Unfortunately, IRI typo detection is another hard problem in RDF because of the open-world nature. Think of it from the future-proofing perspective. schema:pouet
doesn't exist today. (I'm blindly assuming that, but the URL does 404 at the moment.) If you write some mechanism to flag schema:pouet
as an error today, it'd be appropriate for now. But say in a year, schema:pouet
exists, and is added as an isolated new property, with no other risk from the perspective of the schema.org maintainers. Now your schema:pouet
detector would flag new valid data as wrong.
Also, syntax-check - the way you wrote "pouet": "pouet"
does not actually resolve to an RDF triple, because it's not part of a JSON-LD context dictionary (being a non-existent term). My recollection is RDFLib currently silently drops such JSON that doesn't function as JSON-LD. (A string-literal can't be a predicate...except in n-triples, IIRC? In any case, while I forget the detailed reasoning and history, what you wrote would silently drop.) "schema:pouet": "1234"
would enact the "User invented something by typo" scenario you wanted.
FWIW, with an ontology community I work with, we introduced a "Concept typo checker" as part of an extension to pyshacl
, here. My current understanding of the state of the RDF world is that a general concept typo checker across all namespaces is not possible because of the open-world assumption, but a "fixed set" of concepts can be constructed for specific ontologies, though it has to be kept aligned with the ontologies' versions.
It might also be possible to do such a concept set with SKOS Concept Schemes if you want a SHACL-oriented solution. But, again, this set of skos:inScheme
statements has to be an artifact maintained in some way tied to an ontology's specific version, because the set of concepts will likely grow over time.
from pyshacl.
After a while, I figured out a quick and dirty way to perform the type checking under CWA:
- Recursively bring all parents' properties to the child class, then close the definition with
sh:closed true
. - Here is the Python function that work with this version of schema.org shape.
def close_ontology(graph: ConjunctiveGraph):
"""Load an input SHACL shape graph and close each shape
by bringing all property from parent class to currend class shape
then add sh:closed at the end
"""
query = f"""
SELECT DISTINCT ?shape ?parentShape ?parentProp WHERE {{
?shape a <http://www.w3.org/ns/shacl#NodeShape> ;
a <http://www.w3.org/2000/01/rdf-schema#Class> ;
<http://www.w3.org/2000/01/rdf-schema#subClassOf>* ?parentShape .
?parentShape <http://www.w3.org/ns/shacl#property> ?parentProp .
FILTER(?parentShape != ?shape)
}}
"""
results = graph.query(query)
visited_shapes = set()
for result in results:
shape = result.get("shape")
parent_prop = result.get("parentProp")
graph.add((shape, URIRef("http://www.w3.org/ns/shacl#property"), parent_prop))
graph.add((shape, URIRef("http://www.w3.org/ns/shacl#closed"), Literal(True)))
# subj sh:ignoredProperties ( rdf:type owl:sameAs )
# https://www.w3.org/TR/turtle/#collections
if shape not in visited_shapes:
ignored_props = graph.collection(BNode())
ignored_props += [URIRef("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"), URIRef("http://www.w3.org/2002/07/owl#sameAs")]
graph.add((shape, URIRef("http://www.w3.org/ns/shacl#ignoredProperties"), ignored_props.uri))
visited_shapes.add(shape)
# Replace xsd:float with xsd:double
for prop in graph.subjects(URIRef("http://www.w3.org/ns/shacl#datatype"), URIRef("http://www.w3.org/2001/XMLSchema#float")):
graph.set((prop, URIRef("http://www.w3.org/ns/shacl#datatype"), URIRef("http://www.w3.org/2001/XMLSchema#double")))
return graph
from pyshacl.
Thank you for the prompt reply!
If I understand it right, in a few words, schema.org defines what a class is but does not define what a class is not.
Is there a quick way to assume the owl:disjointWith
relation with all other classes unless specified by the author?
The use case is this:
- I generated a markup and I want to validate it (similar to https://validator.schema.org/ but automatically).
- The schema.org validator is able to say that "nutrition" is not a property of "Product" or any superclass of "Product".
If you wanted to be really careful with your usage of
schema:nutrition
and make sure it only appears onMenuItem
orRecipe
, you would need this shape:
In the shape graph, nutrition
only appears in MenuItem
or Recipe
through:
schema:Recipe
a rdfs:Class ;
a sh:NodeShape ;
rdfs:comment "A recipe. For dietary restrictions covered by the recipe, a few common restrictions are enumerated via [[suitableForDiet]]. The [[keywords]] property can also be used to add more detail."^^rdf:HTML ;
rdfs:label "Recipe" ;
rdfs:subClassOf schema:HowTo ;
sh:property schema:Recipe-cookTime ;
sh:property schema:Recipe-cookingMethod ;
sh:property schema:Recipe-ingredients ;
sh:property schema:Recipe-nutrition ;
sh:property schema:Recipe-recipeCategory ;
sh:property schema:Recipe-recipeCuisine ;
sh:property schema:Recipe-recipeIngredient ;
sh:property schema:Recipe-recipeInstructions ;
sh:property schema:Recipe-recipeYield ;
sh:property schema:Recipe-suitableForDiet ;
schema:Recipe-nutrition
a sh:PropertyShape ;
sh:path schema:nutrition ;
sh:class schema:NutritionInformation ;
sh:description "Nutrition information about the recipe or menu item."^^rdf:HTML ;
sh:name "nutrition" ;
from pyshacl.
The shape I sketched would satisfy this use case.
I just updated the example markup to include a rubbish field:
"pouet": "pouet"
Since I never know what the end user might type, I can't anticipate every misadventure with a shape.
from pyshacl.
FYI: Some more background from a schema.org perspective of the problem is in schemaorg/schemaorg#3408 (comment).
from pyshacl.
Related Issues (20)
- CI fails on pyduktape HOT 5
- [Discussion] Pinned vs loose requirements, application vs library use cases HOT 5
- SPARQL ASK rule incorrectly passing HOT 7
- Can the prettytable dependency be upgraded to major version 3? HOT 1
- Generating SHACL-SPARQL shapes in RDFLib HOT 2
- Shacl does not seem to work on rdflib.Dataset() HOT 1
- Unable to see the validation result in pyshacl HOT 2
- How to ensure subject is not a sh:BlankNode when validating a owl:Class in an ontology? HOT 4
- owl:imports for data graph is hardcoded to false HOT 2
- pySHACL uses deprecated API `pkg_resources` HOT 4
- Trouble with qualified value shape HOT 6
- Is a shape (whether node or property) restricted to one instance of sh:not? HOT 5
- Multiple sh:nots causing strange validation results HOT 2
- Multiple sh:xones on one sh:NodeShape grouping member shapes causing unexpected validation results
- `sh:entailment` and `pyshacl --advanced` HOT 1
- `sh:severity` on shapes linked by `sh:node` HOT 2
- `_evaluation_path` length is not configurable HOT 3
- [Discussion] Higher performance "remote" validation
- Difference between PySHACL and TopBraid SHACL API HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyshacl.