semiceu / dcat-ap Goto Github PK
View Code? Open in Web Editor NEWThis is the issue tracker for the maintenance of DCAT-AP
Home Page: https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
This is the issue tracker for the maintenance of DCAT-AP
Home Page: https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:accrualPeriodicity:
• Usage: 44.98%
• Issue: some datasets are using the URIs of the MDR Frequencies NAL while other datasets use the label of the same authority list.
• Is there a need for guidelines to harmonise usage across implementations?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:title and dct:format:
• Usage: 77.17% and 12.17% respectively
• Issue: Some distributions use dct:title instead of dct:format to provide information about the format of the distribution.
• Is there a need for guidelines to explain how to use dct:title and dct:format?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:description (mandatory property)
In a project for the French ministry of Health I am reusing DCAT-AP SHACL shapes.
The current SHACL shapes (at least those mandatory) have shapes which affect exactly the classes indicated. By using sh:targetClass, the shapes can affect subclasses (declared with rdfs:subClassOf in the model) and therefore improving the reusability of the shacl shapes.
The alternative would be define the subclasses directly in the shapes but this is not good for modularity.
By Bert van Nuffelen, created from comment at #4:
I in general like the number of fields as they have mostly some good definitions and are there to support some usecases.
They are rightly mostly annotated as optional, as the effort to give good values to them (and maintain it) requires effort.
The statistics shown by each of the issues are for me indications of two problems:
a) the definition did not fitted well, and hence not used
b) the data could not be collected (for what ever reason)
Removing the properties to those that most portals are providing has potential impact on the usability of the catalogue.
I think we should consider what are the target audiences for the catalogue and the see if the data provided is aiding to achieve that purpose. Today's open data portals mostly are oriented to accidental visitors who want to know what data exists and a bit browse through it. The catalogues are seldom used as a navigator to the technical audiences which would use the catalogues as their source to combine the actual data.
In order to do so technical properties as checksums, formats, sizes, api descriptions are needed.
And maybe this is a discussion worth: how can DCAT-AP handle more consistently the different dissemination forms and provide a highlevel description for it so that this data integration can be realized.
It would be great if we could come to an extension approach so that this can also lead to good interoperability. I am aware that this has it limits from the perspective of DCAT-AP, but if taking the perspective of the developers, self-descriptive services/distributions are what they ask for. In my opinion, it would be fantastic if it would be clarified how this bridge is made.
The order of startdate and enddate in dct:PeriodOfTime should be changed
in the diagram
https://github.com/SEMICeu/DCAT-AP/blob/master/releases/1.2/Draft/DCAT-AP_1.2.png
(as once reported and agreed on in
https://joinup.ec.europa.eu/discussion/order-attributes-class-diagram-dctperiodoftime
)
The DCAT-AP usage analysis on the European Data Portal showed the following:
adms:sample
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The following errors have been found in the current JSON-LD context:
The DCAT-AP usage analysis on the European Data Portal showed the following:
dcat:byteSize:
• Usage: near 0%
• Should these properties be removed/withdrawn/deprecated?
The following properties are affected
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:hasVersion:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:conformsTo:
• Usage: near 0%
• Should these properties be removed/withdrawn/deprecated?
Def:
An auxiliary to save mandatory attribution license texts.
Datatyp: rdfs:Literal
Reasoning for change:
attribution licenses require an information of how to quote correctly the attribution. This information was added to the dcatde: namespace -> this property should be added to the W3C DCAT.
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:format and dcat:mediaType
• Usage: 12.17% and 25.52% respectively
• Issue: dct:format is a recommended property but there are Distributions that use the optional property dcat:mediaType, which is more restrictive (IANA media type).
• Is there a need for guidelines explaining how to use dct:format and dcat:mediaType similar to guideline “How to use accessURL and downloadURL?”, e.g. if specific media type(s) can be provided, duplicate value of the media type(s) in both dcat:mediaType and dct:format?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:isVersionOf:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:source:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:type:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
adms:versionNotes:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:conformsTo:
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:publisher and dcat:contactPoint
• Usage: 43.85% and 45.44% respectively; 5.2% use both properties
• How could we promote the usage of these recommended properties, in addition to the guideline “How are publisher and contact point modelled”?.
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:language:
• Usage: near 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:title:
The DCAT-AP usage analysis on the European Data Portal showed the following:
owl:versionInfo:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:language
• Usage: 0%
• Should this property be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:rights
• Usage: 0%
• Should this property be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:spatial
• Usage: 83.75% with almost all countries represented
• Should this property be made recommended?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:description:
Hi,
In the context of the Flemish Open Data portal we would like to have advice on the creation of a specific instance of a license document.
Q: Is there a LicenseDocument vocabulary which allows to describe a license document by extending an existing license with instances of clauses?
The example case is as follows: the Flemish government has defined 2 licenses which state in short: you can use the data if you
a) mention the source, or
b) pay a fee
Now we would like to create a license document that instantiates the general license document with the specific name of the source (or the fee).
Pointers to how we best address this topic are welcome.
Bert
The DCAT-AP usage analysis on the European Data Portal showed the following:
foaf:page:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
With the purpose to reuse the SHACL shapes, it would nice the properties restrictions are written as IRI instead of blank nodes, so then reusers might disable certain shapes in case they don't fit in their application profile.
E.g.
dcat:Distribution
rdf:type sh:NodeShape ;
sh:property dcat:accessURLShape ;
dcat:accessURLShape
rdf:type sh:PropertyShape ;
sh:path dcat:accessURL ;
sh:class rdfs:Resource ;
sh:minCount 1 ;
sh:severity sh:Violation .
such shape requires that the access url should be explicitly indicated a rdfs:Resource whereas most of the time just a URL would be enough.
Using it as IRI could deactivate the shape for their need:
dcat:accessURLShape
sh:deactivated true .
When working on https://www.dcat-ap.de/ we encountered a missing link between the binary file described in the distribution and its checksum value in RDF:
If you serialize into RDF - so if you do not use the XML hierarchical structure - it is not clear which checksum value is associated with which file in which distribution. The problem occurs as order of things cannot be granted by the XML processor and Distributions and Checksums are two independent classes with a currently missing link and the rdf:nodeID is optional and not sure what to put into it to create this linkage.
Perhaps the use of the dataset dcterms:identifier or the distributions accessURL ( following the logic in https://joinup.ec.europa.eu/release/dcat-ap-how-use-identifiers-datasets-and-distributions ) as a checksumClass rdf:nodeID statement might be a workaround for this and can be added when reworking the spdx vocabulary or somehow be considered in the current ISA² DCAT-AP review?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:relation:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:language:
Def:
Uses a list of guaranteed availabilities, like
http://dcat-ap.de/def/plannedAvailability/temporary
http://dcat-ap.de/def/plannedAvailability/experimental
http://dcat-ap.de/def/plannedAvailability/available
http://dcat-ap.de/def/plannedAvailability/stable
Datatype: rdfs:resource
Reasoning for change:
This concept was added to dcat-ap.de and seems to be a good extension for dcat-ap itself; perhaps even W3C dcat.
The concept of indicating a datasets planned lifetime close to other metadata of the dataset seems a good idea for fostering reuse and decoupling the dataset metadata from its technical publishing channel.
The DCAT-AP usage analysis on the European Data Portal showed the following:
adms:sample:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
dct:isPartOf and dct:hasPart
• Usage: 0%
• Should this property be removed/withdrawn/deprecated?
def: the temporal granularity is the temporal resolution of the contained data
Datatype: rdf:Literal String Enum aus: second, minute, hour, day, week, month, quarter, year, 5-years
Reasoning for change:
This #metadata was already present in the former CKAN / OGD schema; is partially needed e.g. by the statistical domain or for metadata attribution of fiscal data. Notice: It is not the same as temporal coverage.
Def:
Person, who maintains a dataset; is responsible for it and its publication
Datatype: rdfs:Literal string
Reasoning for change:
The maintainer (such as the curator) cares for the dataset and is in charge for keeping metadata up to date. This concept was added in dcat-ap.de as the existing ones were found not to be appropriate.
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:rights:
• Usage: near 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:source:
Bert van Nuffelen, TenForce
Why is the the cardinality for dct:type on licenceDocument set to max 1? For some licenses more than one concept from ADMS:LicenceType controlled vocabulary can apply.
The DCAT-AP usage analysis on the European Data Portal showed the following:
spdx:checksum:
• Usage: near 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:accessRights:
• Usage: 0%
• Should these properties be removed/withdrawn/deprecated?
The DCAT-AP usage analysis on the European Data Portal showed the following:
dct:spatial
• Usage: 76.79%
• Should this property be made recommended?
Reason:
adms:changeType does not exist in ADMS vocabulary therefore cannot create a verifiable controlled vocabulary constraint.
Def:
Dataset contributor as a person or an organisation who, in addition to the creator, has given a relevant input to the dataset.
Datatype: rfds:Literal
Reasoning for change:
As in Germany we need additional roles, we reintroduced the existing Dublin core property “contributor”.
The DCAT-AP usage analysis on the European Data Portal showed the following:
dcat:themeTaxonomy
• Usage: 100% of the catalogues queried by the European Data Portal
• Should this property be made mandatory?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.