kg-construct / rml-cc Goto Github PK

View Code? Open in Web Editor NEW

3.0 5.0 3.0 1.08 MB

RML-CC: Collections and Containers definitions for RML

Home Page: https://w3id.org/rml/cc/spec

License: Creative Commons Attribution 4.0 International

HTML 99.01% JavaScript 0.22% CSS 0.06% Python 0.72%

ontology rdf rdfs rml rml-mapping shacl

rml-cc's People

Contributors

Stargazers

Watchers

Forkers

bjdmeest anaigmo dachafra

rml-cc's Issues

Domain and range of `rml:strategy`

In section 3.2.2 rml:strategy the following domain and range are defined for rml:strategy

The domain of rml:strategy is rml:GatherMap.

The range of rml:strategy is an IRI.

In general I think we should be cautious specifying the domain. In this case, I can imagine that other constructs may also have a need for specifying a strategy. I think it's best to leave domain open for this property.

The range is specified to be an IRI. This should be a class. I think it makes sense to define rml:Strategy as a class for the strategy constants.

Revisit introductory examples and use current example as more complex examples

Replace the introductory example with a more simple JSON example
Split the old example into two: one as a simple example (introducing new concepts such as providing identification to lists and containers) and one to illustrate the behavior of joins.

E.g., the example currently contains identifiers, which may be too complex for a simple example.

Rename stategies to start with lowercase

Rename rml:Append and rml:CartesianProduct to rml:append and rml:cartesianProduct to stick to the common rule that only class names start with a capital letter.

Testcases: add tests to cover everything

Would be nice to have test cases covering the complete specification:

Validating SHACL shapes from #32
Implementations validating their code
...

empty values collection-containers Vs core spec

I think we had this discussion in the past but I'd like to bring it up again. I see in this spec that we have rml:allowEmptyListAndContainer. Does this refer to an empty list or container or o an empty cell/element/object in the list? If the former should we clarify? if the latter shouldn't we use the same property as for the core specification? i.e. all term maps handled in the same way e.g., rml:allowEmpty.

all cases of lists

@chrdebru listed multiple cases of lists

1-collecting-values-from-the-same-term-map
2-collecting-values-from-different-term-maps
collecting the values from different term maps (simple)
collecting values from a reference object maps
2b-collecting-values-from-different-term-maps-with-multi-valued-term-maps
3-processing-empty-collections-and-containers
4-nested-collections-and-containers
5-collections-and-containers-as-subjects
6-identifying-collections-and-containers

I think so far the excel sheet of kg-construct/rml-core#26 covers only 1,3 and 5 or at least it definitely doesn't cover the cases where the lists are generated from different term maps, that still needs to be incorporated. I leave the outline of cases here for future reference.

(it also needs to be disambiguated if it's indeed meant term maps and not expression maps or both)

Create example of a gather map in a predicate map

While we have not yet come up with an example, nothing would prevent us from generating the following:

:a :b :c .

:b a rdf:Bag ;
    rdf:_1 :foo ;
    rdf:_2 :bar .

And the same for lists where the IRI is the predicate is also the IRI of the first cons-pair. Should we add an example to demonstrate this possibility?

Two ontology NS prefixes in the documentation

The following file ./ontology/documentation/sections/introduction-en.html has two ontology NS prefixes. Shouldn't we use rml: for the second? I do not want to propose a change, as maybe @anaigmo used a WIDOCO config file.

Example of lists and containers as subjects

Create an example of identifying lists and containers with a template or reference

Limiting the use of rr:column in the documentation.

Add a note about rr:column using an example. The section should be self-contained so that it can be removed in case rr:column is removed from the core specification.

Add example mixing use of fields and gather maps

This example should be added only when the fields specification is released.
Anyway, the issue does not prevent from releasing a first version of the specification.

See existing example: #10 (comment)

null lists

this question was raised by @chrdebru

how do we handle the generation of null lists?

this issue might be related to kg-construct/rml-core#16

Containers/Collections generation from multiple "term maps" or a single "term map" (multivalues)

how do we handle the two cases?

Containers/Collections may be generated from different "term maps" or from a single "term map" that returnes multi values. How do we handle the two cases? What are their similarities/differences?

Suggestions for extra test cases

test case that generates lists containing multiple occurrences of the same value. (see also kg-construct/rml-core#121)
test cases that test the generation of collections and containers in combination with graph maps
test case for generating a container across iterations (currently only lists are covered)

Replace rr: with rml:

Some classes and properties still use prefix rr:, replace them with rml: as prefix.

Example: rr:TermMap

Create examples of strategies

Create examples about strategies using an example with names (to demonstrate the utility of a cartesian product). Also, note that the cartesian product generates multiple lists/containers; therefore, they are identified by iteration + some "sentinel value."

do we want / can we generate collections & containers in the place of the subject map?

Ambiguity in "A rml:GatherMap MAY have exactly one rml:strategy property."

Is it me, or can

"A rml:GatherMap MAY have exactly one rml:strategy property.."

be interpreted as "it can have multiple"?

Is the following working not more precise:
"A rml:GatherMap MUST have at most one rml:strategy property." ?

Use RFC 9535 JSONPath compliant query expressions in spec and test cases

We should follow the proposed standard RFC 9535 JSONPath for all expressions used in examples in the spec and in test cases.

Most notably this requires all expressions to start with a $ as per https://www.rfc-editor.org/rfc/rfc9535.html#section-2.2.1.

Building a collection/container through multiple iterations with self-join

For my clarification, could you provide an example if I want to convert

{
  "values": [
    {
      "parentId": "a"
      "values": ["1", "2", "3"]
    },
    {
      "parentId": "a"
      "values": ["4", "5", "6"]
    },
    {
      "parentId": "b"
      "values": ["7", "8", "9"]
    }
  ]
}

into

<list/a> rdf:_1 ("1" "2" "3") ; rdf:_2 ("4" "5" "6") .
<list/b> rdf:_1 ("7" "8" "9") .

rr:column Vs rml:reference --> R2RML Vs RML extension

I'm wondering if the collection-containers specification extends RML or R2RML. I see that rr:column is used so it gives me the impression that it extends R2RML but could we phrase it in a way that covers both?

Deploy on Github Pages

Settings > Pages > Deploy from docs folder
Make a rendered version in the docs folder

owl:unionOf should be owl:oneOf

The property gatherAs has as a range one of the following: bag, list, ... Not the class of bags union list union ...

<http://w3id.org/rml/gatherAs> rdf:type owl:ObjectProperty ;
                               rdfs:domain <http://w3id.org/rml/GatherMap> ;
                               rdfs:range [ rdf:type owl:Class ;
                                            owl:oneOf ( rdf:Alt
                                                          rdf:Bag
                                                          rdf:List
                                                          rdf:Seq
                                                        )
                                          ] ;
                               rdfs:comment "Relates a GatherMap with the desired result type of collection or container."@en ;
                               rdfs:isDefinedBy <http://w3id.org/rml/cc/> ;
                               rdfs:label "gather as" .

Metadata of the spec has issues

this is minor, but just here to keep track and not forget

There are some https://https:// typos
most of the URIs don't resolve
No link to https://github.com/kg-construct/collection-containers-spec/issues

Re-structure the repo and change name

New structure:

folder spec with all the resources for the specification (the current content of the repo)*
folder ontology, which I think is coming from #31
folder shapes, coming from #32
folder test-cases, for #33

Other changes:

The new name of the repo should be: rml-cc
*Remember that all specs need to be conformant with W3C: https://respec.org/docs/, an example could be https://kg-construct.github.io/rml-star/spec/docs/

Changing the iterator inside an object map

@frmichel proposed the ability to change the iterator inside a term map. This would allow one to "manipulate" the "input" prior to applying the term map. A use case could be to flatten a list of values, for instance. Examples were included in the context of RML containers and collections, BUT I believe that this proposal would need to be discussed in the RML spec. What do you think?

Visual overview of the spec

I'm wondering whether a visual overview of this extension would make sense to include, to get the gist very quickly,

something like below

classDiagram
  direction BT

  class GatherMap {
    @type TermMap

    TermMap[] rml:gather 1
    [rml:Append rml:CartesianProduct] rml:strategy 0..1 rml:Append
    [rdf:Seq, rdf:Bag, rdf:Alt, rdf:List] rml:gatherAs 1
    xsd:boolean rml:allowEmptyListAndContainer 0..1 "false"
    }

should the collection/container be identified by an IRI?

Question posed by @chrdebru .

it is not explicitly mentioned at the spec, but in the most cases it is a blank node.

There are cases of RDF containers that have IRIs but ut is not commin

Containers/Collections from multiple "term maps" where each term map may return multiple values

How do we handle such cases? Do we generate the Cartesian product?

@chrdebru and @frmichel seem to think that this is a good idea but to be discussed.

Examples of valid RDF, but non-well formed lists and containers

Add examples of templates and references to identify lists and containers that lead to invalid lists and containers but are valid RDF. These examples are useful to indicate the pitfals.

namespace rml or other?

In the case of the source/target spec, we use a different namespace, but here we still use rml. What would be the best strategy? I leave it as a comment here but whatever w decide should hold for all specs.

Remove or update the goatcounter.com URL

This spec is copied from the Target one, but keeps the goatcounter URL in the dev.html to count visitors.

https://github.com/kg-construct/collection-containers-spec/blob/main/dev.html#L209

Goatcounter is a privacy safe alternative for Google Analytics.
Please update the URL otherwise the visitor counter will be wrong soon :)

Building collections/containers in multiple iterations lead to ill-formed collections/containers?

@andimou @chrdebru @dachafra @pmaria, here is a summary of what I tried to explain during the call.

In the current specification, we have assumed that creating a list/container through two separate iterations would yield property rdf:firt/rdf:_1 twice, leading to an ill-formed collection/container. The given example generates this

_:b0    rdf:first   1 , 3 ;
        rdf:rest    ( 2 ) ;
        rdf:rest    ( 4 ) .

whereas we were willing to generate this:

_:b0    rdf:first   1 ;
        rdf:rest    ( 2, 3, 4 ) .

Nevertheless, I wonder whether this is a conceptual issue or just an implementation issue. For now, I'd favor the latter: I think this is a question about how the RDF library that is being used will behave.

If so, instead of assuming that the implementation will behave wrongly, the specification could simply state what must be the right behavior: whenever, during an iteration, we create a collection/container that happens to already exists (the head node IRI or BN id already exists), then the processor needs to append the term(s) to the existing ones.

For a list, that means replacing the existing "rdf:rest rdf:nil" with "rdf:rest [ rdf:first ... ; rdf:rest rdf:nil ]."
For a container, that means adding a new "rdf:_n+1" terms, assuming that the currently last element is rdf:_n.

Is this feasible in terms of implementation, or am I missing a conceptual hurdle?

misspellings on 2 diagrams

https://kg-construct.github.io/rml-resources/portal/

rml:cartessianProduct is misspelled: should be rml:cartesianProduct (one s)

https://kg-construct.github.io/rml-cc/spec/docs/#fig-graphical-overview-of-rml-s-vocabulary-to-generate-rdf-collections-and-containers

rml:Append should be rml:append (lowercase)
rml:CartesianProduct should be rml:cartesianProduct (lowercase and one s)

Update README

README still points to the Target spec 😅