linkml / linkml-model Goto Github PK

Link Modeling Language (LinkML) model

Home Page: https://linkml.github.io/linkml-model/docs/

Makefile 1.45% Shell 0.31% Python 97.79% Dockerfile 0.16% HTML 0.30%

linkml semantic-web json json-schema graph-ql metamodel yaml data-integration metadata uml

linkml-model's Introduction

LinkML - Linked Data Modeling Language

LinkML is a linked data modeling language following object-oriented and ontological principles. LinkML models are typically authored in YAML, and can be converted to other schema representation formats such as JSON or RDF.

This repo holds the tools for generating and working with LinkML. For the LinkML schema (metamodel), please see https://github.com/linkml/linkml-model

The complete documentation for LinkML can be found here:

linkml.io/linkml

linkml-model's People

Contributors

Stargazers

Watchers

Forkers

cthoyt dalito vincentvialard hsolbrig rly bmedi yarikoptic mach30 nicholsn melonora sami-amabit sneakers-the-rat noelmcloughlin vimala88

linkml-model's Issues

Automate deployment of docs

As part of the migration to poetry, we switched off auto-deployment of docs

This should be brought back, see the old workflow:

https://github.com/linkml/linkml-model/blob/813d73abaf1271a6fddf22556270549bd33ac408/.github/workflows/main.yaml

IMPORTANT check with @cmungall before merging any PR here. The way this site works is different from other schema sites

we don't use gh-pages
https://linkml.io/linkml-model/ serves up everything in the repo
the w3id PURLs rely on this structure - don't break it

e.g.

wget -vvv https://w3id.org/linkml/meta.yaml
...
Location: https://linkml.github.io/linkml-model/linkml_model/model/schema/meta.yaml [following]

✗ wget -vvv https://w3id.org/linkml/meta.owl
...
Location: https://linkml.github.io/linkml-model/linkml_model/owl/meta.owl.ttl [following]

Define semantics for referential integrity

A non-inlined class range may be intended as:

a reference to an object in the same document, database, or graph
a reference to an object in the external document, database, or graph

For external references, the mechanism for deferencing will vary depending on substrate:

for rdf, the reference MUST be a URI. If the URI is a URL then the URL may be deferenceable by conneg
for json/yaml docs, there would need to be a bridge from URIs to e.g. JSONPointer
for sql there would need to be a way to bind URIs to other databases

In all cases some kind of configuration would need to be supplied.

Additionally, the interpretation may be one of:

the reference MUST be present
the reference SHOULD be present

Furthermore, the semantics of deleting could be one of:

no action
cascading delete

`array: null` for any shaped arrays not allowed

I believe we arrived at a place where this would represent an array with any shape:

classes:
  MyClass:
    attributes:
      an_array:
        range: int
        array:

but currently the range of the array slot is a non-nullable array_expression:

linkml-model/linkml_model/model/schema/meta.yaml

Line 1430 in e53a511

range: array_expression

edit: wait obviously you can make metamodel attributes optional too my bad. i'll PR

Not sure how to express this in the schema, my first instinct would be something like this:

any_of:
- range: array_expression
- range: null

but I'm not sure if that's valid.

Another option to be able to differentiate between an explicit null like that and an any shaped array might be to make the syntax

array: Any

which would make the metamodel more straightforward:

any_of:
- range: array_expression
- range: Anything

URIs for types do not resolve

E.g

https://linkml.github.io/linkml-model/docs/types/Datetime/

gives the URI as

https://w3id.org/linkml/Datetime

but this doesn't resolve

Attach inlined metaslot to both 'slot definition' and 'slot expression'

This ticket is related to the discussion here: linkml/linkml#664

"From it looks like indeed the inline metaslots were deliberately attached to slot_definition, but in future metamodel versions these become slot-expression slots, so they can be used in expressions as well as named slots"

We should change the domain of inlined metaslot such that inlined can be used in 'slot definition' as well as 'slot expression'.

linkml.io home page is empty

https://linkml.io/linkml-model/docs/home/ just says "Introduction
about my_schema"

I think it should have some brief explanation of what linkml is (as on https://github.com/linkml/linkml) and a link to the github repo.

btw I wanted to replace the "bug" label with the "documentation" label on this issue but I'm not allowed to edit labels.

Need a link to the RE grammer used in slot patterns

We need to add documentation about the RE grammar in slot patterns

Need build artifacts in distribution

At the moment, the only files that we publish in pypi are the contents of the linkml_model directory. We also need to include the graphql, json, jsonld, jsonschema, model, owl, rdf, and shex directories, so that all of these artifacts can be accessed locally by other packages.

Se can't just add these to the setup.cfg directory as packages because it would end up adding "json", "owl", etc. to the python site_packages, so we need to build the following directory structure:

docs/
linkml_model/
     __init__.py   --> imports all the structures from python below
     graphql/
     json/
     jsonld/
     jsonschema/
     python/          --> everything from the linkml_model except the init file.  Add a new blank __init__ inderneath:
        __init__.py
         annotations.py
         extensions.py
         linkml_files.py
        mappings.py
        meta.py
        README.md
        types.py
     owl/
     rdf/
     shex/
     model/
tests/
.gitignore
...

For the short term, it would be cool if a setup.py expert could map the existing structure into something like the above. In the longer term, we probably need to go to this source structure (?)

XML allows xsd:minInclusive and xsd:maxInclusive to apply to dates as well as numbers

As per https://www.w3.org/TR/xmlschema-2/#dc-minInclusive. In DataHarmonizer we allow minimum and maximum date ranges this way.

I was wondering if linkml:minimum_value and maximum_value could also be loosened to include date min max?
Currently linkml has:

Description: for slots with ranges of type number, the value must be equal to or lowe than this
Range: Integer

Runtime footprint needs to be reduced

The pypi image currently requires the whole of biolinkml (soon to be linkml). While the whole package is needed to generate the output, the runtime portion should be cut down to the minimal libraries needed to support YAMLRoot and its relatives. This update should be done once we get linkml fully split into three components (model, runtime and development packages)

Add example instances of example schemas

This repo uses the linkml-example-runner framework, including examples of the schema, and testing those examples

https://github.com/linkml/linkml-model/tree/main/tests/input/examples

but this repo is special - it is a metamodel, so we need to go one level down, for each example schema, we want to include example instances

Proposal for enhancement to "id_prefixes"

id_prefixes currently says "the identifier of this class or slot must begin with one of the URIs referenced by this prefix".

This sort of implies that a prefix can reference more than one URI. I'm hoping that we are dealing with a model where every prefix maps to exactly one URI (note, however, that the reverse may not necessarily be true... I need to check whether we guarantee uniqueness on URI's per prefix)

when it comes to actually validating data, I would think that the following:

classes:
    HighClass:
        id_prefixes:
            - NCIt
            - SCT

Would assert that a YAML or JSON representation of the id of an instance of HighClass would necessarily start with "NCIt:" or "SCT:", while an RDF instance would start with https://nci.....org/ncit/... or http://snomed.org/id/.

What I would propose, however, is that we extend the definition of id_prefixes to support the following:

classes:
   HighClass:
       id_prefixes:
          NCIt:
          SCT:

Which would be the same as the above. We would extend the definition slightly, to allow:

classes:
    HighClass:
       id_prefixes:
           NCIt: ^C\d{5,6}$
           SCT: ^\d{6,18}$

Which would assert that the local name of a Curie or URI must begin with "C" and have 5 or 6 digits if it began w/ NCIt
or it must be a 6 to 18 digit number if it were SCT.

This would be a minimal change to the LinkML model itself, and, as of yet, the loaders do not do anything with ID prefixes so it would be no additions.

Questions:

Do we really need the "^...$" pattern or can we assume them?
Would we ever want two or more patterns and, if so, would we want something of the form

    id_prefixes:
        NCIt:
          - C\d{5}
          - M\d{7}

or would "(C\d{5}|M\d{7})" be ok?

It should be noted that SNOMED CT, in particular, includes a check digit and other formatting information that isn't expressible as a simple RE. Should we provide a hook for future use that names an algorithm or just let it slide.

My suggested answers are: 1) Assume them, 2) single RE is fine and 3) nah - not now

Move to model version 2.0.0

The URL changes and associated structure are breaking changes. Migrate to model version 2.0.0

move this repo to poetry

already made some simplifications to the Pipenv setup here: #102

Need to map a specific version identifier to a source

The current code in linkml_model/linkml_files.py maps a version identifier (e.g. "v0.0.1") to the SHA associated with the version. The SHA, however, doesn't give us the state of any file at that point in time, just the files that have changed. We need to add some code to linkml_files.py that allows us to go from:

GITHUB_PATH_FOR(Source.META, Format.NATIVE_JSONLD, "v0.0.1") to the version of jsonld/meta.model.context.jsonld at the point that the version tag was added.

what the heck is 11179 with respect to permissible value meanings?

Please create issues in main linkml/linkml repo, not here

To help us keep track of issues, please create all LinkML issues in https://github.com/linkml/linkml/issues rather than here. Thanks!

Add `value_presence` slot to `slot_expression`

Similar to slots like equals_string and equals_number, the value_presence slot would be useful for describing class rules, i.e. expressing rules like "if slot a has any value, then slot b must also have a value".

Unit tests needed

A generic set of unit tests need to be added to:

Verify that all of the generated python actually works
Verify that all of the other artifacts are syntactically correct.

With the exception of GraphQL and the docs directory, part 2 can be realized by using the json and RDF loaders once they are separated into runtime

Fix unit tests for httpd test

The code in https://github.com/linkml/linkml-model/blob/main/tests/test_rewrite_rules/test_rewrite_rules.py needs to be updated to test the rules in https://github.com/linkml/linkml-model/tree/main/tests/test_rewrite_rules/httpd

linkml_model/init.py needs relative paths

We believe that things will work a lot better if the init.py file has relative paths to the types, meta, etc. The "believe" because we won't know whether there are other issues w/ relative paths until we start using it.

`types` doesn't work as a relative path in python

from extensions import Extension
from types import Boolean

imports Extension as expected. types, however, is also the name of a python library and it appears that the python interpreter gives precedence to builtin libraries, resulting in the following error:

Traceback (most recent call last):
ImportError: cannot import name 'Boolean' from 'types' (/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/types.py)

An obvious alternative would be to use relative paths:

from .extensions import Extension
from .types import Boolean

But this is known to have issues -- one needs to have established a base before this works.

The other alternative, absolute paths:

from linkml_model.extensions import Extension
from linkml_model.types import Types

But this would require that the base (linkml_model in this case) be passed as an argument to the generator -- a bit of a challenge.

Add testing for py37, py39, and py310

Right now the github actions are only running on py38. It's mixed together the code that runs testing and does commits, so I don't want to mess that up by sending a PR that adds all of these in a strategy. Maybe there's a tricky way to use the if: entry to only run the commit steps on py38

Need a PR unit test

There is currently no unit tests for a PR