In evaluating how the Common Workflow Language (<a href="https://github.com/common-wor

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Provide official mapping of alternate human-readable terms about edamontology HOT 13 OPEN

edamontology commented on May 18, 2024

Provide official mapping of alternate human-readable terms

from edamontology.

Comments (13)

matuskalas commented on May 18, 2024 2

I'd suggest exploring options for - and analysing eventual consequences of - auto-generating version-specific (i.e. long-term unstable) alternative dereferenceable URIs in form of something like:

http://edamontology.org/1.16/Environmental_omics
or
http://edamontology.org/EDAM_1.16.owl#Environmental_omics

from edamontology.

joncison commented on May 18, 2024 1

Hi @tetron @pekrau @mr-c @matuskalas @veitveit

To keep this thread alive and stimulate discussion ....

In EDAM_dev.owl, I added systematically for all non-deprecated Topic concepts a hasHumanReadableId annotation, e.g.
Nucleic acids hasHumanReadableId Nucleic_acids

These IDs are identical to the concept label, except that

spaces replaced by "_"
other non-alphabetic characters (in practice only "," (commas)) removed

It will take some server-side magic before URLs such as http://edamontology.org/Nucleic_acids resolve to anything (@matuskalas ?)

We have the issue of a term appearing in >1 subontology, e.g. Sequence alignment (http://edamontology.org/data_0863, http://edamontology.org/operation_0292). I suspect this again can be handled server-side, i.e. serving a metadata blob for disambiguation.

Just take what I did as an experiment. Once we agree a general direction, we could do it for formats, too. The reason I get to this now (apart from never having enough time for EDAM :-/) is because the EDAM is maturing nicely now (esp. Topics and Formats) making human-readable IDs more timely.

PS. OWLgeeky bit:

    <owl:AnnotationProperty rdf:about="&mirrors;ID_0888">
        <rdfs:label>hasHumanReadableId</rdfs:label>
        <rdfs:subPropertyOf rdf:resource="&oboInOwl;hasAlternativeId"/>
    </owl:AnnotationProperty>

from edamontology.

joncison commented on May 18, 2024

This info is available in the OWL file (IDs are associated with a term and synonyms) and also, I think, in the CSV download from BioPortal. I suspect this isn't quite what you want though, so again, please provide a mock-up / example file format and we'll look at it. Cheers!

from edamontology.

pekrau commented on May 18, 2024

I have never quite understood why ontologies use numerical identifiers at all, even less why they are seen as the canonical identifiers. The point of an ontology is to define the terms and put them into context, as far as I understand it. The numerical identifiers are really like database foreign keys, which (ought to) be meaningless outside the context of the database instance.

from edamontology.

tetron commented on May 18, 2024

@jongithub For background, here's the original discussion with @hmenager : https://groups.google.com/forum/#!topic/common-workflow-language/EZlxSPndtSQ

The basic problem is that we want a stable set of human-readable identifiers that are suitable for use in a text file format that may be written and consumed by humans, which map to canonical EDAM numeric terms. @hmenager has stated that the existing rdfs:labels are not stable as a means of mapping to the numeric terms, which means either EDAM provides an official stable mapping, we create an unofficial vocabulary from the existing rdfs:labels (which will create a maintenance burden and likely lag behind EDAM development) or use a different ontology with a vocabulary that is more user friendly.

from edamontology.

joncison commented on May 18, 2024

EDAM is an ontology of unique concepts identified by URIs, both a concept and it's URI ID persist (remain stable) between versions. Everything else (including the preferred label (term), synonyms, definition, etc. is a property of the concept and not guaranteed to be stable.

With that said, we are of course aiming for term stability and are much closer to this in some areas (notably Format and Data->Identifier) than others.

Why absolute term stability cannot be guaranteed e.g. for the operation and data branches, is because the choice of a term and it's synonyms must reflect the current vernacular, i.e. the terms actually in use by real scientists, and this is volatile. Maintaining the right assignment will involve programmatic analysis (e.g. google trends) and input from scientific domain specialists. While we can hope - with funded plans in ELIXIR - for improved versions soon, the process must necessarily be subject to continual update. No ontology can ever be finished.

As for the idea of creating a new vocabulary, this is not a good idea: 1) it contributes to the mess and usability problems of UNcontrolled vocabularies, 2) there's an unsustainable mapping to maintain, 3) it misses an opportunity to pool resources and make EDAM better. Suffice it to say I know of no serious production ontologies that do this anymore, many past failed attempts (even by well resourced institutes), in short, it'd be a retrograde step at precisely the moment that ELIIXR is moving towards a common vocabulary.

So I hope instead we can find a practical solution to to get what you need (let's keep on discussing ...)

from edamontology.

mr-c commented on May 18, 2024

For file formats I believe we have agreement from the ELIXIR-DK meeting in Amsterdam that IANA Media Types are an excellent stable identifier.

For data format subtype concepts & non-format concepts, what about a human friendly string with a version number attached?

from edamontology.

joncison commented on May 18, 2024

I have some reservations whether bio formats are really that well defined to support the normal expectations of a media type.

Nonetheless, I'd welcome IANA bioinformatics media types, so long as 1) the effort to define them is coordinated with ELIXIR / EDAM (which is funded to provide a comprehensive catalogue of bioinformatics formats) and 2) the IANA metadata and EDAM format definitions are cross-linked. We risk disjoint or duplicate efforts otherwise. I'd welcome very much someone e.g. OBF to coordinate / provide format specifications / documentation where these do not exist.

As for human-friendly, stable and persistent EDAM concept IDs, these can easily be created on all EDAM concepts, e.g. using alternative_id attribute or some such. I imagine a little work is needed to ensure they're resolvable (after some trivial transform). They'd normally be identical to the concept label but, in contrast, would be guaranteed to never change. So we'd have to be v.careful that we get them right. But it's no problem to create these.

The version I expect would be an integral part of the label, e.g. GFF3. Or if you meant the EDAM version when the concept was created, this is defined in the "Created in" annotation.

As for turn-around time, we're aiming for monthly releases in 2016, with a roadmap, which I hope will help.

from edamontology.

veitveit commented on May 18, 2024

Can auto-generatad URIs like to ones above be easily implemented? If yes, then I suggest we go for them. As the version is given in these URIs, they always will be mappable to the actual IDs.

from edamontology.

joncison commented on May 18, 2024

I agree, this would be very nice. What (some) folk really want is this of course:

http://edamontology.org/Environmental_omics

which could be done where we're certain (ahem...) of stability of preferred labels. This is fairly easy for formats and topics but less so for operation and data.

from edamontology.

matuskalas commented on May 18, 2024

What is this mirrors;ID_0888???

from edamontology.

matuskalas commented on May 18, 2024

This will be done systematically in the new EDAM API. A version will always have to be provided when using labels as IDs, and as long as it is insisted on non-unique/duplicated labels across EDAM sub-ontologies, then a sub-ontology too, e.g. http://edamontology.org/1.21/topic/Nucleic_acids

After the web API is in place, there will be no need for copy&pasting the label into a hasHumanReadableId and thus opening new source of inconsistencies.

@pekrau Ontologies use numeric IDs in order to enable version-independent identification (URIs).

from edamontology.

joncison commented on May 18, 2024

"&mirrors;ID_0888" looks like a weirdness introduced (by me by using) Protege. Good to check in the file what ID pattern is used for annotations / is it consistent?

from edamontology.

Provide official mapping of alternate human-readable terms about edamontology HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent