Coder Social home page Coder Social logo

Comments (13)

matuskalas avatar matuskalas commented on May 18, 2024 2

I'd suggest exploring options for - and analysing eventual consequences of - auto-generating version-specific (i.e. long-term unstable) alternative dereferenceable URIs in form of something like:

http://edamontology.org/1.16/Environmental_omics
or
http://edamontology.org/EDAM_1.16.owl#Environmental_omics

from edamontology.

joncison avatar joncison commented on May 18, 2024 1

Hi @tetron @pekrau @mr-c @matuskalas @veitveit

To keep this thread alive and stimulate discussion ....

In EDAM_dev.owl, I added systematically for all non-deprecated Topic concepts a hasHumanReadableId annotation, e.g.
Nucleic acids hasHumanReadableId Nucleic_acids

These IDs are identical to the concept label, except that

  • spaces replaced by "_"
  • other non-alphabetic characters (in practice only "," (commas)) removed

It will take some server-side magic before URLs such as http://edamontology.org/Nucleic_acids resolve to anything (@matuskalas ?)

We have the issue of a term appearing in >1 subontology, e.g. Sequence alignment (http://edamontology.org/data_0863, http://edamontology.org/operation_0292). I suspect this again can be handled server-side, i.e. serving a metadata blob for disambiguation.

Just take what I did as an experiment. Once we agree a general direction, we could do it for formats, too. The reason I get to this now (apart from never having enough time for EDAM :-/) is because the EDAM is maturing nicely now (esp. Topics and Formats) making human-readable IDs more timely.

PS. OWLgeeky bit:

    <owl:AnnotationProperty rdf:about="&mirrors;ID_0888">
        <rdfs:label>hasHumanReadableId</rdfs:label>
        <rdfs:subPropertyOf rdf:resource="&oboInOwl;hasAlternativeId"/>
    </owl:AnnotationProperty>        

from edamontology.

joncison avatar joncison commented on May 18, 2024

This info is available in the OWL file (IDs are associated with a term and synonyms) and also, I think, in the CSV download from BioPortal. I suspect this isn't quite what you want though, so again, please provide a mock-up / example file format and we'll look at it. Cheers!

from edamontology.

pekrau avatar pekrau commented on May 18, 2024

I have never quite understood why ontologies use numerical identifiers at all, even less why they are seen as the canonical identifiers. The point of an ontology is to define the terms and put them into context, as far as I understand it. The numerical identifiers are really like database foreign keys, which (ought to) be meaningless outside the context of the database instance.

from edamontology.

tetron avatar tetron commented on May 18, 2024

@jongithub For background, here's the original discussion with @hmenager : https://groups.google.com/forum/#!topic/common-workflow-language/EZlxSPndtSQ

The basic problem is that we want a stable set of human-readable identifiers that are suitable for use in a text file format that may be written and consumed by humans, which map to canonical EDAM numeric terms. @hmenager has stated that the existing rdfs:labels are not stable as a means of mapping to the numeric terms, which means either EDAM provides an official stable mapping, we create an unofficial vocabulary from the existing rdfs:labels (which will create a maintenance burden and likely lag behind EDAM development) or use a different ontology with a vocabulary that is more user friendly.

from edamontology.

joncison avatar joncison commented on May 18, 2024

EDAM is an ontology of unique concepts identified by URIs, both a concept and it's URI ID persist (remain stable) between versions. Everything else (including the preferred label (term), synonyms, definition, etc. is a property of the concept and not guaranteed to be stable.

With that said, we are of course aiming for term stability and are much closer to this in some areas (notably Format and Data->Identifier) than others.

Why absolute term stability cannot be guaranteed e.g. for the operation and data branches, is because the choice of a term and it's synonyms must reflect the current vernacular, i.e. the terms actually in use by real scientists, and this is volatile. Maintaining the right assignment will involve programmatic analysis (e.g. google trends) and input from scientific domain specialists. While we can hope - with funded plans in ELIXIR - for improved versions soon, the process must necessarily be subject to continual update. No ontology can ever be finished.

As for the idea of creating a new vocabulary, this is not a good idea: 1) it contributes to the mess and usability problems of UNcontrolled vocabularies, 2) there's an unsustainable mapping to maintain, 3) it misses an opportunity to pool resources and make EDAM better. Suffice it to say I know of no serious production ontologies that do this anymore, many past failed attempts (even by well resourced institutes), in short, it'd be a retrograde step at precisely the moment that ELIIXR is moving towards a common vocabulary.

So I hope instead we can find a practical solution to to get what you need (let's keep on discussing ...)

from edamontology.

mr-c avatar mr-c commented on May 18, 2024

For file formats I believe we have agreement from the ELIXIR-DK meeting in Amsterdam that IANA Media Types are an excellent stable identifier.

For data format subtype concepts & non-format concepts, what about a human friendly string with a version number attached?

from edamontology.

joncison avatar joncison commented on May 18, 2024

I have some reservations whether bio formats are really that well defined to support the normal expectations of a media type.

Nonetheless, I'd welcome IANA bioinformatics media types, so long as 1) the effort to define them is coordinated with ELIXIR / EDAM (which is funded to provide a comprehensive catalogue of bioinformatics formats) and 2) the IANA metadata and EDAM format definitions are cross-linked. We risk disjoint or duplicate efforts otherwise. I'd welcome very much someone e.g. OBF to coordinate / provide format specifications / documentation where these do not exist.

As for human-friendly, stable and persistent EDAM concept IDs, these can easily be created on all EDAM concepts, e.g. using alternative_id attribute or some such. I imagine a little work is needed to ensure they're resolvable (after some trivial transform). They'd normally be identical to the concept label but, in contrast, would be guaranteed to never change. So we'd have to be v.careful that we get them right. But it's no problem to create these.

The version I expect would be an integral part of the label, e.g. GFF3. Or if you meant the EDAM version when the concept was created, this is defined in the "Created in" annotation.

As for turn-around time, we're aiming for monthly releases in 2016, with a roadmap, which I hope will help.

from edamontology.

veitveit avatar veitveit commented on May 18, 2024

Can auto-generatad URIs like to ones above be easily implemented? If yes, then I suggest we go for them. As the version is given in these URIs, they always will be mappable to the actual IDs.

from edamontology.

joncison avatar joncison commented on May 18, 2024

I agree, this would be very nice. What (some) folk really want is this of course:

http://edamontology.org/Environmental_omics

which could be done where we're certain (ahem...) of stability of preferred labels. This is fairly easy for formats and topics but less so for operation and data.

from edamontology.

matuskalas avatar matuskalas commented on May 18, 2024

What is this mirrors;ID_0888???

from edamontology.

matuskalas avatar matuskalas commented on May 18, 2024

This will be done systematically in the new EDAM API. A version will always have to be provided when using labels as IDs, and as long as it is insisted on non-unique/duplicated labels across EDAM sub-ontologies, then a sub-ontology too, e.g. http://edamontology.org/1.21/topic/Nucleic_acids

After the web API is in place, there will be no need for copy&pasting the label into a hasHumanReadableId and thus opening new source of inconsistencies.

@pekrau Ontologies use numeric IDs in order to enable version-independent identification (URIs).

from edamontology.

joncison avatar joncison commented on May 18, 2024

"&mirrors;ID_0888" looks like a weirdness introduced (by me by using) Protege. Good to check in the file what ID pattern is used for annotations / is it consistent?

from edamontology.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.