Comments (13)
I'd suggest exploring options for - and analysing eventual consequences of - auto-generating version-specific (i.e. long-term unstable) alternative dereferenceable URIs in form of something like:
http://edamontology.org/1.16/Environmental_omics
or
http://edamontology.org/EDAM_1.16.owl#Environmental_omics
from edamontology.
Hi @tetron @pekrau @mr-c @matuskalas @veitveit
To keep this thread alive and stimulate discussion ....
In EDAM_dev.owl, I added systematically for all non-deprecated Topic concepts a hasHumanReadableId annotation, e.g.
Nucleic acids hasHumanReadableId Nucleic_acids
These IDs are identical to the concept label, except that
- spaces replaced by "_"
- other non-alphabetic characters (in practice only "," (commas)) removed
It will take some server-side magic before URLs such as http://edamontology.org/Nucleic_acids resolve to anything (@matuskalas ?)
We have the issue of a term appearing in >1 subontology, e.g. Sequence alignment (http://edamontology.org/data_0863, http://edamontology.org/operation_0292). I suspect this again can be handled server-side, i.e. serving a metadata blob for disambiguation.
Just take what I did as an experiment. Once we agree a general direction, we could do it for formats, too. The reason I get to this now (apart from never having enough time for EDAM :-/) is because the EDAM is maturing nicely now (esp. Topics and Formats) making human-readable IDs more timely.
PS. OWLgeeky bit:
<owl:AnnotationProperty rdf:about="&mirrors;ID_0888">
<rdfs:label>hasHumanReadableId</rdfs:label>
<rdfs:subPropertyOf rdf:resource="&oboInOwl;hasAlternativeId"/>
</owl:AnnotationProperty>
from edamontology.
This info is available in the OWL file (IDs are associated with a term and synonyms) and also, I think, in the CSV download from BioPortal. I suspect this isn't quite what you want though, so again, please provide a mock-up / example file format and we'll look at it. Cheers!
from edamontology.
I have never quite understood why ontologies use numerical identifiers at all, even less why they are seen as the canonical identifiers. The point of an ontology is to define the terms and put them into context, as far as I understand it. The numerical identifiers are really like database foreign keys, which (ought to) be meaningless outside the context of the database instance.
from edamontology.
@jongithub For background, here's the original discussion with @hmenager : https://groups.google.com/forum/#!topic/common-workflow-language/EZlxSPndtSQ
The basic problem is that we want a stable set of human-readable identifiers that are suitable for use in a text file format that may be written and consumed by humans, which map to canonical EDAM numeric terms. @hmenager has stated that the existing rdfs:labels are not stable as a means of mapping to the numeric terms, which means either EDAM provides an official stable mapping, we create an unofficial vocabulary from the existing rdfs:labels (which will create a maintenance burden and likely lag behind EDAM development) or use a different ontology with a vocabulary that is more user friendly.
from edamontology.
EDAM is an ontology of unique concepts identified by URIs, both a concept and it's URI ID persist (remain stable) between versions. Everything else (including the preferred label (term), synonyms, definition, etc. is a property of the concept and not guaranteed to be stable.
With that said, we are of course aiming for term stability and are much closer to this in some areas (notably Format and Data->Identifier) than others.
Why absolute term stability cannot be guaranteed e.g. for the operation and data branches, is because the choice of a term and it's synonyms must reflect the current vernacular, i.e. the terms actually in use by real scientists, and this is volatile. Maintaining the right assignment will involve programmatic analysis (e.g. google trends) and input from scientific domain specialists. While we can hope - with funded plans in ELIXIR - for improved versions soon, the process must necessarily be subject to continual update. No ontology can ever be finished.
As for the idea of creating a new vocabulary, this is not a good idea: 1) it contributes to the mess and usability problems of UNcontrolled vocabularies, 2) there's an unsustainable mapping to maintain, 3) it misses an opportunity to pool resources and make EDAM better. Suffice it to say I know of no serious production ontologies that do this anymore, many past failed attempts (even by well resourced institutes), in short, it'd be a retrograde step at precisely the moment that ELIIXR is moving towards a common vocabulary.
So I hope instead we can find a practical solution to to get what you need (let's keep on discussing ...)
from edamontology.
For file formats I believe we have agreement from the ELIXIR-DK meeting in Amsterdam that IANA Media Types are an excellent stable identifier.
For data format subtype concepts & non-format concepts, what about a human friendly string with a version number attached?
from edamontology.
I have some reservations whether bio formats are really that well defined to support the normal expectations of a media type.
Nonetheless, I'd welcome IANA bioinformatics media types, so long as 1) the effort to define them is coordinated with ELIXIR / EDAM (which is funded to provide a comprehensive catalogue of bioinformatics formats) and 2) the IANA metadata and EDAM format definitions are cross-linked. We risk disjoint or duplicate efforts otherwise. I'd welcome very much someone e.g. OBF to coordinate / provide format specifications / documentation where these do not exist.
As for human-friendly, stable and persistent EDAM concept IDs, these can easily be created on all EDAM concepts, e.g. using alternative_id attribute or some such. I imagine a little work is needed to ensure they're resolvable (after some trivial transform). They'd normally be identical to the concept label but, in contrast, would be guaranteed to never change. So we'd have to be v.careful that we get them right. But it's no problem to create these.
The version I expect would be an integral part of the label, e.g. GFF3. Or if you meant the EDAM version when the concept was created, this is defined in the "Created in" annotation.
As for turn-around time, we're aiming for monthly releases in 2016, with a roadmap, which I hope will help.
from edamontology.
Can auto-generatad URIs like to ones above be easily implemented? If yes, then I suggest we go for them. As the version is given in these URIs, they always will be mappable to the actual IDs.
from edamontology.
I agree, this would be very nice. What (some) folk really want is this of course:
http://edamontology.org/Environmental_omics
which could be done where we're certain (ahem...) of stability of preferred labels. This is fairly easy for formats and topics but less so for operation and data.
from edamontology.
What is this mirrors;ID_0888???
from edamontology.
This will be done systematically in the new EDAM API. A version will always have to be provided when using labels as IDs, and as long as it is insisted on non-unique/duplicated labels across EDAM sub-ontologies, then a sub-ontology too, e.g. http://edamontology.org/1.21/topic/Nucleic_acids
After the web API is in place, there will be no need for copy&pasting the label
into a hasHumanReadableId
and thus opening new source of inconsistencies.
@pekrau Ontologies use numeric IDs in order to enable version-independent identification (URIs).
from edamontology.
"&mirrors;ID_0888"
looks like a weirdness introduced (by me by using) Protege. Good to check in the file what ID pattern is used for annotations / is it consistent?
from edamontology.
Related Issues (20)
- Refine toxicology topics
- Delete Comment Handle property
- [Edam Browser User] Change proposition for http://edamontology.org/data_1886
- Glycan and molecule structure improvements
- Check/refine authentication data concepts
- Merge or/and refine genome identifier concepts
- Add missing samtools, bedtools, beagle formats HOT 7
- extract ontology from EDAM using python
- Operation Mapping of amino acid residues/nucleotides
- Adjust SBOL EDAM entry HOT 1
- Add `metadata spreadsheet` under resource metadata
- 🐛 Obsolete Infectious tropical disease 🦟 has wrong 'replacedBy'
- [BUG] (a priori) Erroneous hierarchy for "qualsolid" [format:3610] HOT 2
- Adding the metabarcoding as a sub-topic, possibly as a child of metagenomics HOT 2
- Consider a GitHub Action for testing integration into TeSS HOT 1
- Improve citation info
- Dereferencing EDAM classes is not working properly HOT 3
- Requesting assistance on behalf of Human Cell Atlas Ontology
- [BUG] Invalid literal query string parameter in entrypoint redirect causes not found error HOT 1
- New format: MSP HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from edamontology.