Comments (12)
@rdmpage
I'm also a firm believer in Occam's Razor, so IMHO the ontology we should be looking for is the simplest one that satisfies the competency questions, not a more elaborate one, whether driven by philosophy or moral objectives.
from tnc.
Here is a dataset that might be useful to play with:
Agricultural Research Council: Catalogue of Afrotropical Bees. http://doi.org/10.15468/u9ezbh
Accessed via http://www.gbif.org/dataset/da38f103-4410-43d1-b716-ea6b1b92bbac on 2016-10-26
It includes many possible pieces that could be connected using the existing TCS model. It is one of the datasets I played around with and described in this blog post in the section called "Taxon core with Occurrence, TypesAndSpecimen, Distribution, Reference, and Description extensions: Catalogue of Afrotropical Bees". There were a few additional comments about the dataset in the following post.
In my messing around, I used some of the TCS properties in my graph model (described in the post). The triples (as RDF/Turtle) can be downloaded here, but since my purpose was to use as many DwC terms as possible rather than to fully implement TCS, the dataset should probably be re-mapped to more fully embody the TCS graph model. (The mapping files that I used are here but probably won't make sense to anyone who hasn't already messed with Guid-O-Matic.) I don't have time to try re-mapping it myself right now, but if this line of inquiry continues for long enough, I might be able to work on it in a few weeks.
Oh, ho! I see that I put an example record here!
from tnc.
Names Test 3: Is a scientific name a homonym (either within a Code or across Codes)?
Here we test one of the classical "hemihomonyms", that is, a name which occurs in two Codes. Agathis montana is both the name of a wasp and the name of a tree. So, a simple query would be to see how many Codes have a given name:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>
SELECT *
WHERE {
?thing tn:nameComplete "Agathis montana" .
?thing tn:nomenclaturalCode ?code .
}
giving:
thing | code |
---|---|
urn:lsid:ipni.org:names:92693-1:1.1.2.1.1.1.2.1.1.1 | http://rs.tdwg.org/ontology/voc/TaxonName#botanical |
urn:lsid:organismnames.com:name:1407520 | http://rs.tdwg.org/ontology/voc/TaxonName#ICZN |
urn:lsid:organismnames.com:name:1953681 | http://rs.tdwg.org/ontology/voc/TaxonName#ICZN |
Note that we have two zoological names because ION (the source of the names) has two records for Agathis montana (the same problem bedevils IPNI). So, we need to be a little cleverer:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>
SELECT (COUNT(DISTINCT ?code) AS ?count)
WHERE {
?thing tn:nameComplete "Agathis montana" .
?thing tn:nomenclaturalCode ?code .
}
This query asks how many distinct Codes contain Agathis montana, and the answer is:
row | count |
---|---|
1 | 2 |
So, two codes have Agathis montana so it is a cross-Code homonym. TCS +1
Testing for homonyms within a Code is going to get a little messy given the number of duplicates some data sources contain, so we might want to test using publications, taxon authorship, or, in an ideal world, type specimens.
from tnc.
Names Test 5: What objective (Code-governed) synonyms exist for a scientific name?
One way to tackle this is if the name database has basionym relationships. IPNI and IndexFungorum do (although probably not complete).
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT *
WHERE {
?thing tn:nameComplete "Agathis montana" .
?thing owl:versionInfo ?versionInfo .
BIND(IRI(REPLACE( STR(?thing),CONCAT(":", ?versionInfo),"" )) AS ?iri).
{
?name tn:hasBasionym ?iri .
?name tn:nameComplete ?nameComplete .
}
}
This query is a mess because IPNI's RDF is, in a word, buggered. They use a version identifier for the name, which makes cross linking within the data almost impossible. A great example of what happens when you design outputs without thinking about users (sigh). So we have to mess about with the name id to get the query to work. The query also works only in one direction (i.e., what names have the query name as their basionym), we'd want to go in the other direction as well (what are the names linked to the basionym of the query name) but IPNI's RDF prevents this. IndexFungorum is probably OK for this sort of query. ION is clueless about basionyms, so zoologists miss out.
Here's the result:
thing | versionInfo | iri | name | nameComplete |
---|---|---|---|---|
urn:lsid:ipni.org:names:92693-1:1.1.2.1.1.1.2.1.1.1 | 1.1.2.1.1.1.2.1.1.1 | urn:lsid:ipni.org:names:92693-1 | urn:lsid:ipni.org:names:77076253-1:1.2 | Salisburyodendron montanum |
So, Salisburyodendron montanum is an objective synonym of Agathis montana TCS +1, IPNI -1
from tnc.
Taxonomy Test 6: How do the circumscriptions of the same scientific name by two different authorities compare to each other?
This one is for @nfranz, taken from Fig. 1 from https://doi.org/10.1093/sysbio/syw023 where we have two taxon concepts both named Microcebus murinus.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>
PREFIX tc: <http://rs.tdwg.org/ontology/voc/TaxonConcept#>
SELECT *
WHERE {
VALUES ?namestring { "Microcebus murinus" }
?concept1 tc:nameString ?namestring .
?concept1 tc:accordingToString ?accordingto1 .
?concept2 tc:nameString ?namestring .
?concept2 tc:accordingToString ?accordingto2 .
?relationship tc:fromTaxon ?concept1 .
?relationship tc:toTaxon ?concept2 .
?relationship tc:relationshipCategory ?relationship_type .
FILTER(?concept1 != ?concept2)
}
This gives this result:
namestring | concept1 | accordingto1 | concept2 | accordingto2 | relationship | relationship_type |
---|---|---|---|---|---|---|
Microcebus murinus | http://kg-fuseki.sloppy.zone/tc/1993_Microcebus_murinus | MSW2 | http://kg-fuseki.sloppy.zone/tc/2005_Microcebus_murinus | MSW3 | http://kg-fuseki.sloppy.zone/tc/1993-2005 | http://rs.tdwg.org/ontology/voc/TaxonConcept#Includes |
Microcebus murinus | http://kg-fuseki.sloppy.zone/tc/2005_Microcebus_murinus | MSW3 | http://kg-fuseki.sloppy.zone/tc/1993_Microcebus_murinus | MSW2 | http://kg-fuseki.sloppy.zone/tc/2005-1993 | http://rs.tdwg.org/ontology/voc/TaxonConcept#IsIncludedIn |
So the 1993 concept of Microcebus_murinus is a larger taxon than the 2005 concept Microcebus_murinus . So TCS+1. Note that we could also express these relationships using the RCC5 terms in http://openbiodiv.net/
from tnc.
Very nice Rod. But these are competency questions for an information system designed to look and behave, much like TCS. Systems like APNI, AFD, IPNI, ITIS, CoL+, etc. … the TDWG ontology, TCS itself. We’ve had SPARQL services running off tn:views over APNI/APC and AFD for the past 8 years (currently disabled for system migration, sorry) with almost zero interest. Unusable by most clients, shunned by aggregators. Maybe it was just a sign of the times, and its yet to have its day. I'm still hopeful. For RDF at least, the power of Linked Open Data to simply implement complex services - like Taxon Name resolution, and for queries across datasets for example - has been well demonstrated. Though for TCS, we now use a local NSL model.
Like most contributors to this discussion we are custodians/developers of existing infrastructure and the question as to how we might model the domain is by now already well determined (for this current iteration). We have offered TCS and the TDWG ontology for export for many years but clients generally need to do what we do with these data and that is just not possible using any these standards. Delivery is always a compromise. Loss of information, a high barrier for understanding, the lack of adequate semantics, inappropriate generalisations, the “name”, “taxon”, “taxon concept” align/argument ... all contribute to a very poor standing on the reusability index.
Reusability, Interchange, knowing that the data delivered will be reasonably well understood, and represented correctly when it shows up elsewhere. These are the competency questions we are looking for now. A vocabulary for names and classifications, enabling lossless interchange of data (import, export) and good support for their discovery and extract.
At one level, between systems, for users like @rdmpage, a TCS+2 will very likely be the go. But when we deliver data it more often goes to support the taxonomic process, or into local lookup services, reused in controlled vocabularies, for checklist maintenance - into systems that work with the names of taxa. I would like to think that both use cases are possible with a TCS2 modelled as an application profile/ontology over a basic TDWG Names and Trees vocabulary.
from tnc.
Thanks Greg, I think there are two things here.
From my perspective the failure of previous attempts rests on several things: the expectation that users would use multiple SPARQL endpoints, the poor quality of the RDF (most of it not linked in any meaningful sense, just an RDF serialisation of data silos), the lack of rich content that people actually want (e.g, the absence of the literature), etc. I would argue that if we create properly linked data we can build rich clients on top of a centralised SPARQL server. My GBIF challenge entry is a proof of concept https://ozymandias-demo.herokuapp.com and this is built on the LSID TCS vocabulary (supplemented by a vocabulary @frmichel that handles things TCS makes awkward to do) and http://schema.org
What isn't clear to me is whether the previous failures are due to:
- limitations or complexity of TCS (is TCS comprehensible, does it do what we want?)
- limitations in the available data (we have lots of RDF for names, most of it problematic and weakly connected, if at all)
- insufficient interest in the problem TCS was meant to solve (people have created massive, heavily used databases without TCS. maybe it's not actually needed?).
You write:
Reusability, Interchange, knowing that the data delivered will be reasonably well understood, and represented correctly when it shows up elsewhere. These are the competency questions we are looking for now. A vocabulary for names and classifications, enabling lossless interchange of data (import, export) and good support for their discovery and extract.
For the sake of argument I'm asserting that if we used TCS and had properly described and linked data, we could do all this. Note that I'm not saying that I necessarily believe this, I'm simply asking whether it's possible. In other words, if we have good tools and documentation based on TCS can we achieve the goals you outline?
from tnc.
Can we agree to refer to the ratified standard which is an XML Schema as TCS and to the TCS ideas ported to RDF as the TDWG Ontology? I find this confusing.
from tnc.
from tnc.
Re: #7 (comment). Excellent, looks great. Thanks, @rdmpage
P.s.: In this paper https://www.researchgate.net/publication/252228152_Perspectives_Towards_a_language_for_mapping_relationships_among_taxonomic_concepts, page 9, Table 3, I listed a number of terms that in my view should mostly/somehow find their way into an updated TCS, because they are useful. For instance, with TCS2 we should be able to express, in the case of "splitting", that {2005.TCL1 + 2005.TCL2 + 2005.TCL3} == 1993.TCL4. Where (e.g.) the taxonomic name Microcebus murinus may participate both in TCL1 and TCL4.
from tnc.
from tnc.
@rdmpage - thanks. A counter point here would be that that all these terms are spatial, and hence compatible with and informative for spatial logic reasoning. Some are shortcuts for convenient human use, yes, and not representing them would be fine for reasoning purposes.
Another way of saying this: the terms give someone an opportunity to "creatively" assert regions of congruence between classifications where such instances of congruence may not be very obvious. To paraphrase an example: "Take away one concept in classification 1 from this parent and add it to that parent, and then you have congruence otherwise with classification 2". Maximizing opportunities to express congruence (RCC-5: ==), in turn, allows reasoning approaches to be maximally "greedy" in terms of deducing other spatial relationships between classifications through transitivity rules. In that context, it helps to have a more diverse relationship vocabulary.
from tnc.
Related Issues (20)
- Use of this repo - TNC vs. TCS2 HOT 5
- LSIDs for taxonomic names live again HOT 42
- property:{TO BE NAMED} to indicate the novel status of a taxon in a publication HOT 20
- Teleconference 14 January 2020 20:00 UTC
- The need for "intersects" as a TNU relationship type in addition to the five RCC-5 types HOT 28
- Proposed: 'protonym' property on TaxonomicNameUsage HOT 1
- Vernacular names HOT 13
- More appropriate name for TaxonRelationshipAssertion class HOT 60
- Agents and References HOT 14
- Teleconference 24 March 2020 20:00 UTC HOT 1
- TNU Hackathon 7 April 2020 HOT 5
- Task group? HOT 14
- Teleconference 26/27 May 2020 HOT 7
- "Taxon" ID that does not change unless circumscription has changed HOT 27
- How to indicate which TNUs are current HOT 21
- RCC5 relation intersects HOT 4
- Should taxonomicName be represented as a Subclass of taxonomicNameUsage HOT 45
- Fern concept example HOT 6
- Proposal: add properties that represent TNU relationships HOT 2
- Merging Discussions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tnc.