Coder Social home page Coder Social logo

TCS is just fine as it is about tnc HOT 12 CLOSED

tdwg avatar tdwg commented on July 17, 2024 3
TCS is just fine as it is

from tnc.

Comments (12)

hlapp avatar hlapp commented on July 17, 2024 1

@rdmpage 👍 to your proposed approach. It brings us firmly back to first defining concretely what the competency questions are (such as in the form of queries and expected results), and then determining the ontology that can satisfy them.

I'm also a firm believer in Occam's Razor, so IMHO the ontology we should be looking for is the simplest one that satisfies the competency questions, not a more elaborate one, whether driven by philosophy or moral objectives.

from tnc.

baskaufs avatar baskaufs commented on July 17, 2024

Here is a dataset that might be useful to play with:

Agricultural Research Council: Catalogue of Afrotropical Bees. http://doi.org/10.15468/u9ezbh
Accessed via http://www.gbif.org/dataset/da38f103-4410-43d1-b716-ea6b1b92bbac on 2016-10-26

It includes many possible pieces that could be connected using the existing TCS model. It is one of the datasets I played around with and described in this blog post in the section called "Taxon core with Occurrence, TypesAndSpecimen, Distribution, Reference, and Description extensions: Catalogue of Afrotropical Bees". There were a few additional comments about the dataset in the following post.

In my messing around, I used some of the TCS properties in my graph model (described in the post). The triples (as RDF/Turtle) can be downloaded here, but since my purpose was to use as many DwC terms as possible rather than to fully implement TCS, the dataset should probably be re-mapped to more fully embody the TCS graph model. (The mapping files that I used are here but probably won't make sense to anyone who hasn't already messed with Guid-O-Matic.) I don't have time to try re-mapping it myself right now, but if this line of inquiry continues for long enough, I might be able to work on it in a few weeks.

Oh, ho! I see that I put an example record here!

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

Names Test 3: Is a scientific name a homonym (either within a Code or across Codes)?

Here we test one of the classical "hemihomonyms", that is, a name which occurs in two Codes. Agathis montana is both the name of a wasp and the name of a tree. So, a simple query would be to see how many Codes have a given name:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>

SELECT *
WHERE { 
  ?thing tn:nameComplete "Agathis montana" .
  ?thing tn:nomenclaturalCode ?code .
} 

Try it

giving:

thing code
urn:lsid:ipni.org:names:92693-1:1.1.2.1.1.1.2.1.1.1 http://rs.tdwg.org/ontology/voc/TaxonName#botanical
urn:lsid:organismnames.com:name:1407520 http://rs.tdwg.org/ontology/voc/TaxonName#ICZN
urn:lsid:organismnames.com:name:1953681 http://rs.tdwg.org/ontology/voc/TaxonName#ICZN

Note that we have two zoological names because ION (the source of the names) has two records for Agathis montana (the same problem bedevils IPNI). So, we need to be a little cleverer:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>

SELECT (COUNT(DISTINCT ?code) AS ?count)
WHERE { 
  ?thing tn:nameComplete "Agathis montana" .
  ?thing tn:nomenclaturalCode ?code .
} 

Try it

This query asks how many distinct Codes contain Agathis montana, and the answer is:

row count
1 2

So, two codes have Agathis montana so it is a cross-Code homonym. TCS +1

Testing for homonyms within a Code is going to get a little messy given the number of duplicates some data sources contain, so we might want to test using publications, taxon authorship, or, in an ideal world, type specimens.

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

Names Test 5: What objective (Code-governed) synonyms exist for a scientific name?

One way to tackle this is if the name database has basionym relationships. IPNI and IndexFungorum do (although probably not complete).

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT *
WHERE { 
  ?thing tn:nameComplete "Agathis montana" .
  ?thing owl:versionInfo ?versionInfo .
  BIND(IRI(REPLACE( STR(?thing),CONCAT(":", ?versionInfo),"" )) AS ?iri). 
  {
    ?name tn:hasBasionym ?iri .
    ?name tn:nameComplete ?nameComplete .
  }
} 

Try it

This query is a mess because IPNI's RDF is, in a word, buggered. They use a version identifier for the name, which makes cross linking within the data almost impossible. A great example of what happens when you design outputs without thinking about users (sigh). So we have to mess about with the name id to get the query to work. The query also works only in one direction (i.e., what names have the query name as their basionym), we'd want to go in the other direction as well (what are the names linked to the basionym of the query name) but IPNI's RDF prevents this. IndexFungorum is probably OK for this sort of query. ION is clueless about basionyms, so zoologists miss out.

Here's the result:

thing versionInfo iri name nameComplete
urn:lsid:ipni.org:names:92693-1:1.1.2.1.1.1.2.1.1.1 1.1.2.1.1.1.2.1.1.1 urn:lsid:ipni.org:names:92693-1 urn:lsid:ipni.org:names:77076253-1:1.2 Salisburyodendron montanum

So, Salisburyodendron montanum is an objective synonym of Agathis montana TCS +1, IPNI -1

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

Taxonomy Test 6: How do the circumscriptions of the same scientific name by two different authorities compare to each other?

This one is for @nfranz, taken from Fig. 1 from https://doi.org/10.1093/sysbio/syw023 where we have two taxon concepts both named Microcebus murinus.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tcom: <http://rs.tdwg.org/ontology/voc/Common#>
PREFIX tc: <http://rs.tdwg.org/ontology/voc/TaxonConcept#>

SELECT *
WHERE { 
  VALUES ?namestring { "Microcebus murinus" }
  ?concept1 tc:nameString ?namestring .
  ?concept1 tc:accordingToString ?accordingto1 .

  ?concept2 tc:nameString ?namestring .
  ?concept2 tc:accordingToString ?accordingto2 .
  
  ?relationship tc:fromTaxon ?concept1 .
  ?relationship tc:toTaxon ?concept2 .
  ?relationship tc:relationshipCategory ?relationship_type .
  
  FILTER(?concept1 != ?concept2)
} 

Try it

This gives this result:

namestring concept1 accordingto1 concept2 accordingto2 relationship relationship_type
Microcebus murinus http://kg-fuseki.sloppy.zone/tc/1993_Microcebus_murinus MSW2 http://kg-fuseki.sloppy.zone/tc/2005_Microcebus_murinus MSW3 http://kg-fuseki.sloppy.zone/tc/1993-2005 http://rs.tdwg.org/ontology/voc/TaxonConcept#Includes
Microcebus murinus http://kg-fuseki.sloppy.zone/tc/2005_Microcebus_murinus MSW3 http://kg-fuseki.sloppy.zone/tc/1993_Microcebus_murinus MSW2 http://kg-fuseki.sloppy.zone/tc/2005-1993 http://rs.tdwg.org/ontology/voc/TaxonConcept#IsIncludedIn

So the 1993 concept of Microcebus_murinus is a larger taxon than the 2005 concept Microcebus_murinus . So TCS+1. Note that we could also express these relationships using the RCC5 terms in http://openbiodiv.net/

from tnc.

ghwhitbread avatar ghwhitbread commented on July 17, 2024

Very nice Rod. But these are competency questions for an information system designed to look and behave, much like TCS. Systems like APNI, AFD, IPNI, ITIS, CoL+, etc. … the TDWG ontology, TCS itself. We’ve had SPARQL services running off tn:views over APNI/APC and AFD for the past 8 years (currently disabled for system migration, sorry) with almost zero interest. Unusable by most clients, shunned by aggregators. Maybe it was just a sign of the times, and its yet to have its day. I'm still hopeful. For RDF at least, the power of Linked Open Data to simply implement complex services - like Taxon Name resolution, and for queries across datasets for example - has been well demonstrated. Though for TCS, we now use a local NSL model.

Like most contributors to this discussion we are custodians/developers of existing infrastructure and the question as to how we might model the domain is by now already well determined (for this current iteration). We have offered TCS and the TDWG ontology for export for many years but clients generally need to do what we do with these data and that is just not possible using any these standards. Delivery is always a compromise. Loss of information, a high barrier for understanding, the lack of adequate semantics, inappropriate generalisations, the “name”, “taxon”, “taxon concept” align/argument ... all contribute to a very poor standing on the reusability index.

Reusability, Interchange, knowing that the data delivered will be reasonably well understood, and represented correctly when it shows up elsewhere. These are the competency questions we are looking for now. A vocabulary for names and classifications, enabling lossless interchange of data (import, export) and good support for their discovery and extract.

At one level, between systems, for users like @rdmpage, a TCS+2 will very likely be the go. But when we deliver data it more often goes to support the taxonomic process, or into local lookup services, reused in controlled vocabularies, for checklist maintenance - into systems that work with the names of taxa. I would like to think that both use cases are possible with a TCS2 modelled as an application profile/ontology over a basic TDWG Names and Trees vocabulary.

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

Thanks Greg, I think there are two things here.

From my perspective the failure of previous attempts rests on several things: the expectation that users would use multiple SPARQL endpoints, the poor quality of the RDF (most of it not linked in any meaningful sense, just an RDF serialisation of data silos), the lack of rich content that people actually want (e.g, the absence of the literature), etc. I would argue that if we create properly linked data we can build rich clients on top of a centralised SPARQL server. My GBIF challenge entry is a proof of concept https://ozymandias-demo.herokuapp.com and this is built on the LSID TCS vocabulary (supplemented by a vocabulary @frmichel that handles things TCS makes awkward to do) and http://schema.org

What isn't clear to me is whether the previous failures are due to:

  1. limitations or complexity of TCS (is TCS comprehensible, does it do what we want?)
  2. limitations in the available data (we have lots of RDF for names, most of it problematic and weakly connected, if at all)
  3. insufficient interest in the problem TCS was meant to solve (people have created massive, heavily used databases without TCS. maybe it's not actually needed?).

You write:

Reusability, Interchange, knowing that the data delivered will be reasonably well understood, and represented correctly when it shows up elsewhere. These are the competency questions we are looking for now. A vocabulary for names and classifications, enabling lossless interchange of data (import, export) and good support for their discovery and extract.

For the sake of argument I'm asserting that if we used TCS and had properly described and linked data, we could do all this. Note that I'm not saying that I necessarily believe this, I'm simply asking whether it's possible. In other words, if we have good tools and documentation based on TCS can we achieve the goals you outline?

from tnc.

mdoering avatar mdoering commented on July 17, 2024

Can we agree to refer to the ratified standard which is an XML Schema as TCS and to the TCS ideas ported to RDF as the TDWG Ontology? I find this confusing.

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

from tnc.

nfranz avatar nfranz commented on July 17, 2024

Re: #7 (comment). Excellent, looks great. Thanks, @rdmpage

P.s.: In this paper https://www.researchgate.net/publication/252228152_Perspectives_Towards_a_language_for_mapping_relationships_among_taxonomic_concepts, page 9, Table 3, I listed a number of terms that in my view should mostly/somehow find their way into an updated TCS, because they are useful. For instance, with TCS2 we should be able to express, in the case of "splitting", that {2005.TCL1 + 2005.TCL2 + 2005.TCL3} == 1993.TCL4. Where (e.g.) the taxonomic name Microcebus murinus may participate both in TCL1 and TCL4.

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

from tnc.

nfranz avatar nfranz commented on July 17, 2024

@rdmpage - thanks. A counter point here would be that that all these terms are spatial, and hence compatible with and informative for spatial logic reasoning. Some are shortcuts for convenient human use, yes, and not representing them would be fine for reasoning purposes.

Another way of saying this: the terms give someone an opportunity to "creatively" assert regions of congruence between classifications where such instances of congruence may not be very obvious. To paraphrase an example: "Take away one concept in classification 1 from this parent and add it to that parent, and then you have congruence otherwise with classification 2". Maximizing opportunities to express congruence (RCC-5: ==), in turn, allows reasoning approaches to be maximally "greedy" in terms of deducing other spatial relationships between classifications through transitivity rules. In that context, it helps to have a more diverse relationship vocabulary.

from tnc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.