Coder Social home page Coder Social logo

Define scope of TCS2 about tnc HOT 9 CLOSED

tdwg avatar tdwg commented on July 17, 2024 1
Define scope of TCS2

from tnc.

Comments (9)

deepreef avatar deepreef commented on July 17, 2024

I like this approach, except that "identifying and representing author objects" should be embedded into the reference/literature components. Authorities of names and concepts (i.e., the sec authors) should be inherited from linked references (literature citations). I would say very simply that the scope of TCS2 should be about Names of organisms and their relationships to defined or implied Circumscriptions and Classifications through References. To parse that out a bit (capitalization intentional):

Names are text-string labels applies to organisms. I assume we agree that the scope of names includes those that fall under the three major Codes (i.e., Linnean-style names). Does it also included semi-Linnean-style names (e.g., provisional names assigned to morphospecies and other taxa lacking formal scientific names, in combination with a Linnean-style genus or higher-rank name, in the form of "Aus sp123")? Does it include names governed by the Code for viruses? What about vernacular names (especially important for birds)? The scope in this sense is important because it points to what sorts of properties and relationships of "Names" that TCS2 needs to accommodate. Perhaps we start with the three main Codes (Linnean-style names), then expand it with extensions to accommodate other kinds of names? The properties of the Code-governed names include the vocabularies for ranks, links to type specimens, links to References in which relevant nomenclatural acts occur, etc.

Circumscriptions are either defined (through sets of characters, character states, enumerated individuals, etc.) or implied (through synonymies, links to other circumscriptions, context, etc.), and represent asserted sets of organisms to which names are applied. The assumption is that the circumscribed set of organisms share certain properties that can be referenced in aggregate through the name (with the name serving as a proxy for the set of organisms). Going back to earlier discussions, I think it's safe to say that "Circumscriptions" is synonymous with "Taxonomic Concepts", but the word Circumscription is perhaps more precise and less burdened with all the confusion, alternate interpretations and other forms of intellectual baggage that "Taxonomic Concepts" carries. Identification/determination of organisms (e.g., occurrence records in GBIF) are probably the most ubiquitous external entities that connect to Circumscriptions (through Names), but all sorts of other things are linked to these Circumscriptions, such as roles as disease vectors, assertions about evolutionary relationships, and basically almost anything else people document about biology in a generalized way. Circumscriptions can be thought of as nodes in a tree, and their properties are the collective set of properties of all other nodes and branches "below" them (i.e., towards the leaves).

Classifications are assertions about hierarchical placements and arrangements of Circumscriptions with respect to each other. This can be thought of as Linnean-style hierarchies, or in the case of trees, it includes all the nodes and branches "above" a particular Node (i.e., towards the trunk).

References include all manner of publications, plus unpublished documents, and subsections of both. Names, Circumscriptions and Classifications only exist because they are asserted through References. Therefore, all of them fundamentally link (anchor to) a particular Reference. The details for how References are represented through structured information are outside the scope of TCS2; but because References are so fundamental to the other core components of TCS2 (Names, Circumscriptions, and Classifications), we need to minimally define them as such within TCS2, and ensure that the right granularity of properties for References are captured (e.g., date precision for nomenclature; subsection granularity for nomenclatural acts credited to one author-team that occur within a Reference unit traditionally represented in a bibliographic citation that is credited to a different author team). Critically, all authorship credits/citations for names (nomenclatural authorities) and Circumscription assertions (sec authorship) should be inherited through links to References. That is, there should be no direct links between Names, Circumscriptions, or Classifications to Authors. The links need to be to References, through which the authorships are inherited.

Wow.... that ended up being WAY longer than I intended it to be! But perhaps the first and simplest question for scoping TCS2 is the Names part. Once we decide how narrow or broad the scope of names we want it to accommodate should be, the rest of the scoping should follow more easily.

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

@mdoering Regarding your list

  1. structured name objects, nomenclatural relations and their status according to the codes?
  2. core vocabularies such as ranks?
  3. identifying and representing author objects?
  4. type specimens, species and subsequent designations?
  5. how does TCS2 deal with literature citations?

I would make a plea, that for (3) and (5) we defer to existing vocabularies, in particular http://schema.org/, e.g. https://schema.org/Person and https://schema.org/ScholarlyArticle. This avoids domain-specific vocabularies and aligns what is done here with things going on in the wider world. For more fine grained citation I'd argue that the W3C annotation working group provides pretty much everything needed.

That leaves 1, 2, and 4. The next question is "why isn't the existing TDWG LSID vocabulary already used by ION, Index Fungorum and IPNI" not sufficient for our purposes. IPNI, for example, uses it to describe not just names but their typification (i.e., 4).

I think it's reasonable null hypothesis that it's OK pretty much what we need, subject to some tweaks. If it's not, how about we clearly state why it's not?

from tnc.

deepreef avatar deepreef commented on July 17, 2024

I completely agree with @rdmpage on 3 & 5 (authors are part of the reference vocabularies, which are out of scope for TCS2, except perhaps as some sort of "verbatim" capture).

I also tend to agree with @rdmpage on the TDWG LSID vocabularies, except I need to review them. DwC terms cover most of what we need (as noted, with some tweaks).

from tnc.

mdoering avatar mdoering commented on July 17, 2024

I am wondering about authors because we do have domain specific interests and properties in them.
Also TDWG has a history of standards for authors and literature if you check the prior-standards. Databases with nomenclatural authors like IPNI usually track area of interest or collections codes they used to deposit types: http://beta.ipni.org/?q=author%20surname%3AMiller

If authors and citations are not covered, should TCS2 at least recommend something existing? To be used for data exchange we need somewhere at least a more comprehensive specification. That is also something that the old TCS was lacking, citations were left outside. In order to reach interoperability I think we would have to nail things down a bit more. That to me is a major reason why dwc archives are much more in use than TCS - combined with the simplicity and interoperability of delimited text of course which is reason number one.

from tnc.

rdmpage avatar rdmpage commented on July 17, 2024

@mdoering I don't see anything in the IPNI search that necessarily requires a domain-specific vocabulary, and in many ways these are things that could be derived from a query rather than being stated in a database. For example https://ozymandias-demo.herokuapp.com/?uri=https://biodiversity.org.au/afd/publication/%23creator/r-mesibov "knows" that Bob Mesibov works on millipedes (POLYDESMIDA) by doing a SPARQL query over the names he has published and the ALA taxonomic classification.

We are also heading towards a situation where every taxonomist is likely to have a Wikidata entry and/or ORCID (the Wikispecies folks are adding taxonomists like crazy), and many Wikidata entries are linked to identifiers such as IPNI author ids. So I think we can use Wikidata as a way to help define scope. If it's in Wikidata we can stop ;)

from tnc.

jgerbracht avatar jgerbracht commented on July 17, 2024

This all sounds great, I would add two things though, coming at this from a concept/circumscription perspective as opposed to a name perspective.
6. How does TCS2 define a taxonomic concept
7. How will names and name usages be mapped to taxonomic concepts

@deepreef I would add to your "Classifications are assertions about hierarchical placements and arrangements of Circumscriptions with respect to each other" and names to be applied to these Circumscriptions.

from tnc.

deepreef avatar deepreef commented on July 17, 2024

@mdoering : Literature/Reference data and authors as objects are vital to taxonomy and nomenclature. My point (which I think was also @rdmpage 's point - but not certain) was that they (and their vocabularies) ought to be managed outside the scope of TCS2. The parts within scope for TCS2 vocabularies would be limited to something like referenceID and perhaps things like verbatimReferenceCitation and verbatimAuthorship. One of the mistakes that taxonomic databases often make (in my opinion) is to try to link names/TNUs/concepts directly to authors, rather than indirectly to authors via Reference instances (which can include units of "Reference" that are more granular than what are traditionally cited in bibliographies, such as individual taxon treatments).

Stepping back to TDWG in general, there are definitely some domain-specific issues that we need to deal with that are not always addressed by library-based vocabularies or other vocabularies. Some examples include:

Reference instances:

  • more granular units of Reference instances (such as individual treatments, as sub-components to more traditional units like journal article or book chapter).
  • more specific kinds of dates associated with references, which differ somewhat from traditional publication/library dating requirements

Agent instances:

  • more robust capabilities for aliases (synonyms)
  • treating agents as defined entities separately from the names applied to the agents (related to previous one)

Authorship (relationship of Agent to Reference) instances:

  • non-traditional roles of Agents with respect to References, such as "ex." authorships

TWDG spent several years trying to develop a domain-specific standard for References, and got most of the way there, but it never matured to the point where it was adopted. Existing library standards didn't quite cover all our particular needs, and usually included way too much library-specific detail. Probably the best is the NLM/NCBI Journal Publishing DTD, and the TaxPub extension -- but of course we also have to deal with kinds of publications (and unpublished documents) besides those that appear within Journals.

The TDWG efforts on literature didn't delve too deeply into authors, but I think that is reasonably covered by the FOAF vocabulary.

from tnc.

deepreef avatar deepreef commented on July 17, 2024

@jgerbracht :

  1. How does TCS2 define a taxonomic concept
    As I mentioned above, I think we should completely avoid the term "taxonomic concept" and focus instead on the more explicit and less baggage-laden "Circumscription". Are there aspects of your notion of "taxonomic concept" that do not fall within the scope of how we would define a "Circumscription"?
  1. How will names and name usages be mapped to taxonomic concepts

I think TNUs have a 1:1 relationship with defined or implied circumscriptions, so if we accept that "taxonomic concept"="circumscription", then the mapping is self-evident. Similarly, the relationship between "names" and TNUs is very-well defined (for just about every definition of a "name"), so that should be relatively straightforward as well.

@deepreef I would add to your "Classifications are assertions about hierarchical placements and arrangements of Circumscriptions with respect to each other" and names to be applied to these Circumscriptions.

I think the way that names are applied to Circumscriptions is different from Classifications. Basically, the relationship between names and Circumscriptions includes all the heterotypic synonymy stuff and homotypic synonyms involving replacement names (i.e., how many name-bearing type specimens fall within a particular Circumscription). Classifications simply deal with the hierarchical arrangement of the Circumscriptions with respect to each other. The one exception is for names below the rank of genus, where the "name" also incorporates elements of classification. This is the homotypic synonymy stuff (excluding replacement names), where the name itself serves two functions (one in labeling the Circumscription, and one in asserting a hierarchical classification).

from tnc.

nielsklazenga avatar nielsklazenga commented on July 17, 2024

We need to make sure that the outcomes of this discussion make it into the new specification. In the 6/7 Nov. meeting we decided to start working on (as in move properties into) the TaxonomicName and TaxonomicNameUsage classes. We decided on these names in this meeting, but they are really the result of the discussions in issue #1 and the 17/18 Sep. meeting.

While recognising the importance of the Relationship class, we decided to leave that for the new year and focus on the above-mentioned two classes first.

We might add Author (or Agent) and Reference (Document?) as auxiliary classes that we don't define ourselves – nor do we define most of their properties – but still need. I would like to suggest the Bibliographic Ontology (BIBO) as another option to consider for the references (@ghwhitbread always says that the NSL is a bibliographic system).

I think the W3C Web Annotation Vocabulary and Data Model could be very useful, especially for the relationship assertions.

from tnc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.