Coder Social home page Coder Social logo

ecologicaltraitdata / ets Goto Github PK

View Code? Open in Web Editor NEW
10.0 11.0 5.0 2.26 MB

A Data Standard for Ecological Trait Datasets

Home Page: https://ecologicaltraitdata.github.io/ETS/

License: Creative Commons Attribution 4.0 International

R 0.01% HTML 96.98% TeX 0.66% JavaScript 1.29% CSS 1.06%
trait-ontology ecology traits community-ecology morphometrics

ets's People

Contributors

davidfichtmueller avatar nadjasimons avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ets's Issues

reference for bexis template

@aostrow
A sub-folder 'bexis' now holds information for the BExIS trait template. The XML template in this folder should be updated every time the BExIS template is updated to the current version of ETS.

You could also drop your script for producing the XML template here, or we create an R-based script to produce the new version of the template any time the ETS reaches a new release.

fix author and ownership fields

The metadata fields should be more clearly defined. The current authorLastname and authorFirstname aimed for compatibility with TRY. However, this is not sufficient to capture e.g. contact e-mail etc. Maybe some ISO Standard can be considered here?

Several typos in the definitions/comments

in v0.7 I noticed several spelling issues in the definitions or comments of the term. I wasn't sure whether to modify the .csv file and make a pull request or list them here.

One general question is whether the text is written in American English (AE) or British English (BE). So far it is a mixture of both, but it should be consistent.

Words that are currently written in BE are:

  • in definition of measurementValue: "Standardised" instead of "Standardized"
  • in definition of author: "organisation" instead of "organization"
  • in comment of measurementMethod: "standardised" instead of "standardized"
  • in comment of eventDate: "analysing" instead of "analyzing"
  • in comment of valueType: "behaviour" instead of "behavior"
  • in comment of valueType: "specialisation" instead of "specialization"

Words that are currently written in AE are:

  • in definition of decimalLongitude: "center" instead of "centre"
  • in definition of decimalLatitude: "center" instead of "centre"
  • in comment of valueType: "color" instead of "colour"

One of these two blocks should be corrected.

General spelling errors:

  • in definition of scientificName: "continueity" instead of "continuity"
  • in definition of references: "preferrably" instead of "preferably"
  • in definition of preparations: "occurence" instead of "occurrence"
  • in comment of verbatimLocality: "darwin core" instead of "DarwinCore"
  • in comment of BEPlotID: "exploratories" instead of "Exploratories"
  • in comment of relationSource: "precice" instead of "precise"
  • several cases of double spaces, they should be removed with Search & Replace

Errors in dates for DateIssued and DateModified

Most dates in the document are formatted according to the American notation of month/day/year. There are however three cases, that use the format day/month/year, causing ambiguity and wrong dates.

Two terms use the DateIssued of "5/9/2017" with is supposed to be the fifth of September. Both terms have the version "v0.4". The other terms from version 4 have issue dates from 8/24/2017 till 10/24/2017. Version 3 terms were all created on 7/7/2017.

Two terms have the DateModified of "1/11/2017" which is supposed to be the first of November, as the terms were all created on 7/7/2017.

Seven terms have the DateModified of "15/11/2017" which is an obvious error, no ambiguity here.

To avoid such errors in the future, I would suggest to switch to the ISO 8601 format of YY-MM-DD (related XKCD).

Define Terms for 'basisOfRecord'

In Discussion with David of GFBio, we figured that this can be much simplified. The controlled vocabulary could be listed in the 'Definition' field, with an unambiguous definition of the terms we are adding to the DwC options, i..e 'ExpertKnowledge', 'MuseumSpecimen'.

deprecated terms, add 'replacedBy' entries

after renaming the double record terms (#23), the terms traitNameStd , traitValueStd, traitUnitStd and scientificNameStd should have their new versions traitName , traitValue, traitUnit and scientificName in field replacedBy.

relative links on docs

All documentation pages should use relative links to point to the terms (and link back to the top of page), to work on gfbio.org and github.com alike.

obtain DOI for v0.9

a release branch is created for v0.9 that includes only

  • the owl file
  • LICENSE and CONTRIBUTING information

This will be used to receive a new DOI at ZENODO.org.

split docs page for development and GFBio release

The human-readable GFBio release is a static website including the full vocabulary, as well as instructions for contribution (About page), and also including the best practice guidelines and examples. The version history is only showing the change log, not the static websites of previous versions.

The development branch maintains an own website for preview (hosted on Github Pages), which shows the current version of the terms as well as viewports for all previous versions. This site will have a big disclaimer that this is not the official release channel and users of the standard should refer to gfbio.org.

define URI scheme

We should define the URI Scheme of the terms.

First of all, see #3.
Second, we have several options for setting our own URI addresses.

  1. The current solution is provided by the Github Pages URL, extended by the hashtagged ID, which works as a direct link to a first-level header. The Github URL is pretty long and might put off users seeming not institutional.
  2. We could forward the entire page to a different domain, e.g. this could be based on https://www.bexis.uni-jena.de/ or on an own domain. If we run our own domain, this would provide short URIs, and also brand our project better. But this also requires some maintenance and money (for a domain name hosting service). If maintenance stopps and no one pays for the domain in the future, the URIs will break.
  3. The URIs could be forwarded to our Github site through GFBios Terminology server. That has the advantage that the URIs could free of a hashtag and could be moved to a different place in the future. The server could also relay the request to the website if it comes from a web-browser, or to the RDF if it comes from a software. It would have a base URL like this: http://terminologies.gfbio.org/terms/ETS/

In my opinion the latter is the most sustainable solution. It seemed to me that it would be solved in the same process as importing the RDF ontology, and GFBio would be very open to do this.

Any opinions here? Otherwise I would push forward option 3.

harmonize URI scheme

URIs often do not match the case of the Term. Infact spelling is fairly inconsistent. The URI root is http://terminologies.gfbio.org/terms/ETS/ and complemented by a string identical to the Term itself (with mixed case spelling) as in valueType, or modified to a lowercase string as in the case of traitvaluestd. This irrgularity can cause wrong calls of the API or broken reference to the human readable website.
The URI resolves to the human readable website only if it is correctly mapped to the lower case URL of style https://terminologies.gfbio.org/terms/ets/pages/#term.

translate into compliant format

The csv file now contains the original information on the traitdata standard. For integration in GFBio we need to provide SKOS or OWL format. Also, the OBO could be a more human readable way of storing the primary source, which then may be transferred into the other formats by an R script.

rename double records: xStd > y and x > verbatimY

The reviewer comments pointed out that the common way of keeping original author entries along with standardized entries by having a 'verbatim' term next to the standard term. i.e. we will rename terms or change definitions

  • traitNameStd > traitName
  • traitName > verbatimTraitName
  • traitValue > verbatimTraitValue
  • traitValueStd > traitValue
  • traitUnit > verbatimTraitUnit
  • traitUnitStd > traitUnit
  • scientificNameStd > scientificName
  • scientificName > verbatimScientificName

This has some messy side effects on the compatibility of the current and future version of ETS. But at this stage it is possible without doing much harm. Any thoughts on this?

unitID

add a term unitID to contain a pointer to a metric ontology.

separate inherited and refined terms

  1. terms that are not defined by us, but are just duplicated of DwC terms should keep the DwC URI in the Identifier field.
  2. Terms refined by us, but based on DwC Terms, should link to it in the 'Refine' field.
  3. Terms that are added by us, without being based on DwC or any other ontology should leave the 'Refine' field empty.

Versioning scheme for terms and entire standard

Changes on both git page and csv

  • There is no version number in many definitions, should it be v0.1 by default?
  • date issued is missing for several terms (e.g. MeasurementOrFact, taxonID) - check if this is ok
  • Some of the date issued or date modified are in different formats (month/day or day/month)
  • In Traitlist : all valueTypes are missing
  • scientificName - Definition : "Original character string provided as species name by the data owner (kept for reference and continueity)" [typo in continuity]
  • occurrenceID - Definition : a bona fide global unique identifier [maybe not clear what "bona fide" means?]
  • warnings, relationSource, eventID: put capital letters at the beginning of sentences (for harmonisation)
  • datasetID and statisticalMethod - Comment : "e.g." could be "Examples" (for harmonisation)
  • taxonRank, kingdom, phylum, class, order, family, genus: some have a date issued and some have a date modified but the date is the same. Should they be all date issued?
  • author - Comment : Examples: [remove or fill the field]
  • some fields have semicolons in their description, this caused me quite some problems when reading the csv: replace them?
  • in valueType; only "Date" have a capital letter
  • year, month, day - Description: description is not harmonised: e.g. "The four-digit year, at which" vs "The ordinal month, in which" vs "The integer day of the month on which"

Mismatches: in git but not csv or vice versa

  • in csv some description fields have a space before the beginning of the text (in locationID, eventDate, statisticalMethod, Exploratory)
  • Maybe not a problem but csv is not in the same order as git : csv is ordered Traitdata, Taxon, MeasurementOrFact; git is ordered Traitdata, Metadata, Traitlist..

add CONTRIBUTING.md

To include instructions for how to contribute, should be referenced from the README and will automatically be highlighted from the issues page.

Latin-1 character issues for CRAN

Am 11.04.2019 um 08:00 schrieb Prof Brian Ripley:

This concerns packages

[...] traitdataform [...]

which are failing their checks in a strict Latin-1 locale: see the debian-clang results. (Several of these seem to stem from vcr.)

On Linux, such a locale can be ensured via LC_CTYPE=en_US (which may need installing for distros that micro-package). AFAWK it cannot be done on Windows.

  1. The character in don't is an (ASCII) apostrophe, not a right quote:

don’t

(with a right quote) is used in packages [...] (and others not failing).

  1. en and em dashes are not portable, found in packages
    [...] traitdataform [...] .

  2. Using \uxxxx coding for non-ASCII chars in R character strings should help in some cases (see 'Writing R Extensions').

Please correct before May 10 to safely retain the package on CRAN.

publish OWL on GFBio

once the csv is updated,

  • update metadata, e.g. license, contributors, etc.
  • generate the OWL
  • add Scripts for producing the OWL to the repository
  • publish ontology on GFBio

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.