traitecoevo / apd Goto Github PK
View Code? Open in Web Editor NEWThe Australian Plant Traits Dictionary
Home Page: https://traitecoevo.github.io/APD/
The Australian Plant Traits Dictionary
Home Page: https://traitecoevo.github.io/APD/
Once we've posted a proper release, we should add something in the header (possibly navbar) linking to
while building the trait correspondence network I noticed some issues with the correspondence with GIFT traits (I still have to do the same for BIEN and TRY).
I basically checked that both the trait codes provided by AusTraits were in GIFT, as well as the trait names, and that the provided GIFT traits names were matching the provided GIFT trait codes.
The script I used is below. But I'll first detail my findings.
trait_0030020
& trait_0030015
the GIFT_close
contains multiple traits as a single line. Is it on purpose? Because other matched traits span multiple lines.trait_0030215
, there is a typo in the GIFT_exact
name as it is referenced "Fuiting time" missing an r.trait_0030020
, there is a typo in the GIFT code 'leaf_thorns_1 [GIFT:4:14.1]' which should be 'leaf_thorns_1 [GIFT:4.14.1]'.trait_0030060
, GIFT_close
matches with GIFT 1.4.1 (Climber_1) while it should match with GIFT 3.4.1 (Reproduction_sexual_1). This was the trait that triggered my systematic search for potential mismatches, as I obtained in the correspondence network a much larger connected component than expected with traits that shouldn't be matching.Maybe you could use an adaptation of the script below to perform semi-automated quality checks when updating the APD?
For the sake of completeness, I'll try performing the same checks for TRY and BIEN.
library("dplyr")
gift_trait_meta = GIFT::GIFT_traits_meta()
apd_gift_detailed = tibble::as_tibble(read.csv("APD_traits_input.csv")) |>
select(identifier:label, starts_with("GIFT")) |>
rename(trait_id = identifier) |>
tidyr::pivot_longer(
starts_with("GIFT"), names_to = "match_type", values_to = "matched_trait"
) |>
filter(matched_trait != "") |>
mutate(
# Split for traits that have multiple matches on one line
split_traits = purrr::map(stringr::str_split(matched_trait, ";"), trimws),
# Extract GIFT trait name
extracted_trait = purrr::map(
split_traits, \(x) stringr::str_extract(x, "^(.*)\\s\\[", group = 1)
),
# Extract GIFT trait code
extracted_code = purrr::map(
split_traits, \(x) stringr::str_extract(x, "\\[GIFT:(.+)\\]", group = 1)
),
# Get level
gift_lvl = purrr::map(
extracted_code, \(x) stringr::str_count(x, stringr::fixed(".")) + 1L
)
) |>
# Put everything in a tidy format
tidyr::unnest(split_traits:gift_lvl)
## Level 2 traits
# Matching code at level 2
apd_gift_lvl2 = apd_gift_detailed |>
filter(gift_lvl == 2) |>
left_join(
gift_trait_meta |>
distinct(Lvl2, Trait1),
by = c(extracted_code = "Lvl2")
)
# Problematic traits
apd_gift_lvl2 |>
filter((extracted_trait != Trait1) | is.na(Trait1))
## Level 3 traits
apd_gift_lvl3 = apd_gift_detailed |>
filter(gift_lvl == 3) |>
left_join(
gift_trait_meta |>
distinct(Lvl3, Trait2),
by = c(extracted_code = "Lvl3")
)
# Problematic traits
apd_gift_lvl3 |>
filter((extracted_trait != Trait2) | is.na(Trait2))
Files/folders within the TERN GitHub repository that are good examples of what we might like to create for the APT:
https://ternaustralia.github.io/ontology_tern/
https://github.com/ternaustralia/dawe-rlp-vocabs/blob/1b4e3daa611deae25ec90445be27[โฆ]9c12cc/vocab_files/categorical_collections/luts/growth-form.ttl
https://github.com/ternaustralia/dawe-rlp-vocabs
https://linkeddata.tern.org.au/viewers/dawe-vocabs
Similarly to #28. Let's look at the correspondence with TRY.
I've performed a similar matching of codes and names in TRY, and found few typos (see the detailed script below).
trait_0030810
has two traits matching on GIFT_close
.apd_try_detailed = tibble::as_tibble(read.csv("APD_traits_input.csv")) |>
select(identifier:label, starts_with("TRY")) |>
rename(trait_id = identifier) |>
tidyr::pivot_longer(
starts_with("TRY"), names_to = "match_type", values_to = "matched_trait"
) |>
filter(matched_trait != "") |>
mutate(
# Split for traits that have multiple matches on one line
split_traits = purrr::map(stringr::str_split(matched_trait, ";"), trimws),
# Extract GIFT trait name
extracted_trait = purrr::map(
split_traits, (x) stringr::str_extract(x, "^(.*)\s\[", group = 1)
),
# Extract GIFT trait code
extracted_code = purrr::map(
split_traits, (x) stringr::str_extract(x, "\[TRY:(.+)\]", group = 1) |>
as.numeric()
)
) |>
tidyr::unnest(split_traits:extracted_code)
apd_try_smaller = apd_try_detailed |>
left_join(
try_traits |>
distinct(TraitID, name_matched_on_code = Trait),
by = c(extracted_code = "TraitID")
) |>
left_join(
try_traits |>
distinct(code_matched_on_name = TraitID, Trait),
by = c(extracted_trait = "Trait")
)
select(trait, extracted_trait, extracted_code, name_matched_on_code, code_matched_on_name)
apd_try_smaller |>
filter(extracted_trait != name_matched_on_code)
apd_try_smaller |>
filter(extracted_code != code_matched_on_name)
</details>
As told in the PR #24 I'm using the raw APD_traits_input.csv
to get trait correspondence across databases.
I noticed some issues with some columns in TRY (or at least that are non-standard?).
I'm unsure about tackling these so I rather open an issue about them.
My routine is the following:
apd_try_traits = read.csv("APD_traits_input.csv") |>
tibble::as_tibble() |>
select(
trait_id = identifier, trait, label, contains("BIEN"), contains("GIFT"),
contains("TRY")
) |>
# Get all traits for which there is an equivalent in TRY
filter(if_any(contains("TRY"), \(x) x != "")) |>
select(trait_id, trait, label, contains("TRY")) |>
# Making data tidy
tidyr::pivot_longer(
contains("TRY"), names_to = "match_type", values_to = "match_value"
) |>
filter(match_value != "") |>
# Extract TRY TraitIDs
mutate(
extracted_trait = match_value |>
stringr::str_extract_all("\\[TRY:\\d+\\]") |>
purrr::map(stringr::str_remove, "\\[TRY:") |>
purrr::map(stringr::str_remove,"\\]"),
match_type = stringr::str_extract(match_type, "[:alpha:]+"),
# Count number of match traits
length_extracted = purrr::map_int(extracted_trait, length)
)
If I count the number of matched traits given the columns I get the following:
> apd_try_traits |>
+ count(length_extracted)
# A tibble: 4 ร 2
length_extracted n
<int> <int>
1 0 5
2 1 316
3 2 7
4 3 1
So 5 AusTraits traits, with non-empty columns have 0 matches given my extraction of TRY IDs.
If I go to see the strings in the columns I get:
> apd_try_traits |>
+ filter(length_extracted == 0) |>
+ pull(match_value)
[1] "specific leaf area [TO:0000562] (https://www.try-db.org/de/de.php)"
[2] "Leaf epidermis cell area; Leaf mesophyll cell area [TRY:338; 573] (https://www.try-db.org/de/de.php)"
[3] "Bark thickness [TRY:24, TRY:3355, TRY:3356] (https://www.try-db.org/de/de.php)"
[4] "Bark thickness [TRY:24, 3355, 3356] (https://www.try-db.org/de/de.php)"
[5] "plant lifespan and age of first flowering [LEDA:1.3] (https://www.try-db.org/de/de.php)"
For the first line, it matches back to a Trait Ontology definition, but not to a TRY trait.
For leaf epidermis cell and bark thickness it's a matter of TRY IDs writing style. Also Bark thickness is written in two ways?!
For plant lifespan, it's a link to a LEDA trait. Is this relevant here?
I've checked and these issue propagate to the RDF file.
Still need to add in metadata for APD. What follows is what Rowan coded into one of the drafts for ARDC RVA;
https://github.com/traitecoevo/APD
a rdfs:Resource , skos:ConceptScheme , owl:Ontology , http://terminologies.gfbio.org/terms/ETS/TraitData ;
rdfs:label "AusTraits"@en ;
rdfs:seeAlso "https://github.com/traitecoevo/APD_values"^^xsd:anyURI ;
dcterms:description "AusTraits is an open-source, harmonized database of Australian plant trait data. It synthesises data on nearly 500 traits across more than 30,000 taxa from field campaigns, published literature, taxonomic monographs, and individual taxon descriptions. Begun in 2016 as an initiative between three lab groups, it has grown to be the largest collation of plant trait data for Australian plants. AusTraits integrates plant trait data collected by researchers from diverse disciplines, including functional plant biology, plant physiology, plant taxonomy, and conservation biology. By harmonizing and error checking values, linking all AusTraits data entries to detailed metadata, and documenting trait and trait values definitions, AusTraits is a resource researchers can trust and use for their research agendas with minimal additional filtering or manipulations."@en ;
dcterms:license "https://creativecommons.org/licenses/by/4.0/"^^xsd:anyURI ;
dcterms:publisher "https://austraits.org/"^^xsd:anyURI ;
dcterms:title "AusTraits"@en ;
skos:hasTopConcept https://github.com/traitecoevo/APD#0000000 ;
skos:prefLabel "AusTraits"@en .
trait concept: seed production inter-annual variability (i.e. masting)
reference: https://doi.org/10.1071/BT22043 (Wright, B. R., Franklin, D. C., & Fensham, R. J. (2022). The ecology, evolution and management of mast reproduction in Australian plants. Australian Journal of Botany, 70(8), 509-530.)
available data: some in manuscript
suggested by: Matt White
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.