traitecoevo / apcalign Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 5.0 3.2 MB

R package for accessing, matching and updating species names of Australian flora

Home Page: https://traitecoevo.github.io/APCalign/

License: Other

R 100.00%

r-package

apcalign's People

Contributors

Stargazers

Watchers

Forkers

snubian rubysaltbush garytruong yangsophieee

apcalign's Issues

Add argument for align_taxa() function to specify output location

The update_taxonomy() function takes an argument output = x to specify the location and name of the .csv file written by that function, so that the csv file is not just written to the working directory.

Is it possible to add a similar output = x argument to the align_taxa() function, which currently automatically writes "taxonomic_updates.csv" to the working directory?

carry a column for fuzzy fixes (yes/no) back to the top level and return this info to the user

update_taxonomy doesn't handle short species lists

update_taxonomy(c("Dryandra preissii","Banksia acuminata"))

and

Passing arguments `max_distance_abs` and `max_distance_rel` across all function calls

For align_taxa() are the arguments (max_distance_abs = 3, max_distance_rel = 0.2) being called?? I suspect arguments aren't being passed appropiately.

Looks like align_taxa() calls match_taxa() which calls fuzzy_match(max_distance_abs, max_distance_rel) - many many many times.

Should I try pass max_distance_abs and max_distance_rel across all calls?

Originally posted by @fontikar in #64 (comment)

Error in taxonomic lookup function

Hello!

I am trying to use this package and have run into a strange error.

I downloaded the two packages (ausflora and dtastorr) from github and ran the following code in one of the examples:

create_taxonomic_update_lookup(c("Banksia integrifolia","Acacia longifolia","Commersonia rosea"),full=FALSE)

The code yields the following error message which I think has something to do with a bug in the APC dataset:

trying URL 'https://github.com/traitecoevo/ausflora/releases/download/0.0.2.9000/apc.parquet'
Content type 'application/octet-stream' length 10895737 bytes (10.4 MB)
downloaded 10.4 MB

Error: IOError: Couldn't deserialize thrift: TProtocolException: Invalid data

Please let me know if this is the case, and if so when it might be fixed.

The R version I am using is
R version 4.3.0

Cheers,
Sam

fuzzy matching full names, genera, binomials, trinomials in appropriate order

Lizzy has figured this out for austraits but needs to be generalized. code below

handling one-to-many binomial matches

not sure what we want to do there

Readme

Need a readme, documenting installation and usage

Basic readme examples:

Data is downloaded from https://biodiversity.org.au/nsl/services/export/index They don’t currently hold a link to the actual download file.

Advice on acknowledgements and citations from Anne Fuchs:

Data is provided as CC-BY3. Also include the attribute ccAttributionIRI with the data. This provides a link back to the source data.

APNI

“Australian Plant Name Index (continuously updated), Centre of Australian National Biodiversity Research, www.biodiversity.org.au/nsl/services/apni (date of extract)”,

APC (taxon file): is from APC which changes constantly. The file downloaded corresponds to the tree version in your file. If you look at the ccAttributionIRI the following part of the URI https://id.biodiversity.org.au/tree/{id} is a resolvable identifier back to the version that was used for this download. So suggest a citation of

“Australian Plant Census, Centre of Australian National Biodiversity Research, Council of Heads of Australasian Herbaria, {date} https://id.biodiversity.org.au/tree/{id}

fuzzy match non-native species returns different species

Sending Eucalyptus deglupta to create_taxonomic_update_lookup function returns a fuzzy match taxonomic synonym to Eucalyptus decepta which is a completely different species. Eucalyptus deglupta does not appear to be considered native by the APC, and 'deglupta' and 'decepta' are close enough for fuzzy matching.

Maybe have a native check before doing the name matching? Using the native_anywhere_in_australia function returns Eucalyptus deglupta as False.

Subsetting APC to States

code is here to parse the taxonDistribution column but could be organized for particular usecases.

library(tidyverse)
library(stringr)

apc <- read_csv("data/APC-taxon-2022-02-14-5132.csv")
apc_species <- filter(apc, taxonRank == "Species",taxonomicStatus=="accepted")

#seperate the states
sep_state_data <-
  str_split(unique(apc_species$taxonDistribution), ",")

#get unique places
all_codes <- unique(str_trim(unlist(sep_state_data)))
apc_places <- unique(word(all_codes[!is.na(all_codes)], 1, 1))

#make a table to fill in
data.frame(col.names = apc_places)
species_df <- tibble(species = apc_species$scientificName)
for (i in 1:length(apc_places)) {
  species_df <- bind_cols(species_df, NA)
}
names(species_df) <- c("species", apc_places)

#look for all possible entries after each state
state_parse_and_add_column <- function(species_df, state, apc_species){
  print(all_codes[grepl(state,all_codes)]) # checking for weird ones
  species_df[,state] <- case_when(
    grepl(paste0("\\b",state," \\(uncertain origin\\)"), apc_species$taxonDistribution) ~ "uncertain origin",
    grepl(paste0("\\b",state," \\(naturalised\\)"), apc_species$taxonDistribution) ~ "naturalised",
    grepl(paste0("\\b",state," \\(doubtfully naturalised\\)"), apc_species$taxonDistribution) ~ "doubtfully naturalised",
    grepl(paste0("\\b",state," \\(native and naturalised\\)"), apc_species$taxonDistribution) ~ "native and naturalised",
    grepl(paste0("\\b",state," \\(formerly naturalised\\)"), apc_species$taxonDistribution) ~ "formerly naturalised",
    grepl(paste0("\\b",state," \\(presumed extinct\\)"), apc_species$taxonDistribution) ~ "presumed extinct",
    grepl(paste0("\\b",state," \\(native and doubtfully naturalised\\)"), apc_species$taxonDistribution) ~ "native and doubtfully naturalised",
    grepl(paste0("\\b",state," \\(native and uncertain origin\\)"), apc_species$taxonDistribution) ~ "native and uncertain origin",
    grepl(paste0("\\b",state), apc_species$taxonDistribution) ~ "native", #no entry = native, it's important this is last in the list
    TRUE ~ "not present"
  )
  return(species_df)
}

#bug checking
#species_df<-state_parse_and_add_column(species_df,"LHI",apc_species)
#species_df<-state_parse_and_add_column(species_df,"HI",apc_species)

#go through the states one by one
for (i in 1:length(apc_places)){
  species_df <- state_parse_and_add_column(species_df,apc_places[i],apc_species)
}

write_csv(species_df,"data/states_islands_species_list.csv")

Explainer on origin status

Can we provide some kinda explanation for these different terms? @wcornwell said these are calculated from raw data. Maybe worth sticking in an article.Rmd for the pkgdown website.

We need the methods on how these are calculated pls 😄

library(purrr)
library(janitor)

 status_matrix |> 
+   select(-species) |> 
+   flatten_chr() |> 
+   tabyl()
 flatten_chr(select(status_matrix, -species))      n      percent
                       doubtfully naturalised   1120 2.371003e-03
                         formerly naturalised    277 5.863998e-04
                                       native  40336 8.538997e-02
            native and doubtfully naturalised      9 1.905270e-05
                       native and naturalised    136 2.879075e-04
                  native and uncertain origin      2 4.233933e-06
                                  naturalised   8765 1.855521e-02
                                  not present 421606 8.925258e-01
                             presumed extinct    101 2.138136e-04
                             uncertain origin     22 4.657327e-05

pkgdown website and expanding documentation

Will work on website branch for this!

Create pkgdown website
Include workflow diagrams in vignette
Prerender vignette to speed up loading resources
Tidy up output print(n = 6) in vignette
Improve README (Best to point to pkgdown tutorials)
More detail on APC, APNI (difference between them)
- https://ibis-cloud.atlassian.net/wiki/spaces/NP/pages/1154383919/NSL+Name+export+format

Option to name files differently

When loading files via datatorr, can we use a consistent name, even if the file name changes?

e.g. APNI-names-2020-05-14-1341.csv -> APNI?

Broken image link on pkgdown in Data Providers article page

https://traitecoevo.github.io/ausflora/articles/data-providers.html

Related to:

possible diversity plotting function

idea from @fontikar 👍

library(ausflora)
library(tidyverse)

resources<-load_taxonomic_resources()
plot_taxa_heat_map <- function(taxa, resources = resources) {
  ss <- create_species_state_origin_matrix(resources = resources)
  
  ss %>%
    pivot_longer(2:19, names_to = "State") %>%
    filter(grepl(taxa, species)) %>%
    filter(value != "not present") %>%
    filter(
      value %in% c(
        "native",
        "presumed extinct",
        "naturalised",
        "formerly naturalised",
        "doubtfully naturalised"
      )
    ) %>%
    filter(State %in% c("WA", "Qld", "NT", "NSW", "Vic", "Tas", "SA", "ACT")) %>%
    group_by(State, value) %>%
    summarise (`number of species` = n()) %>%
    ggplot(aes(x = State, y = value, fill = `number of species`)) +
    geom_tile(color = "black") +
    scale_fill_gradient2(
      low = "#075AFF",
      mid = "#FFFFCC",
      high = "#FF0000"
    ) +
    coord_fixed() + ggtitle(paste(taxa, " species"))
}
```

create_lookup function

code from @dfalster :

create_lookup <- function(species_list, fuzzy_matching = FALSE, ver="0.0.1.9000")
          tmp <- dataset_access_function("0.0.1.9000")

         aligned_data <- 
                unique(species_list) %>%
                align_taxa(fuzzy_matching = fuzzy_matching, ver=ver)

        aligned_species_list_tmp <-   
                 aligned_data$aligned_name %>% update_taxonomy() 
    
        aligned_species_list <- 
                   aligned_data %>% select(original_name, aligned_name) %>% 
                   left_join(aligned_species_list_tmp, by = c("aligned_name"), multiple= "first") %>% 
                   filter(!is.na(taxonIDClean)) %>% 
                   mutate(genus = word(canonicalName,1,1))

        return(aligned_species_list)
}

testing edge cases

Reorganising R/

Currently, unclear which files refer to which functions.
Best practice is to name of .R as the main function e.g. align_taxa.R
Sub functions i.e. from switch() or helper functions are best stored under the main .R

Will work on new branch clean-r

Feedback on design and workflow from APC contributors

Check they are happy with use of the data and workflow for reconciling taxon names

Handling NA in original_name

Trying with test data where there are NA in species, align_taxa will throw error if you don't drop NA!

library(tidyverse)

remotes::install_github("traitecoevo/ausflora", ref = "vignette") 
#> Skipping install of 'ausflora' from a github remote, the SHA1 (12bd620b) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(ausflora) 

dim(gbif_lite)
#> [1] 129   7

gbif_lite
#> # A tibble: 129 × 7
#>    species       infraspecificepithet taxonrank decimalLongitude decimalLatitude
#>    <chr>         <chr>                <chr>                <dbl>           <dbl>
#>  1 Tetratheca c… <NA>                 SPECIES               145.           -37.4
#>  2 Peganum harm… <NA>                 SPECIES               139.           -33.3
#>  3 Calotis mult… <NA>                 SPECIES               115.           -24.3
#>  4 Leptospermum… <NA>                 SPECIES               151.           -34.0
#>  5 Lepidosperma… <NA>                 SPECIES               142.           -37.3
#>  6 Enneapogon p… <NA>                 SPECIES               129.           -17.8
#>  7 Acacia verti… <NA>                 SPECIES               144.           -38.6
#>  8 Banksia serr… <NA>                 SPECIES               149.           -37.8
#>  9 Glischrocary… <NA>                 SPECIES               136.           -34.3
#> 10 Senna artemi… artemisioides        SUBSPECI…             142.           -25.9
#> # ℹ 119 more rows
#> # ℹ 2 more variables: scientificname <chr>, verbatimscientificname <chr>

resources <- load_taxonomic_resources(stable_or_current_data = "stable")
#> Loading resources...
#> ...done

gbif_lite |> 
 # tidyr::drop_na(species) |>  
  dplyr::pull(species) |> 
  align_taxa(resources = resources)
#> Checking alignments of 129 taxa
#>   -> 0 names already matched; 0 names checked but without a match; 122 taxa yet to be checked
#> Error in `dplyr::mutate()`:
#> ℹ In argument: `fuzzy_match_genus = fuzzy_match_genera(genus,
#>   resources$genera_accepted$canonicalName)`.
#> Caused by error in `purrr::map_chr()`:
#> ℹ In index: 72.
#> Caused by error in `if (words_in_text > 1) ...`:
#> ! missing value where TRUE/FALSE needed
#> Backtrace:
#>      ▆
#>   1. ├─ausflora::align_taxa(dplyr::pull(gbif_lite, species), resources = resources)
#>   2. │ └─ausflora:::match_taxa(taxa, resources)
#>   3. │   └─taxa$tocheck %>% ...
#>   4. ├─dplyr::mutate(...)
#>   5. ├─dplyr:::mutate.data.frame(...)
#>   6. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>   7. │   ├─base::withCallingHandlers(...)
#>   8. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>   9. │     └─mask$eval_all_mutate(quo)
#>  10. │       └─dplyr (local) eval()
#>  11. ├─ausflora (local) fuzzy_match_genera(genus, resources$genera_accepted$canonicalName)
#>  12. │ └─purrr::map_chr(x, ~fuzzy_match(.x, y, 2, 0.35, n_allowed = 1))
#>  13. │   └─purrr:::map_("character", .x, .f, ..., .progress = .progress)
#>  14. │     ├─purrr:::with_indexed_errors(...)
#>  15. │     │ └─base::withCallingHandlers(...)
#>  16. │     ├─purrr:::call_with_cleanup(...)
#>  17. │     └─ausflora (local) .f(.x[[i]], ...)
#>  18. │       └─ausflora:::fuzzy_match(.x, y, 2, 0.35, n_allowed = 1)
#>  19. └─base::.handleSimpleError(...)
#>  20.   └─purrr (local) h(simpleError(msg, call))
#>  21.     └─cli::cli_abort(...)
#>  22.       └─rlang::abort(...)

gbif_lite |> 
  tidyr::drop_na(species) |>  
  dplyr::pull(species) |> 
  align_taxa(resources = resources)
#> Checking alignments of 127 taxa
#>   -> 0 names already matched; 0 names checked but without a match; 121 taxa yet to be checked
#> # A tibble: 121 × 28
#>    original_name    cleaned_name aligned_name source known checked stripped_name
#>    <chr>            <chr>        <chr>        <chr>  <lgl> <lgl>   <chr>        
#>  1 Tetratheca cili… Tetratheca … Tetratheca … <NA>   TRUE  TRUE    tetratheca c…
#>  2 Peganum harmala  Peganum har… Peganum har… <NA>   TRUE  TRUE    peganum harm…
#>  3 Calotis multica… Calotis mul… Calotis mul… <NA>   TRUE  TRUE    calotis mult…
#>  4 Leptospermum tr… Leptospermu… Leptospermu… <NA>   TRUE  TRUE    leptospermum…
#>  5 Lepidosperma la… Lepidosperm… Lepidosperm… <NA>   TRUE  TRUE    lepidosperma…
#>  6 Enneapogon poly… Enneapogon … Enneapogon … <NA>   TRUE  TRUE    enneapogon p…
#>  7 Acacia verticil… Acacia vert… Acacia vert… <NA>   TRUE  TRUE    acacia verti…
#>  8 Banksia serrata  Banksia ser… Banksia ser… <NA>   TRUE  TRUE    banksia serr…
#>  9 Glischrocaryon … Glischrocar… Glischrocar… <NA>   TRUE  TRUE    glischrocary…
#> 10 Senna artemisio… Senna artem… Senna artem… <NA>   TRUE  TRUE    senna artemi…
#> # ℹ 111 more rows
#> # ℹ 21 more variables: stripped_name2 <chr>, trinomial <chr>, binomial <chr>,
#> #   genus <chr>, aligned_reason <chr>, fuzzy_match_genus <chr>,
#> #   fuzzy_match_genus_known <chr>, fuzzy_match_genus_APNI <chr>,
#> #   fuzzy_match_binomial <chr>, fuzzy_match_binomial_APC_known <chr>,
#> #   fuzzy_match_trinomial <chr>, fuzzy_match_trinomial_known <chr>,
#> #   fuzzy_match_cleaned_APC <chr>, fuzzy_match_cleaned_APC_known <chr>, …

^{Created on 2023-07-19 with reprex v2.0.2}

Unmatched species - unsure of reason

A small number of species from my test dataset didn't match using apcnames but did match when I searched for them on https://biodiversity.org.au/nsl/services/APC using the predictive text function. I'm unsure whether these didn't match because of the version of APC used by apcnames or because of some sort of syntax issue. All were species with a "sp" in the middle of the species name (see below).

current align_taxa not adding or subtracting words after the species as desired

e.g. Agrostis mulleriana dwarf form should become Agrostis muelleriana

Agrostis aff_hyemalis needs to be Agrostis sp. aff. hiemalis

Instructions for new releases

This also depends on what happens with #34 in large part, but should remember to do this

Better package name

If we progress, we may want a more enticing package name. What about ausflora, austaxa, aus_plant_taxa?

Thoughts @wcornwell ?

Match "Genus sp." identifications

Currently it looks like a large number of unmatched species from my dataset are genus-level identifications (i.e. Senna sp.). This makes sense given the APC search itself currently doesn't return anything for sp.'s, unless you give it something explicitly genus only.

Is it possible to convert "Genus sp." searches to genus-level searches and return a genus-level value for them, such that e.g. a Dryandra sp. entry would be returned as Banksia sp.?

Use contentID to retrieve and cache data

After meeting with @cboettig, we learnt about https://github.com/cboettig/contentid

This looks like a promising option for locally caching downloads. Plays nice with zenodo.

attempt to retrieve a subspecies using 'ssp.' annotation only retrieves the species

If you run this code:

create_taxonomic_update_lookup(
c(
"Banksia integrifolia integrifolia"
),
resources = resources
)

it retrieves Banksia integrifolia integrifolia as expected. Similarly, asking for Banksia integrifolia subsp. integrifolia also works

But if I ask for Banksia integrifolia ssp. integrifolia, it only retrieves the species name, not the subspecies

So 'ssp.' needs to be entered as an accepted notation for subspecies

Would have to check for other infraspecific ranks too, eg

v. or var.
form. or forma. or f.

Include column for subclass in update_taxonomy output

Hello! I was wondering if it's possible to include a column in the output from update_taxonomy that gives a taxon's subclass, in addition to the column for family etc?
Subclass is quite useful e.g. when you are filtering a list for Magnoliidae only, to remove non-flowering plants.

I suspect that inserting "subclass" at line 323 in clean_names.R might achieve this, I will have a go at this and see if it works.

Non-unique aligned reason (match_06)

As I was writing the vignette, I noticed there are 2 match_06

match_06. Automatic alignment with synonymous term among accepted canonical names in APC (2023-07-20)
match_06. Automatic alignment with synonymous term among known canonical names APC (2023-07-20)

Can these be collapsed into one? or delineated as match_06A, match_6B to retain their nuances?

library(tidyverse)
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
library(ausflora) 

resources <- load_taxonomic_resources(stable_or_current_data = "stable")
#> Loading resources...
#> Warning: Error in curl::curl_fetch_memory(url, handle = handle): Timeout was reached: [hash-archive.carlboettiger.info] Operation timed out after 2002 milliseconds with 0 bytes received
#> ...done

aligned_gbif_taxa <- gbif_lite |> 
  tidyr::drop_na(species) |>  
  dplyr::pull(species) |> 
  align_taxa(resources = resources)
#> Checking alignments of 127 taxa
#>   -> 0 names already matched; 0 names checked but without a match; 121 taxa yet to be checked

aligned_gbif_taxa |> 
  pull(aligned_reason) |> 
  tabyl() |> 
  tibble()
#> # A tibble: 5 × 3
#>   `pull(aligned_gbif_taxa, aligned_reason)`                            n percent
#>   <chr>                                                            <int>   <dbl>
#> 1 match_06. Automatic alignment with synonymous term among accept…   112 0.926  
#> 2 match_06. Automatic alignment with synonymous term among known …     6 0.0496 
#> 3 match_08. Automatic alignment with synonymous name in APNI (202…     1 0.00826
#> 4 match_14. Automatic alignment with species-level canonical name…     1 0.00826
#> 5 match_20. Rewording name to be recognised as genus rank, with g…     1 0.00826

^{Created on 2023-07-20 with reprex v2.0.2}

Cleaning s lat from names before matching

A few points in my data set didn't match because they had "s lat" in the species names, rather than "sensu lato" or "s.l.". Could this be added to the regular expressions for cleaning_names before matching?

Need a new package name

Hi team,

Anne Fuchs (@ afuchs1) advised "On a different topic, are you aware that the acronym of “AusFlora” is being used for the Flora of Australia https://ausflora.net/ and <www.ausflora.org.au> will this become confusing in the long term as the two products are somewhat different."

So suggest we find a new package name.

How about ausfloralign?

Suggestions welcome @wcornwell @rubysaltbush @ehwenk!

Use cases

I'll put some time into updating this package sometime soon. It would be helpful to better understand use cases. If anyone could document use cases for the package, that would be very helpful. Particularly if they're not met by current functionality.

Thoughts @wcornwell @rubysaltbush @eflower @ehwenk

Thanks!

naming of columns is confusing

specifically aligned_name versus canonical_name.

i agree with @yangsophieee point here

Option to pull up-to-date APC and APNI

this could be as simple as

read_csv("https://biodiversity.org.au/nsl/services/export/taxonCsv")

Add option to "turn off" fuzzy matching in align_taxa?

Hello!

Would it be possible to add an argument in align_taxa to set fuzzy matching to FALSE?

Perhaps it is an unusual use case but I'm trying to run apcnames on a list of species that includes both Australian and a large number of international taxa. I'm only interested in the Australian taxa, but as the APC is the best source of info on what taxa are found in Australia I thought apcnames might be a good way to separate Aus from international taxa. Unfortunately though the fuzzy matching in align_taxa coerced a large number of genuses/species not found in Aus into adjacent Australian taxa.

I've figured out a workaround but thought perhaps it could be useful to have an argument to turn fuzzy matching in align_taxa on or off? Or could this be dealt with by changing the max_distance settings?

what should update_taxonomy("Acacia sp.",resources=resources) return?

Should "taxonomic_resources" be unloaded from the global environment when finished?

The update_taxonomy function loads a large list of 6 variables into the global environment while it's running. Is it possible to unload this taxonomic_resources list when the function finishes if it's no longer needed?

Speeding up loading tax resources

Have done some investigation on this and it looks like it's contentid::register which is slow

should be resolved once #25 and #34 are addressed, but opening this in case they are not.

`update_taxonomy` returns some duplicate rows

The update_taxonomy function returned duplicate rows for two species in my data set - both orchids matched via APNI (Caladenia tentaculata and Pterostylis aff. nana). I suspect this is because of problematic taxonomy for these taxa leading to duplicated records in APNI. The duplicate rows are identical in each column so I have removed them by calling dplyr::distinct on the output from update_taxonomy, not sure if this could/should be worked into the function or not.

reduce the size of APNI file?

If I understand right we're only using APNI when the name is not in APC, so APNI could be reduced to just the rows that don't duplicate APC?

which would take us from 132385 to 29059 rows

testing at scale

going to just leave this in an issue for the moment

return authority in create lookup functions

invalid UTF-8 fail

only load APC/APNI files once per call

move loading to top level

Archive APC and APNI on Zenodo

Currently we're storing copies of data in gut hub releases but ideally we'd archive copies of the APC/APNI outputs on Zenodo. These files will then be drawn upon by the package.

I checked in which @afuchs1, who's part of the team preparing and publishing the data. They have been thinking similar and have are currently working towards this goal. so we can stay tuned for updates.

Discussion on roles on ausflora

I'd like to update the DESCRIPTION file at some point (not urgent!)

Leaving this here for reference

I would assume @ehwenk did all the heavy lifting for this package! 💪
Would Anna Monroe and Anne Fuchs like to be named on package as key players in data distribution 📊
Same with Carl with contentID 🗄️
and everyone wonderful found here

False response to native_anywhere_in_aus versus NA

When running the function ausflora::native_anywhere_in_australia(), inputting a mis-spelt taxon returns FALSE for the outcome native_anywhere_in_aus.

For example, ausflora::native_anywhere_in_australia("Banks").

Could this instead return, as an example, not in list or NA?

dplyr::bind_rows error when running `update_taxonomy`

Hello!

Tried installing current Github version of ausflora to run on a new dataset, but got the following dplyr::bind_rows error when trying to run update_taxonomy

> updated_sisspec <- ausflora::update_taxonomy(aligned_sisspec$aligned_name)
Error in `dplyr::bind_rows()`:
! Can't combine `..1$source` <character> and `..2$source` <logical>.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_incompatible_type>
Error in `dplyr::bind_rows()`:
! Can't combine `..1$source` <character> and `..2$source` <logical>.
---
Backtrace:
 1. ausflora::update_taxonomy(aligned_sisspec$aligned_name)
 5. dplyr::bind_rows(taxa_APC, taxa_APNI)
 8. vctrs::vec_rbind(!!!dots, .names_to = .id)

I'm not sure if this is a problem caused by a particular name in my list of names
sister_species.csv ?
I've tried removing the few non-Australian taxa that returned NAs for aligned_name but this did not fix it.

Create package documentation

Just creating this issue so I can tag my commits and PR.

Documentation we need:

package level {ausflora} usethis::use_package_doc
intro to ausflora vignette usethis::use_vignette
explainer on taxa matching algorithm usethis::use_article accessed via pkgdown website but not via R using vignettes("ausflora"). Articles are suited for longer form documentation
explainer on how caching works in {ausflora}

NA in source from `align_taxa` but not in `update_taxonomy`

Source is NA in align_taxa but seems to return nicely in update_taxonomy

user_data <- tibble( my_species_names = c(
  "Eucalyptus regnans", "Acacia melanoxylon",
  "Banksia integrifolia", "Commersonia rosea",
  "Not a species"),other_trait_data = rnorm(5))

aligned<- align_taxa(user_data$my_species_names, resources = resources)

aligned |> select(source)

updated <- aligned$aligned_name |> update_taxonomy(resources = resources)

updated |> select(source)

traitecoevo / apcalign Goto Github PK

apcalign's People

Contributors

Stargazers

Watchers

Forkers

apcalign's Issues

Recommend Projects

Recommend Topics

Recommend Org