forestgeo / fgeo.tool Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 6.0 3.79 MB

[R-package on CRAN] General purpose tools for ForestGEO Packages

Home Page: https://forestgeo.github.io/fgeo.tool

License: Other

R 99.81% Shell 0.19%

fgeo forestgeo tree ecology dynamics tools utils miscelaneas

fgeo.tool's Introduction

Import and manipulate ForestGEO data

fgeo.tool helps you to import and manipulate ForestGEO data.

Installation

Install the latest stable version of fgeo.tool from CRAN with:

install.packages("fgeo.tool")

Install the development version of fgeo.tool from GitHub with:

# install.packages("devtools")
devtools::install_github("forestgeo/fgeo.tool")

Or install all fgeo packages in one step.

Example

library(fgeo.tool)
#> 
#> Attaching package: 'fgeo.tool'
#> The following object is masked from 'package:stats':
#> 
#>     filter
# Helps access data for examples
library(fgeo.x)

example_path() allows you to access datasets stored in your R libraries.

example_path()
#>  [1] "csv"           "mixed_files"   "rdata"         "rdata_one"    
#>  [5] "rds"           "taxa.csv"      "tsv"           "vft_4quad.csv"
#>  [9] "view"          "weird"         "xl"

(vft_file <- example_path("view/vft_4quad.csv"))
#> [1] "/usr/local/lib/R/site-library/fgeo.x/extdata/view/vft_4quad.csv"

read_vft() and read_taxa() import a ViewFullTable and ViewTaxonomy from .tsv or .csv files.

read_vft(vft_file)
#> # A tibble: 500 × 32
#>     DBHID PlotName PlotID Family   Genus Speci…¹ Mnemo…² Subsp…³ Speci…⁴ Subsp…⁵
#>     <int> <chr>     <int> <chr>    <chr> <chr>   <chr>   <chr>     <int> <chr>  
#>  1 385164 luquillo      1 Rubiace… Psyc… brachi… PSYBRA  <NA>        185 <NA>   
#>  2 385261 luquillo      1 Urticac… Cecr… schreb… CECSCH  <NA>         74 <NA>   
#>  3 384600 luquillo      1 Rubiace… Psyc… brachi… PSYBRA  <NA>        185 <NA>   
#>  4 608789 luquillo      1 Rubiace… Psyc… berter… PSYBER  <NA>        184 <NA>   
#>  5 388579 luquillo      1 Arecace… Pres… acumin… PREMON  <NA>        182 <NA>   
#>  6 384626 luquillo      1 Araliac… Sche… moroto… SCHMOR  <NA>        196 <NA>   
#>  7 410958 luquillo      1 Rubiace… Psyc… brachi… PSYBRA  <NA>        185 <NA>   
#>  8 385102 luquillo      1 Piperac… Piper glabre… PIPGLA  <NA>        174 <NA>   
#>  9 353163 luquillo      1 Arecace… Pres… acumin… PREMON  <NA>        182 <NA>   
#> 10 481018 luquillo      1 Salicac… Case… arborea CASARB  <NA>         70 <NA>   
#> # … with 490 more rows, 22 more variables: QuadratName <chr>, QuadratID <int>,
#> #   PX <dbl>, PY <dbl>, QX <dbl>, QY <dbl>, TreeID <int>, Tag <chr>,
#> #   StemID <int>, StemNumber <int>, StemTag <int>, PrimaryStem <chr>,
#> #   CensusID <int>, PlotCensusNumber <int>, DBH <dbl>, HOM <dbl>,
#> #   ExactDate <date>, Date <int>, ListOfTSM <chr>, HighHOM <int>,
#> #   LargeStem <chr>, Status <chr>, and abbreviated variable names ¹SpeciesName,
#> #   ²Mnemonic, ³Subspecies, ⁴SpeciesID, ⁵SubspeciesID
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

pick_dbh_under(), drop_status() and friends pick and drop rows from a ForestGEO ViewFullTable or census table.

tree5 <- fgeo.x::tree5

tree5 %>% 
  pick_dbh_under(100)
#> # A tibble: 18 × 19
#>    treeID stemID tag    StemTag sp     quadrat    gx    gy Measu…¹ Censu…²   dbh
#>     <int>  <int> <chr>  <chr>   <chr>  <chr>   <dbl> <dbl>   <int>   <int> <dbl>
#>  1   7624 160987 108958 175325  TRIPAL 722     139.  425.   486675       5  10.9
#>  2  19930 117849 123493 165576  CASARB 425      61.3 496.   471979       5  23.6
#>  3  31702  39793 22889  22889   SLOBER 304      53.8  73.8  447307       5  67  
#>  4  35355  44026 27538  27538   SLOBER 1106    203.  110.   449169       5  50  
#>  5  39705  48888 33371  33370   CASSYL 1010    184.  194.   451067       5  67  
#>  6  57380 155867 66962  171649  SLOBER 1414    274.  279.   459427       5  16.6
#>  7  95656 129113 131519 131519  OCOLEU 402      79.7  22.8  474157       5  23.6
#>  8  96051 129565 132348 132348  HIRRUG 1403    278    40.6  474523       5  12.9
#>  9  96963 130553 134707 134707  TETBAL 610     114.  182.   475236       5  18.6
#> 10 115310 150789 165286 165286  MANBID 225      24.0 497.   483175       5  14.6
#> 11 121424 158579 170701 170701  CASSYL 811     146.  218.   484785       5  20.2
#> 12 121689 158871 171277 171277  INGLAU 515      84.2 285.   485077       5  13.4
#> 13 121953 159139 171809 171809  PSYBRA 1318    247.  354.   485345       5  14  
#> 14 124522 162698 174224 174224  CASSYL 1411    279.  210.   488386       5  13.1
#> 15 125038 163236 175335 175335  CASSYL 822     153.  426.   488924       5  14.5
#> 16 126087     NA 177394 <NA>    CASARB 521      89.8 408.       NA      NA  NA  
#> 17 126803     NA 178513 <NA>    PSYBER 622     113.  426        NA      NA  NA  
#> 18 126934     NA 178763 <NA>    MICRAC 324      47   480.       NA      NA  NA  
#> # … with 8 more variables: pom <chr>, hom <dbl>, ExactDate <date>,
#> #   DFstatus <chr>, codes <chr>, nostems <dbl>, status <chr>, date <dbl>, and
#> #   abbreviated variable names ¹MeasureID, ²CensusID
#> # ℹ Use `colnames()` to see all variable names

pick_main_stem() and pick_main_stemid() pick the main stem or main stemid(s) of each tree in each census.

stem <- download_data("luquillo_stem6_random")

dim(stem)
#> [1] 1320   19
dim(pick_main_stem(stem))
#> Warning: The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0.
#> Please use the `.add` argument instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
#> [1] 1000   19

add_status_tree() adds the column status_tree based on the status of all stems of each tree.

stem %>% 
  select(CensusID, treeID, stemID, status) %>% 
  add_status_tree()
#> # A tibble: 1,320 × 5
#>    CensusID treeID stemID status status_tree
#>       <int>  <int>  <int> <chr>  <chr>      
#>  1        6    104    143 A      A          
#>  2        6    119    158 A      A          
#>  3       NA    180    222 G      A          
#>  4       NA    180    223 G      A          
#>  5        6    180    224 G      A          
#>  6        6    180    225 A      A          
#>  7        6    602    736 A      A          
#>  8        6    631    775 A      A          
#>  9        6    647    793 A      A          
#> 10        6   1086   1339 A      A          
#> # … with 1,310 more rows
#> # ℹ Use `print(n = ...)` to see more rows

add_index() and friends add columns to a ForestGEO-like dataframe.

stem %>% 
  select(gx, gy) %>% 
  add_index()
#> Guessing: plotdim = c(320, 500)
#> * If guess is wrong, provide the correct argument `plotdim`
#> # A tibble: 1,320 × 3
#>       gx    gy index
#>    <dbl> <dbl> <dbl>
#>  1  10.3  245.    13
#>  2 183.   410.   246
#>  3 165.   410.   221
#>  4 165.   410.   221
#>  5 165.   410.   221
#>  6 165.   410.   221
#>  7 149.   414.   196
#>  8  38.3  245.    38
#>  9 143.   411.   196
#> 10  68.9  253.    88
#> # … with 1,310 more rows
#> # ℹ Use `print(n = ...)` to see more rows

Get started with fgeo

Information

fgeo.tool's People

Contributors

Stargazers

Watchers

Forkers

ayushranjan1 overstreeth sainirock61 helixcn fdbesanto2 jimhester

fgeo.tool's Issues

Improve by_group() (or groupwise())

# I want to make any funciton work with grouped data. I propose 
# by_group(.data, .f, ...), similar to map(), by applies .f() not to each column
# of .data but to each group -- and to all of it if .data is ungrouped.
# alias: grouply(), groupwise().

# History of this feature including upcomming dplyr::nest_by()
# https://community.rstudio.com/t/is-nest-mutate-map-unnest-really-the-best-alternative-to-dplyr-do/11009/7?u=mauro_lepore

library(tidyverse)

# E.g.
# nest_by_groups(mtcars)
# nest_by_groups(group_by(mtcars, cyl))
nest_by_groups <- function(.data) {
  g <- group_vars(.data)
  .data %>% 
    tibble::as.tibble() %>% 
    tibble::add_column(.nest_id = dplyr::group_indices(.)) %>% 
    dplyr::ungroup() %>%
    tidyr::nest(-.nest_id)
}

by_group <- function(.data, .f, ...) {
  .data %>% 
    nest_by_groups() %>% 
    mutate(data = map(.data$data, .f, ...)) %>% 
    tidyr::unnest() %>% 
    dplyr::select(-.nest_id)
}

# E.g. 
first_row <- function(.x, to_chr = FALSE) {
  first <- .x[1, ]
  if (to_chr) {
    first[] <- lapply(first, as.character)
  }
  
  tibble::as.tibble(first)
}
mtcars %>% 
  by_group(first_row, to_chr = TRUE)
#> # A tibble: 1 x 11
#>   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
#>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4

mtcars %>% 
  group_by(cyl) %>% 
  by_group(first_row, to_chr = T)
#> # A tibble: 3 x 11
#>   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
#>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
#> 2 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    
#> 3 18.7  8     360   175   3.15  3.44  17.02 0     0     3     2

Created on 2018-07-16 by the reprex package (v0.2.0.9000).

Create tree table from stem table

Pull the code out of rtbl?

Rewrite? E.g.

pick_stem_max_dbh <- function(.data) {
  # Allow input as ViewFullTable, stem and tree table
  old <- names(.data)
  names(.data) <- tolower(old)
  fgeo.base::check_crucial_names(.data, c("treeid", "stemid", "dbh"))
  
  # TODO: Check for unique censusid or group by census id 
  
  # TODO:
  # if (tree_already) {return(stats::setNames(.data, old))}
  
  .data <- tibble::rowid_to_column(.data)
  .data <- dplyr::group_by(.data, .data$treeid)
  .data <- dplyr::arrange(.data, dplyr::desc(.data$dbh))
  .data <- dplyr::ungroup(.data)
  
  out <- dplyr::distinct(.data, .data$treeid, .keep_all = TRUE)
  # Recover original order
  out <- dplyr::select(dplyr::arrange(out, .data$rowid), -.data$rowid)
  
  stats::setNames(out, old)
}

df <- tibble::tibble(
  treeID = c(1, 1, 1, 2, 2, 2),
  stemID = letters[c(1, 2, 3, 1, 2, 3)],
  dbh = c(1, 2, NA, 4, 5, NA)
)
df
#> # A tibble: 6 x 3
#>   treeID stemID   dbh
#>    <dbl> <chr>  <dbl>
#> 1      1 a          1
#> 2      1 b          2
#> 3      1 c         NA
#> 4      2 a          4
#> 5      2 b          5
#> 6      2 c         NA

pick_stem_max_dbh(df)
#> # A tibble: 2 x 3
#>   treeID stemID   dbh
#>    <dbl> <chr>  <dbl>
#> 1      1 b          2
#> 2      2 b          5

Created on 2018-06-28 by the reprex package (v0.2.0).

In add_var() don't change original names.

In csv_to_df() check that dataframes are `rowbind()`able.

This may be removed via #61

Update fgeo_pkgs in fgeo.tool::fgeo_package_deps()

thank @rick_pack2

replace_all_na <- function(x, filler = 0) {
  replace(x, http://is.na (x), filler)
}
replace_all_na(df, 0)
#>   x y
#> 1 0 2
#> 2 1 0
replace_all_na(df, "missing")
#>         x       y
#> 1 missing       2
#> 2       1 missing

Write helpers to categorize data

From @maurolepore on August 31, 2017 16:19

Gabriel proposed to develop a friendly way to categorize (cut) numeric variables. Important ones include:

from dbh to size category.
from gx and gy to quadrat: see add_quad().

suppressPackageStartupMessages(library(fgeo))
x <- tibble::tibble(gx = c(0, 50, 999.9, 1000), gy = gx/2)
add_quad(x)
#> Gessing: plotdim = c(1000, 500)
#>   * If guess is wrong, provide the correct argument `plotdim`
#> # A tibble: 4 x 3
#>      gx    gy quad 
#>   <dbl> <dbl> <chr>
#> 1    0     0  0101 
#> 2   50    25  0302 
#> 3 1000.  500. 5025 
#> 4 1000   500  NANA
add_quad(x, start = 0)
#> Gessing: plotdim = c(1000, 500)
#>   * If guess is wrong, provide the correct argument `plotdim`
#> # A tibble: 4 x 3
#>      gx    gy quad 
#>   <dbl> <dbl> <chr>
#> 1    0     0  0000 
#> 2   50    25  0201 
#> 3 1000.  500. 4924 
#> 4 1000   500  NANA

Created on 2018-07-02 by the reprex package (v0.2.0).

Copied from original issue: forestgeo/fgeo.abundance#46

In fgeo.tool::add_status_tree() check for TreeID instead of tag

First understand why there may be more than one tag per TreeID (forestgeo/fgeo.data#24).

Suzanne said it was an error.

Consider removing keep_alive_stem() et al

Can I replace it with base:::keep_*()?

Rename topic filter_status to drop_dead for discovery with fgeo_help()

select2()

select2 <- function(.data, ...) {
  dots <- rlang::list2(...)
  select(.data, !!!text_exprs(dots))
}

text_exprs <- function(...) {
  rlang::parse_exprs(semicolon(...))
}

semicolon <- function(...) {
  paste0(..., collapse = "; ")
}

library(tidyverse)

mtcars <- as.tibble(mtcars)

text_vars <- c("mpg", "am")
select2(mtcars, text_vars)
#> # A tibble: 32 x 2
#>      mpg    am
#>  * <dbl> <dbl>
#>  1  21       1
#>  2  21       1
#>  3  22.8     1
#>  4  21.4     0
#>  5  18.7     0
#>  6  18.1     0
#>  7  14.3     0
#>  8  24.4     0
#>  9  22.8     0
#> 10  19.2     0
#> # ... with 22 more rows

# If multiple strings, separate vars with comma
text_vars <- c("mpg", "am", "carb")
select2(mtcars, text_vars)
#> # A tibble: 32 x 3
#>      mpg    am  carb
#>  * <dbl> <dbl> <dbl>
#>  1  21       1     4
#>  2  21       1     4
#>  3  22.8     1     1
#>  4  21.4     0     1
#>  5  18.7     0     2
#>  6  18.1     0     1
#>  7  14.3     0     4
#>  8  24.4     0     2
#>  9  22.8     0     2
#> 10  19.2     0     4
#> # ... with 22 more rows

# If single string, separate with semicolon
text_vars <- c("mpg; am; carb")
select2(mtcars, text_vars)
#> # A tibble: 32 x 3
#>      mpg    am  carb
#>  * <dbl> <dbl> <dbl>
#>  1  21       1     4
#>  2  21       1     4
#>  3  22.8     1     1
#>  4  21.4     0     1
#>  5  18.7     0     2
#>  6  18.1     0     1
#>  7  14.3     0     4
#>  8  24.4     0     2
#>  9  22.8     0     2
#> 10  19.2     0     4
#> # ... with 22 more rows

# With a single string you can do all you can do with bare
text <- 'mpg:cyl; matches("df"); everything()'
select2(mtcars, text)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows
# Same
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
#>     rep_along, splice
bare <- exprs(mpg:cyl, matches("df"), everything())
select(mtcars, !!!bare)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows

Created on 2018-04-29 by the reprex package (v0.2.0).

Add to Family: filters of `tree_status`: Rename rm_dead_twice

To Make it consistent with

not_dead_twice(x)
tree_not_dead_twice(x)

tree_status(x, status = "not_dead_twice")

reexport the most important deplyr verbs

reexport the main dplyr verbs

Consider reexporting the main verbs and common tools.

pro:

everything is available with (fgeo)
con:
conflicts will show up when loading tidyverse
makes it harder to discover the tidyverse

main verbs

filter
select
arrange
summarize
mutate

also

group_by
tibble
tribble
as_tibble
count
add_count

Pull out to fgeo.dvlp the family of functions for developers

Rename restructure_elev to reflect uncertainty

This relates to #16

Consider:
restructure_if_need()
restructure_if_should()
restructure_if_needed()

may_restructure()
restructure_maybe()
restructure_possibly()

Write a function to identify then homogenize fgeo tables

homogenize may have a method for objects of class elevation.

Fix byyr_abundance() to count only one individual per Tag not per StemID

Here the output should be 1 -- ont 2 (@).

library(fgeo)
#> -- Attaching packages -------------------------------------------- fgeo 0.0.0.9000 --
#> v bciex           0.0.0.9000     v fgeo.demography 0.0.0.9000
#> v fgeo.abundance  0.0.0.9004     v fgeo.habitat    0.0.0.9006
#> v fgeo.base       0.0.0.9001     v fgeo.map        0.0.0.9204
#> v fgeo.data       0.0.0.9002     v fgeo.tool       0.0.0.9003
#> 

vft <- data.frame(
  StemID = c("1", "2"),
  Tag = c("0001", "0001"),
  PlotName = "p",
  Status = c("alive", "alive"),
  DBH = c(10, 100),
  ExactDate = c("2000-01-01", "2000-01-01"),
  PlotCensusNumber = c(1, 1),
  CensusID = c(1, 1),
  Genus = c("A", "A"),
  SpeciesName = c("a", "a"),
  Family = "f",
  stringsAsFactors = FALSE
)
vft
#>   StemID  Tag PlotName Status DBH  ExactDate PlotCensusNumber CensusID
#> 1      1 0001        p  alive  10 2000-01-01                1        1
#> 2      2 0001        p  alive 100 2000-01-01                1        1
#>   Genus SpeciesName Family
#> 1     A           a      f
#> 2     A           a      f

# First pick the data you want
pick1 <- pick_plotname(vft, "p")
#> Using: p.

pick2 <- drop_dead_trees_by_cns(pick1)
#> Calculating tree-status (from stem `Status`) by `PlotCensusNumber`.
#> Warning: No observation has .status = dead
#>   * Detected values: alive
#> Dropping rows where `Status = dead`.
#> Warning: No observation has .status = dead
#>   * Detected values: alive
pick3 <- pick_dbh_min(pick2, 10)
pick3
#> # A tibble: 2 x 12
#>   StemID Tag   PlotName Status   DBH ExactDate  PlotCensusNumber CensusID
#>   <chr>  <chr> <chr>    <chr>  <dbl> <chr>                 <dbl>    <dbl>
#> 1 1      0001  p        alive     10 2000-01-01                1        1
#> 2 2      0001  p        alive    100 2000-01-01                1        1
#> # ... with 4 more variables: Genus <chr>, SpeciesName <chr>, Family <chr>,
#> #   status_tree <chr>

byyr_abundance(pick3)
#> # A tibble: 1 x 3
#>   species Family `2000`
#>   <chr>   <chr>   <dbl>
#> 1 A a     f           2

Created on 2018-06-20 by the reprex package (v0.2.0).

Write function to read database output

Wrap as read_vft()

(Similar to read_taxa()).

read_tsv(
  file, 
  col_types = fgeo.tool::type_taxa(), 
  na = c("", "NA", "NULL")
)

to_wide() to_long() may be a useful generic

It would spread or gather whatever makes sense for each fgeo class.

Add add_qxqy()

New data structure has QX and QY variables

map_tag() fails with numeric tags

Clean flag_multiple() and multiple_var().

DRY. Some funcitons seem duplicated with fgeo.tool.

flat_multiple() is being used but flag_if() and multiple_var() are not. See if and which is a better alternative to flag_multiple() and similar functions in fgeo.tool. Then clean the rest.

Handle lowering and restoring names via attributes

Attach old names to the object in question, via attributes.

FastField Forms spreadsheet export - missing required sheets

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/, https://community.rstudio.com/, https://github.com/forestgeo/forum/ or email Mauro Lepore at [email protected].

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Brief description of the problem
When using FastField Forms for a re-census we've discovered or had submissions missing one or multiple of the required sheets: 'new_secondary_stems', 'recruits', and 'original_stems'. If no recruits are recorded in a quadrat, no sheet is exported. Likewise, post-submission if a recruit is found, then added to a 'new' file by itself, the 'new_secondary_stems', and 'original_stems' sheets are missing. This of course breaks the code and the files cannot be compiled. Any solution to this, or should I reach out to FastField Forms/add the sheets ourselves?

> ######## Compile the separate worksheets from the quadrat workbooks into new excel files

> files.directory <- "C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/uploads"

> files.export <- "C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/exports/test"
> xl_sheets_to_xl(files.directory, files.export, first_census = FALSE)
Error: Data should contain these sheets:
original_stems, new_secondary_stems, recruits, root
* Missing sheets: recruits
In addition: Warning messages:
1: `new_secondary_stems` has cero rows. 
2: Filling every cero-row dataframe with NAs (new_secondary_stems). 
3: `new_secondary_stems` has cero rows. 
4: Filling every cero-row dataframe with NAs (new_secondary_stems).

error with xlff_to_dfs - repeating sheet name

After running the FastField Forms files through the xlff_to_xl function we then attempted to run the new xlff_to_dfs function. We were unsuccessful and received the following error:

> first_week_hf <- xlff_to_dfs("C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/exports/test")
Error: Column `sheet` must have a unique name
In addition: Warning message:
Adding missing sheets: original_stems, new_secondary_stems, recruits, [root.]

The name of the sheet for each stem repeats multiple times depending upon which sheet the stem originated from.
excel file

document abundance_tree() not count_distinc() or both separately

I want to index abundance.

In fill_na() add see also dplyr::coalesce()

In restructure elev acknowledge David

Write a function that stores attribute names_old and works with dplyr

Fix add_vars() to work with PX, PY. Don't use old data as a model.

Add nms_*() functions to handle names.

generalize warn_duplicated_treeid() to flag_duplicated_var()

generalize warn_duplicated_treeid to flag_duplicated_var(flag) (function factory):
- flag can be message(), warning(), stop()
- var could be any .data$var, maybe use tidyeval
- var is used to group by and to summarize

Update the variables dictionaly Create a google worksheet table names of different fgeo tables

Get the names of all tables
Import the available variable's definitions
Merge the available definitions with the names of each table
Push all data into a Google worksheet data_dictionary
Ask database team to fill the gaps
Create an app to quickly inspect variable names? Or it could just be the google worksheet

Identify each table by its names. Check the difference between names to understand what identifies each table uniquely
Check what names intersect
What variables are the same, regardless of their names. See definition in CFTS pakge
What variables are not the same
What variables matter -- the crucial names of all functions I have developed (search for crucial_names())

...
Fgeo_clasify x
If (names x match vft) as.vft
If ... as.stem

vft <- f (x, ...) structure(x, class = c(vft, tibble?, data.frame))

as.fgeo
as_fgeo?

homo_nms.vft
homo_nms.stem
...

Remove keep_alive_stem() and keep_alive_tree()

See fgeo.base::keep_status() and friends.

fill_na()

Rename replace _all_na() to fill_na() with is more specific. Replace is a general term on the implementation domain -- because it is inspired in replace() -- while fill_na() is on the problem domain.

Consider adding a method for lists.

fill_na <- function(x, filler) {
  x[is.na(x)] <- filler
  x
}

Diversity

E.g. from qo00 in maurolepore/hrv

diversity_by_sample <- wrangle() %>%
  dplyr::group_by(sample_name) %>%
  dplyr::mutate(
    shannon = vegan::diversity(abun_pcnt, "shannon", MARGIN = 2),
    invsimpson = vegan::diversity(abun_pcnt, "invsimpson", MARGIN = 2),
    simpson = vegan::diversity(abun_pcnt, "simpson", MARGIN = 2),
    richness = length(sample_name)
  ) %>%
  dplyr::ungroup() %>%
  dplyr::select(
    time_interval, collected_from, sample_name, species, shannon:richness
  ) %>%
  # Keep only one row per sample -- not one per sample per species
  dplyr::select(-species) %>%
  unique() %>%
  # Transform to long format to allow facetting by metric(key)
  tidyr::gather(key = metric, value = value, shannon:richness)

tree_status(x, status = "not_dead")
tree_not_dead(x)
tree_status(x, status = "*")
tree_*(x)

stem_status(x, status = "not_dead")
stem_not_dead(x)
stem_status(x, status = "*")
stem_*(x)

as_fgeo(): wrap rename() to privide API with the format as_fgeo(expect_nm1 = NULL, expect_nm2 = NULL, ...)

It could output a message of a table showing what renames were made.

  old_name      new_name
-----------------------------
"old name"    "new name"