Coder Social home page Coder Social logo

asci's Introduction

ASCI

*Marcus W. Beck [email protected], Robert Butler (maintainer) [email protected], Susanna Theroux, [email protected], Quynh-Thi Ho, [email protected]

Travis-CI Build Status AppVeyor Build Status DOI

R package materials to calculate the Algal Stream Condition Index (ASCI) based on pMMI scores using diatom, soft-bodied algae, or a hybrid appproach. A link to the ASCI manuscript can be found here: https://www.sciencedirect.com/science/article/pii/S1470160X20303587

Installation

Install the package as follows:

install.packages('devtools')
library(devtools)
install_github('SCCWRP/ASCI')
library(ASCI)

Usage

The sample files demo_algae_tax and demo_station are included to demonstrate the correct formats for the input data. The demo_algae_tax file is a data.frame of taxonomic data in long format (one row per sample). The demo_station file is a data.frame of GIS predictors in wide format, one row per station. See the help files for more information (e.g., ?demo_algae_tax). Also see the help file for chkinp() and calcgis() for requirements of each file to work with the ASCI.

head(demo_algae_tax)
## # A tibble: 6 x 7
##   StationCode SampleDate          Replicate SampleTypeCode BAResult  Result
##   <chr>       <dttm>                  <dbl> <chr>             <dbl>   <dbl>
## 1 909M24937   2016-06-22 00:00:00         1 Macroalgae           NA  1.21e9
## 2 909M24937   2016-06-22 00:00:00         1 Epiphyte             15 NA     
## 3 909M24937   2016-06-22 00:00:00         1 Epiphyte             83 NA     
## 4 909M24937   2016-06-22 00:00:00         1 Epiphyte              2 NA     
## 5 909M24937   2016-06-22 00:00:00         1 Integrated            5 NA     
## 6 909M24937   2016-06-22 00:00:00         1 Integrated           53 NA     
## # … with 1 more variable: FinalID <chr>
head(demo_station)
## # A tibble: 3 x 27
##   StationCode CondQR50 SITE_ELEV TEMP_00_09 KFCT_AVE  AtmCa PPT_00_09 MAX_ELEV
##   <chr>          <int>     <int>      <int>    <dbl>  <dbl>     <dbl>    <int>
## 1 404M07357         NA       199       2456    0.278 0.0554    55570.      783
## 2 801M16916         NA       197       2685    0.185 0.0652    25406.     3480
## 3 909M24937         NA       582       2442    0.202 0.106     37972.     1980
## # … with 19 more variables: CaO_Mean <dbl>, MgO_Mean <dbl>, S_Mean <dbl>,
## #   UCS_Mean <dbl>, LPREM_mean <dbl>, AtmMg <dbl>, AtmSO4 <dbl>, MINP_WS <dbl>,
## #   MEANP_WS <dbl>, SumAve_P <dbl>, TMAX_WS <dbl>, XWD_WS <dbl>,
## #   MAXWD_WS <dbl>, LST32AVE <dbl>, BDH_AVE <dbl>, PRMH_AVE <dbl>, PSA6C <chr>,
## #   XerMtn <lgl>, AREA_SQKM <dbl>

The output is in a wide format.

demo_results <- ASCI(demo_algae_tax, demo_station)
demo_results
## # A tibble: 3 x 84
##   SampleID StationCode SampleDate          Replicate SampleType D_ValveCount
##   <chr>    <chr>       <dttm>                  <dbl> <chr>             <int>
## 1 404M073… 404M07357   2016-06-13 00:00:00         1 Integrate…          600
## 2 801M169… 801M16916   2016-05-25 00:00:00         1 Microalga…          600
## 3 909M249… 909M24937   2016-06-22 00:00:00         1 Macroalga…          600
## # … with 78 more variables: S_EntityCount <int>, S_Biovolume <dbl>,
## #   D_NumberTaxa <int>, S_NumberTaxa <int>, H_NumberTaxa <int>,
## #   UnrecognizedTaxa <chr>, D_ASCI <dbl>, S_ASCI <dbl>, H_ASCI <dbl>,
## #   D_cnt.spp.most.tol_pct_att <dbl>, D_cnt.spp.most.tol_pred <dbl>,
## #   D_cnt.spp.most.tol_raw <int>, D_cnt.spp.most.tol_scr <dbl>,
## #   D_EpiRho.richness_pct_att <dbl>, D_EpiRho.richness_pred <dbl>,
## #   D_EpiRho.richness_raw <int>, D_EpiRho.richness_scr <dbl>,
## #   D_prop.spp.IndicatorClass_TN_low_pct_att <dbl>,
## #   D_prop.spp.IndicatorClass_TN_low_pred <dbl>,
## #   D_prop.spp.IndicatorClass_TN_low_raw <dbl>,
## #   D_prop.spp.IndicatorClass_TN_low_scr <dbl>,
## #   D_prop.spp.Planktonic_pct_att <dbl>, D_prop.spp.Planktonic_pred <dbl>,
## #   D_prop.spp.Planktonic_raw <dbl>, D_prop.spp.Planktonic_scr <dbl>,
## #   D_prop.spp.Trophic.E_pct_att <dbl>, D_prop.spp.Trophic.E_pred <dbl>,
## #   D_prop.spp.Trophic.E_raw <dbl>, D_prop.spp.Trophic.E_scr <dbl>,
## #   D_Salinity.BF.richness_pct_att <dbl>, D_Salinity.BF.richness_pred <dbl>,
## #   D_Salinity.BF.richness_raw <int>, D_Salinity.BF.richness_scr <dbl>,
## #   S_prop.spp.IndicatorClass_DOC_high_pct_att <dbl>,
## #   S_prop.spp.IndicatorClass_DOC_high_raw <dbl>,
## #   S_prop.spp.IndicatorClass_DOC_high_scr <dbl>,
## #   S_prop.spp.IndicatorClass_NonRef_pct_att <dbl>,
## #   S_prop.spp.IndicatorClass_NonRef_raw <dbl>,
## #   S_prop.spp.IndicatorClass_NonRef_scr <dbl>,
## #   S_prop.spp.IndicatorClass_TP_high_pct_att <dbl>,
## #   S_prop.spp.IndicatorClass_TP_high_raw <dbl>,
## #   S_prop.spp.IndicatorClass_TP_high_scr <dbl>, S_prop.spp.ZHR_pct_att <dbl>,
## #   S_prop.spp.ZHR_raw <dbl>, S_prop.spp.ZHR_scr <dbl>,
## #   H_cnt.spp.IndicatorClass_TP_high_pct_att <dbl>,
## #   H_cnt.spp.IndicatorClass_TP_high_pred <dbl>,
## #   H_cnt.spp.IndicatorClass_TP_high_raw <int>,
## #   H_cnt.spp.IndicatorClass_TP_high_scr <dbl>,
## #   H_cnt.spp.most.tol_pct_att <dbl>, H_cnt.spp.most.tol_pred <dbl>,
## #   H_cnt.spp.most.tol_raw <int>, H_cnt.spp.most.tol_scr <dbl>,
## #   H_EpiRho.richness_pct_att <dbl>, H_EpiRho.richness_pred <dbl>,
## #   H_EpiRho.richness_raw <int>, H_EpiRho.richness_scr <dbl>,
## #   H_OxyRed.DO_30.richness_pct_att <dbl>, H_OxyRed.DO_30.richness_pred <dbl>,
## #   H_OxyRed.DO_30.richness_raw <int>, H_OxyRed.DO_30.richness_scr <dbl>,
## #   H_prop.spp.Planktonic_pct_att <dbl>, H_prop.spp.Planktonic_pred <dbl>,
## #   H_prop.spp.Planktonic_raw <dbl>, H_prop.spp.Planktonic_scr <dbl>,
## #   H_prop.spp.Trophic.E_pct_att <dbl>, H_prop.spp.Trophic.E_pred <dbl>,
## #   H_prop.spp.Trophic.E_raw <dbl>, H_prop.spp.Trophic.E_scr <dbl>,
## #   H_prop.spp.ZHR_pct_att <dbl>, H_prop.spp.ZHR_raw <dbl>,
## #   H_prop.spp.ZHR_scr <dbl>, H_Salinity.BF.richness_pct_att <dbl>,
## #   H_Salinity.BF.richness_pred <dbl>, H_Salinity.BF.richness_raw <int>,
## #   H_Salinity.BF.richness_scr <dbl>, Comments <chr>, version_number <chr>

FAQ

Missing data

If a single algal assemblage is submitted (e.g. no soft algae taxa submitted), then the corresponding metrics and indices to the missing assemblage will return NA. However, if an assemblage is submitted but no taxa are attributed for the corresponding metrics, then the metrics will score the worst possible score. Samples with single assemblages submitted will result in a warning message. With few exceptions, missing values in stations data are not allowed.

Bad or Missing Field Names

All required field names must be present in input files. Please be sure to match the field names provided above. Although we have implemented scripts to make the inputs case-insensitive, we recommend conforming to the capitalizations shown above.

Stations with Catchments that Include Parts in Mexico

Portions of some streams include areas in Mexico. Because the geodatabases used to calculate ASCI predictors do not currently include this area, the ASCI cannot be calculated properly for these sites. The geodatabases will be updated within the next few months. In the interim, we make the following recommendations: If more than 90% of the area of a watershed is within California, treat the state boundary as the edge of the watershed and calculate the predictors accordingly. However, you should interpret these results with caution, particularly if the portion within Mexico contains substantially different natural features. For watersheds that are less than 90% within California, we recommend using the Southern California Algal Indices of Biotic Integrity (Fetscher et al. 2014) as a substitute index.

Unrecognized Taxa

Novel or misspelled species names will not be recognized by the calculator and will be output as unrecognized taxa. Users should modify these species in agreement with the SWAMP species lists and re-run the calculator. The calculator STE currently reflects the 2019 SWAMP lookup lists and is available to view HERE.

Metadata

Resources: SOP
Contact: Susanna Theroux

asci's People

Contributors

arm627 avatar fawda123 avatar qthi24 avatar r7butler avatar stheroux avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

arm627 stheroux

asci's Issues

reformat results output

see file at "Z:/MarcusBeck/Copy of Rafi's Q&D guide to ASCI report_v2.xls" and meeting notes from 8/6

prop.spp.ZHR

In mmifun.R, there were many concerning things I recently saw
There was one metric, prop.spp.ZHR which gets calculated via mmicalmetrics, that shows up in mmifun, in the sba.metrics and hybrid.metrics dataframes.

The sba.metrics and hybrid.metrics dataframes had mutate functions on them that were making an assumption that they had a column named prop.spp.ZHR_raw, when in fact it was just prop.spp.ZHR

taxrmv not found

In the end of chkinp.R, in the master branch,

Notice that taxrmv is only defined when there are differences between the taxa finalid column and the STE finalid column

What if all finalIDs in the taxa dataframe are in the STE Dataframe? Then the program will crash

It happened to me. Don't want to fix it on the master branch without permission

Error messages for single assemblage or wrong assemblage

The package should return an error message if:

  1. User has SBA names in an “integrated” fraction
  2. User has diatom names in “microalgae” or “macroalgae” or “qualitative” fractions
  3. The ASCI output should only return D_ASCI if only “integrated” fraction is submitted (no H_ASCI calculated). Also, only S_ASCI calculated if only “microalgae” and/or ”macroalgae” fractions are submitted. H_ASCI only returned if BOTH “integrated” + micro/macroalgae fractions are submitted.

Case insensitive for all metadata headers

From a beta tester: "The ASCI calculator did not recognize 'UCS_mean', which is the output from Tyler's tool. I changed it to 'UCS_Mean' and it worked"

Would be great to allow for case insensitive metadata headers to be accepted (e.g. UCS_mean, UCS_MEAN, UCS_MeAN, etc).

Changes to oescr and pmmiscr functions

Some of these are long term, and others are quickies:

  1. Drop null outputs
  2. Drop BC outputs (or make that a toggle option, default to suppressed)
  3. Add percentiles: round(pnorm(OoverE, mean, sd), digits=2)
  4. Let's go for some consistency with the CSCI reporting names (e.g., call this a "core" report, rather than "Scores"
  5. We need a few more reports added:
    4.a: A site by group probability matrix (like CSCI's Suppl1_grps report)
    4.b: A sample-otu-count-captureprob flat file (like CSCI's Suppl1_oe report) (use the same field names to the extent possible)
  6. Add StationCode and SampleID as variables to each relevant report
  7. For the pMMI outputs, we need each metric to have raw, predicted, and scored values (lke CSCI's Suppl1_mmi report)
  8. Combine get the OE and MMI results integrated into a single report. Have user provide options on which indices to return
  9. Because there's a subsampling step, we should return the subsampled results
    8a. Because there's a subsampling step, we may want to follow the CSCI's example and do this iteratively (and return the appropriate results). This is a Susie decision, I think, but it would be good for her to know if that's a huge pain in the butt.
    8b: Because there's a subsampling option, we should give the user the ability to set a seed. (Or we can prompt them in documentation)
  10. We'll need to think about appropriate QA, like we have for the CSCI. Again, a Susie decision. But here are a few things I think are certain:
    -Diatom count
    -SBA count
    -% ambiguous individuals (for O/E)
    -% ambiguous taxa (for O/E)
  11. In general, we should think about the kinds of error-checking we need, with the intention of creating informative error messages. Some examples:
    -Checking that you have all the required fields in the inputs
    -Checking that all sites in the taxonomy data show up in the stations data.
    -Making the taxonomic data case-insensitive

Ok, that's all for now.

Rename NumberTaxa

I noticed another concerning thing in mmifun.R

There is a part where it is doing some stuff with hybrid.metrics and sba.metrics and it is saying

%>%
rename(
#NumberTaxa = richness_raw
NumberTaxa = 10
)

It didn't like this at all, saying that "there are only 9 columns"

As a temporary "fix" I commented it out and it gets past that part

Accept PSA6, PSA6c, PSA6C

Input stations data may have "PSA6", PSA6c", or "PSA6C" and they should all be accepted and used for the XerMtn calculation.

Issue with devtools

From Marco: I was running R 3.4.1 but updated to R 3.5.1 because the ASCI calculator says I need R 3.5.0 or later. My problem, though, is I cannot install 'devtools'. I have spent hours searching the internet to try to solve my problem and Heili was unable to help. Have you had any issues updating? Some of the packages (e.g., processx, callr, cachem, memoise) have a binary version earlier than the source version, and the ASCI package requires the later source version but my system will not update accordingly. I am including some screen shots so you can see some of the errors. If I cannot figure this out, I cannot calculate ASCI for Andy. Any help would be much appreciated.
1image
11image

Update STE

Need to update STE to include new Final IDs from SWAMP/CEDEN and also a feature to accept new and old taxa names and calculate scores with appropriate trait attributes.

stringr package not loading

Users are reporting error bc stringr package is not loading correctly and therefore str_trim is not working.

Convert Zeros to NAs

If BAResult or Result column has a "0", convert to NA before proceeding with calculations.

check some calculations for reformatted output

Line number links are below, provided by description of what it's supposed to do:

  • Diatom, soft-bodied and algae data subsets, are we subsetting these correctly (i.e., diatoms are integrated, etc.)
  • Percent attributed taxa relevant for each metric
  • Check biovolume and sample counts
  • Check percent unrecognized taxa calculations

    ASCI/R/ASCI.R

    Line 85 in 6ccc821

    UnrecognizedTaxa = paste0(setdiff(FinalID, STE$FinalID), collapse = '|')

Add more informative percent attributed

Right now percent attributed is not very informative, since it is essentially the same as the metric itself (e.g. proportion BCG12 taxa). What would be more useful is the percent attributed to reflect the percentage of total taxa with any version of the attribute (e.g. BCG 1,2,3,4,5,6).

Taxa counts are counting taxa twice if in two different SampleTypeCodes

We need to make Taxa Counts (S_NumberTaxa,D_NumberTaxa, etc) occur on a dataframe that does not include SampleTypeCode. e.g:

Right now:
Epiphyte Species A
Microalgae Species A
Microalgae Species B
S_NumberTaxa == 3

It should be:
Epiphyte Species A
Microalgae Species A
Microalgae Species B
S_NumberTaxa == 2

NULL randomForest object

In the "robert" branch this shows up in mmifun.R on line 292, (as of now)

hybrid.predmet <- stationid %>%
mutate(
# hybrid.cnt.spp.IndicatorClass_TP_high is supposed to be a randomForest model object thing
# However, right now it is saying that it is NULL........
cnt.spp.IndicatorClass_TP_high_pred = predict(rfmods$hybrid.cnt.spp.IndicatorClass_TP_high, newdata = .[, c("PPT_00_09", "KFCT_AVE")]),

that rfmods$hybrid.cnt.spp.IndicatorClass_TP_high is a randomForest object type of thing, but for some reason it is coming up as NULL and the script freaks out and says that the predict function was expecting a randomForest thing, but it got a NULL object

S_EntityCount calculations should ignore diatom taxa

I believe the S_EntityCount calculations (Lines 77-84 in ASCI.R script) does not ignore any diatom taxa – it should. The way to modify this would be to drop any taxa with Phylum == Bacillariophyta before summing the BAResult column.

  • From an email from Susie

Rewrite checks for XerMtn and PSA6C

I want to re-do some code in calcgis.R in the ASCI package.

Basically, the idea is this:
The ASCI function needs XerMtn.
If the station data that they gave the function is missing XerMtn, then XerMtn may be calculated from PSA6C.

PSA6C values must be one of these
PSA6C_values <- c(
'NC','North Coast',
'DM','Deserts Modoc',
'CH','Chaparral',
'CV','Central Valley',
'SC','South Coast',
'SN','Sierra Nevada'
)

If the value is the long name, it must be converted to the abbreviation.

code to derive XerMtn from PSA6C is already in calcgis.R. It is very straightforward

XerMtn should be either a 0 or a 1.

There is much to derive from this comment, but that will have to be figured out and put into the check

Descriptive PSA6C entries

From a beta-tester: "PSA6C is populated with text values (e.g., Chaparral, Central Valley) rather than abbreviations (e.g., CH, CV) when coming out of Tyler's tool. I suggest adding the text values to your crosswalk or conversion table so XerMtn can handle both options since there will be historic and future metrics calculated."

NC==North Coast
DM==Deserts Modoc
CH==Chaparral
CV==Central Valley
SC==South Coast
SN==Sierra Nevada

Negative diatom valve counts

When a sample has no "integrated" fraction, diatom valve count should equal 0. Instead, the ASCI calculator is returning negative values. This section of code likely needs to be modified (sorry I can't edit pipes!). Probably lines 68-74 in ASCI.R.

Demo data not running

Getting the following error message when running the demo data:
Error in prop.spp.BCG12/richness :
non-numeric argument to binary operator

Thanks!

Handling missing info

Request from RB9:

  1. Ability to have not a perfect match between the GIS and taxa data -- i.e. not have to have GIS file match all taxa file sample IDs.
  2. Run scripts when there is missing taxonomy data (e.g. diatom but no sba data submitted), just add flags.

does a seed need to be set?

In the commit a05ab21, the file OE.caret.load.and.source.R was removed but it included a call to set.seed. Is this required or an artifact from index dev?

Error in as.POSIXct.numeric(value) Origin must be supplied

There is an issue with the package where if you try to run ASCI on a site that has only SBA or only diatom data, it errors out in mmifun.R

I figured this out by running ASCI on a site that had only diatom data, and then did traceback()

Specifically there is a function called chkmt that gets defined within the mmifun.R, and this is the place where it errors out at.

Apparently, if a site has only diatom data, it errors out at chkmt(bugs.sba)

The other way is true as well. If a site only has SBA data, it errors out at chkmt(bugs.d)

I attached example excel files. it is a site that has only diatom data. So you can download these files and run ASCI with them and maybe you will be able to see the issue.

onlydiatom-taxa.xlsx
onlydiatom-station.xlsx

Fix non-ASCII characters in rdata files for build

devtools::check() returns the following:

  • checking data for non-ASCII characters ... WARNING
    Warning: found non-ASCII strings
    'Terpsino% musica' in object 'pmmilkup'
    'Anabaenaf' in object 'pmmilkup'
    'ChlorotetraTdron' in object 'pmmilkup'

Version number

Version number is still 1.1.1. Need to update to 2.1.x....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.