The asci's discuss from sccwrp

Verify XerMtn and CondQR50 only calculate if missing in input

Verify that code uses XerMtn and CondQR50 if they are supplied by the user. If absent, code calculates them.

Update output example for demo data in readme

Rewrite checks for XerMtn and PSA6C

I want to re-do some code in calcgis.R in the ASCI package.

Basically, the idea is this:
The ASCI function needs XerMtn.
If the station data that they gave the function is missing XerMtn, then XerMtn may be calculated from PSA6C.

PSA6C values must be one of these
PSA6C_values <- c(
'NC','North Coast',
'DM','Deserts Modoc',
'CH','Chaparral',
'CV','Central Valley',
'SC','South Coast',
'SN','Sierra Nevada'
)

If the value is the long name, it must be converted to the abbreviation.

code to derive XerMtn from PSA6C is already in calcgis.R. It is very straightforward

XerMtn should be either a 0 or a 1.

There is much to derive from this comment, but that will have to be figured out and put into the check

Convert Zeros to NAs

If BAResult or Result column has a "0", convert to NA before proceeding with calculations.

Add more informative percent attributed

Right now percent attributed is not very informative, since it is essentially the same as the metric itself (e.g. proportion BCG12 taxa). What would be more useful is the percent attributed to reflect the percentage of total taxa with any version of the attribute (e.g. BCG 1,2,3,4,5,6).

document S4 classes and methods

use S4 methods for asci objects

this will make the attributes more transparent

taxrmv not found

In the end of chkinp.R, in the master branch,

Notice that taxrmv is only defined when there are differences between the taxa finalid column and the STE finalid column

What if all finalIDs in the taxa dataframe are in the STE Dataframe? Then the program will crash

It happened to me. Don't want to fix it on the master branch without permission

create asci s4 method for tabular summary

This should report accuracy, precision, and responsiveness of each index, will require NULL data as slots

Changes to oescr and pmmiscr functions

Some of these are long term, and others are quickies:

Drop null outputs
Drop BC outputs (or make that a toggle option, default to suppressed)
Add percentiles: round(pnorm(OoverE, mean, sd), digits=2)
Let's go for some consistency with the CSCI reporting names (e.g., call this a "core" report, rather than "Scores"
We need a few more reports added:
4.a: A site by group probability matrix (like CSCI's Suppl1_grps report)
4.b: A sample-otu-count-captureprob flat file (like CSCI's Suppl1_oe report) (use the same field names to the extent possible)
Add StationCode and SampleID as variables to each relevant report
For the pMMI outputs, we need each metric to have raw, predicted, and scored values (lke CSCI's Suppl1_mmi report)
Combine get the OE and MMI results integrated into a single report. Have user provide options on which indices to return
Because there's a subsampling step, we should return the subsampled results
8a. Because there's a subsampling step, we may want to follow the CSCI's example and do this iteratively (and return the appropriate results). This is a Susie decision, I think, but it would be good for her to know if that's a huge pain in the butt.
8b: Because there's a subsampling option, we should give the user the ability to set a seed. (Or we can prompt them in documentation)
We'll need to think about appropriate QA, like we have for the CSCI. Again, a Susie decision. But here are a few things I think are certain:
-Diatom count
-SBA count
-% ambiguous individuals (for O/E)
-% ambiguous taxa (for O/E)
In general, we should think about the kinds of error-checking we need, with the intention of creating informative error messages. Some examples:
-Checking that you have all the required fields in the inputs
-Checking that all sites in the taxonomy data show up in the stations data.
-Making the taxonomic data case-insensitive

Ok, that's all for now.

StationCode, SampleDate, Replicate Columns in output

The StationCode, SampleDate, and Replicate columns are not being generated correctly in the output of the ASCI function

Descriptive PSA6C entries

From a beta-tester: "PSA6C is populated with text values (e.g., Chaparral, Central Valley) rather than abbreviations (e.g., CH, CV) when coming out of Tyler's tool. I suggest adding the text values to your crosswalk or conversion table so XerMtn can handle both options since there will be historic and future metrics calculated."

NC==North Coast
DM==Deserts Modoc
CH==Chaparral
CV==Central Valley
SC==South Coast
SN==Sierra Nevada

Option to export long format

SMC-ify function that can be run separate from ASCI function to make long format.

Keep output fields consistent even with missing data

Request to keep all output fields consistent even when one assemblage (diatoms/SBA) is missing. Compare results from taxonomy files OneAssemblage.csv and TwoAssemblage.csv to illustrate issue.

Crashes on Diatoms and Hybrid when there is only one site

fix tax argument in asci function

It works now but all indices are calculated and results are filtered. It should suppress calculation on the back end.

Handling missing info

Request from RB9:

Ability to have not a perfect match between the GIS and taxa data -- i.e. not have to have GIS file match all taxa file sample IDs.
Run scripts when there is missing taxonomy data (e.g. diatom but no sba data submitted), just add flags.

check some calculations for reformatted output

Line number links are below, provided by description of what it's supposed to do:

Diatom, soft-bodied and algae data subsets, are we subsetting these correctly (i.e., diatoms are integrated, etc.)
- diatoms
  
  ASCI/R/mmifun.R
  
  Line 50 in 6ccc821
  
  SampleTypeCode == 'Integrated'
- soft-bodied
  
  ASCI/R/mmifun.R
  
  Line 57 in 6ccc821
  
  SampleTypeCode != 'Integrated'
- hybrid
  
  ASCI/R/mmifun.R
  
  Line 68 in 6ccc821
  
  smpid <- intersect(bugs.d$SampleID, bugs.sba$SampleID)
Percent attributed taxa relevant for each metric
- diatoms
  
  ASCI/R/mmifun.R
  
  Line 118 in 6ccc821
  
  mutate(
- soft-bodied
  
  ASCI/R/mmifun.R
  
  Line 135 in 6ccc821
  
  pcnt.attributed.BCG5 = cnt.spp.BCG5/richness,
- hybrid
  
  ASCI/R/mmifun.R
  
  Line 149 in 6ccc821
  
  pcnt.attributed.HiTolerance = cnt.ind.most.tol/richness,
Check biovolume and sample counts
- diatoms
  
  ASCI/R/ASCI.R
  
  Line 75 in 6ccc821
  
  D_ValveCount = sum(BAResult, na.rm = T)
- soft-bodied
  
  ASCI/R/ASCI.R
  
  Line 83 in 6ccc821
  
  S_EntityCount = sum(BAResult, na.rm = T),
Check percent unrecognized taxa calculations

ASCI/R/ASCI.R

Line 85 in 6ccc821

UnrecognizedTaxa = paste0(setdiff(FinalID, STE$FinalID), collapse = '|')

remove sqldf dependency in pmmi_calcmetrics

replace with tidyr

Taxa counts are counting taxa twice if in two different SampleTypeCodes

We need to make Taxa Counts (S_NumberTaxa,D_NumberTaxa, etc) occur on a dataframe that does not include SampleTypeCode. e.g:

Right now:
Epiphyte Species A
Microalgae Species A
Microalgae Species B
S_NumberTaxa == 3

It should be:
Epiphyte Species A
Microalgae Species A
Microalgae Species B
S_NumberTaxa == 2

prop.spp.ZHR

In mmifun.R, there were many concerning things I recently saw
There was one metric, prop.spp.ZHR which gets calculated via mmicalmetrics, that shows up in mmifun, in the sba.metrics and hybrid.metrics dataframes.

The sba.metrics and hybrid.metrics dataframes had mutate functions on them that were making an assumption that they had a column named prop.spp.ZHR_raw, when in fact it was just prop.spp.ZHR

Case insensitive for all metadata headers

From a beta tester: "The ASCI calculator did not recognize 'UCS_mean', which is the output from Tyler's tool. I changed it to 'UCS_Mean' and it worked"

Would be great to allow for case insensitive metadata headers to be accepted (e.g. UCS_mean, UCS_MEAN, UCS_MeAN, etc).

reformat results output

see file at "Z:/MarcusBeck/Copy of Rafi's Q&D guide to ASCI report_v2.xls" and meeting notes from 8/6

Accept PSA6, PSA6c, PSA6C

Input stations data may have "PSA6", PSA6c", or "PSA6C" and they should all be accepted and used for the XerMtn calculation.

Error in as.POSIXct.numeric(value) Origin must be supplied

There is an issue with the package where if you try to run ASCI on a site that has only SBA or only diatom data, it errors out in mmifun.R

I figured this out by running ASCI on a site that had only diatom data, and then did traceback()

Specifically there is a function called chkmt that gets defined within the mmifun.R, and this is the place where it errors out at.

Apparently, if a site has only diatom data, it errors out at chkmt(bugs.sba)

The other way is true as well. If a site only has SBA data, it errors out at chkmt(bugs.d)

I attached example excel files. it is a site that has only diatom data. So you can download these files and run ASCI with them and maybe you will be able to see the issue.

onlydiatom-taxa.xlsx
onlydiatom-station.xlsx

Negative diatom valve counts

When a sample has no "integrated" fraction, diatom valve count should equal 0. Instead, the ASCI calculator is returning negative values. This section of code likely needs to be modified (sorry I can't edit pipes!). Probably lines 68-74 in ASCI.R.

does a seed need to be set?

In the commit a05ab21, the file OE.caret.load.and.source.R was removed but it included a call to set.seed. Is this required or an artifact from index dev?

Add output descriptions

Add descriptions for output columns to ReadMe

The build is not passing

It has something to do with a warning message about joining character vectors and factors

NULL randomForest object

In the "robert" branch this shows up in mmifun.R on line 292, (as of now)

hybrid.predmet <- stationid %>%
mutate(
# hybrid.cnt.spp.IndicatorClass_TP_high is supposed to be a randomForest model object thing
# However, right now it is saying that it is NULL........
cnt.spp.IndicatorClass_TP_high_pred = predict(rfmods$hybrid.cnt.spp.IndicatorClass_TP_high, newdata = .[, c("PPT_00_09", "KFCT_AVE")]),

that rfmods$hybrid.cnt.spp.IndicatorClass_TP_high is a randomForest object type of thing, but for some reason it is coming up as NULL and the script freaks out and says that the predict function was expecting a randomForest thing, but it got a NULL object

Calculate Hybrid even if they are missing either diatom or SBA

Susie wants H_ scores to be calculated even if they are missing one of the types of Algae. Rather, an option to a user to let it calculate Hybrid with missing assemblage, or not.

stringr package not loading

Users are reporting error bc stringr package is not loading correctly and therefore str_trim is not working.

S_EntityCount calculations should ignore diatom taxa

I believe the S_EntityCount calculations (Lines 77-84 in ASCI.R script) does not ignore any diatom taxa – it should. The way to modify this would be to drop any taxa with Phylum == Bacillariophyta before summing the BAResult column.

From an email from Susie

Achnanthes subhudsonis var kraeuselii

Achnanthes subhudsonis var kraeuselii should be Achnanthidium subhudsonis var kraeuselii (as it appears in MTL)

Run ASCI with either diatoms or SBA alone

Allow users to submit single assemblage and calculate scores. Include flag.

SBA data submitted, but result value is Zero

Susie found a bug that if they have Soft Body data, but results are all zero, the organisms still get counted in the count of S_NumberTaxa (not H_NumberTaxa though...)

Add GIS metric inclusion

There's one metric that needs some point-based GIS data, @stheroux let us know when you're ready to pull the trigger on that!

Version number

Version number is still 1.1.1. Need to update to 2.1.x....

Demo data not running

Getting the following error message when running the demo data:
Error in prop.spp.BCG12/richness :
non-numeric argument to binary operator

Thanks!

Fix non-ASCII characters in rdata files for build

devtools::check() returns the following:

checking data for non-ASCII characters ... WARNING
Warning: found non-ASCII strings
'Terpsino% musica' in object 'pmmilkup'
'Anabaenaf' in object 'pmmilkup'
'ChlorotetraTdron' in object 'pmmilkup'

Check calculation of pmmi metric for proportion species at least do_50

c8c0db1 This metric was previously returning a warning because of incorrect logical comparison.

Issue with devtools

From Marco: I was running R 3.4.1 but updated to R 3.5.1 because the ASCI calculator says I need R 3.5.0 or later. My problem, though, is I cannot install 'devtools'. I have spent hours searching the internet to try to solve my problem and Heili was unable to help. Have you had any issues updating? Some of the packages (e.g., processx, callr, cachem, memoise) have a binary version earlier than the source version, and the ASCI package requires the later source version but my system will not update accordingly. I am including some screen shots so you can see some of the errors. If I cannot figure this out, I cannot calculate ASCI for Andy. Any help would be much appreciated.

User has SBA names in an “integrated” fraction
User has diatom names in “microalgae” or “macroalgae” or “qualitative” fractions
The ASCI output should only return D_ASCI if only “integrated” fraction is submitted (no H_ASCI calculated). Also, only S_ASCI calculated if only “microalgae” and/or ”macroalgae” fractions are submitted. H_ASCI only returned if BOTH “integrated” + micro/macroalgae fractions are submitted.

Rename NumberTaxa

I noticed another concerning thing in mmifun.R

There is a part where it is doing some stuff with hybrid.metrics and sba.metrics and it is saying

%>%
rename(
#NumberTaxa = richness_raw
NumberTaxa = 10
)

It didn't like this at all, saying that "there are only 9 columns"

As a temporary "fix" I commented it out and it gets past that part

change checkinput for diatoms

change filter(Class %in% 'Bacillariophyceae') to Phylum== 'Bacillariophyta'

sccwrp / asci Goto Github PK

asci's Issues

Recommend Projects

Recommend Topics

Recommend Org