Coder Social home page Coder Social logo

asci's Issues

Rewrite checks for XerMtn and PSA6C

I want to re-do some code in calcgis.R in the ASCI package.

Basically, the idea is this:
The ASCI function needs XerMtn.
If the station data that they gave the function is missing XerMtn, then XerMtn may be calculated from PSA6C.

PSA6C values must be one of these
PSA6C_values <- c(
'NC','North Coast',
'DM','Deserts Modoc',
'CH','Chaparral',
'CV','Central Valley',
'SC','South Coast',
'SN','Sierra Nevada'
)

If the value is the long name, it must be converted to the abbreviation.

code to derive XerMtn from PSA6C is already in calcgis.R. It is very straightforward

XerMtn should be either a 0 or a 1.

There is much to derive from this comment, but that will have to be figured out and put into the check

Convert Zeros to NAs

If BAResult or Result column has a "0", convert to NA before proceeding with calculations.

Add more informative percent attributed

Right now percent attributed is not very informative, since it is essentially the same as the metric itself (e.g. proportion BCG12 taxa). What would be more useful is the percent attributed to reflect the percentage of total taxa with any version of the attribute (e.g. BCG 1,2,3,4,5,6).

taxrmv not found

In the end of chkinp.R, in the master branch,

Notice that taxrmv is only defined when there are differences between the taxa finalid column and the STE finalid column

What if all finalIDs in the taxa dataframe are in the STE Dataframe? Then the program will crash

It happened to me. Don't want to fix it on the master branch without permission

Changes to oescr and pmmiscr functions

Some of these are long term, and others are quickies:

  1. Drop null outputs
  2. Drop BC outputs (or make that a toggle option, default to suppressed)
  3. Add percentiles: round(pnorm(OoverE, mean, sd), digits=2)
  4. Let's go for some consistency with the CSCI reporting names (e.g., call this a "core" report, rather than "Scores"
  5. We need a few more reports added:
    4.a: A site by group probability matrix (like CSCI's Suppl1_grps report)
    4.b: A sample-otu-count-captureprob flat file (like CSCI's Suppl1_oe report) (use the same field names to the extent possible)
  6. Add StationCode and SampleID as variables to each relevant report
  7. For the pMMI outputs, we need each metric to have raw, predicted, and scored values (lke CSCI's Suppl1_mmi report)
  8. Combine get the OE and MMI results integrated into a single report. Have user provide options on which indices to return
  9. Because there's a subsampling step, we should return the subsampled results
    8a. Because there's a subsampling step, we may want to follow the CSCI's example and do this iteratively (and return the appropriate results). This is a Susie decision, I think, but it would be good for her to know if that's a huge pain in the butt.
    8b: Because there's a subsampling option, we should give the user the ability to set a seed. (Or we can prompt them in documentation)
  10. We'll need to think about appropriate QA, like we have for the CSCI. Again, a Susie decision. But here are a few things I think are certain:
    -Diatom count
    -SBA count
    -% ambiguous individuals (for O/E)
    -% ambiguous taxa (for O/E)
  11. In general, we should think about the kinds of error-checking we need, with the intention of creating informative error messages. Some examples:
    -Checking that you have all the required fields in the inputs
    -Checking that all sites in the taxonomy data show up in the stations data.
    -Making the taxonomic data case-insensitive

Ok, that's all for now.

Descriptive PSA6C entries

From a beta-tester: "PSA6C is populated with text values (e.g., Chaparral, Central Valley) rather than abbreviations (e.g., CH, CV) when coming out of Tyler's tool. I suggest adding the text values to your crosswalk or conversion table so XerMtn can handle both options since there will be historic and future metrics calculated."

NC==North Coast
DM==Deserts Modoc
CH==Chaparral
CV==Central Valley
SC==South Coast
SN==Sierra Nevada

Handling missing info

Request from RB9:

  1. Ability to have not a perfect match between the GIS and taxa data -- i.e. not have to have GIS file match all taxa file sample IDs.
  2. Run scripts when there is missing taxonomy data (e.g. diatom but no sba data submitted), just add flags.

check some calculations for reformatted output

Line number links are below, provided by description of what it's supposed to do:

  • Diatom, soft-bodied and algae data subsets, are we subsetting these correctly (i.e., diatoms are integrated, etc.)
  • Percent attributed taxa relevant for each metric
  • Check biovolume and sample counts
  • Check percent unrecognized taxa calculations

    ASCI/R/ASCI.R

    Line 85 in 6ccc821

    UnrecognizedTaxa = paste0(setdiff(FinalID, STE$FinalID), collapse = '|')

Taxa counts are counting taxa twice if in two different SampleTypeCodes

We need to make Taxa Counts (S_NumberTaxa,D_NumberTaxa, etc) occur on a dataframe that does not include SampleTypeCode. e.g:

Right now:
Epiphyte Species A
Microalgae Species A
Microalgae Species B
S_NumberTaxa == 3

It should be:
Epiphyte Species A
Microalgae Species A
Microalgae Species B
S_NumberTaxa == 2

prop.spp.ZHR

In mmifun.R, there were many concerning things I recently saw
There was one metric, prop.spp.ZHR which gets calculated via mmicalmetrics, that shows up in mmifun, in the sba.metrics and hybrid.metrics dataframes.

The sba.metrics and hybrid.metrics dataframes had mutate functions on them that were making an assumption that they had a column named prop.spp.ZHR_raw, when in fact it was just prop.spp.ZHR

Case insensitive for all metadata headers

From a beta tester: "The ASCI calculator did not recognize 'UCS_mean', which is the output from Tyler's tool. I changed it to 'UCS_Mean' and it worked"

Would be great to allow for case insensitive metadata headers to be accepted (e.g. UCS_mean, UCS_MEAN, UCS_MeAN, etc).

reformat results output

see file at "Z:/MarcusBeck/Copy of Rafi's Q&D guide to ASCI report_v2.xls" and meeting notes from 8/6

Accept PSA6, PSA6c, PSA6C

Input stations data may have "PSA6", PSA6c", or "PSA6C" and they should all be accepted and used for the XerMtn calculation.

Error in as.POSIXct.numeric(value) Origin must be supplied

There is an issue with the package where if you try to run ASCI on a site that has only SBA or only diatom data, it errors out in mmifun.R

I figured this out by running ASCI on a site that had only diatom data, and then did traceback()

Specifically there is a function called chkmt that gets defined within the mmifun.R, and this is the place where it errors out at.

Apparently, if a site has only diatom data, it errors out at chkmt(bugs.sba)

The other way is true as well. If a site only has SBA data, it errors out at chkmt(bugs.d)

I attached example excel files. it is a site that has only diatom data. So you can download these files and run ASCI with them and maybe you will be able to see the issue.

onlydiatom-taxa.xlsx
onlydiatom-station.xlsx

Negative diatom valve counts

When a sample has no "integrated" fraction, diatom valve count should equal 0. Instead, the ASCI calculator is returning negative values. This section of code likely needs to be modified (sorry I can't edit pipes!). Probably lines 68-74 in ASCI.R.

does a seed need to be set?

In the commit a05ab21, the file OE.caret.load.and.source.R was removed but it included a call to set.seed. Is this required or an artifact from index dev?

NULL randomForest object

In the "robert" branch this shows up in mmifun.R on line 292, (as of now)

hybrid.predmet <- stationid %>%
mutate(
# hybrid.cnt.spp.IndicatorClass_TP_high is supposed to be a randomForest model object thing
# However, right now it is saying that it is NULL........
cnt.spp.IndicatorClass_TP_high_pred = predict(rfmods$hybrid.cnt.spp.IndicatorClass_TP_high, newdata = .[, c("PPT_00_09", "KFCT_AVE")]),

that rfmods$hybrid.cnt.spp.IndicatorClass_TP_high is a randomForest object type of thing, but for some reason it is coming up as NULL and the script freaks out and says that the predict function was expecting a randomForest thing, but it got a NULL object

stringr package not loading

Users are reporting error bc stringr package is not loading correctly and therefore str_trim is not working.

S_EntityCount calculations should ignore diatom taxa

I believe the S_EntityCount calculations (Lines 77-84 in ASCI.R script) does not ignore any diatom taxa – it should. The way to modify this would be to drop any taxa with Phylum == Bacillariophyta before summing the BAResult column.

  • From an email from Susie

Version number

Version number is still 1.1.1. Need to update to 2.1.x....

Demo data not running

Getting the following error message when running the demo data:
Error in prop.spp.BCG12/richness :
non-numeric argument to binary operator

Thanks!

Fix non-ASCII characters in rdata files for build

devtools::check() returns the following:

  • checking data for non-ASCII characters ... WARNING
    Warning: found non-ASCII strings
    'Terpsino% musica' in object 'pmmilkup'
    'Anabaenaf' in object 'pmmilkup'
    'ChlorotetraTdron' in object 'pmmilkup'

Issue with devtools

From Marco: I was running R 3.4.1 but updated to R 3.5.1 because the ASCI calculator says I need R 3.5.0 or later. My problem, though, is I cannot install 'devtools'. I have spent hours searching the internet to try to solve my problem and Heili was unable to help. Have you had any issues updating? Some of the packages (e.g., processx, callr, cachem, memoise) have a binary version earlier than the source version, and the ASCI package requires the later source version but my system will not update accordingly. I am including some screen shots so you can see some of the errors. If I cannot figure this out, I cannot calculate ASCI for Andy. Any help would be much appreciated.
1image
11image

Update STE

Need to update STE to include new Final IDs from SWAMP/CEDEN and also a feature to accept new and old taxa names and calculate scores with appropriate trait attributes.

Error messages for single assemblage or wrong assemblage

The package should return an error message if:

  1. User has SBA names in an “integrated” fraction
  2. User has diatom names in “microalgae” or “macroalgae” or “qualitative” fractions
  3. The ASCI output should only return D_ASCI if only “integrated” fraction is submitted (no H_ASCI calculated). Also, only S_ASCI calculated if only “microalgae” and/or ”macroalgae” fractions are submitted. H_ASCI only returned if BOTH “integrated” + micro/macroalgae fractions are submitted.

Rename NumberTaxa

I noticed another concerning thing in mmifun.R

There is a part where it is doing some stuff with hybrid.metrics and sba.metrics and it is saying

%>%
rename(
#NumberTaxa = richness_raw
NumberTaxa = 10
)

It didn't like this at all, saying that "there are only 9 columns"

As a temporary "fix" I commented it out and it gets past that part

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.