Coder Social home page Coder Social logo

sptotal's Introduction

Hello ๐Ÿ‘‹

I am an Assistant Professor of Statistics at St. Lawrence University in Canton, NY โ„๏ธ โ„๏ธ

I graduated from Oregon State University in 2019 with a PhD in stats ๐ŸŽ“

I've worked on applications of spatial statistics in the ecological sciences ๐ŸŒŽ, and I'm currently learning how to write better R packages for spatial and spatiotemporal analyses ๐Ÿ“ฆ

sptotal's People

Contributors

brycefrank avatar highamm avatar jayverhoef avatar michaeldumelle avatar pmaurogut avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sptotal's Issues

code quality & test coverage

code quality checks by goodpractice::gp(".") return the following issues:

Preparing: covr
Preparing: cyclocomp
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
โ”€โ”€ GP sptotal โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

It is good practice to

  โœ– write unit tests for all functions, and all package code in
    general. 43% of code lines are covered by test cases.

    R/AIC.slmfit.R:17:NA
    R/AIC.slmfit.R:18:NA
    R/coef.slmfit.R:14:NA
    R/coef.slmfit.R:15:NA
    R/coef.slmfit.R:16:NA
    ... and 503 more lines

  โœ– omit "Date" in DESCRIPTION. It is not required and it gets
    invalid quite often. A build date will be added to the package when
    you perform `R CMD build` on it.
  โœ– use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easier
    to read your code for them if you use '<-'.

    inst/scripts/create_simulated_data.r:4:4
    inst/scripts/create_simulated_data.r:12:5
    inst/scripts/create_simulated_data.r:18:4
    inst/scripts/create_simulated_data.r:20:4
    inst/scripts/create_simulated_data.r:22:4
    ... and 22 more lines

  โœ– avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/AIC.slmfit.R:5:81
    R/AIC.slmfit.R:6:81
    R/data.R:27:81
    R/data.R:32:81
    R/data.R:39:81
    ... and 114 more lines

  โœ– avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/slmfit.R:214:15
    R/slmfit.R:216:17
    R/slmfit.R:216:40

  โœ– avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...)
    and 1:NCOL(...) expressions. They are error prone and result 1:0 if
    the expression on the right hand side is zero. Use seq_len() or
    seq_along() instead.

    inst/scripts/create_simulated_data.r:38:58
    inst/scripts/create_simulated_data.r:41:58
    R/utils.R:101:15
    vignettes/sptotal-vignette.Rmd:178:17
    vignettes/sptotal-vignette.Rmd:471:21

  โœ– not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the
    specific functions you need.

increasing test coverage, replacing 1: iterators, and cleaning up/reducing the imports (if possible) seem important to me, the rest is mroe or less cosmetic...

Paper comments

Paper looks great! Some feedback from my review (all from the statement of need section):

What assumptions are made on the distribution of epsilon? I would assume it's normal, but might not be clear for someone new to spatial stats.

Additionally, you denote that y has N locations but don't use N anywhere else. You could use N to clarify some of the model terms, for example, R is N x N, providing spatial correlation from each of the N locations to all others. (similar to the above point, it may help someone who is familiar to linear models but not spatial stats)

Otherwise, everything reads clear to me and I learned quite a bit about what the package does. It's clear the advantages this method has over other available approaches.

LatLon Assumption

In other statistical packages I have used (i.e. geoR) it is left to the user to define and handle the coordinate system. I see here we make the assumption that they are already Lat/Lon:

sptotal/R/slmfit.R

Lines 84 to 95 in e0f7f50

## ASSUME that coordinates are lat/lon. Convert these to UTM
if (coordtype != "LatLon" & coordtype != "UTM") {
stop("coordtype must be a character string LatLon or UTM")
} else if (coordtype == "LatLon") {
xcoordsUTM <- LLtoUTM(cm = base::mean(datanomiss[ ,xcoordcol]),
lat = datanomiss[ ,ycoordcol], lon = datanomiss[ ,xcoordcol])$xy[ ,1]
ycoordsUTM <- LLtoUTM(cm = base::mean(datanomiss[ ,xcoordcol]),
lat = datanomiss[ ,ycoordcol], lon = datanomiss[ ,xcoordcol])$xy[ ,2]
} else if (coordtype == "UTM") {
xcoordsUTM <- datanomiss[ ,xcoordcol]
ycoordsUTM <- datanomiss[ ,ycoordcol]
}

This has been a point of discussion in meetings. I would rather not assume anything about the data structure, and leave it to the user to reproject. Or, if we are wedded the the reprojecting idea, we need to at least print a warning when slmfit runs.

Error on `slmfit` without factor

I am writing some tests on my own branch, so not sure if this is just me. The following fit using the simdata throws an error:

fit <- sptotal::slmfit(Z ~ X1 + X2 + X3 + X4+ X5 + X6 + X7, data=simdata, xcoordcol = "x", ycoordcol="y")

The error:

Error in max(sapply(datapredsonly[, sapply(datapredsonly, is.factor)],  : 
  invalid 'type' (list) of argument

Fitting again, but with a factor variable resolves the issue:

fit <- sptotal::slmfit(Z ~ X1 + X2 + X3 + X4+ X5 + X6 + X7 + F1, data=simdata, xcoordcol = "x", ycoordcol="y")

slmfit fails on simulated data

Matt,

Like we discussed earlier today. Leaving this issue here for tracking.

The following simulation lead to an error using slmfit. Here is the code:

pop197 <- read.csv('sims/h10_l10/197.csv')
slmfit(z~x+y, xcoordcol="x", ycoordcol="y", data=pop197, CorModel="Exponential")

However, it does not fail using "Spherical"

pop197 <- read.csv('sims/h10_l10/197.csv')
slmfit(z~x+y, xcoordcol="x", ycoordcol="y", data=pop197, CorModel="Spherical")

Nor does it fail for maximum likelihood:

pop197 <- read.csv('sims/h10_l10/197.csv')
slmfit(z~x+y, xcoordcol="x", ycoordcol="y", data=pop197, estmethod = "ML")

The csv is attached:
197.csv.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.