highamm / sptotal Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 2.0 11.08 MB

Package for Fitting Spatial Linear Models and Predicting Total Abundances for Finite Populations

License: GNU General Public License v2.0

R 97.21% TeX 2.79%

sptotal's Introduction

Hello 👋

I am an Assistant Professor of Statistics at St. Lawrence University in Canton, NY ❄️ ❄️

I graduated from Oregon State University in 2019 with a PhD in stats 🎓

I've worked on applications of spatial statistics in the ecological sciences 🌎, and I'm currently learning how to write better R packages for spatial and spatiotemporal analyses 📦

sptotal's People

Contributors

Stargazers

Watchers

Forkers

brycefrank adf-g-wildlife-conservation-region-iii

sptotal's Issues

code quality & test coverage

code quality checks by goodpractice::gp(".") return the following issues:

Preparing: covr
Preparing: cyclocomp
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
── GP sptotal ──────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in
    general. 43% of code lines are covered by test cases.

    R/AIC.slmfit.R:17:NA
    R/AIC.slmfit.R:18:NA
    R/coef.slmfit.R:14:NA
    R/coef.slmfit.R:15:NA
    R/coef.slmfit.R:16:NA
    ... and 503 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it gets
    invalid quite often. A build date will be added to the package when
    you perform `R CMD build` on it.
  ✖ use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easier
    to read your code for them if you use '<-'.

    inst/scripts/create_simulated_data.r:4:4
    inst/scripts/create_simulated_data.r:12:5
    inst/scripts/create_simulated_data.r:18:4
    inst/scripts/create_simulated_data.r:20:4
    inst/scripts/create_simulated_data.r:22:4
    ... and 22 more lines

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/AIC.slmfit.R:5:81
    R/AIC.slmfit.R:6:81
    R/data.R:27:81
    R/data.R:32:81
    R/data.R:39:81
    ... and 114 more lines

  ✖ avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/slmfit.R:214:15
    R/slmfit.R:216:17
    R/slmfit.R:216:40

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...)
    and 1:NCOL(...) expressions. They are error prone and result 1:0 if
    the expression on the right hand side is zero. Use seq_len() or
    seq_along() instead.

    inst/scripts/create_simulated_data.r:38:58
    inst/scripts/create_simulated_data.r:41:58
    R/utils.R:101:15
    vignettes/sptotal-vignette.Rmd:178:17
    vignettes/sptotal-vignette.Rmd:471:21

  ✖ not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the
    specific functions you need.

increasing test coverage, replacing 1: iterators, and cleaning up/reducing the imports (if possible) seem important to me, the rest is mroe or less cosmetic...

Paper comments

Paper looks great! Some feedback from my review (all from the statement of need section):

What assumptions are made on the distribution of epsilon? I would assume it's normal, but might not be clear for someone new to spatial stats.

Additionally, you denote that y has N locations but don't use N anywhere else. You could use N to clarify some of the model terms, for example, R is N x N, providing spatial correlation from each of the N locations to all others. (similar to the above point, it may help someone who is familiar to linear models but not spatial stats)

Otherwise, everything reads clear to me and I learned quite a bit about what the package does. It's clear the advantages this method has over other available approaches.

LatLon Assumption

In other statistical packages I have used (i.e. geoR) it is left to the user to define and handle the coordinate system. I see here we make the assumption that they are already Lat/Lon:

sptotal/R/slmfit.R

Lines 84 to 95 in e0f7f50

    
           ## ASSUME that coordinates are lat/lon. Convert these to UTM 
        
           if (coordtype != "LatLon" & coordtype != "UTM") { 
        
             stop("coordtype must be a character string LatLon or UTM") 
        
           } else if (coordtype == "LatLon") { 
        
             xcoordsUTM <- LLtoUTM(cm = base::mean(datanomiss[ ,xcoordcol]), 
        
               lat = datanomiss[ ,ycoordcol], lon = datanomiss[ ,xcoordcol])$xy[ ,1] 
        
             ycoordsUTM <- LLtoUTM(cm = base::mean(datanomiss[ ,xcoordcol]), 
        
               lat = datanomiss[ ,ycoordcol], lon = datanomiss[ ,xcoordcol])$xy[ ,2] 
        
           } else if (coordtype == "UTM") { 
        
             xcoordsUTM <- datanomiss[ ,xcoordcol] 
        
             ycoordsUTM <- datanomiss[ ,ycoordcol] 
        
           }

This has been a point of discussion in meetings. I would rather not assume anything about the data structure, and leave it to the user to reproject. Or, if we are wedded the the reprojecting idea, we need to at least print a warning when slmfit runs.

Error on `slmfit` without factor

I am writing some tests on my own branch, so not sure if this is just me. The following fit using the simdata throws an error:

fit <- sptotal::slmfit(Z ~ X1 + X2 + X3 + X4+ X5 + X6 + X7, data=simdata, xcoordcol = "x", ycoordcol="y")

The error:

Error in max(sapply(datapredsonly[, sapply(datapredsonly, is.factor)],  : 
  invalid 'type' (list) of argument

Fitting again, but with a factor variable resolves the issue:

fit <- sptotal::slmfit(Z ~ X1 + X2 + X3 + X4+ X5 + X6 + X7 + F1, data=simdata, xcoordcol = "x", ycoordcol="y")

slmfit fails on simulated data

Matt,

Like we discussed earlier today. Leaving this issue here for tracking.

The following simulation lead to an error using slmfit. Here is the code:

pop197 <- read.csv('sims/h10_l10/197.csv')
slmfit(z~x+y, xcoordcol="x", ycoordcol="y", data=pop197, CorModel="Exponential")

However, it does not fail using "Spherical"

pop197 <- read.csv('sims/h10_l10/197.csv')
slmfit(z~x+y, xcoordcol="x", ycoordcol="y", data=pop197, CorModel="Spherical")

Nor does it fail for maximum likelihood:

pop197 <- read.csv('sims/h10_l10/197.csv')
slmfit(z~x+y, xcoordcol="x", ycoordcol="y", data=pop197, estmethod = "ML")

The csv is attached:
197.csv.zip

`FPBKOutput` `get_variogram` Assumes that coordinates are named "xcoords" and "ycoords"

sptotal/R/FPBKoutput.R

Line 90 in 21870c4

locations = ~ xcoords + ycoords,

highamm / sptotal Goto Github PK

sptotal's Introduction

Hello 👋

sptotal's People

Contributors

Stargazers

Watchers

Forkers

sptotal's Issues

code quality & test coverage

Paper comments

LatLon Assumption

Error on `slmfit` without factor

slmfit fails on simulated data

`FPBKOutput` `get_variogram` Assumes that coordinates are named "xcoords" and "ycoords"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	## ASSUME that coordinates are lat/lon. Convert these to UTM
	if (coordtype != "LatLon" & coordtype != "UTM") {
	stop("coordtype must be a character string LatLon or UTM")
	} else if (coordtype == "LatLon") {
	xcoordsUTM <- LLtoUTM(cm = base::mean(datanomiss[ ,xcoordcol]),
	lat = datanomiss[ ,ycoordcol], lon = datanomiss[ ,xcoordcol])$xy[ ,1]
	ycoordsUTM <- LLtoUTM(cm = base::mean(datanomiss[ ,xcoordcol]),
	lat = datanomiss[ ,ycoordcol], lon = datanomiss[ ,xcoordcol])$xy[ ,2]
	} else if (coordtype == "UTM") {
	xcoordsUTM <- datanomiss[ ,xcoordcol]
	ycoordsUTM <- datanomiss[ ,ycoordcol]
	}