r-glennie / occur Goto Github PK

Occupancy modelling with generalized additive models in Template Model Builder.

License: MIT License

R 84.27% Makefile 0.56% C 3.67% C++ 11.50%

occur's Introduction

I am a Research Fellow in Statistics at the University of St Andrews. I work on ecological problems, e.g., estimating the size of wild animal populations, quantifying how populations change over time, or finding out how animals move around.

I am actively developing software to implement my research methods, e.g. r-glennie/openpopscr and r-glennie/moveds.

Full details on my current research projects can be found at www.richardglennie.co.uk.

I can be contacted by email [email protected] or on twitter @richard_glennie if you'd like to discuss a research project. If you have any questions or issues with code in my repositories (which is without warranty) then please open an issue in the corresponding github repository.

occur's People

Contributors

Stargazers

Watchers

Forkers

patchcervan

occur's Issues

Is scaling variables recommended for fitting purposes?

I've noticed that occuR seems to fit models without much trouble even when covariates are in very different scales. Does the package scale them "under the hood"? Would it be recommended to scale/center covariates anyway? In my case, I am most interested in prediction rather than causal inference.

installation

Hi Richard,

This looks like a super useful package!! I have been trying to fit occupancy models with splines in Stan and Jags - but its slow.

However, I have a problem. I am trying to install the package on our HPC server (since I have lots of species) but it isnt working. I discussed with our IT guys and they said:

"The problem is, that for whatever reason, this R package is trying to install itself to where R is installed, not to your home directory. That's why you get the error about not having write permissions to the location where R is installed.
I even tried from a clean slate, i.e. I removed my ~/R directory. Normally, when using install_packages or install_github, R will ask you where to install the packages, and suggests ~/R if you don't have it already. That's not the case with this R package and that's why it's broken.
"

Any ideas of how to fix this?

Cheers,

Diana

Fatal error with simulated data

Hi, I've written a function to simulate data involving multiple species, all declining in occupancy over time. The data it's outputting looks correct to me, but when I try to fit the model to it I get a "fatal error session aborted" message. It outputs "Constructing atomic D_lgamma" just before it aborts.

When I have

set.seed(1234)

It runs fine, but if I change it to a different number (for example 123) I get the error.

Here's all the code (sorry, I can't figure out how to just upload the simulated data).

set.seed(1234)
library(secr)
library(occuR)
library(data.table)
library(dplyr)

library(secr)
library(occuR)
library(data.table)
library(dplyr)

# Function: multispecies_sim
# Purpose: Simulate hoverfly data with multiple species, absences only when other species are detected
# All species have the same occupancy probability, but can have different detection probability
# Occupancy probably depends on habitat and occasion
# Inputs: nsites - number of sites
#         nspecies - number of species
#         noccasion - number of occasions
#         mean_visits - mean visits per site per occasion (generated as a random poisson)
#         p - vector of detection probability of each species
#         beta - [b1, b2, b3, b4] where where logit(psi) = b1 + b2*Urban + b3*Woodland + b4*occasion
#         species - species of interest (the one whose observations get returned) - later will make it so they all get returned
# Output: list(visit, site), visit = visit_data and site = site_data
#
multispecies_sim <- function(nsites, nspecies, noccasion, mean_visits, p, beta, species){
  
  # Random number of visits to each site in each occasion
  visits <- matrix(rpois(nsites*noccasion, mean_visits), nrow = noccasion, ncol = nsites)
  
  # Simulate some covariates 
  temp <- do.call("c", sapply(visits, FUN = function(v) {rnorm(v, 10, 5)}))
  
  habitat_options <- c("arable", "woodland", "urban")
  habitat <- sample(habitat_options, nsites, replace = TRUE)
  hab_list <- rep(habitat, each=noccasion)
  
  #occasion <- do.call("c", apply(visits, 2, FUN = function(v) {rep(1:noccasion, v)}))
  occasion <- rep(1:noccasion, nsites)

  # Occupancy probability
  psi <- rep(0.7, nspecies*noccasion)
  #psi <- invlogit(beta[1] + beta[2]*(hab_list == "urban") + beta[3]*(hab_list == "woodland") + beta[4]*occasion)

  # Whether sites are occupied or not
  occupied <- array(NA, dim=c(nsites*noccasion, nspecies))
  for(i in 1:nspecies){
    occupied[,i] <- rbinom(nsites*noccasion, 1, psi)
  } #Could this be done more efficiently?
  
  # Observations for each species, in order of site, then occasion, then visit
  rows <- rep(1:nrow(occupied), as.numeric(visits))
  obs <- apply(occupied[rows,], 1, FUN = function(x){rbinom(nspecies, 1, p*x)})
  #This switches the dimensions of the matrix - is it wrong?
  #obs <- matrix(NA, nrow = nsites*)
  #row_ticker <- 1
  #for(i in 1:nsites){
  #  for(j in 1:noccasion){
  #    if(visits[j, i] != 0){
  #      for(k in 1:visits[j,i]){
  #        detec <- occupied[((i-1)*noccasion + j),]*rbinom(nspecies,1,p)
  #        if(sum(detec) != 0){
  #          obs[row_ticker,] <- detec
  #          row_ticker <- row_ticker + 1
  #        }
  #      }
  #    }
  #  }
  #}
  
  tokeep <- which(colSums(obs) != 0)
  
  visit_data <- data.table(site = rep(1:nsites, each = noccasion),
                           occasion = rep(1:noccasion), 
                           hab = hab_list,
                           occupied = occupied[,species]) 
  
  nrep <- rep(1:nrow(visit_data), as.numeric(visits))
  visit_data <- visit_data[nrep,]
  #visit_data$visit <- do.call("c", apply(visits, c(2,1), FUN = function(v) {1:v}))
  
  visit_data$temp <- temp
  visit_data$obs <- obs[species,]
  
  visit_data <- visit_data[tokeep,]
  visit_data$visit <- visit_data[,.(visit = 1:.N), .(site, occasion)]$visit
  
  
  # Construct site data
  # x and y coordinates are irrelevant right now, os can be random.
  # will need to sort this out in the future
  x_coor <- rnorm(nsites,0,1)
  y_coor <- rnorm(nsites, 0, 1)
  site_data <- data.table(site = rep(1:nsites, each = noccasion), 
                          occasion = 1:noccasion, 
                          hab = rep(habitat, each = noccasion), 
                          x = rep(x_coor, each = noccasion), 
                          y = rep(y_coor, each = noccasion))
  
  return(list(visit = visit_data, site = site_data))
}

x <- multispecies_sim(nsites = 50, nspecies = 5, noccasion = 10, 
                      mean_visits = 10, p = rep(0.7, 5), 
                      beta = c(2, -1, 0.4, -1), species = 1)
fit1 <- fit_occu(list(psi ~ 1, p ~ 1), visit_data = x$visit, site_data = x$site)

R abortion due to TMB package

When using the fit_occu function my R session is fatally ending at this section of the function (found by running it line by line)

## CREATE MODEL OBJECT oo <- MakeADFun(data = tmb_dat, parameters = tmb_par, map = map, random = random, DLL = "occu_tmb", silent = !print)

I've been looking up issues from the TMB resource side but I'm able to the package to run for all other models, it is just one which is broken. Is there any known documentation of this issue and workarounds?

Added help files

Hi @r-glennie, I've created a pull request adding the function help files to the man directory, so that one can access help from R, in case that is useful. I've left the function structure as is in occu_tmb.R. Perhaps dividing into a few files would make find things easier, but I rather leave that to your style!

ddd

None of this work.

cod < h

Do sites need to be named sequentially?

Hi @r-glennie, thanks so much for the sharing package. I have to run hundreds of multi-year occupancy models and this package makes it super efficient!

One question, I found the condition that sites need to be named sequentially a bit restrictive. Sequentially in the sense that sites need to be labelled from 1 to max(site) without missing any numbers in the sequence.

From line #223 in occu_tmb.R:

# check site data 
num <- site_data[, .(max = max(site), n = uniqueN(site))]
if (any(num$max != num$n)) stop("site_data has missing sites or sites are mis-numered: ", num[which(num$max  != num$n,)])

Sometimes, one might decide to remove sites from the analysis, and is then forced to re-label all other sites. It is not a major issue, but I don't completely understand the reason why. I was thinking alternative ways to perhaps write this condition but I couldn't pinpoint the exact problem to prevent. You probably have a better idea of why non-sequential sites are a problem?

Thanks!

Here is a toy example of what I am talking about:

library(dplyr)
library(occuR)

# Parameters for simulating data
psi <- 0.8
p <- 0.9
nsites <- 5
noccasions <- 1
nvisits <- 10

## Name sites sequentially and fit model ##

sims <- data.frame(site = rep(seq_len(nsites), each = nvisits), # this labels sites sequentially
                   occasion = 1,
                   visit = rep(seq_len(nvisits), times = nsites),
                   occu = rep(rbinom(nsites, 1, psi), each = nvisits)) %>%
    mutate(obs = rbinom(nrow(.), 1, psi*p*occu))

site_data <- sims %>%
    group_by(site) %>%
    summarize(occasion = unique(occasion)) %>%
    data.table::as.data.table()

visit_data <- sims %>%
    data.table::as.data.table()

# Fit simple model (works well with all sites)
mod <- occuR::fit_occu(list(psi ~ 1, p ~ 1), visit_data, site_data)

## DON'T name sites sequentially and fit model ##

# Remove one site
site_data <- site_data %>%
    filter(site != 2)

visit_data <- visit_data %>%
    filter(site != 2)

# Fit simple model (doesn't work, because there is no site number 2)
mod <- occuR::fit_occu(list(psi ~ 1, p ~ 1), visit_data, site_data)

r-glennie / occur Goto Github PK

occur's Introduction

occur's People

Contributors

Stargazers

Watchers

Forkers

occur's Issues

Is scaling variables recommended for fitting purposes?

installation

Fatal error with simulated data

R abortion due to TMB package

Added help files

ddd

Do sites need to be named sequentially?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent