Coder Social home page Coder Social logo

oliviergimenez / gammagompertzcr Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 1.54 MB

Fit Gamma-Gompertz model to capture-recapture data using data cloning

License: Other

R 82.07% TeX 17.93%
capture-recapture data-cloning mcmc bayesian-inference survival maximum-likelihood-estimation individual-heterogeneity

gammagompertzcr's Introduction

DOI DOI

In brief

GammaGompertzCR is an R package that allows estimating survival in free-ranging animal populations using a Gompertz capture-recapture model with a Gamma frailty to deal with individual heterogeneity. It uses data cloning in the Bayesian framework to get maximum likelihood parameter estimates (Lele et al. 2007). Data cloning uses multiple copies of the data to produce prior-invariant inferences and a normal distribution centered at the maximum likelihood estimates. In addition, this method allows detecting non-identifiable parameters (Lele et al. 2010). To use the package, users should be familiar with Bayesian MCMC techniques and in particular how to interpret convergence diagnostics. We refer to Robert and Casella (2010) for an introduction to Bayesian MCMC techniques with R and to King et al. (2009) for an introduction for ecologists.

Installation

This repository hosts the development version of the package. Assuming JAGS is already on your system (otherwise it'll take a minute to get it, you can install GammaGompertzCR as follows:

if(!require(devtools)) install.packages("devtools")
library("devtools")
install_github('oliviergimenez/GammaGompertzCR')

Getting help, reporting an issue or contribute

To report bugs/issues/feature requests, please file an issue. These are very welcome!

Feel free to contribute to the repository by forking and submitting a pull request. If you are new to GitHub, you might start with a basic tutorial and check out a more detailed guide to pull requests.

If you prefer to email, feel free to drop a message to the Gilbert and Olivier (see the DESCRIPTION file of this repository for their contact details).

A session example

Preliminary steps

First, we need to load the GammaGompertzCR package as well as the dcmle that we will need:

library(GammaGompertzCR)
## Thanks for using package GammaGompertzCR. 
##     Check out https://github.com/oliviergimenez/GammaGompertzCR for an example.
library(dcmle)
## Loading required package: dclone
## Loading required package: coda
## Loading required package: parallel
## Loading required package: Matrix
## dclone 2.1-2 	 2016-03-11
## dcmle 0.3-1 	 2016-03-11
## 
## Attaching package: 'dcmle'
## The following objects are masked from 'package:coda':
## 
##     chanames, crosscorr.plot, gelman.diag, gelman.plot,
##     geweke.diag, heidel.diag, raftery.diag, varnames

Then, we read in some data. Below we use a fragment of the dipper dataset dipper_tiny used by Marzolin et al. (2011) and provided with the package. The complete dataset dipper_complete is also provided. Detections are the 1's, non-detections the 2's.

data(dipper_tiny)
head(dipper_tiny)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]    1    1    1    1    1    1    1    1    2
## [2,]    1    1    1    1    1    1    1    2    2
## [3,]    1    1    1    1    1    1    2    2    2
## [4,]    1    1    1    1    1    2    2    2    2
## [5,]    1    1    1    2    2    2    2    2    2
## [6,]    1    1    1    2    2    2    2    2    2

Model fitting

Now let's fit the Gamma-Gompertz model to these capture-recapture data:

clo <- c(1,10,50,100) # numbers of clones
nu <- 1000 # number of updates
ni <- 5000 # number of iterations
nt <- 5 # thinning
nc <- 3 # number of chains
initmeans <- c(0.15,0.5,4.9,1.4,-0.4) # means of normally distributed priors for parameters
initprec <- 1000 # common precision of the normal priors for the first number of clones

# fitting procedure (this may take a few minutes)
post_inf <- fit_ggcr(dipper_tiny,clo,nu,ni,nt,nc,initmeans,initprec)
## 
## Parallel computation in progress
## 
## 
## Parallel computation in progress
## 
## 
## Parallel computation in progress
## 
## 
## Parallel computation in progress

Let's display the parameter estimates from posterior distributions :

summary(post_inf[[length(clo)]])
## 
## Iterations = 2005:7000
## Thinning interval = 5 
## Number of chains = 3 
## Sample size per chain = 1000 
## Number of clones = 100
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##          Mean      SD  DC SD  Naive SE Time-series SE R hat
## a      0.1182 0.01694 0.1694 0.0003093      0.0003092 1.002
## b      0.4871 0.02856 0.2856 0.0005215      0.0005147 1.000
## k      4.8999 0.03105 0.3105 0.0005670      0.0005912 1.000
## psi    1.4020 0.03131 0.3131 0.0005716      0.0005625 1.001
## sigeta 0.6713 0.02142 0.2142 0.0003912      0.0003854 1.003
## 
## 2. Quantiles for each variable:
## 
##           2.5%    25%    50%    75%  97.5%
## a      0.08749 0.1063 0.1176 0.1294 0.1539
## b      0.43243 0.4677 0.4872 0.5067 0.5427
## k      4.83694 4.8789 4.9006 4.9213 4.9593
## psi    1.34133 1.3811 1.4006 1.4229 1.4640
## sigeta 0.63044 0.6565 0.6709 0.6858 0.7140

Let's represent survival as a function of age at the population level.

# get values from posterior distributions
a <- coef(post_inf[[length(clo)]])[1]
b <- coef(post_inf[[length(clo)]])[2] 
k <- coef(post_inf[[length(clo)]])[3]
grid_age <- seq(0,8,1)
S <- exp(-k*log(1+(exp(b*grid_age)-1)*a/(b*k)))
plot(grid_age,S,xlab='age',ylab='estimated survival',lwd=2,type='l')

How to choose the precision, number of iterations and clones?

For the Gamma-Gompertz model, all parameters are positive. The user begins with a quite moderate precision, say 10, for the normal priors. This precision is only valid during the run with the first element of vector clone. For the second element of clone and the following ones, mean and precision of a parameter prior are chosen as being the posterior mean and precision of this same parameter obtained at the previous run (function update). Several chains run in parallel may converge to different values according to their starting points, while the standard deviation (sd) of the parameter estimates may be quite large. Then, one increases the new prior initial precision initprec from 100 or 1000, associated with new initmeans picked in the neighborhood of the previous mean estimates, and start the iterations again. To choose among the different values, we look at the corresponding tests lambda.max, ms.error and r.squared. In the above example, with prec = 10, lamba.max varies around 0.12 according to the number of clones, while with prec = 100 or prec = 1000, it decreases to 0.002, which indicates a more relevant choice. Again, the statistical accuracy is a function of the sample size but not of the number of cloned copies (Lele et al. 2010). Regarding the number of iterations, ni, the plots of the different chains usually help. For each parameter, a display by chain of the iterations can be obtained with densityplot(post_inf[[1]][,"a"]), the '[[1]]' being for the first element of vector clone.

Convergence diagnostics

First have a look at the traceplots for all parameter estimates that we obtained with the last (ascending order) number of clones:

plot(post_inf[[length(clo)]], ask = TRUE)

Now checking some correlations:

autocorr.diag(post_inf[[length(clo)]])    
##                     a            b             k         psi       sigeta
## Lag 0    1.0000000000  1.000000000  1.0000000000  1.00000000  1.000000000
## Lag 5    0.0295019859  0.006720507 -0.0007752711 -0.02088471 -0.016978811
## Lag 25   0.0007827941  0.019774407  0.0308653197  0.00354215  0.020744847
## Lag 50  -0.0342719405 -0.017393540 -0.0293516686  0.01155295  0.019323252
## Lag 250  0.0106310015 -0.021151298 -0.0156915502  0.00201559  0.005046463
crosscorr.plot(post_inf[[length(clo)]]) 

gelman.diag(post_inf[[length(clo)]])
## Potential scale reduction factors:
## 
##        Point est. Upper C.I.
## a               1       1.01
## b               1       1.00
## k               1       1.00
## psi             1       1.00
## sigeta          1       1.01
## 
## Multivariate psrf
## 
## 1

Parameters a and b seem to be slightly auto- and cross-correlated.

Sensitivity to priors

It is good advice to check if changing the priors leads to similar posterior results. In our case, the choice of initmeans is made through trial and error, because the models we're dealing with have likelihood with multiple spikes. For instance, we might begin with vague priors: initmeans = c(1,1,5,2,-0.5), initprec=10, clo= c(1,50,100), which yields c(0.11,0.51,4.39,1.44,0.41) with quite large standard deviation. Modifying the prior means can affect the posterior estimates. Hence, additional tests are required. For each number of clones, the posterior variances of the parameters are divided by that obtained with min(clo) to get scaled variances. When increasing the number of clones, the means of the parameters converge to the maximum likelihood estimates while their scaled variances decrease to reach a lower bound. We get a visual threshold indicating the minimal number of clones to agree with the desired results (Solymos 2010) by plotting the scaled variances or their logs against the number of clones.

Say we have 4 lots of clones:

clo <- c(1,10,50,100) 

We use function dctable to do the job:

dct <- do.call(dctable,post_inf) 
plot(dct)        

#log of scaled variances against number of clones.
plot(dct, type= "log.var")  

Checking parameter identifiability

Data cloning allows testing identifiability rather easily (Lele 2010). When some parameters are not estimable, the largest eigenvalue of the posterior covariance matrix lambda.max does not tend to 0. When it tends to 0, the parameters are estimable and the results are independent of prior choice. Two other statistics ms.error and r.squared are used to test the normality of the limit. Notice that the statistical accuracy is a function of the sample size but not of the number of cloned copies (Lele et al.2010).

Get the necessary elements and checking:

mat.dcdiag<- t(sapply(post_inf,dcdiag))
dimnames(mat.dcdiag) <- list(NULL,c("clones","lambda.max","ms.error","r.squared","r.hat"))               
mat.dcdiag
##      clones lambda.max   ms.error    r.squared    r.hat   
## [1,] 1      0.001001928  0.008665452 0.0007978197 1.000715
## [2,] 10     0.001008014  0.02980244  0.00295392   1.002615
## [3,] 50     0.001057018  0.006642011 0.0006647053 1.001721
## [4,] 100    0.0009865647 0.01145292  0.0009542366 1.003452

References

King R, Morgan BJT, Gimenez O and Brooks S (2009). Bayesian Analysis for Population Ecology. Chapman & Hall/CRC Interdisciplinary Statistics.

Lele, SR, D Dennis, and F Lutscher. 2007. Data Cloning: Easy Maximum Likelihood Estimation for Complex Ecological Models Using Bayesian Markov Chain Monte Carlo Methods. Ecology Letters 10: 551–63.

Lele, SR, K Nadeem, and B Schmuland. 2010. Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning. Journal of the American Statistical Association 105: 1617–25.

Marzolin, G, A Charmantier, and O Gimenez. 2011. Frailty in State-Space Models: Application to Actuarial Senescence in the Dipper. Ecology 92: 562–67.

Missov, TI. 2013. Gamma-Gompertz Life Expectancy at Birth. Demographic Research 28: 259–270.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/.

Robert CP and Casella G (2010). Introducing Monte Carlo Methods with R. Springer.

Solymos, P. 2010. dclone: Data Cloning in R. The R Journal 2: 29-37.

gammagompertzcr's People

Contributors

bbolker avatar oliviergimenez avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

bbolker

gammagompertzcr's Issues

review for JOSS (1/5)

Overall, I think this package is fine (clear and, as far as I can see, technically correct). There are a handful of changes that need to be made to conform to JOSS's standards, but those should be relatively easy. On the other hand, there are a rather large number of chores, individually not too big deal but overall perhaps rather overwhelming, that could be made to make this package much easier to use and enhance its uptake in the research community.

review for JOSS (2/5): conforming to JOSS guidelines (license, community guidelines)

There are a few particular items that seem to need to be done to conform to JOSS's standards. Some are a little odd when looked at from the R-package point of view, but I would fix them rather than make an exception for R packages.

  • the package does not "contain a plain-text LICENSE file with the contents of an OSI approved software license". The DESCRIPTION file does indicate "MIT", which is the standard way to specify licensing in an R package (but note that R CMD check does note: "License components which are templates and need '+ file LICENSE': MIT"; see https://cran.r-project.org/web/licenses/MIT
  • Community guidelines: "Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support". No. It would be a reasonable guess that (1) forking and submitting pull requests; (2) posting to the Issues list on GitHub would be reasonable approaches, but this could be made explicit. (The DESCRIPTION file in an R package also allows for a URL: field and a BugReports: field.)

review for JOSS (3/5): tests

The package contains no tests. This could be done by the standard old-school R approach (.R files in tests/), or via the testthat unit-testing framework.

Some thought will have to be given to the degree of reproducibility required (tolerances, random-number seed setting for JAGS, etc.) and to how to make the tests run fast enough (e.g. run for minimal numbers of steps).

The manual example in the README file does serve as a minimal example to make sure the software runs, but it does not show output (nor set random-number seeds) so there is no way to know if the output obtained is correct.

There are also no examples given in the manual. The examples given in the README file should probably be incorporated in the package help pages.

review for JOSS (4/5): documentation of functionality

The documentation of the functionality in the package is relatively sparse. The minimal pieces are there (README and help pages for the three functions contained in the package), and there are pointers to the literature so that users can in principle go educate themselves on the technical details underpinning the computations. However, what's here is extremely bare-bones, and it would be kind to users to give them a bit more help:

  • describe the level of expertise in Bayesian MCMC techniques expected as a prerequisite for using the package safely (and possibly pointers to ecologist-friendly introductions by Kéry, McCarthy, ...). For example, users need to know how to interpret standard MCMC diagnostics (trace plots, Gelman-Rubin statistics, etc.) and what to do when they fail.
  • Are there any diagnostics (e.g., sensible ranges of parameters) that are specific to this model, as opposed to the general tools that apply to most MCMC approaches?)
  • What priors are used? Are the results sensitive to the priors? Can users override the default priors? When might they want to?
  • How are initial values chosen for the chains? Can users override the defaults? When might they want to?
  • Are there reasonable minimum and maximum data set sizes for this approach?
  • Is there any capability for including fixed-effect covariates or random-effects groupings in the model? (I realize this may be beyond the scope of this package, but it might be worth mentioning)

review for JOSS (5/5): bits and pieces

I don't consider any of these issues publication-blockers, but many of them are fairly low-hanging fruit.

  • in its current form the package does not pass R CMD check ... (at least with a development version of R) - and so it cannot be put on CRAN
  • there are many accessor methods that could be provided to make users' lives easier.
    • print() (to print short summary of model)
    • coef(summary(...)) (to return coefficient table)
    • plot() (to plot model diagnostics?) (I have a personal preference for coda::xyplot.mcmc (trace plots only, compact/lattice format) and coda::densityplot.mcmc (density plots, only, compact/lattice format) over plot.mcmc (trace & density plots)
    • in the README, why use post_inf[[1]] (first chain only) rather than output from dclone::coef() to get parameter values
    • storing output from longer runs (i.e., with larger numbers of clones) in inst would make it easier to give examples that illustrate convergence of data-cloning in a non-trivial way
    • why not include the dipper data as a a data object in the package?
    • the examples print a (skipped) error "Error in chol.default(W) : the leading minor of order 6 is not positive definite". This is probably trivial, but might worry users.
    • FWIW my fork of the broom package (https://github.com/bbolker/broom) allows "tidying" of mcmc objects, so e.g. mm <- emdbook::lump.mcmc.list(post_inf); broom::tidy(mm); dotwhisker::dwplot(mm) produces useful output

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.