squidgroup / squid Goto Github PK

Statistical Quantification of Individual Differences: an simulation tool for understanding multi-level phenotypic data in linear mixed models

License: Other

R 52.79% CSS 0.22% HTML 45.97% JavaScript 1.02%

linear-mixed-effects-modelling multilevel-data phenotypic-equation variance-components phenotypic-plasticity reaction-norm repeatability personality simulations

squid's Introduction

Statistical Quantification of Individual Differences: an educational and statistical tool for understanding multi-level phenotypic data in linear mixed models

Brief description

SQuID stands for Statistical Quantification of Individual Differences and is the product of the SQuID working group. The package aims to help scholars who, like us, are interested in understanding patterns of phenotypic variance. Individual differences are the raw material for natural selection to act on and hence the basis of evolutionary adaptation. Understanding the sources of phenotypic variance is thus a most essential feature of biological investigation and mixed effects models offer a great, albeit challenging tool. Disseminating the properties, potentials and interpretational challenges in the research community is thus a foremost goal of SQuID.

The squid package has two main objectives: First, it provides an educational tool useful for students, teachers and researchers who want to learn to use mixed-effects models. Users can experience how the mixed-effects model framework can be used to understand distinct biological phenomena by interactively exploring simulated multilevel data. Second, squid offers research opportunities to those who are already familiar with mixed-effects models, as squid enables the generation of datasets that users may download and use for a range of simulation-based statistical analyses such as power and sensitivity analysis of multilevel and multivariate data.

Install squid package

To install the latest released version from CRAN:

install.packages("squid")

To install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("squid-group/squid")

Get more information about the installation of the devtools package.

Background

The phenotype of a trait in an individual results from a sum of genetic and environmental influences. Phenotypic variation is structured in a hierarchical way and the hierarchical modelling in mixed effect models is great tool to analyze and decompose such variation. Phenotypes vary across species, across populations of the same species, across individuals of the same population, and across repeated observations of the same individual. We focused on the individual level because it represents one of the most important biological levels to both ecological and evolutionary processes. Different sources of variation are at the origin of the phenotype of an individual. Individuals may differ in their phenotypes because they carry different gene variants (i.e. alleles). But individuals also experience different environments during their lifetime. Some environmental influences impose a lasting mark on the phenotype, while others are more ephemerous. The former tend to produce long-lasting, among-individual variation, while the latter causes within-individual variation. However, this depends on the time scale at which the measurements of the phenotypes are done relative that of the environmental influences. Furthermore, individuals differ not only in their average phenotypes but also in how they respond to changes in their environment (i.e. differences in individual phenotypic plasticity). This represents an interaction between the among- and the within-individual levels of variation. The patterns of variation can, thus, be very complex. Selection can act differently on these different components of variance in the phenotypes of a trait, and this is why it is important to quantify their magnitude.

Mixed models are very flexible statistical tools that provide a way to estimate the variation at these different levels, and represent the general statistical framework for evolutionary biology. Because of the progress in computational capacities mixed models have become increasingly popular among ecologists and evolutionary biologists over the last decade. However, fitting mixed model is not a straightforward exercise, and the way data are sampled among and within individuals can have strong implications on the outcome of the model. This is why we created the squid simulation tool that could help new users interested in decomposing phenotypic variance to get more familiar with the concept of hierarchical organization of traits, with mixed models and to avoid pitfalls caused by inappropriate sampling.

squid package description

squid is a simulation-based tool that can be used for research and educational purposes. squid creates a world inhabited by individuals whose phenotypes are generated by a user-defined phenotypic equation, which allows easy translation of biological hypotheses into mathematically quantifiable parameters. The framework is suitable for performing simulation studies, determining optimal sampling designs for user-specific biological problems, and making simulation based inferences to aid in the interpretation of empirical studies. squid is also a teaching tool for biologists interested in learning, or teaching others, how to implement and interpret mixed-effects models, when studying the processes causing phenotypic variation. squid is based on a mathematical model that creates a group of individuals (i.e. study population) repeatedly expressing phenotypes, for one or two different traits, in uniform time. Phenotypic values of traits are generated following the general principle of the phenotypic equation (Dingemanse & Dochtermann 2013, Journal of Animal Ecology): phenotypes are assumed to be the summed effects of a series of components and the phenotypic variance (Vp) is the sum of the respective variances in theses causal components. The user has thus the flexibility to add different variance components that will form the phenotype of the individual at each time step, and to set up the relative importance of each component through the definition of environmental effects. squid then allows the user to collect a sub-sample of phenotypes for each simulated individual (i.e. operational data set), according to a specific sampling design. The major difference between squid and other R packages that also allow performance analysis through data simulation (e.g. pamm, odprism, simr), is that only squid allows separate steps for generating the world first and then model a sampling process from it. squid is subject to evolution and is designed to adapt to more complex scenarios in the future.

squid has two main functions; squidApp() and squidR():

squidApp(): runs the SQuID application which is a browser-based interface created with the shiny package (we recommend to update your default web browser to its latest version). SQuID is built up as a series of modules that guide the user into situations of increasing complexity to explore the phenotypic equation model and the dynamics between the way phenotypes are sampled and the estimation of parameters of specific interest; The last module is the full model simulation that allows the user to generate data sets that can then be used to run analyses in the statistical package of their choice for specific research questions. For most of the modules, the simulated data set is automatically fed into a statistical model in R and the main results of the analysis shown in an output. For the full model the user has the opportunity to download the operational data set for further analyses. The SQuID application also has a tab (Full model (Step by step)) describing in details the SQuID full model.

# run SQuID application
library(squid)
squidApp()

squidR(): is a traditional R function that allows data generation and sampling without the browser-based interface. This function can be used for more advanced and efficient simulations once you understand how SQuID works. squidR() can be easily included in R scripts.

History of the project

It all started in Hannover in November 2013 at the occasion of a workshop on personality organised by Susanne Foitzik, Franjo Weissing, and Niels Dingemanse and funded by the Volkswagen Foundation. During this workshop, a group of researchers discussed the potential issues related to sampling designs on the estimation of components of the phenotypic variance and covariance. It became obvious that there was an urgent need to develop a simulation package to help anyone interested in using a mixed model approach at getting familiar with this methods and avoiding the pitfalls related to the interpretation of the results. A first model and a working version of the package were created in January 2014, during a meeting at Université du Québec à Montréal. The current version was produced during a workshop in November 2014, at the Max Plank Institute for Ornithology in Seewiesen.

SQuID team

Hassen Allegue (Université du Québec À Montréal, Montreal, Canada)
Yimen G. Araya-Ajoy (Norwegian University of Science and Technology, Trondheim, Norway)
Niels J. Dingemanse (Max Planck Institute for Ornithology, Seewiesen & University of Munich, Germany)
Ned A. Dochtermann (North Dakota State University, Fargo, USA)
Laszlo Z. Garamszegi (Estación Biológica de Doñana-CSIC, Seville, Spain)
Shinichi Nakagawa (University of New South Wales, Sydney, Australia)
Denis Réale (Université du Québec À Montréal, Montreal, Canada)
Holger Schielzeth (University of Bielefeld, Bielefeld, Germany)
David F. Westneat (University of Kentucky, Lexington, USA)

References

Allegue, H., Araya-Ajoy, Y.G., Dingemanse, N.J., Dochtermann N.A., Garamszegi, L.Z., Nakagawa, S., Réale, D., Schielzeth, H. and Westneat, D.F. (2016). SQuID - Statistical Quantification of Individual Differences: an educational and statistical tool for understanding multi-level phenotypic data in linear mixed models. Methods in Ecology and Evolution, 8:257-267. DOI: 10.1111/2041-210X.12659

Dingemanse, N.J. and Dochtermann N.A. (2013). Quantifying individual variation in behaviour: mixed-effect modelling approaches. Journal of Animal Ecology, 82:39-54. DOI: 10.1111/1365-2656.12013

squid's People

Contributors

Stargazers

Watchers

Forkers

alrutten digideskio mcmaurer jxxcgit shafieisabets bertvanderveen

squid's Issues

explanation of resid parameters

Likely very easy comment to address, but as far as I can tell, there is no explanation for what/how to specify the 'cov' parameter in the residual arguments in the simulation? I assume this is just residual variance, correct? This just became apparent to me when I noticed that cov =1 for the first few simulations (simple linear model), but cov =0.3 for the next few (interactions and non-linear effects) without any explanation as to why this changed?

Default residual variance

The residual level does not have a default variance value. This code returns an error:

squid_data <- sim_population(
  parameters = list(
    residual = list(
      n_level = 2000
    )
  )
)
Error: The number of parameters given for residual are not consistent

Error with squidApp (No such file directory)

This error occurs when running squidApp:

Warning in file(filename, "r", encoding = encoding) :
  cannot open file './source/pages/full_model_sbys/ui_fullmodel_sbys_bivar_summary.R.R': No such file or directory
Warning: Error in file: cannot open the connection

Predictors' Covariance matrix

I'm wondering if isn't more user-friendly to have the predictors' (co)variance matrix as a variance/correlation matrix instead?

error message needed if 0 variances are specified

Add extra documentation to SQUID app

Add more documentation to SQUID app (page "Full model (Step by step)") that describes the full model equation.

Add Table 1 and Table S1 from MS1: "SQuID – Statistical Quantification of Individual Differences: an educational and statistical tool for understanding multi-level phenotypic data in the mixed modelling framework".

Error message for n_level of observation and residual

The observation and the residual levels should have the same size (i.e., n_level).

When setting different values, such as in this code:

squid_data <- sim_population(
  parameters = list(
    observation = list(
      names = c("temperature", "rainfall"),
      n_level = 1000
    ),
    residual = list(
      cov = 1,
      n_level = 2000
    )
  )
)

The following error is returned:
Error in (function (..., deparse.level = 1) : number of rows of matrices must match (see arg 2)

It'll be nice to have a more informative error message such as: "Error: The observation and the residual levels must have the same number of levels (i.e., n_level)"

Also, it'll be nice to check that the n_level is a positive integer otherwise return an error message. This could be done for any input parameter.

create function to generate data structure

notation for beta (random regression module)

for squidAPP() - Random-slope regression model

currently $\beta$ but should be $\beta_1$

Write function documentation

Write and generate documentation (using roxygen2 packgae) for these SQUID package functions:

runSQUIDapp()
runSQUIDfct()

Many populations within a world

I am not sure if it is relevant at this stage. But we need to be able to simulate many populations within a world. I assume there will be a sampling function allowing to sample the generated population data using "pop_data". However we still need to generate different populations with the same parameters. Maybe an argument within "pop_data".

Name of the `pop_data()` function

I suggest to change the name of the pop_data() function to: get_data()

Also the data of each step of the squid process could be accessed through, for example, these functions:

get_data_structure()
get_population_data()
get_sample_data()

Implement the non-gaussian module

Number of observations in `make_structure()`

In the function make_structure() when the number of observations entered by the user (N) is lower or not a multiplicative of the minimum number of observations to generate properly the grouping levels (Nlevel), the function returns an error.

Example:

ds <- make_structure(structure = "sex(2)/individual(10)", N=9)
ds <- make_structure(structure = "sex(2)/individual(10)", N=38)

 Error in levels[, i] <- rep(1:Ns[i], each = N/Ns[i]) : 
  number of items to replace is not a multiple of replacement length

here:
N = 9 or 38
and Nlevel = 10*2 = 20

It would be nice to either:
(1) return a constructive error message
(2) or (which I recommend) automatically adjust N to Nlevel or to the closest multiplicative of Nlevel, and return a warning message to inform the user that N was changed.

This could also be an issue when multiple grouping structures are defined. Ex:

ds <- make_structure(structure = "sex(2)/individual(10) + season(4)/day(7)", N=40)

N = 40
Nlevel1 = 20
Nlevel2 = 28

The user should be warned that samples sizes within nested structures should match among nested structures and with N.

create function to simulate population level phenotypes from data structure and parameters

Problem with the function ifelse

Hi,
I have a problem running squidApp, it may be because of the function ifelse
This is a new problem because I didn't any trouble 6 months ago.
I run :
library(squid) squidApp()

And I get this warning:

Warning: Error in rep: attempt to replicate an object of type 'environment'
46: ifelse
45: [./source/server/fullmodel/SVRFullModel.R#310]
2: shiny::runApp
1: squidApp

I will be very gratefull if you can help me.
Thank you

squidApp doesn't run in R

@hallegue the squidApp does not run in R due to an error on the "squidR" page. You might or might not be aware of this, but I have tracked down the issue to l85-86 in ui.R:

      tabPanel("squidR",
      				 fixedPage(wellPanel(shiny::includeMarkdown("./source/pages/squidr/squidr.md")))

includeMarkdown(.) seems case sensitive and the file & directory name include a capital R instead of a small r. When that is changed, the app runs in R.

There are various warnings thrown by lines that include icon("refresh") in var_fullmodel.R and var_modules.R which, if the warning text is correct, should be corrected to icon("sync").
A final warning seems to be thrown by navbarpage(.) in ui.R of which I am still unsure how to improve that.

@hallegue please let me know if you would like me to submit a PR with fixes.

Name of the `sim_population` function

To make the function name more general, I think the word "population" should be avoided.

For example, this function could be named: simulate(), simulate_data(), sim_data()

case sensitivity and filenames

Hi Hassen!

some file systems are case sensitive, you may want to accomodate for that (case in point: source("./source/pages/fullmodel/UIenvironment.R",local=TRUE)
does not work for me (ubuntu 14.04), as the actual filepath is
source("./source/pages/fullModel/UIenvironment.R",local=TRUE)
)

add in simulating multivariate effects

there a probably several ways we can implement simulating multivariate effects. Perhaps have one that allow the user to just put in parameters at the different levels and then everything gets added up, and another way where things can more specifically be added up

Ve and VG don't work in the Full model

The default values of the residual and group variances don't work in the full model. But they get updated and properly accounted for when the user changes them.

Column and row names for the (co)variance matrices

It'll be nice to set the column and row names of the (co)variance matrices with the predictor names.

It'll be easier and faster for the users to understand those matrices when they extract them from the squid output.

For example:

squid_data <- sim_population(
  parameters=list(
    observation=list(
      names=c("temperature","rainfall", "wind"),
      mean = c(10,1,20),
      cov  = matrix(c(1,0,1,0,0.1,0,1,0,2), nrow=3 ,ncol=3),
      beta =c(0,-3,0.5),
      n_level=2000
    ),
    residual=list(
      mean=10,
      cov=1,
      n_level=2000
    )
  )
)

cov_mat <- squid_data$parameters$observation$cov
cov_mat
     [,1] [,2] [,3]
[1,]    1  0.0    1
[2,]    0  0.1    0
[3,]    1  0.0    2

colnames(cov_mat) <- rownames(cov_mat) <- squid_data$parameters$observation$names
cov_mat
           temperature rainfall wind
temperature           1      0.0    1
rainfall              0      0.1    0
wind                  1      0.0    2

The same thing could be applied to the mean and the beta vectors.

add in interactions into parameter list

need a way to nicely specify interactions between covariates. Probably do this through allowing a user to add a named interaction parameter to the parameter list, which they can add into their model formula

add in simulating fixed factors

Need to work out a nice way to add factors with fixed effects, not simulated from a normal distribution. If someone wants to simulate sex effects, for example, then this is likely to be a fixed difference between sexes, rather than one drawn from a distribution.

Implement the diagnostics module

Naming the levels of variation

I think naming the lists observation and individual may be restrictive. Can we go with level one, level two or something like that. Also useful when there are more than two levels.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.