Coder Social home page Coder Social logo

kdzimm / pseudoreplicationpaper Goto Github PK

View Code? Open in Web Editor NEW
18.0 4.0 5.0 733 KB

Code used to carry out parameter estimation, correlation estimation, type 1 error analysis, and power analysis for our "Pseudoreplication in Single-Cell" study

License: MIT License

HTML 100.00%
single-cell-rna-seq mixed-models power-analysis type1-error

pseudoreplicationpaper's People

Contributors

kdzimm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pseudoreplicationpaper's Issues

Intra- and inter-individual correlation: input specification missing

Hi!

Thanks for sharing the code to study intra- and inter-individual correlation in scRNA-seq data. We are interested in performing this analysis, specifically the intra-correlation one, in our dataset as a proxy of cell-to-cell variability within an individual.

Would you mind specifying what is really the initial input in the Correlation Comparison.Rmd? I assume it's the cells <- allgenes[-3] object in line 65 of this script. Maybe you could share a toy example?

Also, in the methods, you say you filtered out the lowly expressed and correlated genes. I do see the section where you filter out the correlated genes (lines 67-86 in this script), but I do not find the filter of lowly expressed genes ("Genes were removed if the average transcript-per-million (TPM) value was equal to zero.", from your methods section).

Thanks a lot!
Aida

Experimental design - fixed effects model

Thank you very much for tackling the problems associated with differential expression for scRNA-Seq experiment in your paper!

I have a question regarding your suggestion about using MAST with fixed-effects, from this paragraph:

"While we recommend computing differential expression analysis using MAST with RE, alternative methods include using MAST with fixed-effects for individual, a tweedie GLMM, or permutation testing. Accounting for the within-sample correlation with a fixed-effect term for individual, will have a slight difference in interpretation, but should be considered an alternative option to random effects โ€” particularly when the number of independent experimental units is modest."

I will use the scRNA-Seq dataset I am currently trying to analyse as an example. The design is as follows:

condition mouse_id
control Mouse1
control Mouse2
control Mouse3
treated Mouse4
treated Mouse5
treated Mouse6

Each Mouse with approx 150 single cells sampled.

If I understood correctly your approach, you are suggesting to fit a random effects model, like this:

zlm(~condition + ngenes_scaled + (1 | mouse_id), 
        sca,
        method='glmer',
        ebayes=FALSE,
        strictConvergence = FALSE, 
        exprs_value = 'logcounts')

In my experiment there are only 3 animals, so I wasn't sure whether this is enough to fit the random effects model. You mentioned in your paper to consider a fixed effect model instead when the number of experimental units is small.
Would you mind explaining how the model with fixed effect should look like?

simulate() error when running Power.Rmd script

Hello @kdzimm et al.,

Thank you for providing this excellent resource for incorporating a random effect for individual when running MAST for single-cell differential expression analyses. After loading all of the necessary packages, I started building the simulation as in 'Power.Rmd' but got an error at line 27. Here is my full script in R version 4.0.3:

library(ggplot2) 
library(fitdistrplus) 
library(MASS) 
library(tidyr) 
library(gdata)
library(Seurat)
library(data.table)
library(EnvStats)
library(purrr)
library(dplyr)
library(sn)
library(matrixStats)
library(fmsb)

ngenes <- 100
n.per.group <- 5
ncontrols <- n.per.group
ncases <- n.per.group
mean.number.cells.per.person <- 150
ncells.per.control <- rpois(n=ncontrols,lambda=mean.number.cells.per.person)
ncells.per.case <- rpois(n=ncases,lambda=mean.number.cells.per.person)
ncells <- sum(ncells.per.case) + sum(ncells.per.control)
allgenes <- as.data.frame(replicate(ngenes,simulate(foldchange=2)))

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': no applicable method for 'simulate' applied to an object of class "c('double', 'numeric')"

It seems that the format of the simulate() argument is not correct, and I am having trouble determining what was intended here.

Thanks for your help!

sError in .dsy2dpo(.M2sym(from)) :

It is very helpful to have a detailed study like yours that addresses a number of critical questions about differential expression for scRNA-Seq experiments.
I am running MAST with RE for a categorical response variable where I have 12 donors from condition_1 and 11 donors from condition_2.

When I run MAST for 20 different cell types separately, some of them fail giving the below error,
I wonder if you have encountered such situation and would be glad if you could share any suggestions. Thank you

sError in .dsy2dpo(.M2sym(from)) :
  not a positive definite matrix (and positive semidefiniteness is not checked)
In addition: Warning messages:
1: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
2: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
3: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
4: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
5: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00204015 (tol = 0.002, component 1)
7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Problem with Hessian check (infinite or missing values?)
9: In vcov.merMod(object@fitC) :
  Computed variance-covariance matrix problem: not a positive definite matrix (and positive semidefiniteness is not checked);
returning NA matrix

Missing file 'High_Quality_Cells_TPM_Information.csv' in Parameter_Estimation

Hi,

I'm trying to run the code in Parameter_Estimation folder. But it seems I can not find this file required in the code: 'C:\Users\kdzimmer\Documents\Langefeld group\Studies\Dissertation\scRNAseq\Pseudoreplication\Simulations\Empirical Distributions\High_Quality_Cells_TPM_Information.csv'

Could you please share this file? Or could you please explain how you generated this file?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.