kdzimm / pseudoreplicationpaper Goto Github PK

Code used to carry out parameter estimation, correlation estimation, type 1 error analysis, and power analysis for our "Pseudoreplication in Single-Cell" study

License: MIT License

HTML 100.00%

single-cell-rna-seq mixed-models power-analysis type1-error

pseudoreplicationpaper's People

Contributors

Stargazers

Watchers

Forkers

chatzinakos chengwei94 harithaa-anandakumar sarwarchy20 nhihin

pseudoreplicationpaper's Issues

Intra- and inter-individual correlation: input specification missing

Hi!

Thanks for sharing the code to study intra- and inter-individual correlation in scRNA-seq data. We are interested in performing this analysis, specifically the intra-correlation one, in our dataset as a proxy of cell-to-cell variability within an individual.

Would you mind specifying what is really the initial input in the Correlation Comparison.Rmd? I assume it's the cells <- allgenes[-3] object in line 65 of this script. Maybe you could share a toy example?

Also, in the methods, you say you filtered out the lowly expressed and correlated genes. I do see the section where you filter out the correlated genes (lines 67-86 in this script), but I do not find the filter of lowly expressed genes ("Genes were removed if the average transcript-per-million (TPM) value was equal to zero.", from your methods section).

Thanks a lot!
Aida

Experimental design - fixed effects model

Thank you very much for tackling the problems associated with differential expression for scRNA-Seq experiment in your paper!

I have a question regarding your suggestion about using MAST with fixed-effects, from this paragraph:

"While we recommend computing differential expression analysis using MAST with RE, alternative methods include using MAST with fixed-effects for individual, a tweedie GLMM, or permutation testing. Accounting for the within-sample correlation with a fixed-effect term for individual, will have a slight difference in interpretation, but should be considered an alternative option to random effects — particularly when the number of independent experimental units is modest."

I will use the scRNA-Seq dataset I am currently trying to analyse as an example. The design is as follows:

condition	mouse_id
control	Mouse1
control	Mouse2
control	Mouse3
treated	Mouse4
treated	Mouse5
treated	Mouse6

Each Mouse with approx 150 single cells sampled.

If I understood correctly your approach, you are suggesting to fit a random effects model, like this:

zlm(~condition + ngenes_scaled + (1 | mouse_id), 
        sca,
        method='glmer',
        ebayes=FALSE,
        strictConvergence = FALSE, 
        exprs_value = 'logcounts')

In my experiment there are only 3 animals, so I wasn't sure whether this is enough to fit the random effects model. You mentioned in your paper to consider a fixed effect model instead when the number of experimental units is small.
Would you mind explaining how the model with fixed effect should look like?

simulate() error when running Power.Rmd script

Hello @kdzimm et al.,

Thank you for providing this excellent resource for incorporating a random effect for individual when running MAST for single-cell differential expression analyses. After loading all of the necessary packages, I started building the simulation as in 'Power.Rmd' but got an error at line 27. Here is my full script in R version 4.0.3:

library(ggplot2) 
library(fitdistrplus) 
library(MASS) 
library(tidyr) 
library(gdata)
library(Seurat)
library(data.table)
library(EnvStats)
library(purrr)
library(dplyr)
library(sn)
library(matrixStats)
library(fmsb)

ngenes <- 100
n.per.group <- 5
ncontrols <- n.per.group
ncases <- n.per.group
mean.number.cells.per.person <- 150
ncells.per.control <- rpois(n=ncontrols,lambda=mean.number.cells.per.person)
ncells.per.case <- rpois(n=ncases,lambda=mean.number.cells.per.person)
ncells <- sum(ncells.per.case) + sum(ncells.per.control)
allgenes <- as.data.frame(replicate(ngenes,simulate(foldchange=2)))

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': no applicable method for 'simulate' applied to an object of class "c('double', 'numeric')"

It seems that the format of the simulate() argument is not correct, and I am having trouble determining what was intended here.

Thanks for your help!

sError in .dsy2dpo(.M2sym(from)) :

It is very helpful to have a detailed study like yours that addresses a number of critical questions about differential expression for scRNA-Seq experiments.
I am running MAST with RE for a categorical response variable where I have 12 donors from condition_1 and 11 donors from condition_2.

When I run MAST for 20 different cell types separately, some of them fail giving the below error,
I wonder if you have encountered such situation and would be glad if you could share any suggestions. Thank you

sError in .dsy2dpo(.M2sym(from)) :
  not a positive definite matrix (and positive semidefiniteness is not checked)
In addition: Warning messages:
1: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
2: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
3: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
4: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
5: In vcov.merMod(object@fitD) :
  variance-covariance matrix computed from finite-difference Hessian is
not positive definite or contains NA values: falling back to var-cov estimated from RX
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00204015 (tol = 0.002, component 1)
7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Problem with Hessian check (infinite or missing values?)
9: In vcov.merMod(object@fitC) :
  Computed variance-covariance matrix problem: not a positive definite matrix (and positive semidefiniteness is not checked);
returning NA matrix

Missing file 'High_Quality_Cells_TPM_Information.csv' in Parameter_Estimation

Hi,

I'm trying to run the code in Parameter_Estimation folder. But it seems I can not find this file required in the code: 'C:\Users\kdzimmer\Documents\Langefeld group\Studies\Dissertation\scRNAseq\Pseudoreplication\Simulations\Empirical Distributions\High_Quality_Cells_TPM_Information.csv'

Could you please share this file? Or could you please explain how you generated this file?

Thanks!

kdzimm / pseudoreplicationpaper Goto Github PK

pseudoreplicationpaper's People

Contributors

Stargazers

Watchers

Forkers

pseudoreplicationpaper's Issues

Intra- and inter-individual correlation: input specification missing

Experimental design - fixed effects model

simulate() error when running Power.Rmd script

sError in .dsy2dpo(.M2sym(from)) :

Missing file 'High_Quality_Cells_TPM_Information.csv' in Parameter_Estimation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent