nmikolajewicz / scmiko Goto Github PK

R package developed for single-cell RNA-seq analysis. It was designed using the Seurat framework, and offers existing and novel single-cell analytic work flows.

License: Other

R 100.00%

scrnaseq seurat single-cell

scmiko's Introduction

scMiko

scMiko is an R package developed for single-cell RNA-seq analysis. It was designed using the Seurat framework, and offers existing and novel single-cell analytic work flows.

Documentation and tutorials can be found at https://nmikolajewicz.github.io/scMiko/. To get started, refer to our getting started with scMiko article.

scMiko is hosted on Github, and the source code can be cloned at https://github.com/NMikolajewicz/scMiko.

scMiko is developed and tested in Windows, however it is expected to work in MacOS X and Linux.

scPipeline

The scPipeline analysis pipeline is a modular collection of R markdown scripts that have been developed in tandem with the scMiko package. The scPipeline pipeline generates dashboard reports for scRNAseq analyses.

Documentation and tutorials for scPipeline workflows can also be found at https://nmikolajewicz.github.io/scMiko/. To get started, see our getting started with scPipeline article.

scPipeline source code can be accessed at https://github.com/NMikolajewicz/scPipeline.

Citation

if using scPipeline or scMiko, please consider citing our work: Mikolajewicz, N., Gacesa, R., Aguilera-Uribe, M., Brown, K. R., Moffat, J., & Han, H. (2022). Multi-level cellular and functional annotation of single-cell transcriptomes using scPipeline. Communications Biology, 5(1), 1-14.

scmiko's People

Contributors

Stargazers

Watchers

Forkers

jhuanglabsctools biliopo tobylanser millersan nbahti justype reginaldoallves

scmiko's Issues

download from 'https://api.github.com/repos/NMikolajewicz/scMiko/tarball/HEAD' failed

Hi,

I ran devtools::install_github(repo = "NMikolajewicz/scMiko")
but got the error message

Downloading GitHub repo NMikolajewicz/scMiko@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
download from 'https://api.github.com/repos/NMikolajewicz/scMiko/tarball/HEAD' failed

Installation error

Tried installing from Github.

> remotes::install_github("NMikolajewicz/scMiko")
Downloading GitHub repo NMikolajewicz/scMiko@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet,  : 
  download from 'https://api.github.com/repos/NMikolajewicz/scMiko/tarball/HEAD' failed

Then I tried to download and install it locally.


> install.packages("~/Downloads/scMiko-master.zip", repos = NULL, type = "win.binary")
Error in install.packages : cannot install Windows binary packages on this platform

R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

An error message with mikoScore function

Dear Professor,
I tried to follow the Cell-type annotation using Miko Scoring pipeline with pbmc3k example dataset but with an error message with mikoScore function, and the error message was "2022-09-21 19:11:13: Optimal bin size: 10
2022-09-21 19:11:13: Running cell scoring...
Error in grep(pattern = paste0("^", s, "$"), x = match, ignore.case = TRUE, : formal performance’^[CD3D$' wrong"
And the original code was

load package

library(scMiko)

load Seurat

library(Seurat)
library("SeuratData")
data(pbmc3k)

cell-type catalog

marker.df <- geneSets[["CellMarker_Hs_Zhang2019"]]
marker.list <- wideDF2namedList(marker.df)

ensure that all genes are UPPER CASE (human)

marker.list <- lapply(marker.list, toupper)

only include gene sets with more than 3 markers

marker.list <- marker.list[unlist(lapply(marker.list, length)) > 3]

step 1

ns.res <- nullScore(object = pbmc3k, assay = DefaultAssay(pbmc3k), n.replicate = 25,
nbin = 24, min.gs.size = 2, max.gs.size = 200, step.size = 10, nworkers = 16,
verbose = T, subsample.n = 2000)

step 2

pbmc_scored <- mikoScore(object = pbmc3k, geneset = marker.list, nbin = 10,
nullscore = ns.res, assay = DefaultAssay(pbmc3k), nworkers = 18)
"
and with variance-gene set-size relationship plot like this

can this ens2sym.so function work?

Dear developer,

I am just wondering about the function ens2sym?
How to use this to convert the ens to gene symbol for seurat?
I tried it but nothing changed.
Thanks for your help

different CDI marker genes for cluster optimisation and CDI DEG analysis

Thank you for this interesting and useful package.
I am running it on my 10x single cell data to identify optimal clustering parameters and marker genes.
I am running the script as they appear in the vignette. First I use the cluster optimisation vignette. Once identified the desired resolution I ran the FindClusters command and ran the CDI DEG analysis. This is when I saw that the top marker genes displayed for resolution optimisation were different then the ones for the DEG analysis.
I am attaching the relevant images:

Thank you for your help

know a good mitochondrial threshold a priori is not always possible

Hello Dr. Mikolajewicz!

A very nice and useful tool for single-cell! Thanks!

I have a question: did you consider using tools, such as miQC developed by the Greene Laboratory at the University of Colorado Anschutz, for mitochondrial quality control?

What I mean is that using an arbitrary threshold could eliminate cells that are actually good. Accordingly, recent work has shown these thresholds can be highly dependent on the organism or tissue, the type of scRNA-seq technology used, or the protocol-specific decisions made as part of the disassociation, library preparation, and sequencing steps (Osorio D, Cai JJ. Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control. Bioinformatics. 2020).

Something wrong in "multiSpecificity" function

I appreciate this nice R package, but I ran into a problem when following the “Cluster Optimization” vignette. The issue is that “plt.auc.dot” is NULL. I checked the multiSpecificity code and noticed that the whole R package does not have a multiDEG function. Is there something wrong with this part? I hope you can help me with this.

NAs produced by integer overflow

Hi,

While running findCDImarkers function I had this error:

2024-03-26 13:26:58.175816: Computing co-dependency indices...
  |++++++++++++++++++++++++++++++++++                | 67% ~04s          Error in binom.test(x = imat[i, j], n = ncol(emat), p = fmat[i, j], alternative = "greater") : 
  'p' must be a single number between 0 and 1
In addition: There were 50 or more warnings (use warnings() to see the first 50)

when I check the warnings I get :

length(x) * length(y) : NAs produced by integer overflow

and I think it is due to this step here:

pmat <- mysapply(which.cells2, function(x) sapply(which.cells, 
    function(y) length(x) * length(y)))

and I fixed the error by taking integers with as.integer64() from the bit64 package to handle larger integer values.

pmat <- mysapply(which.cells2, function(x) sapply(which.cells, 
    function(y) as.integer64(length(x)) * as.integer64(length(y))))

However this inflated my FDR values. Could you help in this?

Regarding the multiSpecificity function

Hi Team.
Nice package. I followed the code from the tutorial changing only the seurat object names and the assay names (SCT or integrated). I found that the tool always seems to output 0.5 as the optimal res from the CDI specificity graph and 1.0 res as optimal from the multi-level resolution specificity score graph even when different datasets are used. Also, the R Session crashes when I try to run the multiSilhouette() function. Is there a way to refine the analysis to output the optimal res parameter? Thank you.

problem with runSSN

When I run:

so.gene <- runSSN(object = obj, features = unique(c(features_hvg, features_dev)), scale_free = T,
robust_pca = F, data_type = "pearson", reprocess_sct = T, slot = c("scale"), batch_feature = NULL,
pca_var_explained = 0.9, optimize_resolution = T, target_purity = 0.8, step_size = 0.05, n_workers = parallel::detectCores(),
verbose = F)

I get:

Error in self$geom$rename_size && "size" %in% names(plot$mapping) :
invalid 'x' type in 'x && y'

If you could help, I would really appreciate it, thanks
Chris

Silhouette scores calculated using UMAP

Hi,

I recently found your package which will help make my scripts shorter, but I realized that the silhouette values obtained by you were different from the ones I got. Checking your function I saw that you only use the 2 UMAP dimensions to calculate the distance.

 df.umap <- getUMAP(object)[["df.umap"]]
 umap.dist <- dist(x = (df.umap[, c("x", "y")]), method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
...
  sil <- cluster::silhouette(x = clust.mem, dist = umap.dist)

In codes and functions I have seen in the past they use the PCAs, instead of UMAP. Here two examples:
https://rdrr.io/github/jr-leary7/YehLabClust/src/R/ComputeSilhouetteScores.R
https://bioinformatics-core-shared-training.github.io/UnivCambridge_ScRnaSeq_Nov2021/Markdowns/08_ClusteringPostDsi.html#1212_Separatedness

In my own code, I use the harmony dimensions, as I'm integrating datasets

dimensions <- 1:15
pc.dist <- dist(x = Embeddings(object = seu[["harmony"]])[, dimensions])
sil <- silhouette(x = as.numeric([email protected]$seurat_clusters), dist = pc.dist)

For a given resolution, for example, with my code I get score of 0.4 and with scMiko 0.63. So I was wondering which one is the most appropriate to do.
Thank you

multiSpecificity() Not compatible with seurat v5

Here the error

multiSpecificity(object = seurat_object_v5)

Error in getExpressionMatrix(object, which.data = "data") :
no slot of name "data" for this object of class "Assay5"

I think an update in the package to make it compatible with Seurat v5 would be great