xuranw / music Goto Github PK
View Code? Open in Web Editor NEWMulti-subject Single Cell Deconvolution
Home Page: https://github.com/xuranw/MuSiC
License: GNU General Public License v3.0
Multi-subject Single Cell Deconvolution
Home Page: https://github.com/xuranw/MuSiC
License: GNU General Public License v3.0
Hi,
I wonder what sampleID is in your analysis. Does it refer to the batches? In my sc data, I only have one batch (replica). So, I set the sampleID to 1 for all cells. However, I got the following error:
Error in colMeans(S, na.rm = TRUE) : 'x' must be an array of at least two dimensions
If I set sampleID=cell_ID, then the deconvolution works but I am not sure if this is right.
Thanks for your help
Hi xuran,
Thank you for your excellent work. I was wondering if MuSiC could be used for microarray data or not? And if the results of MuSiC could be compared across samples(inter-sample) and across cell types(intra-sample) or not.
Best
Yi Han
Hi Xuran,
I tried to apply MuSiC to RNAseq bulk data with RPKM as the input. According to your paper(Discussion), "MuSiC can utilize RPKM if estimates of cell type-specific total RNA abundance can be provided." I am wondering how I can incorporate cell-type-specific total RNA abundance into your function? Or can I directly use RPKM as the input and use music_prop to do the deconvolution? Does my reference single-cell RNAseq data also requires RPKM as the input?
Thanks!
Best,
Ming
This work is not published yet ... bioRxiv.
It is now in Nature Communications!
Dear xuranw,
I am a new guy in bio-software, I want to make some modification on the source code of your MuSiC about calculation of cell infiltration to get simple function for website. I will cite your article and show your License. Are these ok ? or if I want to modify your code, what do I need ?
I would be greatly appreciated if you could spend some of your time teach me. Thank you.
My email: [email protected]
Best regards.
Hi there,
I'm very interested in using this exciting deconvolution tool for my bulk RNA-seq data, but keep running into the error message from music_prop of too few common genes from three different single cell datasets.
I tried pre-processing each dataset by including only genes found in both datasets using the merge function in R, yet I still receive this message. I don't understand how this can be the case if their gene lists are exactly the same.
Thanks so much in advance.
Hey,
In your nature article about the Music Package, its stated that there is turorial on how to use this package. I cant seem to find it anywhere, could add a link? Much appreciated.
Hi,
Thanks for providing this great package,
I just a small issue in the NAMESPACE file of the package. The music_prop
function is not exported, which raises the following error message: object 'music_prop' not found
.
A workaround for the user it to write MuSiC:::music_prop
instead, but might not be evident for some occasional R users.
If my bulk-RNA-seq data obtain from different time, does this matter?
Hi. I have encountered this problem in my code. what is going on?
tmp <- music_prop(bulk.eset = rd, sc.eset = ad, clusters = 'cellType',
+ samples = 'sampleID', verbose = F)
Error in music_prop(bulk.eset = rd, sc.eset = ad, clusters = "cellType", :
Too few common genes!
Hi,
when I executive command Est.prop.Xin <- music_prop(bulk.eset = XinT2D.construct.full$Bulk.counts, sc.eset = EMTAB.eset, clusters = 'cellType', samples = 'sampleID', select.ct = c('alpha', 'beta', 'delta', 'gamma'))
a error happened : Error in pVar(x, clusters) : could not find function "pVar"
Is there a lack of dependency ? Can u tell me how to fix the error ?
I am running the tutorial for MuSiC for Benchmark evaluation. I followed all the steps without problems, and averything looks like is running properly, but the final step which is plotting the figures stored in "abs.diff.fig" and "prop.comp.fig" generate an error with plot_grid.
Here is everything R says about the error:
plot_grid(prop.comp.fig, abs.diff.fig, labels = "auto", rel_widths = c(4,3))
Error: Aesthetics must be either length 1 or the same as the data (9): label
Runrlang::last_error()
to see where the error occurred.
rlang::last_error()
<error/rlang_error>
Aesthetics must be either length 1 or the same as the data (9): label
Backtrace:
rlang::last_trace()
to see the full context.rlang::last_trace()
<error/rlang_error>
Aesthetics must be either length 1 or the same as the data (9): label
Backtrace:
█
└─base::lapply(...)
└─cowplot:::FUN(X[[i]], ...)
├─cowplot::as_gtable(x)
└─cowplot:::as_gtable.default(x)
├─cowplot::as_grob(plot)
└─cowplot:::as_grob.ggplot(plot)
└─ggplot2::ggplotGrob(plot)
├─ggplot2::ggplot_gtable(ggplot_build(x))
├─ggplot2::ggplot_build(x)
└─ggplot2:::ggplot_build.ggplot(x)
└─ggplot2:::by_layer(function(l, d) l$compute_geom_2(d))
└─ggplot2:::f(l = layers[[i]], d = data[[i]])
└─l$compute_geom_2(d)
└─ggplot2:::f(..., self = self)
└─self$geom$use_defaults(data, self$aes_params, modifiers)
└─ggplot2:::f(..., self = self)
└─ggplot2:::check_aesthetics(params[aes_params], nrow(data))
Thank you very much for your help!
David
I tried to run the vignette, but ran into a few issues.
The RDS files are linked in the live vignette (on github.io) to files on the github.io site; this works but requires a tiny workaround rather than simply the bare URL:
readRDSFromWeb <- function(ref) {
readRDS(gzcon(url(ref)))
}
You can do a similar trick with RData files.
Building the vignette with devtools::build_vignettes()
also did not work. If you add %\VignetteEngine{knitr::rmarkdown}
to the header of the vignette, it will work fine. I also found that the R chunks were not run, merely displayed verbatim. Using ```{r}
rather than ``` r
fixed this for me.
An unusual issue relates to the inclusion of CIBERSORT in the bseqsc package. My suggestion would be to compute the bseqsc results and save them inside the package as data, since CIBERSORT has an (in my view) ridiculously restrictive policy. Then, you could attempt to run the bseqsc code in the vignette, and fall back to using pre-computed values if this fails. Currently I will have to wait 3-5 days to (hopefully) gain access to CIBERSORT.
There was also a minor typo in the vignette.
I have fixed all of these in my PR with the exception of the CIBERSORT issue.
Hi Xuran,
Thank you very much for the MuSiC package you have provided, I have been working on it and have another question regarding finding the most influential genes in determining cell type proportions. Can we use the Weight.gene matrix and sum across rows (transcripts) to get the transcripts with the highest aggregate weight, and consider them to be the most influential transcripts in determining cell type proportions? We would like to compare the MuSiC method with another method that other members of our group has used (a version of NNLS) and compare which transcripts were selected as being those that were differentially expressed in different immune cell types.
Any help is appreciated.
Thanks, T.J.
Can you tag a release? That makes it a bit easier to keep the bioconda version of this updated.
Thanks for making a great package!
I realized I had to library(xbioc)
in order to run music_basis()
. It would be helpful to add this to the vignette.
The vignette loads IEmarkers.RData
but this does not appear to be available from https://github.com/xuranw/MuSiC/tree/master/vignettes/data. Would it be possible to add it?
Hi,
I would like to know what should be the pre-processing steps to apply on bulk and scRNA data before applying MuSiC. I could not find anywhere in the documentation or in github how the preprocessing is done. It's a bit strange.
For scRNA, how many cells do you need to perform better ? I have a data from pooled scRNA data i.e I pooled all the cells of similar type 1 and quantified the gene expression, so I have 1 sample per cell type. Does it work with it ?
Hi,
I have a question on cell type proportion estimates. I have a single cell RNA-seq reference data like the below toy example,
Genes CD4 Mono Ery
ACE 49 1 0
ALG9 401 74 234
ANKRD18A 332 69 0
AQP1 14 0 8342
CELF6 40 17 0
CFB 206 100 14
When I do music_prop(bulk.eset = bulk.est, sc.eset = scRNA.est, clusters = 'cellType', samples = 'sampleID', select.ct = c('CD4', 'Mono','Ery'), verbose = F). It throws out an error as
Error in music_prop(bulk.eset = bulk.est, sc.eset = scRNA.est, clusters = "cellType", :
Not enough valid cell type!
In order to help you understand, I attach the info of scRNA.est below,
I am wondering that whether MuSiC needs at least two replicates for inferring the cell type proportion.
Thanks a lot!
Elaine
Hi Xuran,
In the tutorial section "Estimation of cell type proportions with pre-grouping of cell types", I wonder how you selected genes for group.marker. It did mention 'intra-cluster differentially expressed genes', but I wonder if you have recommended procedure to identify these DEGs.
Thanks,
Yuping
Hi Xuran
I was unable to load the package, with this error after trying to install from github
devtools::install_github('xuranw/MuSiC')
Error in read.dcf(path) :
Found continuation line starting ' plyr, ...' at begin of record.
Dear Xuran,
I first tried to install the package in the suggested way without success:
devtools::install_github('xuranw/MuSiC')
Error in read.dcf(path) :
Found continuation line starting ' plyr, ...' at begin of record.
Next I have managed to manually download the package and install it:
install.packages("C:/Users/nivs/Downloads/MuSiC-master.zip", repos = NULL, type = "win.binary")
following the suggestion from other issues, I have tried to also update the dependencies:
setwd("C:/Rpackages/MuSiC-master/")
devtools::check()
Updating MuSiC documentation
Writing NAMESPACE
Loading MuSiC
Loading required package: nnls
Loading required package: ggplot2
Writing NAMESPACE
-- Building ----------------------------------------------------------- MuSiC --
Setting env vars:
√ checking for file 'C:\Rpackages\MuSiC-master/DESCRIPTION' (553ms)
it seems that there are some issues with the vignette
finally, I was able to load the package, but ran into another problem:
library(MuSiC)
Download EMTAB single cell dataset from Github
Mousesub.eset = readRDS("C:/Rpackages/MuSiC-master/vignettes/data/Mousesubeset.rds")
Mousesub.basis = music_basis(Mousesub.eset, clusters = 'cellType', samples = 'sampleID', select.ct = c("Endo", "Podo", "PT", "LOH", "DCT", "CD-PC", "CD-IC", "Fib", "Macro", "Neutro","B lymph", "T lymph", "NK"))
Error in sampleNames(x) : could not find function "sampleNames"
Any idea?
Thanks,
Niv
Hi,
I can't install the package. Error:
devtools::install_github('xuranw/MuSiC')
Downloading GitHub repo xuranw/MuSiC@master
Skipping 1 packages not available: Biobase
Downloading GitHub repo renozao/xbioc@master
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?
1: All
2: CRAN packages only
3: None
4: pkgmaker (0.31.1 -> ac95c24f3...) [GitHub]
Enter one or more numbers, or an empty line to skip updates:
1
pkgmaker (0.31.1 -> ac95c24f3...) [GitHub]
Downloading GitHub repo renozao/pkgmaker@develop
✓ checking for file ‘/private/var/folders/3t/lc9m5zv966934dq4d6gf6l60g_dj6h/T/Rtmp1OVDRI/remotes85b01129e8d2/renozao-pkgmaker-ac95c24/DESCRIPTION’ ...
─ preparing ‘pkgmaker’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
Removed empty directory ‘pkgmaker/vignettes’
─ building ‘pkgmaker_0.31.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'
Warning: invalid gid value replaced by that for user 'nobody'
* installing *source* package ‘pkgmaker’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Setting package specific options: package:pkgmaker:logger (1 default option(s))
Creating meta registry in package 'pkgmaker' ... OK
Creating registry 'extra_handler' in package 'pkgmaker' ... OK
Creating registry 'extra_action' in package 'pkgmaker' ... OK
Registering extra handler 'install.packages' [function] ... OK
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (pkgmaker)
✓ checking for file ‘/private/var/folders/3t/lc9m5zv966934dq4d6gf6l60g_dj6h/T/Rtmp1OVDRI/remotes85b05a137826/renozao-xbioc-b4f512c/DESCRIPTION’ ...
─ preparing ‘xbioc’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘xbioc_0.1.18.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'
Warning: invalid gid value replaced by that for user 'nobody'
* installing *source* package ‘xbioc’ ...
** using staged installation
** R
** data
** inst
** byte-compile and prepare package for lazy loading
Error: (converted from warning) package ‘S4Vectors’ was built under R version 3.6.3
Execution halted
ERROR: lazy loading failed for package ‘xbioc’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/xbioc’
Error: Failed to install 'MuSiC' from GitHub:
Failed to install 'xbioc' from GitHub:
(converted from warning) installation of package ‘/var/folders/3t/lc9m5zv966934dq4d6gf6l60g_dj6h/T//Rtmp1OVDRI/file85b046b54149/xbioc_0.1.18.tar.gz’ had non-zero exit status
Hi Xuran,
Thank you for writing this package! I was trying to run music_prop.cluster on some single cell data, as per the tutorial and using the bulk.construct function as well. I've run the same data on music_prop and get a proper result. Unfortunately, when I go to run it after identifying clusters and getting a list of the variably expressed genes for each group, I end up with the error message: Error in nnls(D.weight, Y.weight): NA/NaN/Inf in foreign function call (arg1):
Would you have any suggestions as to how to get around this?
Thank you!
Orion
I read through the paper and went through the tutorial and I am having trouble figuring out how to best use the outputs. From the outputs is it possible to get cell type specific counts for each gene for the bulk rna-seq samples? Then take this information to do a differential expression analysis using edgeR and then throw it into GSEA?
Hello,
sadly loading the Rdata file does result in two character-lists and not one RObject.
Best
Thank you for writing this great package. One small issue: I tried running the example in the tutorial, but with ct.cov = TRUE
. I get the error object 'i' not found
. I believe the error comes from line 163 in utils.R
, where the for loop defining i
is missing. See line 201 in the same file for comparison.
Hi, Xuranw
thanks for your great package.
I just have a question in your paper. Since you have compared CIBERSORT with MuSiC in your paper, what is the signature for the CIBERSORT you used?
Looking forward to your response
Thanks
Fei
I'm having issues scaling the single cell to the bulk experiment with my own data. Using your simulation function I can simulate a bulk set from the single-cell data, and then use that to deconvolute that using other single-cell data sets. However, when I then try to use that to deconvolute real bulk data sets, the model performs poorly. I hypothesize this is due to scaling, but I am unsure.
Thanks
Hello Xuran! Thanks for your great package!
I've been able to run MuSiC cell type estimation analysis on my data of interes (brain). However, several cell types are very transcriptionally closely related to one another, yet with significative functional distinction. Because of this, I want to run music_prop.clusters on my data, in order to obtain more reliable results.
However, as noted by #15 , how to select for differentially expressed genes among these groups using the output of music_basis is not explained neither in the vignettes nor in the paper itself. So how do one properly builds its own group.marker list from music_basis output?
From your experience as the package creator, what cutoff should be used to select genes from the design matrix, as an example?
Hi,
I am running the code which are available on Tutorial but somehow I am getting the following error, when I run music_prop.cluster
Est.mouse.bulk = music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset, group.markers = Immune.marker, group = 'clusterType', clusters = 'cellType', samples = 'sampleID', clusters.type = clusters.type)
Error in music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset, :
Cluster number is not matching!
Hi,
Installation failed with a following message;
Downloading GitHub repo xuranw/MuSiC@master Skipping 2 packages not available: Biobase, bseqsc Installing 2 packages: bseqsc, nnls Installing packages into ‘/Volumes/Documents/Users/akihoji/Library/R/3.x/library’ (as ‘lib’ is unspecified) Error: (converted from warning) package ‘bseqsc’ is not available (for R version 3.5.1)
A culprit of this issue is one of the dependency, bseqsc. I tried to install it manually by
install_github('hutuqiu/BSeQC')
but I get a following error;
Error: HTTP error 404.
Any workaround for this ?
I'm trying to the run the music_prop command and have been getting the following error:
Error in colMeans(S, na.rm = TRUE): 'x' must be an array of at least two dimensions
Traceback:
Here is the full write out of the command for reference:
Est.prop.nabec = music_prop(bulk.eset = nabecBulk, sc.eset = nabecSet, clusters = 'Celltype',
samples = 'sample', select.ct = c('Neuron', 'Oligodendrocyte', 'OPC',
'Astrocyte','Microglia'), verbose = F)
and the expression set of the single cell data:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 27009 features, 3000 samples
element names: exprs
protocolData: none
phenoData
sampleNames: AAACCCAAGACAACAT-1 AAACCCAAGATGAATC-1 ...
ATCAGGTGTTTAGAGA-1 (3000 total)
varLabels: Celltype cluster sample
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
and bulk data:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 59032 features, 311 samples
element names: exprs
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
Any idea what might be causing the problem?
Hello,
Firstly, thanks for this analysis tool. So far I find it pretty intuitive to use & helpful.
I stumbled across something I thought a bit odd, was hoping you might be able to help me.
I used a mixture of 4 to 5 sc-RNAseq datasets with cell-types 'T-cell', 'Fibroblast', 'Macrophage', 'Endothelial', 'CAF' & 'Epithelial' and the transcriptomes are pretty similar across the datasets.
I make the ExpressionSet object for my sc-RNAseq datasets and 2 bulk tissue RNA datasets, one of them I got using the TCGAbiolinks package on R (I mention is because its the odd one).
I use the following
Est.1 <- music_prop(bulk.eset = bt_data, sc.eset = sc_data, clusters = 'Cell-type',samples = "SampleID", verbose = T)
and everything has proportions as expected. (1st plot)
jitter_estproportions_wShih.pdf
But I try with the second bt_data set, and I lose all of my T-cells?
Est.2 <- music_prop(bulk.eset = bt_data_2nd, sc.eset = sc_data, clusters = 'Cell-type',samples = "SampleID", verbose = T)
I checked the bt_data_2nd matrix and there are definitely T-cell markers present. If I remove one of the datasets from the sc_data and rerun
Est.3 <-music_prop(bulk.est = bt_data_2nd, sc.est = sc_data.minus1, cluster = 'Cell-type', samples = "SampleID", verbose = T)
The NNLS seems to find T-cells, but not MuSiC.
jitter_NNLS_tcells.pdf
My sc-RNAseq datasets are usually processed as Seurat objects, so I pulled T-cell markers across all sc-RNAseq datasets and they're definitely in the bt_data_2nd (TCGA bulk RNAseq dataframe). So I don't understand why I am getting flat zeroes for T-cells.
bt_data (the one that had all cell-types afte deconvolution)
ExpressionSet (storageMode: lockedEnvironment)
assayData: 13104 features, 548 samples
element names: exprs
protocolData: none
phenoData
sampleNames: TCGA.20.0987 TCGA.23.1031 ... TCGA.13.1819 (548 total)
varLabels: EPCAM PTPRC ... VWF (7 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
bt_data_2nd (the one that has no T-cells, apparently)
ExpressionSet (storageMode: lockedEnvironment)
assayData: 56537 features, 229 samples
element names: exprs
protocolData: none
phenoData
sampleNames: TCGA-04-1331-01A-01R-1569-13 TCGA-04-1332-01A-01R-1564-13 ...
TCGA-WR-A838-01A-12R-A406-31 (229 total)
varLabels: Sample.ID Definition ... sampleNames (5 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
my sc-RNAseq datasets
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22390 features, 38789 samples
element names: exprs
protocolData: none
phenoData
sampleNames: E27_Peri_AAACCCAAGACGCCAA E27_Peri_AAACCCAAGAGTCAGC ...
Shih_ctcaatgtcggcaccttc (38789 total)
varLabels: Cell-type SampleID
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
Just to show that my second bulk-RNAseq dataset does indeed include T-cell markers.
I intersected the marker genes from all cell-types using seurat across all my sc-RNAseq datasets. leaving me with vectors containing marker genes for each cell-type that overlap across all the sc-RNAseq datasets.
a quick glance shows these genes are present and have expression values. (T-cells markers)
TCGA-61-1724-01A-01R-1568-13 TCGA-61-1736-01B-01R-1568-13
IL32 3266 14180
PTPRC 465 1027
NKG7 141 365
HCST 583 464
TCGA-61-1738-01A-01R-1567-13 TCGA-61-1741-01A-02R-1567-13
IL32 2443 18459
PTPRC 755 1139
NKG7 180 317
HCST 784 248
TCGA-61-1918-01A-01R-1568-13 TCGA-61-1919-01A-01R-1568-13
IL32 3651 14568
PTPRC 1164 4614
NKG7 76 2053
HCST 187 294
TCGA-61-2101-01A-01R-1568-13 TCGA-61-2102-01A-01R-1568-13
IL32 4881 6006
PTPRC 2574 1441
NKG7 870 362
HCST 482 291
TCGA-61-2109-01A-01R-1568-13 TCGA-61-2110-01A-01R-1568-13
IL32 5623 2866
PTPRC 1422 725
NKG7 580 717
HCST 354 769
TCGA-61-2113-01A-01R-1568-13 TCGA-VG-A8LO-01A-11R-A406-31
IL32 572 2747
PTPRC 147 320
NKG7 176 497
HCST 124 489
Much the same for the marker genes of the other cell types.
I'd appreciate any help or suggestions as to why I might be getting these results.
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.5 (Nitrogen)
Matrix products: default
BLAS: /gpfs/igmmfs01/software/pkg/el7/apps/R/3.6.0/lib64/R/lib/libRblas.so
LAPACK: /gpfs/igmmfs01/software/pkg/el7/apps/R/3.6.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] reshape2_1.4.3 xbioc_0.1.17 AnnotationDbi_1.46.1 IRanges_2.18.2
[5] S4Vectors_0.22.1 Seurat_3.1.0 MuSiC_0.1.1 ggplot2_3.2.1
[9] nnls_1.4 Biobase_2.44.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_1.4-1 ggridges_0.5.1 rstudioapi_0.10
[5] leiden_0.3.1 listenv_0.7.0 npsurv_0.4-0 MatrixModels_0.4-1
[9] bit64_0.9-7 ggrepel_0.8.1 codetools_0.2-16 splines_3.6.0
[13] R.methodsS3_1.7.1 lsei_1.2-0 zeallot_0.1.0 jsonlite_1.6
[17] mcmc_0.9-6 ica_1.0-2 cluster_2.1.0 png_0.1-7
[21] R.oo_1.22.0 uwot_0.1.3 sctransform_0.2.0 BiocManager_1.30.4
[25] compiler_3.6.0 httr_1.4.1 backports_1.1.4 assertthat_0.2.1
[29] Matrix_1.2-17 lazyeval_0.2.2 htmltools_0.3.6 quantreg_5.51
[33] tools_3.6.0 rsvd_1.0.2 igraph_1.2.4.1 coda_0.19-3
[37] gtable_0.3.0 glue_1.3.1 RANN_2.6.1 dplyr_0.8.3
[41] Rcpp_1.0.2 vctrs_0.2.0 gdata_2.18.0 ape_5.3
[45] nlme_3.1-140 gbRd_0.4-11 lmtest_0.9-37 stringr_1.4.0
[49] globals_0.12.4 lifecycle_0.1.0 irlba_2.3.3 gtools_3.8.1
[53] future_1.14.0 MASS_7.3-51.4 zoo_1.8-6 scales_1.0.0
[57] SparseM_1.77 RColorBrewer_1.1-2 yaml_2.2.0 memoise_1.1.0
[61] reticulate_1.13 pbapply_1.4-1 gridExtra_2.3 pkgmaker_0.28
[65] stringi_1.4.3 RSQLite_2.1.2 caTools_1.17.1.2 bibtex_0.4.2
[69] Rdpack_0.11-0 SDMTools_1.1-221.1 rlang_0.4.0 pkgconfig_2.0.2
[73] bitops_1.0-6 lattice_0.20-38 ROCR_1.0-7 purrr_0.3.2
[77] labeling_0.3 htmlwidgets_1.3 bit_1.1-14 cowplot_1.0.0
[81] tidyselect_0.2.5 RcppAnnoy_0.0.12 plyr_1.8.4 magrittr_1.5
[85] R6_2.4.0 gplots_3.0.1.1 DBI_1.0.0 pillar_1.4.2
[89] withr_2.1.2 fitdistrplus_1.0-14 survival_2.44-1.1 tibble_2.1.3
[93] future.apply_1.3.0 tsne_0.1-3 crayon_1.3.4 KernSmooth_2.23-15
[97] plotly_4.9.0 grid_3.6.0 data.table_1.12.2 blob_1.2.0
[101] metap_1.1 digest_0.6.20 xtable_1.8-4 tidyr_1.0.0
[105] MCMCpack_1.4-4 R.utils_2.9.0 RcppParallel_4.4.3 munsell_0.5.0
[109] registry_0.5-1 viridisLite_0.3.0
Hi Xuran,
Thank you for writing this package! I was trying to run the Tutorial of MuSiC, when I run the music_prop.cluster function, I get the following error:
Est.mouse.bulk = music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset,group.markers = IEmarkers, clusters =
'cellType',group = 'clusterType', samples = 'sampleID',clusters.type = clusters.type)
Error in if (sum(abs(p.weight.new - p.weight)) < eps) { :
missing value where TRUE/FALSE needed
would you give me some suggestions?
Thank you!
Hello.
I am trying to use MuSIC and want to plot the heatmap of informative and non-informative genes. Can you guide me on how to extract that matrix?
Thank you,
Kalpit
Hi,
I find this method really cool and promising but I am having issues trying to implement it to my data.
Can you provide a vignette (or section of one) describing how to go from expression matrix to the necessary input file for MuSiC? Or perhaps you can construct the single cell reference files from the Tabula muris or MCA datasets?
dear Dr.xuranw,
I have studied your paper recently and I think It's a great tool to estimate cell proportions.But some questions puzzled me,I wonder if I can ask you?
1.When starts from scRNA-seq data from multiple subjects,the immune cells(eg.B cell,CD4+ cells etc.)from peripheral blood mononuclear cells are the same to tumour-infiltrating immune cells?
2.As you show in the overview of MuSiC framework,calculates both cross-subject mean and cross-subject variance for these genes in each cell type,why select the informative gene only use cross-subject variance ?Can only use these low cross-subject variance genes identifies different cell types?
Best wishes,
huitingxiao
Hi Xu Ran
Many thanks for developing MuSiC! I really like the idea of giving each gene a weight instead of establishing an overall cutoff w signature matrix.
I am currently quite new to learning deconvolution techniques and R as well and I hope you can help my understanding on it.
1-for the QC of scRNAseq data/ i was wondering how does MuSiC filter off the doublets/multiplets/dying cells and outliers that might give falsely high gene expression.
2-I read over your code the part where it says..
> GSE50244.EMTAB.prop
I couldnt find the assignment of this variable. I'm guessing the GSE50244 set was concatenated with
EMTAB set but it is not clear to me what are the intermediate steps to it.
Thank you
When installing the package, I see notes about incorrect usage of break
:
Note: break used in wrong context: no loop is visible
This occurs in analysis.R
and in utils.R
. The warning is descriptive in this instance; break
is used seemingly in place of stop()
here and in unreachable code here (since nothing after the return
call will ever be run.
Hello,
I am running the tutorial of MuSiC to check that everything runs properly before moving to my own data, but while running the function:
m.prop.GSE50244 = rbind(melt(GSE50244.EMTAB.prop$Est.prop.weighted),
melt(GSE50244.EMTAB.prop$Est.prop.allgene), melt(Est.prop.bseq),
melt(data.matrix(Est.prop.cibersort)))
R print the error:
Error in melt(GSE50244.EMTAB.prop$Est.prop.weighted) :
object 'GSE50244.EMTAB.prop' not found
I have try to figure it out on my own, for example I found that the link in:
"load(gzcon(url('https://xuranw.github.io/MuSiC/data/GSE50244CIBERSORT.RData')))"
It is not right (Error 404) and changed it for the one I think could be the right one: 'https://github.com/xuranw/MuSiC/tree/master/vignettes/data/GSE50244CIBERSORT.RData'
I have been looking around trying to find the object 'GSE50244.EMTAB.prop' but I haven't been able.
Thanks for you help,
David
It is worth pointing out in the README and vignettes that one must install the xbioc
package to be able to install MuSiC
, since this package is not on Bioc nor on CRAN.
Dear xuranw:
Thanks for your great package! I'm trying to learn the Tutorial of MuSiC . when I run Prop_comp_multi and Abs_diff_multi commands, I was getting the following error:
Prop_comp_multi(prop.real = data.matrix(XinT2D.construct.full$prop.real),
prop.est = list(data.matrix(Est.prop.Xin$Est.prop.weighted),
data.matrix(Est.prop.Xin$Est.prop.allgene)),
method.name = c('MuSiC', 'NNLS'),
title = 'Heatmap of Real and Est. Prop' )
Error: Aesthetics must be either length 1 or the same as the data (9): label
Abs_diff_multi(prop.real = data.matrix(XinT2D.construct.full$prop.real),
prop.est = list(data.matrix(Est.prop.Xin$Est.prop.weighted),
data.matrix(Est.prop.Xin$Est.prop.allgene)),
method.name = c('MuSiC', 'NNLS'),
title = 'Abs.Diff between Real and Est. Prop' )
Error: Aesthetics must be either length 1 or the same as the data (4): label
Could you give me some advise?
thanks!
Hello Xuran,
Thank you very much for creating this package, we have found it helpful and are still trying to figure out some issues. I have been trying to follow the recursive algorithm explanation:
[https://xuranw.github.io/MuSiC/articles/MuSiC.html#estimation-of-cell-type-proportions-with-pre-grouping-of-cell-types]
I have been having trouble with the IEmarkers object, when I downloaded the IEmarkers.RData object, I found an "Immune.marker" and "Epith.marker" object. I assumed I needed to turn them into a list, which I did with IEmarkers = list(C3 = Epith.marker, C4 = Immune.marker)
However I get an error:
Error in music_prop.cluster(bulk.eset = mouse.bulk, sc.eset = mouse.sc, : Cluster number is not matching!
when typing in the command you provided. I also see that in the tutorial, the command:
Est.mouse.bulk = music_prop.cluster(bulk.eset = Mouse.bulk.eset, sc.eset = Mousesub.eset, group.markers = IEmarkers, clusters = 'cellType', group = 'clusterType', samples = 'sampleID', clusters.type = clusters.type)
uses 'group', but the options from the Documentation have the name of the argument as 'groups', is that the one we should be using? Any help is appreciated.
Thanks, T.J.
Hi,
Thanks for the package. Is there any minimum number of cell required per cell type in the single-cell dataset used?
Best wishes
Nurun
First of all, thank you for your wonderful work on data analysis.
While I was skimming through your R codes, I found on function "music.iter.ct" on "analysis.R" file there was some part I found difficult to understand the flow.
Flowing is the part of music.iter.ct function I got puzzled :
common.gene = intersect(names(Y), rownames(D))
common.gene = intersect(common.gene, colnames(Sigma.ct))
if(length(common.gene)< 0.1*min(length(Y), nrow(D), ncol(Sigma.ct))){
stop('Not enough common genes!')
}
Y = Y[match(common.gene, names(Y))];
D = D[match(common.gene, rownames(D)), ]
Sigma.ct = Sigma.ct[, match(common.gene, colnames(Sigma))]
at the final line which updates Sigma.ct to include only common genes, variable Sigma was used, but I could not find any variable named Sigma inside music.iter.ct function else where. I was wondering if I missed something or is just a typo.
+) found similar issue on music_prop function
following are code with issues :
m.sc = match(cm.gene, rownames(sc.basis$Disgn.mtx)); m.bulk = match(cm.gene, bulk.gene)
D1 = sc.basis$Disgn.mtx[m.sc, ]; M.S = colMeans(sc.basis$S, na.rm = T);
Yjg = relative.ab(exprs(bulk.eset)[m.bulk, ]); N.bulk = ncol(bulk.eset);
if(ct.cov){
Sigma.ct = sc.basis$Sigma.ct[, m.sc];if(sum(Yjg[, i] == 0) > 0){ D1.temp = D1[Yjg[, i]!=0, ]; Yjg.temp = Yjg[Yjg[, i]!=0, i]; Sigma.ct.temp = Sigma.ct[, Yjg[,i]!=0]; if(verbose) message(paste(colnames(Yjg)[i], 'has common genes', sum(Yjg[, i] != 0), '...') ) }else{ D1.temp = D1; Yjg.temp = Yjg[, i]; Sigma.ct.temp = Sigma.ct; if(verbose) message(paste(colnames(Yjg)[i], 'has common genes', sum(Yjg[, i] != 0), '...')) }
unfortunately, there are no predefined variable "i" in scope.
However, "parallel" code for else clause(case for ct.cov=FALSE) contains for-loop using "i" variable. I believe there should be some missing for-loop declaration. Thank you.
the code in function music_prop
:
if(ct.cov){
Sigma.ct = sc.basis$Sigma.ct[, m.sc];
if(sum(Yjg[, i] == 0) > 0){
D1.temp = D1[Yjg[, i]!=0, ];
Yjg.temp = Yjg[Yjg[, i]!=0, i];
Sigma.ct.temp = Sigma.ct[, Yjg[,i]!=0];
...
...
Is there miss for loop? like for(i in 1:N.bulk){}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.