Coder Social home page Coder Social logo

vivianstats / scimpute Goto Github PK

View Code? Open in Web Editor NEW
91.0 8.0 34.0 1.82 MB

Accurate and robust imputation of scRNA-seq data

Home Page: https://www.nature.com/articles/s41467-018-03405-7

R 78.75% Jupyter Notebook 6.71% HTML 14.53%
imputation single-cell-rna-seq r-package

scimpute's Introduction

scImpute: accurate and robust imputation of scRNA-seq data

Wei Vivian Li, Jingyi Jessica Li 2019-08-20

Latest News

2019/08/20:

  • Since the development of scImpute, new imputation methods have been proposed for scRNA-seq data. These methods have different model assumptions and diverse performances on different datasets. It contributes to both method development and bioinformatic applications to discuss and compare existing imputation methods. However, we realize several issues in existing evaluation and comparison of imputation methods and discuss these issue in our commentary, which is available at arxiv.

2018/08/15:

  • Version 0.0.9 is released!
  • More robust implementation of dimension reduction.
  • Faster calculation of cell similarity.

Introduction

scImpute is developed to accurately and robustly impute the dropout values in scRNA-seq data. scImpute can be applied to raw read count matrix before the users perform downstream analyses such as

  • dimension reduction of scRNA-seq data
  • normalization of scRNA-seq data
  • clustering of cell populations
  • differential gene expression analysis
  • time-series analysis of gene expression dynamics

The users can refer to our paper An accurate and robust imputation method scImpute for single-cell RNA-seq data for a detailed description of the modeling and applications.

Any suggestions on the package are welcome! For technical problems, please report to Issues. For suggestions and comments on the method, please contact Wei ([email protected]) or Dr. Jessica Li ([email protected]).

Installation

The package is not on CRAN yet. For installation please use the following codes in R

install.packages("devtools")
library(devtools)

install_github("Vivianstats/scImpute")

Quick start

scImpute can be easily incorporated into existing pipeline of scRNA-seq analysis. Its only input is the raw count matrix with rows representing genes and columns representing cells. It will output an imputed count matrix with the same dimension. In the simplest case, the imputation task can be done with one single function scimpute:

scimpute(# full path to raw count matrix
         count_path = system.file("extdata", "raw_count.csv", package = "scImpute"), 
         infile = "csv",           # format of input file
         outfile = "csv",          # format of output file
         out_dir = "./",           # full path to output directory
         labeled = FALSE,          # cell type labels not available
         drop_thre = 0.5,          # threshold set on dropout probability
         Kcluster = 2,             # 2 cell subpopulations
         ncores = 10)              # number of cores used in parallel computation

This function returns the column indices of outlier cells, and creates a new file scimpute_count.csv in out_dir to store the imputed count matrix. Please note that we recommend applying scImpute on the whole-genome count matrix. A filtering step on genes is acceptable but most genes should be present to ensure robust identification of dropouts.

For detailed usage, please refer to the package manual or vignette.

scimpute's People

Contributors

vivianstats avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scimpute's Issues

Question on a line of the codes.

droprate[mucheck & droprate > drop_thre] = 0 in function imputation_model8, imputation_model.R.
I just wonder what is the deeper meaning behind this sentence. Why do you have to check the value and set it to zero. Could you explain more?

Could not load the scImpute library in R after installation: Conflicts with other packages

@Vivianstats : Thanks for developing the scImpute package. It appears to be really useful for imputing the scRNA data. I was trying to load the package and run it in my own test data and came across this error where the package could not be loaded by R. It appears to be a dependency conflict with other packages.

Can you please help in troubleshooting this error, better would be if this can be taken care of in the future releases of the package:

Output from RStudio R console:

library(scImpute)

Loading required package: doParallel
Loading required package: foreach
Error in value[3L] :
Package ‘foreach’ version 1.4.3 cannot be unloaded:
Error in unloadNamespace(package) : namespace ‘foreach’ is imported by ‘doSNOW’, ‘caret’, ‘Seurat’ so cannot be unloaded

Error in seq.default(lsmin, (Re(log2(midmin)) - 0.5), stepm) : 'from' must be a finite number

Hi Vivian,

I am trying to impute dropouts from a csv of UMI values ( 7 genes and 12244 cells).
The codes are listed below.
scimpute ("E:/gdT.csv", Kcluster = 2, out_dir = "E:/Cal", ncores = 1, drop_thre = 0.1),
but when I am at "calculating cell distances ...", I got error.

[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 7"
[1] "number of cells in raw count matrix 12244"
[1] "reading finished!"
[1] "imputation starts ..."
[1] "searching candidate neighbors ... "
[1] "inferring cell similarities ..."
[1] "dimension reduction ..."
[1] "calculating cell distances ..."
Error in seq.default(lsmin, (Re(log2(midmin)) - 0.5), stepm) :
'from' must be a finite number
I checked our code and can not find from parameter, can you help me fix it ?

Thanks!

Yale

Input/Output Options

Dear Vivian,

thanks for making scImpute available. It works great for me. I just have a small question:
Could you maybe change the input and output options to R objects instead of only tsv/csv files? This would make it easier to include scImpute in my power simulation tool.

Kind regards,
Beate

Running Scimpute for Allele specific analysis

Hello a greate tool for single cell analysis, i wanted use tool for datanormalization allele specific analysis, i have two file alleleA and AlleleB count matrix file need suggestion that how i can use , scimpute run on both the count file or combine both the file run the scimpute

Thank you

resume scImpute?

hi, i am running scImpute (thank you!) on our cluster with:

[1] "number of genes in raw count matrix 11213"
[1] "number of cells in raw count matrix 14865"

and just got a timeout after 24 hours. is it possible to set scImpute to resume since i already have 11 out of 16 scImputepars*.rds files? would it be valid to subset the object and do only the last 5 clusters?

Part of the data not imputed

I randomly subsampled my counts data for imputing. The labels are known and there are four classes. When I tried 1000 or 1500 cells, it worked well. But when I tried 2000, there are two clusters of cells that are not imputed and it's three when I tried 3000. Is there anything wrong with my parameter setting?
scimpute(counts.dat,
infile = "txt",
outfile = "txt",
out_dir = outdir,
labeled = TRUE,
labels = labels,
drop_thre = 0.5,
ncores = 2)

No parallel version

Is it possible to provide a version that completely turns off the parallel, or use mclapply instead? The parallelization always throw the following error:

Error in serialize(data, node$con) : ignoring SIGPIPE signal
Calls: scimpute ... postNode -> sendData -> sendData.SOCKnode -> serialize
Execution halted

Question about input file

Hi, Vivian, thank you for your work on scImpute package!I have a question about input count file, the input raw count file in quick start is /path/to/package/extdata/raw_count.csv, to my understanding, counts in this file should be an integer, but I see a float, did I misunderstand ?

Error in parslist[valid_genes, , drop = FALSE] : subscript out of bounds

Hi,
Since support seems to have been dropped, it seems like a long shot, but any suggestion regarding this error would be much appreciated. Thanks anyone!

[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 36601"
[1] "number of cells in raw count matrix 43320"
[1] "reading finished!"
[1] "imputation starts ..."
[1] "searching candidate neighbors ... "
[1] "calculating cell distances ..."
starting worker pid=14864 on localhost:11838 at 00:22:19.511
starting worker pid=14858 on localhost:11838 at 00:22:19.513
starting worker pid=14861 on localhost:11838 at 00:22:19.514
starting worker pid=14856 on localhost:11838 at 00:22:19.525
starting worker pid=14865 on localhost:11838 at 00:22:19.526
starting worker pid=14863 on localhost:11838 at 00:22:19.553
starting worker pid=14862 on localhost:11838 at 00:22:19.563
starting worker pid=14855 on localhost:11838 at 00:22:19.564
starting worker pid=14853 on localhost:11838 at 00:22:19.565
starting worker pid=14852 on localhost:11838 at 00:22:19.565
starting worker pid=14866 on localhost:11838 at 00:22:19.576
starting worker pid=14857 on localhost:11838 at 00:22:19.579
starting worker pid=14854 on localhost:11838 at 00:22:19.591
starting worker pid=14860 on localhost:11838 at 00:22:19.599
starting worker pid=14859 on localhost:11838 at 00:22:19.601
[1] "estimating dropout probability for type 1 ..."
[1] 2000
[1] 4000
[1] 6000
[1] 8000
[1] 10000
[1] 12000
[1] 14000
[1] 16000
[1] 18000
[1] 20000
[1] 24000
[1] 26000
[1] 30000
[1] 32000
[1] 36000
[1] "searching for valid genes ..."
Error in parslist[valid_genes, , drop = FALSE] : subscript out of bounds

scimpute using only single thread

Dear Vivian,

I want to use scinpute to impute a 10X dataset with 25.000 cells and 20.000 genes. Unfortunately I noticed that scimpute is only using a single thread, even though ncores=8 is set. Here is how I am invoking scimpute:

scimpute(count_path="./data.csv", out_dir="./", ncores=8, Kcluster=15)

I am using Ubuntu 16.04 and R 3.4.2.
I also tested MAGIC, but I would rather use scimpute, for the reason you mentioned in the paper. However, when using MAGIC, all 8 threads are being used.
Did I miss something? As this dataset is very large, computation on a single thread takes very long.

Best

Input and Output

Dear author,

Thanks for developing the software! I have a few questions but did not find clear answer from your paper:

  1. is the input a count matrix?
  2. is the output a count matrix? Has it been normalized by library size, or log2-transformed?
  3. I tried to use a raw count matrix as an input, but find in output matrix there are some values as large as 400+ and three quantiles are between 0 and 1?

Thanks!

error of 'missing value where TRUE/FALSE needed'

Hi Vivian,
I am using scImpute on my several scRNA data sets. Thank you very much for providing such wonderful tools for us. Actually, some of my data sets worked really well but not all of them. Here is my script: scimpute(file.path(outDir,"tumor.tpm"),infile="csv",outfile="csv",out_dir=file.path(outDir,"malignant_"), labeled=TRUE,labels=as.vector(labels_tumor), type="TPM",genelen=genelen,drop_thre=0.5,ncores=num_cores)
And I got error like this when run one data set:
[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 21985"
[1] "number of cells in raw count matrix 317"
Error in if (min(raw_count) < 0) { :
missing value where TRUE/FALSE needed

Do you have any suggestion on this error?
Thanks in advance!

Error during cell similarity inference

Dear Vivian,
I am running on some error while running scImpute:

Error in rowSums(xi^2) : 'x' must be an array of at least two dimensions

I am running scImpute using this command:

scimpute(
    'data.scimputein.txt',
    infile='txt',
    outfile='txt',
    out_dir='aa_',
    drop_thre=0.5,
    Kcluster=1,
    ncores=2)

The full trace of scImpute is:

[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 13054"
[1] "number of cells in raw count matrix 3408"
[1] "inferring cell similarities ..."
Error in rowSums(xi^2) : 'x' must be an array of at least two dimensions

I send you the link to an example dataset: https://www.dropbox.com/s/51y1bd1z7a65qh5/data.7z?dl=0

Thanks,
Francesco

Overestimation with new version

Hi,

I recently tried your newest version (v0.0.6) on my data but I encountered some problem. My experiment consists in taking an actual dataset, masking a part of its value(artificially setting to 0) and see if scImpute successfully recovers the data.
However, I could not make it work and maybe I did something wrong (like some preprocessing steps). I used K=1 and default parameters.
Do you have an idea about this issue? Here is what I observe:

image

Thanks

Error in imputation

Hi Vivian,

I'm trying to run scImpute version 0.0.2, but it's giving me this error. I've included parts of my code and the error message below.

library(scImpute)
count_path <- "data/Y4_033017.csv"
infile <- "csv"
outfile <- "csv"
out_dir <- "data/"
drop_thre <- 0.5
ncores <- 6
scimpute(count_path, infile, outfile, out_dir, drop_thre, ncores)

[1] "reading in raw count matrix ..."
[1] "estimating mixture models ..."
[1] 500
[1] 1000
[1] 1500
[1] 2000
[1] 2500
[1] 3000
[1] 3500
[1] "imputing dropout values ..."
Error in type_list[[kk]] : subscript out of bounds

Thanks!
Mo

Typo about genelen in help

Dear developers,

Thanks for the great work and sharing. I noticed a typo in the ?scImpute that says:

genelen | An integer vector giving the length of each gene. Order must match the gene orders in the expression matrix. genelen must be specified if type = "count".

Whereas the vignette says the opposite:

We strongly suggest using scImpute on count matrices. However, if only TPM values are available, users can apply scImpute with gene lengths supplied. scImpute will use the gene lengths (sum of exon lengths) to scale the data , which ensures a good fitting of the mixture models. In this case, users need to specify type = "TPM" (type = "count" by default), and supply a vector genelen of gene lengths. The order of genes in genelen should match the order in the expression matrix.

In case you might want to fix that.

Best,
Nicolas

outfile is space delimited rather than tab delimited

Hello Vivian,

I found the output file by scImpute is space delimited rather than tab delimited when I specify outfile='txt'

My full scimpute call:

scimpute(count_path, infile = "txt", outfile = "txt", type = "count",
  out_dir, labeled = TRUE, drop_thre = 0.5, Kcluster = NULL,
  labels = labels$cell_type, genelen = genelen, ncores = 8)
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin17.6.0 (64-bit)
Running under: macOS High Sierra 10.13.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/openblas/0.3.0/lib/libopenblasp-r0.3.0.dev.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] scImpute_0.0.9    doParallel_1.0.11 iterators_1.0.10  foreach_1.4.4     penalized_0.9-51  survival_2.42-4  

loaded via a namespace (and not attached):
 [1] compiler_3.5.0   Matrix_1.2-14    tools_3.5.0      rsvd_0.9         Rcpp_0.12.18     codetools_0.2-15
 [7] splines_3.5.0    grid_3.5.0       kernlab_0.9-27   lattice_0.20-35 

Thanks!

Parameter Selection

With scImpute, how can we select parameters for drop_thre and Kclusters?
Should we always use the default?
Thanks!

Differential gene expression analysis following imputation

Dear Vivian,

Thanks very much for sharing this tool.......it's a very important contribution!

Regarding use of DESeq2 for differential gene expression analysis on the imputed data, did you round the imputed "counts" (as DESeq2 accepts integers only)? And did you use the package with standard parameters?

Finally, do you have any experience of using packages other than DESeq2/MAST on the output of scimpute?

Thanks very much

Ehsan

PBMC Dataset used in paper to produce Figure 6

Dear Vivian,
I'd like to use the same dataset that you used in scImpute paper. However, I'm unable to infer which one it is from 10X webpage.
The interested dataset is the one used to produce Figure 6:

The dataset contains 4,500 peripheral blood mononuclear cells (PBMCs) of nine immune cell types, with 500 cells of each type.

I am contacting you since the Fresh 68k PBMCs (Donor A) has 68k cells, the other ones have different size (not 4.5k cells), so I don't understand which one you used.

Edit
Is the dataset the following:
https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/pbmc4k
?

Thanks,
Francesco

doParallel performance on windows edition R

When I try to use scImpute, the ncores is useless. It will use all the CPU resources automatically.

As far as I know, Windows system has different parallel mechanism with Unix-based system like: Linux and OSX. Therefore the doParallel package design different usages for Windows and Unix. (Please see: https://stackoverflow.com/questions/45819337/option-cores-from-package-doparallel-useless-on-windows).

And the point is the R installed by conda is a Windows edition. So I compared the difference between system default R (3.5.2; Ubuntu 16.04) and conda installed R (3.5.1; Platform x86_64-conda_cos6-linux-gnu).

image

So can you check how to avoid this issue?

Error in mclapply

Hi!

I'm new to single-cell sequencing data and r programming. When I try to use scImpute to impute my data, I got following problem and not so sure how to carry on. I sincerely hope you could help me out. Thanks!

scImpute::scimpute(count_path = "dataNeed.csv", out_dir = "C:/360Downloads/R_workspace",Kcluster = 9)
[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 26178"
[1] "number of cells in raw count matrix 1529"
[1] "inferring cell similarities ..."
Error in mclapply(1:J, function(id1) { :
'mc.cores' > 1 is not supported on Windows

Error in pca$x[, 1:npc] : subscript out of bounds

Hi,
I got the following error when running the imputation on a dataset. Could you please advise what's the reason for this error? Thanks.

[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 180253"
[1] "number of cells in raw count matrix 60"
[1] "reading finished!"
[1] "imputation starts ..."
[1] "searching candidate neighbors ... "
[1] "calculating cell distances ..."
Error in pca$x[, 1:npc] : subscript out of bounds
Calls: scimpute ... imputation_wlabel_model8 -> find_neighbors -> lapply -> FUN -> t
Execution halted

Error in if (max(var_cum) <= var_thre) { : missing value where TRUE/FALSE needed

Hi, everyone.
I was running the scripts below, and it took a long time and finally sent an error, I want to know why? Thanks!

scimpute(file.path(outDir,"counts.csv"),
infile="csv",outfile="csv",
out_dir=file.path(outDir,"imputation_"),
labeled=TRUE,labels = as.vector(data$CellType),
drop_thre=0.5,ncores=1)
[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 33694"
[1] "number of cells in raw count matrix 87253"
[1] "reading finished!"
[1] "imputation starts ..."
[1] "searching candidate neighbors ... "
[1] "calculating cell distances ..."
Error in if (max(var_cum) <= var_thre) { :
missing value where TRUE/FALSE needed

Parallel Processing cannot find library

Greetings,

I am running this package on a computing cluster in parallel, and it keeps throwing an error like this one: "worker initialization failed: there is no package called 'penalized." Since I am running this on a cluster, I have my own path that contains all of my installed packages. How can I ensure that the path to my packages is passed on to all of the workers? Thanks!

Processing time

Hi Vivian,

I am trying to impute dropouts from a csv of UMI values (around 40000 genes and 6000 cells).

The codes are listed below.

scImpute::scimpute(count_path = "all_umi_raw.csv", infile = "csv", outfile = "csv", out_dir = "test_scimpute", labeled = F, drop_thre = .5, Kcluster = 5, ncores = 10)

It takes over 60 gigabytes of ram and is running slowly.
Is that normal? Can I make this faster?

Thanks!

Error in pca$x[, 1:npc] : subscript out of bounds

Hello,
I obtained the following error when running the imputation on my dataset. Could you please advise what's the reason for this error? Thanks.

[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 180253"
[1] "number of cells in raw count matrix 60"
[1] "reading finished!"
[1] "imputation starts ..."
[1] "searching candidate neighbors ... "
[1] "calculating cell distances ..."
Error in pca$x[, 1:npc] : subscript out of bounds
Calls: scimpute ... imputation_wlabel_model8 -> find_neighbors -> lapply -> FUN -> t
Execution halted

Workflow of imputation, doublet detection and data integration

If I want to incorporate doubelt detection in my scRNA-seq workflow, should I perform imputation before or after detecting and removing doublets? A second question is if I need to integrate multiple datasets, can I still perform imputation on individual dataset and use the imputed output as input for integration? Thanks!

About the 'genelen'

"'genelen' must be specified when type = 'TPM'!". Different version of genomic has different length? And the file in UCSC is about the whole length, not the cDNA.
Could you tell me where can I download the suitable genelen for human and mouse? @Vivianstats

Parameter setting in tSNE using PBMC datasets

Dear Vivian

I'm interested in scImpute and trying to reproduce the tSNE plot using the PBMC data
presented in Fig.6 in your paper.
I generated the PBMC datasets which consists of 10 cell types each of 500 cells,
which are randomly selected and plotted the tSNE results in Python.
Here is the results of tSNE.

tsne_pbmc_imputed_10k

In this experiment, I used the following two parameters in scImpute,
Kcluster=10 and drop_thre=0.5, and also used the following three parameters
in tSNE, perplexity = 20, n_iter = 5000, random_state = 0 .
However, when using tSNE, PBMC cells cannot be clearly classified after imputation.

Would you tell me how to set the parameters in scImpute and tSNE to reproduce
the PBMC plot ?

Thanks,
Natsu

'mc.cores' > 1 is not supported on Windows

When I used scImpute on windows, it threw out this error:
Error in mclapply(1:J, function(id1) { :
'mc.cores' > 1 is not supported on Windows
How can I solve this problem?

Imputation error

Hi Vivian,

Thank you so much for the scImpute package. I am relatively new to it and I am still learning.

When I ran with the code below with v0.0.9

scimpute("test.csv"), infile="csv", outfile="csv", out_dir="data/", labeled=TRUE, labels=as.vector(cluster_labels),
	type="TPM", genelen=genelength, drop_thre=0.5, ncores=5)

I got through the following with errors

[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 12371"
[1] "number of cells in raw count matrix 15230"
[1] "reading finished!"
[1] "imputation starts ..."
[1] "searching candidate neighbors ... "
[1] "calculating cell distances ..."
starting worker pid=32606 on localhost:11462 at 19:33:49.328
[1] "estimating dropout probability for type 1 ..."
[1] 2000
[1] 4000
[1] 6000
[1] 8000
[1] 10000
[1] 12000
[1] "searching for valid genes ..."
[1] "imputing dropout values for type 1 ..."
Loading required package: scImpute
Loading required package: parallel
Loading required package: penalized
Loading required package: survival
Welcome to penalized. For extended examples, see vignette("penalized").
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
loaded scImpute and set parent environment
Error in qr.coef(qr(a, LAPACK = TRUE), b) : 
    error code 5 from Lapack routine 'dtrtrs'
Error in qr.coef(qr(a, LAPACK = TRUE), b) : 
    error code 11 from Lapack routine 'dtrtrs'
Error in qr.coef(qr(a, LAPACK = TRUE), b) : 
    error code 5 from Lapack routine 'dtrtrs'
Error in qr.coef(qr(a, LAPACK = TRUE), b) : 
    error code 6 from Lapack routine 'dtrtrs'
....
[1] 100
Error in qr.coef(qr(a, LAPACK = TRUE), b) : 
    error code 8 from Lapack routine 'dtrtrs'
Error in qr.coef(qr(a, LAPACK = TRUE), b) : 
    error code 5 from Lapack routine 'dtrtrs'
Error in qr.coef(qr(a, LAPACK = TRUE), b) : 
    error code 9 from Lapack routine 'dtrtrs'

I have not been able to figure out what happened. Any help would be appreciated! Thank you.

No imputation with Kcluster = 1

Dear Vivian,
I am running scImpute on the 293T dataset, that should be the same used in the scImpute paper.
Using k = 1 (cells are clustered with the same cell type), scImpute does not impute any value.
You may see in the attached image that the percentiles are the same for the raw and imputed datasets.
Is this the correct behavior?

Thanks,
Francesco

scimpute_k1_perc

number of cells does not match number of labels

Getting an error about 'number of cells' not matching the number of cells despite length(labels) = number of cells

scimpute(count_path, infile, outfile, out_dir, labeled = TRUE, drop_thre, Kcluster, ncores)
[1] "reading in raw count matrix ..."
[1] "number of genes in raw count matrix 33694"
[1] "number of cells in raw count matrix 8055"
Error in scimpute(count_path, infile, outfile, out_dir, labeled = TRUE, :
number of cells does not match number of labels !
length(labels)
[1] 8055
table(labels)
labels
c1 c2 c3
1174 4909 1972

########################
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] scImpute_0.0.4 doParallel_1.0.11 iterators_1.0.9 foreach_1.4.4 penalized_0.9-50
[6] survival_2.41-3 kernlab_0.9-25

loaded via a namespace (and not attached):
[1] compiler_3.4.1 Matrix_1.2-11 tools_3.4.1 Rcpp_0.12.14 codetools_0.2-15
[6] splines_3.4.1 grid_3.4.1 lattice_0.20-35

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.