velocyto-team / velocyto.py Goto Github PK

View Code? Open in Web Editor NEW

158.0 13.0 82.0 10.06 MB

RNA velocity estimation in Python

Home Page: http://velocyto.org/velocyto.py/

License: BSD 2-Clause "Simplified" License

Python 100.00%

rna-velocity-estimation rna-seq single-cell

velocyto.py's Introduction

Velocyto

This repo contains the source code for the velocyto.py library.

For more information consult the velocyto.py documentation.

velocyto.py's People

Contributors

Stargazers

Watchers

Forkers

olgabot jobleonard sejiro b97jre alan19922015 amisios ahy1221 gyd1990 noahpieta fbnrst peiwenliu18 jacobkimmel adrianveres pberkes jamestwebber katjarm kwells4 ktravaglini suger0917 marius1311 cartal picrin kriemo tuqiang2014 jiajia7179 chris-mcginnis-ucsf bioinfonerd-forks milescsmith rapolaszilionis sachitsaksena cohenp05 jeltje mosquitocat kasimana37 douberz subercui bacemdatascience jpbernal888 antonio-miranda delocalizer swagatam123 mitraak lhqing rstatistics buysdb jizhongzhong winni2k shulp2211 poshine gennadygorin dongfang1021 roknabadi listen2099 biopzhang saidakhodaee collinj0 altairwei mcrewcow chen-guanming ejarmand prete dfgao bbyun28 dimitri-sid happypuppycoder jieshengm sagnik-nandy link82 collective-logic-lab gladelephant georgealehandro aishwaryashiv10 ryanz8 tucano jackng88 wrb2012 freddiewho omrf-ouhsc-training catsargent msoh0826 jovo-rna

velocyto.py's Issues

Problems with numpy1.13.3

In vlm.default_fit_preparation(), numpy1.13 gives the following error:
I'm fairly sure a workaround is to intsall numpy1.11

    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <built-in method query of sklearn.neighbors.kd_tree.KDTree object>
        args = (array([[ -2.5302447 ,  -3.73565713,   4.00488269...0.42542586,
          1.07867484,   2.27431581]]), 1281.0, True)
        kwargs = {}
    132 
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/home/seqteam/programs/miniconda3/lib/python3.6/site-packages/sklearn/neighbors/kd_tree.cpython-36m-x86_64-linux-gnu.so in sklearn.neighbors.kd_tree.BinaryTree.query()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

run error - valid_bcs2idx referenced before assignment

Command: velocyto run -o vel-run-test -m mm10_rmsk_joined.gtf test.bam mm10-1.2.0_gene_ivls.txt

File "/miniconda/lib/python3.6/site-packages/velocyto/commands/run.py", line 87, in run
additional_ca)
File "/miniconda/lib/python3.6/site-packages/velocyto/commands/_run.py", line 84, in _run
exincounter = vcy.ExInCounter(valid_bcs2idx)
UnboundLocalError: local variable 'valid_bcs2idx' referenced before assignment

pip install fails

The documentation states that pip install velocyto should be enough. This is incorrect. As per conda, all the deps will need to be installed first.

subprocess.CalledProcessError returned non-zero exit status 1.

I've installed velocyto.py from source in a cluster environment running on CentOS (6.9).

Im trying to use the cli run command like so:
velocyto run -b barcodes.tsv -o ~/output/ possorted_genome_bam.bam hg19.gtf

Yielding the following error:

2018-04-05 16:20:21,604 - DEBUG - No SAMPLEID specified, the sample will be called possorted_genome_bam_5N1Y2
2018-04-05 16:20:21,605 - DEBUG - Using logic: Default
2018-04-05 16:20:21,612 - DEBUG - Read 4913 cell barcodes from /target/gpfs2/groups/umcg-wijmenga/tmp02/projects/scRNAseq_10X_pilot/pilot4/pilot4_lane_2/outs/filtered_gene_bc_matrices/hg19/barcodes.tsv
2018-04-05 16:20:21,612 - DEBUG - Example of barcode: AAACCTGAGATGTGTA and cell_id: possorted_genome_bam_5N1Y2:AAACCTGAGATGTGTA-1
Traceback (most recent call last):
  File "./velocyto", line 11, in <module>
    load_entry_point('velocyto==0.13.8', 'console_scripts', 'velocyto')()
  File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/velocyto/commands/run.py", line 102, in run
    samtools_memory=samtools_memory, additional_ca=additional_ca)
  File "/home/umcg-hbrugge/.local/lib/python3.6/site-packages/velocyto/commands/_run.py", line 110, in _run
    mb_available = int(subprocess.check_output('grep MemAvailable /proc/meminfo'.split()).split()[1]) / 1000
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['grep', 'MemAvailable', '/proc/meminfo']' returned non-zero exit status 1.

Do you have any idea what might cause this issue?

Installed as conda environment

Hi,

Package install is very clean. Thanks! I installed as an conda environment because of the python 3.6 requirement. You may consider adding this to the docs for other users.

conda env create -f velocyto.yml

where velocyto.yml is:

name: velocyto_0.11
dependencies:
  - python=3.6   
  - numpy 
  - scipy
  - cython 
  - numba
  - matplotlib
  - scikit-learn
  - h5py
  - click
  - pip:
    - velocyto

To run velocyto after install, one needs to activate the environment (Linux version):

source activate velocyto_0.11

MemoryError when generating data table

Hi,

When running velocyto run on a bam file generated with the cellranger pipeline, I get the following error:

2017-11-30 04:28:37,293 - DEBUG - Save 3' junction/exon read counts
2017-11-30 04:28:44,852 - DEBUG - Collecting genes structural info statistics
2017-11-30 04:29:41,614 - DEBUG - Mapping statistics have been saved to D26/possorted_genome_bam_5ON9O_mapstats.hdf5
2017-11-30 04:29:41,616 - DEBUG - Generating output file D26/possorted_genome_bam_5ON9O.loom
2017-11-30 04:29:41,617 - DEBUG - Collecting row attributes
2017-11-30 04:29:41,732 - DEBUG - Generating data table
Traceback (most recent call last):
  File "/home/users/jmehtone/miniconda2/envs/velocyto/bin/velocyto", line 11, in <module>
    sys.exit(cli())
  File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/velocyto/commands/run.py", line 90, in run
    intron_validation=introns, additional_ca=additional_ca)
  File "/home/users/jmehtone/miniconda2/envs/velocyto/lib/python3.6/site-packages/velocyto/commands/_run.py", line 205, in _run
    total = spliced + unspliced + ambiguous
MemoryError

velocyto was installed to a fresh conda environment with python 3.6 following the installation guide from the docs. velocyto v0.11.0 and loompy v1.1.0 used. Intronic/exonic intervals and repeats annotations were generated as guided by the docs.

Any idea what might be causing this? The bam file consists of around 2k cells with around 200 million mapped reads. This is run on a computing server, so resources (CPU/RAM/disk space) should not be the problem. The bam file itself is ok as I have processed it with other tools.

multi10x output path

Hi,

Not sure where the .loom output file is located after running the following:

$velo multi10x -n 4 -l $log_folder -m $repmask $parentfolder $intervals

All I see in STDOUT is this:

2017-12-02 04:00:24,912 - DEBUG - Attempting to start 4 processes

Then I ran velocyto on each sample individually as a background process:

for sample_path in $parentfolder/*;
do
  echo $sample_path 
  sample=`basename $sample_path`
  $velo run10x -m $repmask $sample_path $intervals 1> logs_velocyto/${sample}.log 2> logs_velocyto/${sample}.err &
done

It seems to be running each sample in the background well enough right now:

2017-12-04 10:03:41,978 - DEBUG - Read first 100 million mapped reads
2017-12-04 10:05:21,292 - DEBUG - Marking up chromosome 17
2017-12-04 10:06:25,369 - DEBUG - Read first 110 million mapped reads
2017-12-04 10:08:30,929 - DEBUG - Marking up chromosome 18

Thus far I see a velocyto/ folder in the $sample_path,

Where is the output going to be? Perhaps make this more explicit in the documentation.

Any comments on the "multi10x" subcommand?

normalize_by_total() fails after filter_cells() (update self.initial_cell_size)

when filtering cells using vlm.filter_cells(totals > 500), only vlm.S and vlm.U get updated (resized), but vlm.initial_cell_size and vlm.initial_Ucell_size keep their original shapes. this leads to a

ValueError: operands could not be broadcast together with shapes (ncells_before,) (ngenes, ncells_after)

when calling vlm.normalize_by_total()

using vlm.initial_cell_size=vlm.initial_cell_size[(totals > 500)] fixes this problem

default_filter_and_norm index out of bounds

Hello! Thanks again for your help with the interval file. I was able to use it to create a .loom file for a practice version of our dataset and now I'm working through the analysis pipeline tutorial.

One of the ways to get started mentions using vlm.default_filter_and_norm(). When I run that, I appear to get an index out of bounds error. I think might have to do with the score_detection_levels() function reducing the number of genes from ~60,000 to 441 but then the score_cv_vs_mean() function is still trying to use more than 441 genes...? But perhaps it's for another reason, not sure.

Would this just mean that you have to experiment with providing inputs to the vlm.default_filter_and_norm() function or do you expect that the default function should have worked here?

Here's the traceback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-66-9ec18bcc69cb> in <module>()
----> 1 vlm.default_filter_and_norm()

~/ebs/workspace/velocyto.py/velocyto/analysis.py in default_filter_and_norm(self, min_expr_counts, min_cells_express, N, min_avg_U, min_avg_S)
   1587         self.filter_genes(by_detection_levels=True)
   1588 
-> 1589         self.score_cv_vs_mean(N=N, max_expr_avg=40)
   1590         self.filter_genes(by_cv_vs_mean=True)
   1591 

~/ebs/workspace/velocyto.py/velocyto/analysis.py in score_cv_vs_mean(self, N, min_expr_cells, max_expr_avg, svr_gamma, winsorize, winsor_perc, plot)
    274         ff = fitted_fun(log_m[:, None])
    275         score = log_cv - ff
--> 276         nth_score = np.sort(score)[::-1][N]
    277         if plot:
    278             scatter_viz(log_m[score > nth_score], log_cv[score > nth_score], s=3, alpha=0.4, c="tab:red")

IndexError: index 1000 is out of bounds for axis 0 with size 441

python always crush when running estimate_transition_prob

Hi,
I was using velocyto to analysis my own data (about 90k cells). however, when I do estimate_transition_prob, the python always crush down without any hits. Here is my code:
vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt",threads=8, n_neighbors=4000, knn_random=True, sampled_fraction=0.1)
and output:
2018-05-06 20:41:29,612 - DEBUG - Calculate KNN in the embedding space 2018-05-06 20:44:23,179 - DEBUG - Correlation Calculation 'knn_random'

my server is ubuntu 16.04 with 512GB RAM. Is there some easy ways to deal with it?
Thank you~

velocyto stuck on writing loom file

Hi,

velocyto seems stuck on the writing of the loom file -- currently it's been stuck on the line below for 3 hours.

ds = loompy.create(filename=outfile, matrix=spliced, row_attrs=ra, col_attrs=ca, dtype="float32")

Velocyto on aggregated data

Hey there!

I'm applying velocyto to aggregated data collected from several time points of a differentiation time course experiment. I was wondering if I should be cautious about situations where cells from different time points (eg. before and immediately after induction) may have similar gene expression, but could have different underlying velocities. Are kNNs determined based on total expression? Would the kNN smoothing be affected by this?

The reason I ask is because I'm noticing that I have cells in my non-induced population (blue) with a high velocity towards the more-differentiated populations without the accumulation of these cells at some terminal state (the cells shouldn't be dying and it seems unlikely that I managed to measure the cells just as this started to happen and they have).

Any chance you have any insight or suggestions here? Thanks! :)

Velocities produced with default pipeline (default_fit_preparation)

bam sort output

Seems like the output of samtools sort should be directed into the output folder, not the folder containing the input bam files (which may be read-only).

Where would we expect the executable `velocyto --help`?

Following the instructions, I've been able to (seemingly sucessfully) install velocyto via pip and installing from source via github, i.e.

git clone https://github.com/velocyto-team/velocyto.py.git
cd velocyto.py
pip install -e .  # note the trailing dot

outputs

Installing collected packages: velocyto                                                                                                                                          
  Found existing installation: velocyto 0.13.8                                                                                                                                   
    Uninstalling velocyto-0.13.8:                                                                                                                                                
      Successfully uninstalled velocyto-0.13.8                                                                                                                                   
  Running setup.py develop for velocyto                                                                                                                                          
Successfully installed velocyto

However, I cannot find the executable

$ velocyto --help
bash: velocyto: command not found

As I'm using multiple versions of Python, this isn't set-up as the "default" python. Where is this executable?

click CLI - positional argument missing 'test'

I'm attempting to run Velocyto within a condo environment, a la issue #32. Upon calling the run command, I get a missing argument error that appears to originate from the click module:

Invocation:
velocyto run bam.dir/Uber_sort.bam mm10_genes.gtf

Traceback (most recent call last):
File "/home/morgan02/.local/bin/velocyto", line 11, in
sys.exit(cli())
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/morgan02/miniconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
TypeError: run() missing 1 required positional argument: 'test'

ValueError: too many values to unpack (expected 9) in extract_repeats

Using the repeat masking GTF from the UCSC genome browser, velocyto extract_repeats results in the following error:

$ velocyto extract_repeats mm10_rmsk.gtf
2017-10-31 17:50:58,709 - DEBUG - Sorting by `sort -k1,1 -k7,7 -k4,4n /home/ubuntu/mm10_rmsk.gtf > /home/ubuntu/mm10_rmsk_sorted.gtf`
2017-10-31 17:51:05,265 - DEBUG - Opening /home/ubuntu/mm10_rmsk_sorted.gtf, output will be at /home/ubuntu/mm10_rmsk_joined.gtf
Traceback (most recent call last):
  File "/home/ubuntu/anaconda/bin/velocyto", line 11, in <module>
    load_entry_point('velocyto==0.9.9', 'console_scripts', 'velocyto')()
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/velocyto/commands/extract_repeats.py", line 54, in extract_repeats
    chrom, feature_class, feature_type, start, end, junk, strand, junk, tags = fields
ValueError: too many values to unpack (expected 9)

This looks like a malformed GTF issue or something else to do with reading each line of the gtf.

Could not create loom file

Hi,
I was trying to follow the procedure to run velocyto. However I got the error message when using 10X genomics cellranger output data, until the last step to form loom file. Is there possible reason why this happen? Thanks!

The command I used was:
~/anaconda3/bin/velocyto run10x -m ~/reference/mm10_rmsk.sorted.gtf ~/cellrangerpath/count-WT/ ~/reference/refdata-cellranger-mm10-1.2.0/genes/genes.gtf

It was all good until writing to data:
...
2018-01-25 02:58:37,430 - DEBUG - 866 reads not considered because fully enclosed in repeat masked regions
2018-01-25 02:58:37,681 - DEBUG - Counting done!
2018-01-25 02:58:37,691 - DEBUG - Generating output file /cellrangerpath/count-WT/velocyto/count-WT.loom
2018-01-25 02:58:37,691 - DEBUG - Collecting row attributes
2018-01-25 02:58:37,763 - DEBUG - Generating data table
2018-01-25 02:58:38,975 - DEBUG - Writing loom file
Traceback (most recent call last):
File "/usr/home/anaconda3/bin/velocyto", line 11, in
sys.exit(cli())
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/home/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/home/anaconda3/lib/python3.6/site-packages/velocyto/commands/run10x.py", line 100, in run10x
samtools_memory=samtools_memory, additional_ca=additional_ca)
File "/usr/home/anaconda3/lib/python3.6/site-packages/velocyto/commands/_run.py", line 222, in _run
ds = loompy.create(filename=outfile, matrix=total, row_attrs=ra, col_attrs=ca, dtype="float32")
TypeError: create() got an unexpected keyword argument 'matrix'

Issue with IndexError when running smartseq2 bam files

Hi,
I was using velocyto run_smartseq2 command to run some bam files. Most of them work well, but some bam files will get back "IndexError: list index out of range" this error when reading the bam (after summarizing the results of intron validation). It seems like this:

We used GSNAP for RNAseq alignment, maybe there are some special cigar. For example, we think this kind of reads (below) will cause the error but are not very sure and don't know why. Is there some easy ways to find which reads are not satisfied the requirement and how to deal with them?

Thank you very much!

`set_clusters` should use string type for cluster labels for hdf5 dump

Hi,
when setting clusters from an external data frame (eg., read Seurat output using pandas with vlm.set_clusters(my_annotation.loc[vlm.ca['CellID'],'cell.type'])), the dtype of vlm.cluster_labels is 'O', which cannot be dumped into hdf5 (TypeError: Object dtype dtype('O') has no native HDF5 equivalent).
using vlm.cluster_labels=vlm.cluster_labels.astype(np.string_) fixes this problem but creates byte strings which is maybe not optimal. some unicode compatibility problem?

great package by the way!

Issue with ERCC spike-ins in the GTF when using run_smartseq2

I ran this command, for SmartSeq2 data:

velocyto run_smartseq2 -o ./velocyto star/22606_3#7*/22606_3#7*.2pass.Aligned.sortedByCoord.dedup.bam $GTF

I was using a bam file with some reads mapped to ERCC spike-ins, so I have a GTF file with ERCC transcripts described like this (for example):

ERCC-00157	ERCC	exon	1	1019	0.000000	+	.	gene_id "ERCC-00157"; transcript_id "DQ839618";
ERCC-00158	ERCC	exon	1	1027	0.000000	+	.	gene_id "ERCC-00158"; transcript_id "DQ516795";
ERCC-00160	ERCC	exon	1	743	0.000000	+	.	gene_id "ERCC-00160"; transcript_id "DQ883658";
ERCC-00162	ERCC	exon	1	523	0.000000	+	.	gene_id "ERCC-00162"; transcript_id "DQ516750";
ERCC-00163	ERCC	exon	1	543	0.000000	+	.	gene_id "ERCC-00163"; transcript_id "DQ668359";
ERCC-00164	ERCC	exon	1	1022	0.000000	+	.	gene_id "ERCC-00164"; transcript_id "DQ516779";
ERCC-00165	ERCC	exon	1	872	0.000000	+	.	gene_id "ERCC-00165"; transcript_id "DQ668363";
ERCC-00168	ERCC	exon	1	1024	0.000000	+	.	gene_id "ERCC-00168"; transcript_id "DQ516776";
ERCC-00170	ERCC	exon	1	1023	0.000000	+	.	gene_id "ERCC-00170"; transcript_id "DQ516773";
ERCC-00171	ERCC	exon	1	505	0.000000	+	.	gene_id "ERCC-00171"; transcript_id "DQ854994";

And this seems to cause an error:

2018-03-25 21:03:50,535 - INFO - Load the annotation from /hps/nobackup/stegle/datasets/references/human/GRCh37/gencode.v19.annotation_ERCC.gtf
2018-03-25 21:03:59,379 - DEBUG - Parsing Chromosome ERCC-00002 strand + [line 0]
Traceback (most recent call last):
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/bin/velocyto", line 11, in <module>
    load_entry_point('velocyto', 'console_scripts', 'velocyto')()
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/run_smartseq2.py", line 72, in run_smartseq2
    samtools_memory=1, dump=dump, verbose=verbose, additional_ca=additional_ca)
  File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/_run.py", line 180, in _run
    annotations_by_chrm_strand = exincounter.read_transcriptmodels(gtffile)
  File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/counter.py", line 467, in read_transcriptmodels
    trname = regex_trname.search(tags).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

I got around this no problem by just removing the ERCC chromosomes from the gtf, but thought maybe this would be an issue for others, since I think quite a few people have used ERCC spike-ins with the SmartSeq2 protocol.

run10x for non-model organism

Hello again,
I wonder if you can help with this issue. I am trying to obtain a loom file from 10x data. I know the program runs well in human 10x data (tried and tested) but when I try to run it on a non-model organims I am working with, it gives me an error which I think it is to do with the type of annotation. I wonder if you could have a look at the error below? Have you seen this error before? I would be very greateful if you could help with this.
Many thanks in advance and apologies for the many questions
PS: I am have also attached the velocyto log file. Hope you can see it

Cheers, Carmen

2018-05-16 20:42:58,966 - WARNING - The entry exon_number was not present in the gtf file. It will be infferred from the position.
2018-05-16 20:43:02,117 - DEBUG - Parsing Chromosome SM_V7_1 strand - [line 0]
2018-05-16 20:43:07,784 - DEBUG - Fixing corner cases of transcript models containg intron longer than 1000Kbp
/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/bin/velocyto", line 11, in
sys.exit(cli())
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/commands/run10x.py", line 104, in run10x
samtools_memory=samtools_memory, dump=dump, verbose=verbose, additional_ca=additional_ca)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/commands/_run.py", line 180, in _run
annotations_by_chrm_strand = exincounter.read_transcriptmodels(gtffile)
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/counter.py", line 512, in read_transcriptmodels
tm.chop_if_long_intron() # Change it in place
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/transcript_model.py", line 93, in chop_if_long_intron
long_feats = [i for i in self.list_features if len(i) > maxlen and i.kind == ord("i")]
File "/nfs/users/nfs_m/my4/scratch/src/miniconda3/lib/python3.6/site-packages/velocyto/transcript_model.py", line 93, in
long_feats = [i for i in self.list_features if len(i) > maxlen and i.kind == ord("i")]
ValueError: len() should return >= 0

veloctyo.log

prepare_markov question

Hi,

The prepare_markov function has two required parameters -- sigma_D and sigma_W. However I'm unclear on what parameters those correspond to in the manuscript. If I could have clarification on that, that would be much appreciated!

Thanks!

velocyto .normalize - divide by zero encountered in true_divide

import velocyto as vcy
vlm = vcy.VelocytoLoom("myloomfile.loom")
vlm.normalize("S", size=True, log=True)

vlm.default_filter_and_norm()
/miniconda/lib/python3.6/site-packages/velocyto/analysis.py:670: RuntimeWarning: divide by zero encountered in true_divide
adj_factor = predicted / y
/miniconda/lib/python3.6/site-packages/velocyto/analysis.py:707: RuntimeWarning: divide by zero encountered in true_divide
self.U_sz[:, ~self.small_U_pop] = self.U_sz[:, ~self.small_U_pop] * (np.median(self.U_sz[:, ~self.small_U_pop].sum(0)) / self.U_sz[:, ~self.small_U_pop].sum(0))
/miniconda/lib/python3.6/site-packages/velocyto/analysis.py:707: RuntimeWarning: invalid value encountered in multiply
self.U_sz[:, ~self.small_U_pop] = self.U_sz[:, ~self.small_U_pop] * (np.median(self.U_sz[:, ~self.small_U_pop].sum(0)) / self.U_sz[:, ~self.small_U_pop].sum(0))

Problem with velocyto.py install

I'm seeing the following error message after running "pip install velocyto":

Collecting velocyto
Using cached velocyto-0.13.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-8iDgnS/velocyto/setup.py", line 14
package_data: dict = {}
^
SyntaxError: invalid syntax

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-8iDgnS/velocyto/

I have all of the proper dependencies installed, on a Ubuntu16.04 machine, running conda 4.4.7. Any ideas whats going wrong?

Poor results from a Drop-Seq bam file with "velocyto run"

Hi,

Thanks for Velocyto, it's very interesting.

I try to launch "velocyto run" on our data.
Our scRNA-seq protocol is inspired by the STRT-Seq protocol. The results are in UMI counts and the reads are from the 5 prime end of the mRNA (not like 10x where there are from the 3 prime end).
I used the Drop-Seq pipeline (and STAR) to obtain a mapped bam file where the cell barcode and UMI barcode are contained in the XC and XM tags.
I noticed that counter.py takes care of these specific tags from Drop-Seq.

Unfortunately, I have got very poor results when I launch "Velocyto run" on my bam which contains 96 cells. Only a few genes are detected and we do not have a lot of spliced, unspliced and ambiguous values.

Also, it may be part of this issue: In R, when I look at the rownames(ldat$spliced), there are a lot of duplicated genes...

Did I miss something?
Normally (without velocyto), I have more than 1000 genes detected per cell.

Thank you

Dictionary KeyError in extract_intervals

Hey! I've installed velocyto and am looking to use it to create the ivls.txt file from a gtf. In particular, I'm trying to use the ensembl human genome gtf here: http://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/

I started the process with the following command:
velocyto extract_intervals Homo_sapiens.GRCh37.75.gtf -p GRCh37
And it starts running but then after about 2 hours, it failed with what I think is a dictionary lookup error for a gene name? Here is the traceback:

2017-11-16 23:45:55,547 - DEBUG - 47500 genes already processed
2017-11-16 23:51:16,930 - DEBUG - 50000 genes already processed
2017-11-16 23:56:46,434 - DEBUG - 52500 genes already processed
Traceback (most recent call last):
  File "/home/ubuntu/ebs/anaconda3/bin/velocyto", line 11, in <module>
    sys.exit(cli())
  File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/ebs/anaconda3/lib/python3.6/site-packages/velocyto/commands/extract_intervals.py", line 152, in extract_intervals
    othertrs.update(binnedtrs_by_chromstrand[chromstrand][trbin])
KeyError: 'HG1007_PATCH+'

I looked into the Homo_sapiens.GRCh37.75.gtf file and could not find the pattern 'HG1007_PATCH+' but I did find it several times without the '+' (just 'HG1007_PATCH'). I'm wondering if there is a chance the + got accidentally attached to the gene name when it was defined as a dictionary key at some point? Or if you have any suggestions for finding where this gene name edit might have happened?

Thanks!

Running SmartSeq2 logic with "-U" gives IOError for missing UMI barcodes

Hi, I am trying to run velocyto on data generated using SmartSeq2, using the current dev branch.
I ran the command:

velocyto run -l SmartSeq2 -o ./velocyto -U $BAMPATH $GTFPATH

And I get the following error:

Traceback (most recent call last):
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/bin/velocyto", line 11, in <module>
    load_entry_point('velocyto', 'console_scripts', 'velocyto')()
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/nfs/software/stegle/users/dseaton/conda-envs/velocyto_env/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/run.py", line 114, in run
    samtools_memory=samtools_memory, dump=dump, verbose=verbose, additional_ca=additional_ca)
  File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/commands/_run.py", line 156, in _run
    exincounter.peek(bamfile[0])
  File "/nfs/software/stegle/users/dseaton/python_packages/velocyto.py/velocyto/counter.py", line 148, in peek
    raise IOError("The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag.")
OSError: The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag.

I checked my BAM and GTF files are correct and compatible. The first few lines of my BAM are like this:

HS34_22606:3:2212:13187:68945	419	chr1	156025	0	122M	=	156265	344	GTCTTGCTCTATCACCTAGGCTGGACTGCAGTGCTGCAATCACAGTTAACTGCAGCCTCAACCTCCAGGGCTTGAGCAATATTCCCATCTAATTTTTATCTTGTTTAAGAAGTGCAGTCTTG	BBBBB<BFFBFFF/<F/</FF<F7FFFFFB/<BFBFFF/F<FFBFB</<<</<F<F</BF/</BFFB///<F/</B///<</</FBBB/B/FF<BF//B///<<F/B//B/<//7/BF///B	MD:Z:33G65T11A10	PG:Z:MarkDuplicates	RG:Z:22606_3#76	NH:i:6	HI:i:2	NM:i:3	AS:i:208
HS34_22606:3:2212:13187:68945	339	chr1	156265	0	21S104M	=	156025	-344	CACGGACTCCAGGGGTGCCCAGCCACGAGTAGCTCCTTTTTTTTTTTTAATATTTAGTAGAGACAAAGTGTCACCATGTTGACCAGGTTGGTGGTGATCTCCTACACTCAGGCAGTTCTCTCACC	7//BB7/<7B77/////7////</B///B7//B<<///BBB/FBF/FF</////<B/F<B/FFFBFFFFFF</<<BBB</FFBBF//BF</<</FBFFFFFFB/FBFFFFBFF<BBBB/<BBB</	MD:Z:2A3T7A15T22T50	PG:Z:MarkDuplicates	RG:Z:22606_3#76	NH:i:6	HI:i:2	NM:i:5	AS:i:208
HS34_22606:3:1314:4949:83531	99	chr1	340951	255	9M42S	=	566538	225711	GGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT	//BBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	MD:Z:9	PG:Z:MarkDuplicates	RG:Z:22606_3#76	NH:i:1	HI:i:1	NM:i:0	AS:i:119
HS34_22606:3:2310:16017:71190	137	chr1	526895	255	8S116M	*	0	0	GGCGAAAAACGAGGACAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCCGGGCATGGTAGTGGGCGCCTGTAGTCCCAGCTACTTGGGA	/////<//<//<////<//B/F/</BB///<</F<<///<//<F<F<BBFFF/FFFFFFBF/FBFF<FFFFBBBB/FBB<7FF/<BBBFB<BFFFFF/7<<7/BBFF/7B7/7/B7B//7B/BB	MD:Z:6T68T9G30	PG:Z:MarkDuplicates	RG:Z:22606_3#76	NH:i:1	HI:i:1	NM:i:3	AS:i:108
HS34_22606:3:1316:9877:80870	355	chr1	564464	3	8S115M	=	564664	324	GAGTACATGGGAGTCCGAACTAGTCTCAGGCTTCAACATCGAATACGCCGCAGGCCCCTTCGCCCTATTCTTCATAGCCGAATACACAAACATTATTATAATAAACACCCTCACCACTACAAT	/BBBB<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFF/BFFFFFFFFFFFFFFFBFFFFFFFF<F	MD:Z:115	PG:Z:MarkDuplicates	RG:Z:22606_3#76	NH:i:2	HI:i:2	NM:i:0	AS:i:237
HS34_22606:3:1213:5571:77912	163	chr1	564659	3	125M	=	564673	139	CCCTGTTCTTATGAATTCGAACAGCATACCCCCGATTCCGCTACGACCAACTCATACACCTCCTATGAAAAAACTTCCTACCACTCACCCTAGCATTACTTATATGATATGTCTCCATACCCATT	BBBBBFBFFFFFFFF/FFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFB	MD:Z:125	PG:Z:MarkDuplicates	RG:Z:22606_3#76	NH:i:2	HI:i:1	NM:i:0	AS:i:248

I think I'm on the most up-to-date dev branch. The output of pip freeze for velocyto version is:

-e git+https://github.com/velocyto-team/velocyto.py.git@e7b2a82dc8d23f47d190f7c4d3f326892d5ba2b8#egg=velocyto

Thanks for making the software, and for releasing as a preprint for people to read about and try while still under development.

AttributeError: module 'matplotlib.cm' has no attribute 'Vega20b'

I'm having some problems getting velocyto installed properly with python-3.6.1. The package appears to install properly, but doesn't import. I get the following matplotlib issue:

Successfully installed velocyto
You are using pip version 9.0.1, however version 9.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[root@mskilab02 velocyto.py]# python
Python 3.6.1 (default, Mar 19 2018, 15:38:01)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import velocyto
/path/software/python-3.6.1/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/Software/src/velocyto.py/velocyto/__init__.py", line 15, in <module>
    from .analysis import VelocytoLoom, scatter_viz, ixs_thatsort_a2b, load_velocyto_hdf5
  File "/path/Software/src/velocyto.py/velocyto/analysis.py", line 1936, in <module>
    colors20 = np.vstack((plt.cm.Vega20b(np.linspace(0., 1, 20))[::2], plt.cm.Vega20c(np.linspace(0, 1, 20))[1::2]))
AttributeError: module 'matplotlib.cm' has no attribute 'Vega20b'

Loompy appears to have installed correctly though...


>>> import loompy
/path/software/python-3.6.1/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Any troubleshooting tips? Should I be trying a different python? Thanks for any help.

error occurred at running vlm.default_filter_and_norm()

I was using the default filtering and normalization but got this error. Is there way to get around this? Thanks!

vlm.default_filter_and_norm()
/usr/home/anaconda3/lib/python3.6/site-packages/velocyto/analysis.py:695: RuntimeWarning: divide by zero encountered in true_divide
adj_factor = predicted / y

Support for sci-RNA-seq data

Hi there,

First of all, I've already run velocyto on some 10x data and it's looking really interesting, so thanks a lot for developing the tool!

I'm now hoping to compare against some data produced with the sci-RNA-seq protocol (Cao et al. 2017. At the moment I'm processing the data with zUMIs, which seems to do a pretty good job of filtering the barcodes.

Now I'm trying to figure out whether it's best to try to use velocyto run directly or to adapt one of the existing shortcuts, such as run_dropest. As far as I can tell, for either one I would need to pass a list of the filtered barcodes as well as adding the barcodes as CB tag in the filtered bam files. This is not done by default with zUMIs but should be do-able.

One thing I'm not sure about is how to handle the UMIs. In sci-RNA-seq they are concatenated together with the cell barcodes in R1. Would I need to also add them as a tag in the filtered bam file, or as part of the reads themselves?

Thanks for your help - I would appreciate any suggestions.

Best,
Sarah

Support for Drop-seq bam files?

Hi there,

first of all, thank you very much for the great package and the awesome preprint, really enjoyed reading your analysis using RNA velocity and the theory behind it.

I was going through the python tutorial and was wondering whether you are planning to add a run mode, that would make it easy to run velocyto with BAM files from the Drop-seq pipeline.

The format for BAM files created with this pipeline is described here:
Dropseq Alignment Cookbook

Basically, the Cell barcode for each read is tagged with the XC tag and the molecular barcode with the XM tag. I think it would be really cool to use your approach on some Drop-seq data. Maybe there is already a way to use the package on these kinds of BAM files and I haven't noticed if so, please excuse and close the issue. If not, I think there are quite a few people using Drop-seq, that would be interested to apply velocyto to their data!

Best,

Florian

Run10x Error

Hi,

Thanks for developing such an exciting piece of technology! I am currently trying to run the velocyto CLI to generate a .loom file. However, when I submit the run10x command I receive the following error:

TypeError: run10x() missing 1 required positional argument: 'molrep'

However, according to the tutorial, molrep does not appear to be an option in the 10x command. If I try to use the molrep option as defined in the run command in the run10x command I receive this error:

Error: no such option: --molrep

If you have any suggestions on how to correct this issue it would be greatly appreciated.

velocyto successfully installed but crashing

Dear velocyto team,

I have installed velocyto and dependencies following the installation guide. However, the program crashes simply when doing velocyto --help. It is funny because I have succeeded in different computers but now I just don't know what the problem could be. Bellow is the error I am getting, it seems to be related to matplotlib.cm. Please let me know if you recognise this problem

velocyto --help
Traceback (most recent call last):
File "/homes/iimaz/miniconda3/bin/velocyto", line 7, in
from velocyto.commands.velocyto import cli
File "/homes/iimaz/miniconda3/lib/python3.6/site-packages/velocyto/init.py", line 15, in
from .analysis import VelocytoLoom, scatter_viz, ixs_thatsort_a2b, load_velocyto_hdf5
File "/homes/iimaz/miniconda3/lib/python3.6/site-packages/velocyto/analysis.py", line 1936, in
colors20 = np.vstack((plt.cm.Vega20b(np.linspace(0., 1, 20))[::2], plt.cm.Vega20c(np.linspace(0, 1, 20))[1::2]))
AttributeError: module 'matplotlib.cm' has no attribute 'Vega20b'

Thank you!
Ivan

adding metadata for run_smartseq2

Hi sorry if I'm missing something obvious here, but when running run_smartseq2 is there a way of adding metadata information.

Im currently going through the API info but can find any reference to this.

Thanks.

How to use with many bam files?

In the documentation, there is a command to perform velocyto run on a single bam file, but how would you do this for 100s of cells at a time? Do you recommend aggregating all the bam files together and adding a unique ID to the "read_group" field?

Issue with AttributeError when running smartseq2 bam files

Sorry for disturbing you again, but after you fixed the former problem, I get a new one.
Also when using velocyto run_smartseq2 command to run bam files, I get the error “AttributeError: 'list' object has no attribute 'add'”. It likes this:

It seems to happen where you fix your code yesterday. Here is the link (below) of two of my samples which will get this error, if you would like to try to run them using smartseq2 logic.
https://1drv.ms/f/s!Ah6kTQEKGo66giSsQqtqeLOF8j8c
Thank you very much for making the software and helping us solve the problems.

More requirements than specified

Running pipreqs on the repo resulted in more dependencies than in the installation requirements, and I'm not sure which you prefer to be conda vs pip.

Installation instructions from documentation

conda install numpy scipy cython scikit-learn h5py click
pip install pysam loompy

Output from `pipreqs`

(loom-env) ➜  velocyto.py git:(edit_on_github) /Users/olgabot/anaconda3/envs/loom-env/bin/pipreqs . 
INFO: Successfully saved requirements file in ./requirements.txt
(loom-env) ➜  velocyto.py git:(edit_on_github) cat ./requirements.txt
click==6.7
h5py==2.7.0
scipy==0.19.1
setuptools==36.5.0.post20170921
numpy==1.13.3
numba==0.35.0+6.gaa35fb1
matplotlib==2.1.0
pysam==0.12.0.1
Cython==0.27.2
loompy==1.0.0
scikit_learn==0.19.1
typing==3.6.2

High Memory usage of extract_intervals

I am running extract_intervals on an mm10 annotation (from ENCODE but with the transcript_name field added at the end to solve that issue). It has 23337 genes and 30455 transcripts. It seems to take tons of memory - using up all the 64 gigs of ram I have and then eating into swap space, which obviously makes it run very slow - 20’ to process the first 2500 genes.

How much memory use is expected for the extract_intervals program? I assume there is a problem somewhere here.

Typo in gene_knn_imputation

Hey!
There's a typo in the analysis.py gene_knn_imputation function.
On line 1107 it's currently connectivity = (self.knn > 0).atype(float)
But it should be with connectivity = (self.knn > 0).astype(float)

Bioconda to install pysam

Have you used bioconda to install pysam? It comes pre-built with many extensions so you don't need to. I always worry about installing C-dependent libraries with pip because it's been so problematic for me.

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install pysam

Compatible with C1 (Fluidigm) read data from SRA??

Hi,

I have extensively read the documentation and still have questions:

Could you confirm alignments of C1 data (Fluidigm) is valid input to Velocyto's run stage?
Further, I have no valid cell barcodes available to me from the SRA sample FASTQs (short read archive). I don't think that will be a problem, even though you warn against this.
Please confirm the "ValidatedIntrons10X" logic option (default) is still appropriate for handling C1 data, in spite of the fact that it is not a Drop-Seq based technology?

Thanks!

Symbol not found: _GOMP_parallel

Hello, I am relatively new to python. Sorry if I am missing any information.

Velocyto installation completed, but when running velocyto --help to test installation I receive the following error:

$ velocyto --help
Traceback (most recent call last):
  File "/anaconda3/bin/velocyto", line 7, in <module>
    from velocyto.commands.velocyto import cli
  File "/anaconda3/lib/python3.6/site-packages/velocyto/__init__.py", line 13, in <module>
    from .estimation import fit_slope, _fit1_slope, clusters_stats
  File "/anaconda3/lib/python3.6/site-packages/velocyto/estimation.py", line 7, in <module>
    from .speedboosted import _colDeltaCor, _colDeltaCorLog10, _colDeltaCorSqrt
ImportError: dlopen(/anaconda3/lib/python3.6/site-packages/velocyto/speedboosted.cpython-36m-darwin.so, 2): Symbol not found: _GOMP_parallel
  Referenced from: /anaconda3/lib/python3.6/site-packages/velocyto/speedboosted.cpython-36m-darwin.so
  Expected in: flat namespace
 in /anaconda3/lib/python3.6/site-packages/velocyto/speedboosted.cpython-36m-darwin.so

I think this likely has something to do with my gcc version, but I'm not sure how to fix the problem.

$ python -V
Python 3.6.3 :: Anaconda custom (64-bit)
$ which gcc
/anaconda3/bin/gcc
$ gcc --version
gcc (GCC) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

extract_intervals- AttributeError: 'NoneType' object has no attribute 'group' with GENCODE gtf

Using extract_intervals on a GENCODE gtf file like these resulted in this error:

ubuntu@olgabot-loom:~$ velocyto extract_intervals gencode.vM15.annotation.gtf -p gencode.vM15.annotation.velocyto-intervals.txt
2017-10-24 23:14:54,320 - DEBUG - Reading /home/ubuntu/gencode.vM15.annotation.gtf
Traceback (most recent call last):
  File "/home/ubuntu/anaconda/bin/velocyto", line 11, in <module>
    load_entry_point('velocyto==0.9.6', 'console_scripts', 'velocyto')()
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/anaconda/lib/python3.6/site-packages/velocyto/commands/extract_intervals.py", line 95, in extract_intervals
    exonno = regex_exonno.search(tags).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
ubuntu@olgabot-loom:~$ which python
/home/ubuntu/anaconda/bin/python
ubuntu@olgabot-loom:~$ python --version
Python 3.6.2 :: Anaconda custom (64-bit)

Some genes have much lower counts in Velocyto than 10x cellranger output

Hi,

I'm using Velocyto on data generated from 10x genomics cellranger pipeline, and project velocity onto embeddings produced from Seurat and scanpy. I found that some of my marker genes are barely detected in Velocyto pipeline, but are detected in hundreds of cells from the cellranger output, .

For example:

'GSX-2' 
cellranger count (total 10k cells):
nonzero_cells     898.0
total_count      1233.0

Velocyto spliced count:
nonzero_cells    1.0
total_count      1.0

Velocyto un-spliced count:
nonzero_cells    0.0
total_count      0.0

Here is a plot of counts for ~20,000 genes:

It looks like many genes are detected with far less counts in the Velocyto pipeline.

Below is the CLI code I used to produce the loom file:
velocyto run10x -m ~/Data1/data/hrch38_rmsk.gtf ./14741X1 ~/Data1/data/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf

The bam files were generated using cellranger 2.0.2 using the same reference and sorted during velocyto run10x. I tried run10x without repeat mask and got similar result.

Do you know what's the reason for the much lower gene detection in Velocyto?

Thanks!
Yueqi

Velocyto crashes just by starting...

omendivi$ velocyto
/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/anaconda3/bin/velocyto", line 11, in
load_entry_point('velocyto', 'console_scripts', 'velocyto')()
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 572, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2755, in load_entry_point
return ep.load()
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2408, in load
return self.resolve()
File "/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2414, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/local/bin/velocyto.py/velocyto/init.py", line 13, in
from .estimation import fit_slope, _fit1_slope, clusters_stats
File "/usr/local/bin/velocyto.py/velocyto/estimation.py", line 7, in
from .speedboosted import _colDeltaCor, _colDeltaCorLog10, _colDeltaCorSqrt
ImportError: dlopen(/usr/local/bin/velocyto.py/velocyto/speedboosted.cpython-36m-darwin.so, 2): Symbol not found: _GOMP_parallel
Referenced from: /usr/local/bin/velocyto.py/velocyto/speedboosted.cpython-36m-darwin.so
Expected in: flat namespace
in /usr/local/bin/velocyto.py/velocyto/speedboosted.cpython-36m-darwin.so

Splicing matrix is all 0

Hi,

I ran the following commands

samtools sort -t CB -O BAM -o cellsorted_possorted_genome_bam.bam possorted_genome_bam.bam
velocyto run -b filtered_gene_bc_matrices/mm10/barcodes.tsv -m ../mm10_rmsk.gtf -o preprocessed_bam possorted_genome_bam.bam ../genes.gtf

However the resulting splicing matrix S in the loom file seems to be all 0. I have another dataset (different conditions but same platform -- 10x) but for that, the splicing matrix looks fine (though the PCA visualization of the top 2 components is different compared to what it looks like when doing it for the counts from the 10x software processing). For a third dataset, the splicing matrix is also all 0s.

Any help would be appreciated!

--Sid

vlm.normalize_median

Dear velocyto team,
I am currently following your tutorial online to learn how to work with velocyto to then apply the same code to my data. Unfortunately, I cannot get passed the vlm.normalize_median() step in your script. Maybe it is something obvious so apologies in advanced if this is the case. I should say that I am not working in ipython due to a system and administration issues with all the dependencies. Instead I am submitting my script to a computer farm. I am also following exactly what is in your code. The only exception are the modules that I import which are almost the same except for the ipython/display import.

import sys
import numpy as np
from scipy.stats import norm
from scipy.spatial.distance import pdist, squareform
import matplotlib
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import loompy
import velocyto as vcy
import logging

So... the error I keep getting is the following:

Traceback (most recent call last):
File "test.py", line 81, in
vlm.normalize_median()
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/velocyto/analysis.py", line 898, in normalize_median
self.Sx_sz = self.Sx * (np.median(self.Sx.sum(0)) / self.Sx.sum(0))
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4119, in median
overwrite_input=overwrite_input)
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4033, in _ureduce
r = func(a, **kwargs)
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/lib/function_base.py", line 4152, in _median
part = partition(a, kth, axis=axis)
File "/software/pathogen/external/apps/usr/local/Python-3.6.0/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 664, in partition
a.partition(kth, axis=axis, kind=kind, order=order)
ValueError: kth(=9106) out of bounds (1)

I am also attaching the script that I am running in case.
I would be very greateful if you have any ideas of what the problem is.
Kind regards,
test.txt

Carmen

Installation on Mac OS

Has team velocyto tested the installation on the Mac operating system? Myself and colleagues have have run into roadblocks with the clang compiler. gcc-4.8 appeared to solve the issue for one of us, allowing her to successfully install velocyto, but that compiler is not compatible with the latest Mac OS 10.13.

Do you have general advice for installing velocyto on a Mac? Would really appreciate it!

Installation on macOS

I'm trying to install velocyto on macos High Sierra using conda 4.5.0 python version Python 3.6.5.
All the dependencies install fine but pip install velocyto yields the following error:

Building wheels for collected packages: velocyto
  Running setup.py bdist_wheel for velocyto ... error
...
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/harmbrugge/anaconda3/envs/3point6/include -arch x86_64 -I/Users/harmbrugge/anaconda3/envs/3point6/include -arch x86_64 -I/Users/harmbrugge/anaconda3/envs/3point6/lib/python3.6/site-packages/numpy/core/include -I/Users/harmbrugge/anaconda3/envs/3point6/include/python3.6m -c velocyto/speedboosted.c -o build/temp.macosx-10.7-x86_64-3.6/velocyto/speedboosted.o -fopenmp -ffast-math
  clang: error: unsupported option '-fopenmp'
  error: command 'gcc' failed with exit status 1

I think the problem is that i'm using an old gcc version and tried installing the latest version using homebrew: brew install cmake gcc, but I'm not sure how to link the new gcc instalation with pip

setup.py error - package_data: dict = {}

Hello,

I'm trying to install velocyto into an ubuntu-based singularity image. However, I am failing at the install with the error in the title:

+ pip install velocyto
Collecting velocyto
  Downloading velocyto-0.12.1.tar.gz (180kB)
    100% |################################| 184kB 2.6MB/s
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-rCTsse/velocyto/setup.py", line 14
        package_data: dict = {}
                    ^
    SyntaxError: invalid syntax

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-rCTsse/velocyto/
ABORT: Aborting with RETVAL=255

I can see this line in the setup.py file, but I'm not familiar enough with python to understand what this line is trying to do.

Concerning the compute environment, the recipe includes these steps with regard to python:

apt-get -y install python python3 python-pip python3-pip git
pip install --upgrade pip
pip3 install --upgrade pip

pip install numpy scipy cython numba matplotlib scikit-learn h5py click pysam
pip install velocyto

Do you have any ideas what is causing this?

Thanks!

velocyto-team / velocyto.py Goto Github PK

velocyto.py's Introduction

Velocyto

velocyto.py's People

Contributors

Stargazers

Watchers

Forkers

velocyto.py's Issues

Installation instructions from documentation

Output from pipreqs

Recommend Projects

Recommend Topics

Recommend Org

Output from `pipreqs`