scverse / scirpy Goto Github PK

A scanpy extension to analyse single-cell TCR and BCR data.

Home Page: https://scirpy.scverse.org/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

scirpy's Introduction

Scirpy: single-cell immune receptor analysis in Python

Scirpy is a package to analyse T cell receptor (TCR) or B cell receptor (BCR) repertoires from single-cell RNA sequencing (scRNA-seq) data in Python. It seamlessly integrates with scanpy and mudata and provides various modules for data import, analysis and visualization.

Scirpy is part of the scverse project (website, governance) and is fiscally sponsored by NumFOCUS. Please consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.

Getting started

Please refer to the documentation. In particular, the

Tutorial, and the
API documentation.

Installation

You need to have Python 3.9 or newer installed on your system. If you don't have Python installed, we recommend installing Mambaforge.

There are several alternative options to install scirpy:

Install the latest release of scirpy from PyPI:
```
pip install scirpy
```
Get it from Bioconda:

First setup conda channels as described here. Then install scirpy:
```
conda install scirpy
```

Install the latest development version:

pip install git+https://github.com/scverse/scirpy.git@main

Run it in a container using Docker or Podman:

docker pull quay.io/biocontainers/scirpy:<tag>

where tag is one of these tags.

Release notes

See the changelog.

Support and Contact

We are happy to assist with problems when using scirpy.

If you need help with scirpy or have questions regarding single-cell immune-cell receptor analysis in general, please join us in the scverse discourse.
For bug report or feature requests, please use the issue tracker.

We try to respond within two working days, however fixing bugs or implementing new features can take substantially longer, depending on the availability of our developers.

Citation

If you use scirpy in your work, please cite the scirpy publication as follows:

Scirpy: A Scanpy extension for analyzing single-cell T-cell receptor sequencing data

Gregor Sturm, Tamas Szabo, Georgios Fotakis, Marlene Haider, Dietmar Rieder, Zlatko Trajanoski, Francesca Finotello

Bioinformatics 2020 Sep 15. doi: 10.1093/bioinformatics/btaa611.

You can cite the scverse publication as follows:

The scverse project provides a computational ecosystem for single-cell omics data analysis

Isaac Virshup, Danila Bredikhin, Lukas Heumos, Giovanni Palla, Gregor Sturm, Adam Gayoso, Ilia Kats, Mikaela Koutrouli, Scverse Community, Bonnie Berger, Dana Pe’er, Aviv Regev, Sarah A. Teichmann, Francesca Finotello, F. Alexander Wolf, Nir Yosef, Oliver Stegle & Fabian J. Theis

Nat Biotechnol. 2023 Apr 10. doi: 10.1038/s41587-023-01733-8.

scirpy's People

Contributors

Stargazers

Watchers

scirpy's Issues

Type Checking

In GitLab by @grst on Jan 17, 2020, 14:32

null

Competition: pyVDJ

In GitLab by @grst on Mar 13, 2020, 11:56

Just discovered this on GitHub:

Has quite similar concepts (scanpy extension)... on the other hand it doesn't really seem promoted a lot and lacks some functionality that we have.
I do like their way of visualizing shared CDR3 sequences and that they allow to find 'public' and 'private' epitopes.

Maybe we can draw some more inspiration from there.

Switch to python 3

In GitLab by @grst on Oct 3, 2019, 10:35

Python 2 will not be supported any more end of the year.
We should port the code to python 3.7. Should not be too hard.

https://pythonclock.org/

Do we need a plotting function for chain pairing?

In GitLab by @szabogtamas on Feb 10, 2020, 12:10

Chain pairing stats can be visualized by the group abundance plotting function. It will not check if the tool was ran and also uses a specific set of settings. Does it make sense to create a function for it that is only a wrapper actually?

TCR dist

In GitLab by @grst on Jan 24, 2020, 13:50

Getting the alignments was quite straightforward, thanks to parasail.
We said that we focus on CDR3, at least initially.

@szabogtamas, some points to discuss

Distance between which chains? primary alpha/beta only?
What if a cell does not have either alpha or beta?
Right now, I use blosum62 to compute the alignment score. I think in the original publication they cap the max score per amino acid to 4. Does that make sense?

Edit:
Prototype works. ToDo for now.

Add clonotype overlap as a distance metric (simple 0-1)
keyword argument to choose to compute on primary_only or all chains
consider TRA and TRB separately.

Implement Levenshtein distance for tcr dist

In GitLab by @grst on Mar 17, 2020, 17:29

Should be really straightforward.
https://rawgit.com/ztane/python-Levenshtein/master/docs/Levenshtein.html

On the other hand, it should be equivalent to a global pairwise sequence alignment with identity matrix and gap penalty 1.

Test that read_10x_vdj and read_10x_vdj_csv result in the same data matrix.

In GitLab by @grst on Mar 30, 2020, 14:53

null

Supported T cell receptor types [REPLACEMENT ISSUE]

The original issue

Id: 10
Title: Supported T cell receptor types

could not be created.
This is a dummy issue, replacing the original one. It contains everything but the original issue description. In case the gitlab repository is still existing, visit the following link to show the original issue:

TODO

Add groupwise data to obs

In GitLab by @szabogtamas on Feb 10, 2020, 13:00

When computing convergence, it seems plausible to add the convergence value (let us say log(1/singleton rate) for now) for a single cell given the grouping to obs and not only the singleton/duplicate/triplicate rates to uns.
The case would be similar with alpha diversity scores and maybe also clonal expansion.
If we add it, what rule should we follow for column names?

Unit-tests for TCR-dist

In GitLab by @grst on Mar 21, 2020, 11:45

null

upload on GitHub, PyPI and Bioconda

ToDo CI:

test python
test docs
publish docs to github pages (master branch); potentially other branches to sub-directory
publish to pypi on release
pypi package cache

Error in callClonotypes

In GitLab by @grst on Nov 18, 2019, 16:53

Hi @szabogtamas,

the new pipeline looks very nice and I got to run it on the example data without problems.
I now

merged the multiprocess branch into master
cleaned up the main.nf a bit
created conda yml files. So it is independent of the environment in your home directory.

When I try to run it on the data from the Vanderburg study,
I get the following error. Do you have any idea? Otherwise, let's look at
it together when you are back!

Best, Gregor

Error executing process > 'callClonotypes (1)'                                                                                                                                                                       
                                                                                                                                                                                                                     
Caused by:                                                                                                                                                                                                           
  Process `callClonotypes (1)` terminated with an error exit status (1)                                                                                                                                              
                                                                                                                                                                                                                     
Command executed:                                                                                                                                                                                                    
                                                                                                                                                                                                                     
  callClonotype.py mergedCDRs.tsv clonotypeTable.tsv additionalCellInfo.tsv chainConvergence.tsv chainMap.tsv chainPairs.tsv chainNet.tsv inToDiv.txt inToDist.txt                                                   
                                                                                                                                                                                                                     
Command exit status:                                                                                                                                                                                                 
  1                                                                                                                                                                                                                  
                                                                                                                                                                                                                     
Command output:                                                                                                                                                                                                      
  (empty)                                                                                                                                                                                                            
                                                                                                                                                                                                                     
Command error:                                                                                                                                                                                                       
  Traceback (most recent call last):                                                                                                                                                                                 
    File "/home/sturm/projects/2019/singlecell_tcr/bin/callClonotype.py", line 325, in <module>                                                                                                                      
      __main__()                                                                                                                                                                                                     
    File "/home/sturm/projects/2019/singlecell_tcr/bin/callClonotype.py", line 19, in __main__                                                                                                                       
      callClonotypesAndChainPairs(cdrF, clonInfo, cellInfo, convergenceTab, chainMap, chainPairs, chainNet, seqsToDiv, seqsToDis, distance_method)                                                                   
    File "/home/sturm/projects/2019/singlecell_tcr/bin/callClonotype.py", line 35, in callClonotypesAndChainPairs                                                                                                    
      chainMapping, cdrMapping, convergenceTable, clonoFreq, cellChainTable, cellInfoTable, chainsForDiversity, chainsForDist = renameChain(cdrF, chainMapping, cdrMapping, convergenceTable, clonoFreq, cellChainTab
le, cellInfoTable, chainsForDiversity, chainsForDist, distance_method)                                                                                                                                               
    File "/home/sturm/projects/2019/singlecell_tcr/bin/callClonotype.py", line 172, in renameChain                                                                                                                   
      cellInfoTable[cellNames] = [clonoSign, cellNames[:cellNames.rindex('_')+1], chainLinks]                                                                                                                        
  ValueError: substring not found                                                                                                                                                                                    
                                                                                                                                                                                                                     
Work dir:                                                                                                                                                                                                            
  /home/sturm/projects/2019/singlecell_tcr/work/b7/ef5d9c141aeef83d23efc80f827746                                                                                                                                    
                                                                                                                                                                                                                     
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Describe rationale of clonotype definition [REPLACEMENT ISSUE]

The original issue

Id: 4
Title: Describe rationale of clonotype definition

TODO

Encode distance in edge thickness

In GitLab by @grst on Mar 21, 2020, 13:32

null

List of Tools

In GitLab by @grst on Jan 30, 2020, 12:59

Tools are functions that work with the data parsed from 10x/tracer and add either

new columns to obs
new matrices to obsm (e.g. distance matrices)
other summary data to uns.

They are usually required as an additional processing step before running certain plotting functions.
Here's a list of tools we want to implement.

@szabogtamas, feel free to add to/edit the list.

List of tools

st.tl.define_clonotypes(adata) assignes clonotypes to cells based on their CDR3 sequences
st.tl.tcr_dist(adata, chains=["TRA_1, "TRB_1"], combination=np.min) adds TCR dist to obsm (#11)
st.tl.kidera_dist adds Kidera distances to obsm
st.tl.chain_convergence(adata, groupby) adds column to obs that contains the number of nucleotide versions for each CDR3 AA sequence
st.tl.alpha_diversity(adata, groupby, diversityforgroup) Now we were only thinking about calculating diversity of clonotypes in different groups. But the diversity of any group could just as well be calculated.
st.tl.sequence_logos(adata, ?forgroup?) Precompute MSAs and sequence logos for plotting with st.pl.sequence_logos.
st.tl.dendrogram(adata, groupby) Compute a dendrogram on an arbitrary distance matrix (e.g. from tcr_dist).

Needs discussion

st.tl.create_group(group_membership={Group1: ['barcode1', barcode2']} adds a group membership to each cell by adding a column to obsm and the name of the grouping to a list in uns (by default, groups based on samples, V gene usage and even clonotypes could be created at initial run); might call chain_convergence and alpha_diversity functions to calculate these measures right when creating a group

Ideas, might be implemented at later stage

Shared Kmers
GLIPH
Chains recognizing the same eiptopes based on McPAS-TCR
epitope reactivity -> query external database
tcellmatch (Fischer, Theis et al. )

Function documentation

In GitLab by @grst on Nov 27, 2019, 11:32

Hi @szabogtamas,

if you still find some time, it would be great if you could add documentation to at least the most important functions, explaining the input arguments and return values. This would make it a lot easier for me to maintain the project.

I would recommend sticking to numpydoc. There's a great example here:

ef foo(var1, var2, long_var_name='hi'):
    """A one-line summary that does not use variable names.

    Several sentences providing an extended description. Refer to
    variables using back-ticks, e.g. `var`.

    Parameters
    ----------
    var1 : array_like
        Array_like means all those objects -- lists, nested lists, etc. --
        that can be converted to an array.  We can also refer to
        variables like `var1`.
    var2 : int
        The type above can either refer to an actual Python type
        (e.g. ``int``), or describe the type of the variable in more
        detail, e.g. ``(N,) ndarray`` or ``array_like``.
    long_var_name : {'hi', 'ho'}, optional
        Choices in brackets, default first when optional.

    Returns
    -------
    type
        Explanation of anonymous return value of type ``type``.
    describe : type
        Explanation of return value named `describe`.
    out : type
        Explanation of `out`.
    type_without_description

    Other Parameters
    ----------------
    only_seldom_used_keywords : type
        Explanation
    common_parameters_listed_above : type
        Explanation

    Raises
    ------
    BadException
        Because you shouldn't have done that.

    See Also
    --------
    numpy.array : Relationship (optional).
    numpy.ndarray : Relationship (optional), which could be fairly long, in
                    which case the line wraps here.
    numpy.dot, numpy.linalg.norm, numpy.eye

    Notes
    -----
    Notes about the implementation algorithm (if needed).

    This can have multiple paragraphs.

    You may include some math:

    .. math:: X(e^{j\omega } ) = x(n)e^{ - j\omega n}

    And even use a Greek symbol like :math:`\omega` inline.

    References
    ----------
    Cite the relevant literature, e.g. [1]_.  You may also cite these
    references in the notes section above.

    .. [1] O. McNoleg, "The integration of GIS, remote sensing,
       expert systems and adaptive co-kriging for environmental habitat
       modelling of the Highland Haggis using object-oriented, fuzzy-logic
       and neural-network techniques," Computers & Geosciences, vol. 22,
       pp. 585-588, 1996.

    Examples
    --------
    These are written in doctest format, and should illustrate how to
    use the function.

    >>> a = [1, 2, 3]
    >>> print([x + 3 for x in a])
    [4, 5, 6]
    >>> print("a\nb")
    a
    b
    """
    # After closing class docstring, there should be one blank line to
    # separate following codes (according to PEP257).
    # But for function, method and module, there should be no blank lines
    # after closing the docstring.
    pass

Public and private epitopes

In GitLab by @szabogtamas on Feb 19, 2020, 14:04

In publications, people commonly refer to CDR3s shared by most individuals in a group (patients) as a public chain. We should also consider if we want to offer a function to detect public CDRs and if this should just return a list or try to visualize something.
Another consideration is whether or not we should creat a metric that tells us how public is a CDR3 (maybe the diversity of the grouping variable among the cells having tat specific CDR3).

Distance-based plots and tools

In GitLab by @grst on Feb 14, 2020, 16:16

Mirror the scanpy-workflow:

tcr_dist is the equivalent of neighbors

Then, the following tools

tl.tcr_umap
st.tl.dendrogram(adata, groupby) Compute a dendrogram on an arbitrary distance matrix (e.g. from tcr_dist).

And the following plots:

pl.tcr_umap
pl.dendrogram
...

STARTRAC indexes

In GitLab by @grst on Mar 20, 2020, 18:17

Maybe also the STARTRAC indexes could be of interest: https://www.ncbi.nlm.nih.gov/pubmed/30479382

Read 10x csv data

In GitLab by @grst on Mar 18, 2020, 15:42

Even though we cannot get all information from the csv, we should still support it - some of the public datasets do not provide the json files.

Set-up docs.

In GitLab by @grst on Jan 17, 2020, 16:52

Support citing (e.g. parasail).
Support intersphinx (links to scanpy)
fix backticks

Improve documentation

In GitLab by @grst on Mar 20, 2020, 18:35

citations are not rendered correctly atm.

API docs

In particular, double-check all docstrings for

correct formatting
completeness
understandabliness (does that work even exist?)
proper citation of other methods

At that occation:

make sure all functions have sensible default parameters

General documentation

data structure (#121 )
Explain API doc (consistent with scanpy, what are tools, pp tools etc.)
Page "usage principles" -> can include the data structure and a section on how to import data.
add sample images for plotting functions
Go through tutorial to reflect changes of docs.

performance optimizations

In GitLab by @grst on Mar 31, 2020, 13:30

null

Include datasets

In GitLab by @grst on Mar 16, 2020, 15:08

Include demo datasets as part of the package (sctcrpy.datasets) similar to how it is done in scanpy.

Nice datasets to include:

10x

https://www.nature.com/articles/s41586-020-2056-8#data-availability
10x genomics demo datasets.

Smartseq2

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75688 (500 cells of human breast cancer, FASTQ available)
https://www.cell.com/immunity/pdf/S1074-7613(19)30001-9.pdf (mouse)

This is a prerequisite of making a nice tutorial.

Stacked spectratype curves shifted

In GitLab by @szabogtamas on Feb 11, 2020, 14:51

In the stacked curve mode of spectratype plotting, the fill areas are shifted because I cannot find a way to set bins for seaborn's kdeplot. Maybe we could move to scipy or scikit.

tests for IO

In GitLab by @grst on Jan 29, 2020, 18:07

Add unit tests for the IO functions.
Check values with two sample entries and verify manually.

Should plotting functions accept a dictionary

In GitLab by @szabogtamas on Feb 6, 2020, 11:22

I previously found it great help it tools of functions also accepted standard Python data structures, not only an object defined by a library that might not even be used.
Of course, scanpy is robust already and there is no real danger that we needed this backdoor, so in this particular case, I can go along with accepting AnnData objects only.
The question is rather if we want to stick to the scanpy ecosystem with our tool or try to make some parts work alone as well.
We can think about it later.

Scatter Plots

In GitLab by @grst on Mar 27, 2020, 09:46

Everyone seems to use these scatterplots:

Possible interface

ir.pl.scatter(adata, data_col, x_value, y_value, color)

Improving distance-functions

In GitLab by @grst on Feb 28, 2020, 12:56

With !10, the basic functionality to compute distances was added.

Next steps:

for leiden to work, the distances need to be turned into adjacencies. This needs to be somehow achieved with some low-level umap code. Also, this is better for storing as sparse format.
let the user decide on "primary only" vs "all chains"
reconcile TRA and TRB
reconcile the four TRA matrices

Goal:

Mirror the scanpy workflow

neighbors to compute adjacency matrices
leiden to cluster = "define clonotypes"
umap to visualize similarity of clonotypes

Assure there are legends and axis labels for all plots

In GitLab by @grst on Mar 20, 2020, 18:19

... and that they are legible
I would try to have always a title on the color legend to be very clear about what is visualized

Repertoire overlap

In GitLab by @szabogtamas on Mar 30, 2020, 07:05

Add possibility to analyse the overlap of the samples based on the clonotype network.
What I would picture here is a heatmap of pairwise Hamming distances, where the binary string is the presence or absence of a given sample in a clonotype cluster.
Could not find a relevant function in Scanpy, but it does not seem very complicated with scipy.spatial.distance.hamming.

make `target_col` optional in `group_abundance`.

In GitLab by @grst on Mar 27, 2020, 14:14

null

List of plots [REPLACEMENT ISSUE]

The original issue

Id: 9
Title: List of plots

TODO

Sequence Logos

In GitLab by @grst on Jan 24, 2020, 14:06

Here's a quite recent re-implementation of sequence logos in Python that looks promising:
https://github.com/jbkinney/logomaker

We will also require an algorithm for multiple sequence alignment in addition to the pairwise one that we have already.

Things to discuss:

What sequences to use (TCRA/B primary only?)
What happens if a cell lacks the alpha/beta chain?

Make tutorial nice

In GitLab by @grst on Mar 20, 2020, 18:16

null

Scale up TCR-dist

In GitLab by @grst on Mar 20, 2020, 18:26

null

Consider computing inexpensive tools on-the-fly

In GitLab by @grst on Feb 13, 2020, 16:03

I have just been using sctcrpy with Sandro to analyse some TCR data.
He could use the library on his own without a lot of help from my side, except for plotting different subsets of adata (all plots looked the same because it re-used the precomputed values). And it is indeed a bit unintutitive...

Maybe we should reconsider the need of a tool for each plotting function, at least for the cases where it can be computed inexpensively.

Allow "themeing" for figures?

In GitLab by @szabogtamas on Feb 6, 2020, 12:02

Maybe this is something already available in Scanpy and I recall something similar is already implemented in Seaborn.
I think that in order to make plots prettier, a key step would be to fix size.
For most publication scenarios a 3.44 x 2.58 inch figure is a safe choice (a single text column in most journals) that resolves to around the 1050x768 pixels standard, also good enough for presentations.
This figure size is also good, if individual figures will realistically become subplots of a multipanel figure later. If the figure looks nice at this size, chances are good that something will still be visible if the area is reduced to a quarter.
Font size and spacing could be optimized to this size.
Later this could be called a small image profile and a full-page version could also be made.
This could also be something that we can store in the uns and the plotting function could check if anything was passed explicitly, the value is set in uns or fall back to matplotlib defaults.
A separate profile could be set up for colors.

CDR3 convergence

In GitLab by @szabogtamas on Feb 11, 2020, 16:00

I think we need to think again about how to compute the convergence and how useful it is in the context of single cell data.
In our example dataset, it appears so occur at negligible rates and doesn't make sense to be visualized as a barplot.
Finally, I believe it measures something similar as TCR-dist.

Ideas:

Create a single value measure (like singlet rate).
Rather show on UMAP than as bar plot?

The CDR3 convergence is currently hidden from the public API, but the preliminary code is still in _plotting._cdr_convergence.

Find better name

In GitLab by @grst on Mar 20, 2020, 18:16

The current one, sctcrpy is hard to pronounce and remember.

Also it would be nice if the name left the option to expand to BCRs later on.

imm, sc, py, receptor, cr, ... ??

Make a few dependencies optional

In GitLab by @grst on Mar 31, 2020, 10:23

null

Improve network plots

In GitLab by @grst on Mar 20, 2020, 18:27

null

Ignore uncategorized cells (NaN), when calculating fractions?

In GitLab by @szabogtamas on Feb 6, 2020, 11:09

There are lots of cells without TCR and thus, no clnotype assigned. Should we exclude them from the base, when calculating clusters or include them as a fraction? Include on plot?
Pro: it is cleaner to show fractions of the categorized cells
Con: they are there...

Basic plots

In GitLab by @grst on Jan 30, 2020, 11:18

This issue is about implementing basic plotting functions that can be re-used to plot any column in obs by a group.

sc.pl.violin already exists in scanpy.
Example:

sc.pl.violin(adata, ["TRA_1_cdr3_len", "TRA_1_junction_ins"], groupby="leiden")

Do we also want

barplot
boxplot

Error when running with many cells

In GitLab by @grst on Nov 19, 2019, 14:00

Hi @szabogtamas,

after fixing the sample prefixes, the pipeline ran fine on two samples of the Vanderburg study.
However, when including all samples (50k cells) a certain object appears to become too large for multiprocessing (see error message below).

This is just to keep track of the error, we can discuss it next week and I can probably fix it myself.

This could help, but there's probably an even better way to deal with this:
https://stackoverflow.com/questions/29704139/pickle-in-python3-doesnt-work-for-large-data-saving

/home/sturm/data/projects/2019/IBDome/datasets/HMP2
Error executing process > 'kideraDistances (1)'

Caused by:
  Process `kideraDistances (1)` terminated with an error exit status (1)

Command executed:

  chainBasedCellDistanceCalculations.py kidera             chainKideras.tsv kideraDistanceMatrix.h5
  chainBasedCellDistanceCalculations.py celldist             kideraDistanceMatrix.h5 chainPairs.tsv             minimum 8

Command exit status:
  1

Command output:
  <KeysViewHDF5 ['distances', 'names']>

Command error:
  Traceback (most recent call last):
    File "/home/sturm/projects/2019/singlecell_tcr/bin/chainBasedCellDistanceCalculations.py", line 258, in <module>
      __main__()
    File "/home/sturm/projects/2019/singlecell_tcr/bin/chainBasedCellDistanceCalculations.py", line 26, in __main__
      cellDistances, cells = calculateCellDistanceFromChains(condensedDistances, chains, chainsOnCells, disambiguation=disambiguation, numCores=numCores)
    File "/home/sturm/projects/2019/singlecell_tcr/bin/chainBasedCellDistanceCalculations.py", line 82, in calculateCellDistanceFromChains
      M = p.map(checkCellDistance, itertools.zip_longest(itertools.combinations(cellnames, 2), [], fillvalue=(condensedDistances, chainsOnCells, posDict, disambiguation, L)), chunksize=chunkSize)
    File "/home/sturm/projects/2019/singlecell_tcr/work/conda/tcrpy3-50638e2f3e2662bd69be51301015f0d3/lib/python3.7/multiprocessing/pool.py", line 268, in map
      return self._map_async(func, iterable, mapstar, chunksize).get()
    File "/home/sturm/projects/2019/singlecell_tcr/work/conda/tcrpy3-50638e2f3e2662bd69be51301015f0d3/lib/python3.7/multiprocessing/pool.py", line 657, in get
      raise self._value
    File "/home/sturm/projects/2019/singlecell_tcr/work/conda/tcrpy3-50638e2f3e2662bd69be51301015f0d3/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
      put(task)
    File "/home/sturm/projects/2019/singlecell_tcr/work/conda/tcrpy3-50638e2f3e2662bd69be51301015f0d3/lib/python3.7/multiprocessing/connection.py", line 206, in send
      self._send_bytes(_ForkingPickler.dumps(obj))
    File "/home/sturm/projects/2019/singlecell_tcr/work/conda/tcrpy3-50638e2f3e2662bd69be51301015f0d3/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
      cls(buf, protocol).dump(obj)
  OverflowError: cannot serialize a bytes object larger than 4 GiB

Work dir:
  /home/sturm/projects/2019/singlecell_tcr/work/c0/c9bf36401672277dae2d9edac9d8b4

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Pseudo counts if fraction?

In GitLab by @szabogtamas on Feb 11, 2020, 10:36

In case of spectratypes, I am using pseudocounts to draw nicer KDE curves if fraction is true.
Does peudocount make sense for a stripplot or boxplot?

Fix KDE/curve plot [REPLACEMENT ISSUE]

The original issue

Id: 31
Title: Fix KDE/curve plot

TODO

VDJ usage as Sankey plot [REPLACEMENT ISSUE]

The original issue

Id: 24
Title: VDJ usage as Sankey plot

TODO

Fix vdj_usage plot

In GitLab by @grst on Mar 28, 2020, 12:18

null

Find consistent way for checking for 'nan' in `obs` columns

In GitLab by @grst on Jan 19, 2020, 17:40

adata.obs columns are converted to categorical on plotting. None, NaN might be turned into str(nan) in that case which breaks pd.isna etc.

scverse / scirpy Goto Github PK

scirpy's Introduction

Scirpy: single-cell immune receptor analysis in Python

Getting started

Installation

Release notes

Support and Contact

Citation

scirpy's People

Contributors

Stargazers

Watchers

Forkers

scirpy's Issues

List of tools

Needs discussion

Ideas, might be implemented at later stage

API docs

General documentation

10x

Smartseq2

Goal:

Recommend Projects

Recommend Topics

Recommend Org