Coder Social home page Coder Social logo

Comments (38)

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 22, 2020, 16:29

changed the description

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 22, 2020, 16:30

changed the description

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 22, 2020, 16:33

changed the description

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 22, 2020, 16:40

changed the description

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 23, 2020, 12:02

image

Clonal Expansion visualized in the Zheng et al (2017) paper.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 24, 2020, 11:30

@szabogtamas

  • number of inserted nucleotides -> box or bar plots by cell group; or maybe color-based on the umap
  • number of nucleotides inserted in the vj junction -> box or bar plots by cell group; or maybe color-based on the umap

Where do we find that data in 10x and Tracer files?

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 24, 2020, 12:03

This is an information that has to be calculated by the preprocessing script. In the contigs.json file we have the start and and positions for V, D and J blocks and if the start of D is not the one after the end of V, then there are inserted nucleotides

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 24, 2020, 12:15

ok, so no way getting this from the .csv file.
Could you check where we can find this information in the Tracer data?

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 24, 2020, 12:49

With 10x it is not a problem. This is included in the current preprocessing script and we can make it a separate function easily.

With Tracer it is a good point. I cannot find a hint in the summary files. What we can do is to parse the filtered_TCR_seqs/filtered_TCRs.txt files for the trinity id of the chosen sequences (the ones that will be considered as TCR of the cell) and then extract the IgBlast result for that trinity id from that file we were looking at the last time. This would also solve the CDR1&2 issue. If we are lucky, we can convert the IgBlast output to json first and then it is doable.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 24, 2020, 14:11

I agree that parsing the summary data from tracer is not enough (see also #10).

Some more points regarding the plots:

  • length of CDR3 regions -> box or bar plots by cell group; or maybe color-based on the umap
  • number of inserted nucleotides -> box or bar plots by cell group; or maybe color-based on the umap
  • number of nucleotides inserted in the vj junction -> box or bar plots by cell group; or maybe color-based on the umap

Here, again, we will need to define for which chains we want to plot that. Aggregate it by cell? Make (up to) four different plots?

number (or ratio among total cell number) of cells in a clonotype -> Sankey-like plots or bar plots by cell group; or maybe a treemap by cell group or a heat plot for two different groupings (samples vs cell types)

I don't get this one, do you have an example figure?

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 27, 2020, 16:29

Wishlist restructured

I. Cell-based information:
Calculated once by the preprocessing script when importing 10x or Tracer results

  • length of CDR3 regions (continous, four columns)
  • number of inserted nucleotides (continous, four columns). VJ for alpha, VD + DJ for Beta.
  • CDR3 sequence, AA and NT (string, four columns each)
  • presence of secondary chain (categorical, single column)
  • clonotype
  • VDJ genes (categorical, 12 columns)

I.2 Cell-based information, generated

  • clonotypes (categorical, single column)
  • convergence of chains [nucleotide versions of a single CDR3 aa sequence] (continous, four columns)

II. Cell-cell relation:
These are basically cell-based features, but the table would just explode if we wanted to include it in the cell table. It might be better to keep it separately in sparse matrix or an upper triangle. Or create umaps and store the x and y of umaps in the cell table?

  • Shared clonotypes

  • Shared chains (four columns)

  • Shared CDR3aa but different CDR3nuc

  • Similar CDR3aa sequences (tcrdist)

  • Similar physicochemical features (Kidera factors)

  • ?Shared kmers?

  • ?GLIPH networks?

  • ?Chains recognizing the same eiptopes based on McPAS-TCR?

  • ?epitope reactivity? (list, single column) -> query external database

  • number of inserted nucleotides (continous, four columns)

III. Group-based features
Has to be calculated on the fly or when creating groups <- from cell-based information

  • number of cells in the group (absolute number and ratios)
  • size of clonotypes in the group (absolute number and ratios) <- from clonotype membership and number of cells in the group
    • clonotype multiplicity <- from clonotype membership and number of cells in the group
    • size of singleton clonotypes in the group (absolute number and ratios) <- from clonotype multiplicity and number of cells in the group
    • size of doublet clonotypes in the group (absolute number and ratios) <- from clonotype multiplicity and number of cells in the group
    • size of triplet clonotypes in the group (absolute number and ratios) <- from clonotype multiplicity and number of cells in the group
    • size of quadriple clonotypes in the group (absolute number and ratios) <- from clonotype multiplicity and number of cells in the group
    • quartile distribution of clonotypes in the group <- from size of clonotypes
  • diversity of the group <- from clonotype membership and number of cells in the group
  • spectratype (cell number by CDR3 length) in the group <- from length of CDR3 regions
  • VDJ (or VJ) usage in the group <- from VDJ genes
  • sequence logo of group <- from chain identities

IV. Group-based features
Probably this one is the most problematic, but the question will often arise in this context: which samples, patients or cell types share a feature...

  • repertoire overlap among groups
  • ?similar spectratypes?
  • ?similar usage of VDJ genes?
  • +all the features calculated as cell-cell relations

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 30, 2020, 11:57

List of plotting functions to implement moved to the very top

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 30, 2020, 12:59

changed title from {-Write a list of plots we want to have-} to {+List of plots+}

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 30, 2020, 15:40

We have sequence_logo both as a tool and as a plotting function. I would only go for the plotting function. Maybe I would even put the alpha_diversity into the plotting part only. It will always have to be recalculated by groups and we might not even want to store it. Furthermore, the best place to store them would be the uns that I would like to keep as empty as possible.

If so, convergence calculation might also be a plotting function only.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 30, 2020, 15:44

Hmm, I'll think about it.

In scanpy it is common practice to have everything as both a tool and a plotting function.
In scanpy, a plotting function never computes anything, it just displays stuff that is already in anndata.

This makes especially sense, when plotting something that takes a long time to compute (e.g. UMAP, and, in our case, sequence logos).

I agree that it feels a bit cumbersome, though, to always call tl.alpha_diversity and pl.alpha_diversity for each group.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 30, 2020, 16:15

Another, conceptual question is the naming of the functions. For most plots, the same dataset (a list or dictionary of values) would be passed on to draw a violin, box or barplot. Should we name the plotting functions according to what they draw (violin or bar) or based on the question they answer? In the latter case, the visualization type would only be an attribute.

For example:

  • st.pl.cdr3_length(adata, groupby=None, subgroupby=None, vistype='violin')

    • the groupby argument can be the name of a grouping column or None to show this for the whole population
    • subgroupby would give us the paired or stacked columns if specified
    • vistype specifies the actual look of the plot; would be violin and umap for now, but later we could add box and bar, as well as histogram and that is actually equal to a spectratype
  • st.pl.group_abundance(adata, forgroup='clonotype', groupby='sample', subgroupby=None, relative=None, vistype='bar')

  • st.pl.group_overlap(adata, forgroup='clonotype', groupby='sample', subgroupby=None, relative=None, vistype='chord')

  • st.pl.diversity(adata, forgroup='clonotype', groupby='sample', subgroupby=None, relative=None, vistype='bar')

  • st.pl.convergence(adata, forgroup='clonotype', groupby='sample', subgroupby=None, relative=None, vistype='bar')

  • st.pl.vdj_usage(adata, groupby='sample', relative=None, vistype='chord')

  • st.pl.sequence_logos(adata, groupby='sample', vistype='logo')

  • st.pl.group_similarities(adata, groupby='sample', distancematrix='tcrdist', vistype='chord|umap|dendrogram')

Of course, we could have a separate set of plotting functions called chord and so on that would be called by the upper convenience functions.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 30, 2020, 16:20

I think we should differentiate between the basic and specific plotting functions.

The basic ones should be named by what they show (e.g. violin).

For the specific ones, I'm in favor of having a single function for which the vistype is an option.
Like that we can also start with a single visualization, and add others later on.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 31, 2020, 08:44

In scanpy they just have a function for each vistype, e.g.

sc.pl.rank_genes_groups_dotplot
sc.pl.rank_genes_groups_matrixplot
sc.pl.rank_genes_groups_heatmap

Pro: it generates visibility for each visualization type
Con: It's just soo many functions.

I think I'm still in favor of having the vistype argument.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 31, 2020, 09:57

Or we can make a "mother function" that has the vistype option and thus it is easier to extend for now and create "fake" plotting functions that just call the "mother function" with one specific vistype argument, just to conform scanpy conventions better...

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 31, 2020, 10:24

Regarding the duplication of functions in plotting and tools:

Pro:

  • makes sense for computationally expensive plots (e.g. sequence logos) to not re-compute everything, just because you want to change the axis label
  • It might be sometimes relevant to get the raw values (e.g. get the actual entropy values from alpha_diversity instead of just plotting them.
  • Seems to be a common pattern in scanpy

Con:

  • More complicated for the user to call two functions to get a plot.
  • Storing stuff in AnnData that might never be needed again

A compromise could be to offer both pl and tl functions, but automatically run the tool with default parameters from the plotting function, if it has not been precomputed.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 31, 2020, 10:38

Yes, I think checking if the tool function was run and calling it from the plotting function if not is an excellent idea! We should do this!

Regarding the raw values: at some point there might be a need for creating tables. Especially, if it is just diversity scores for a couple of samples or the abundance of the top 10 clonotypes in the samples. In my mind this would also be a plotting function.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 31, 2020, 10:41

Regarding the raw values: at some point there might be a need for creating tables. Especially, if it is just diversity scores for a couple of samples or the abundance of the top 10 clonotypes in the samples. In my mind this would also be a plotting function.

Let's keep that in mind. Maybe vistype='table' could be an option.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 31, 2020, 10:46

changed the description

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 31, 2020, 10:48

Yes, exactly: vistype='table' and leave the implementation for a later time point.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 31, 2020, 10:53

Can you please integrate this list into the overview at the top?
I think there are some duplicates...

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Jan 31, 2020, 10:57

changed the description

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 31, 2020, 11:38

changed the description

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Jan 31, 2020, 11:40

I edited the overview.

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Feb 2, 2020, 19:48

marked the task st.pl.clonal_expansion(adata, forgroup='clonotype', groupby='sample', subgroupby=None, relative=None, vistype='bar' Show fraction of n=1, n=2 and n>=3 clonotypes for each group in the groupby (optionally combined with subgroupby) grouping in obs. If relative is not None, it should point to a grouping, ideally one already supplied as groupby or subgroupby. as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @grst on Feb 2, 2020, 19:48

marked the task st.pl.alpha_diverities(adata, forgroup='clonotype', groupby='sample', subgroupby=None, vistype='bar' Simply plot the diversity scores calculated by the st.tl.alpha_diverities function. as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:36

marked the task st.pl.group_abundance(adata, forgroup='clonotype', groupby='sample', subgroupby=None, relative=None, vistype='bar') The number of ratio of a group present in another group for example, the presence of the top10 clonotypes in each sample. as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:36

marked the task st.pl.cdr_convergence(adata, forchain='alpha', groupby='sample', subgroupby=None, vistype='bar' For each cell, we check, how many nucleotide versions of the CDR3 region of forchain exist in groupby as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:37

marked the task st.pl.chain_pairing(adata, forchain='alpha', groupby='sample', subgroupby=None, vistype='bar' Plots the ratio of single pair, double pair and orphan alpha or beta chain cells. We just call a basic plotting function, just include it as a separate function for the sake of completeness. as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:37

marked the task st.pl.spectratype(adata, groupby='sample', subgroupby='Vgene', relative=None, vistype='chord') The distribution (pdf) of CDR3 lengths in cell groups (cell types, samples, or cells with a specific V gene); stakced barplot or just histogram curves as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:38

marked the task st.pl.repertoire_overlap(adata, forgroup='clonotype', groupby='sample', subgroupby=None, relative=None, vistype='chord') The number or fraction of cells that belong to the same forgroup but different groupby. In principle, it has to be computed pairwise and results in a similarity matrix for the groups in groupby. In case of fractions, we have to know what is the base. I am not sure if subgroupby is an option here. as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:39

marked the task st.pl.sequence_logo(adata, group=celltypes['CD8'], letter=amino_acids, vistype='logo') as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:39

marked the task st.pl.sequence_logos(adata, groupby=celltypes, letter=amino_acids, vistype='logo') as completed

from scirpy.

grst avatar grst commented on July 18, 2024

In GitLab by @szabogtamas on Feb 12, 2020, 13:47

closed

from scirpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.