Coder Social home page Coder Social logo

Comments (12)

YiweiNiu avatar YiweiNiu commented on July 21, 2024 1

Hello, sorry to re-open the closed issue.

When using immunarch to read 10x output (specifically, filtered_contig_annotations.csv), I found clonotypes were defined by CDR3 nucleotide sequence+V gene+J gene+Chain. @vadimnazarov also confirmed this here.

I have one question: why not just use clonotype id like 'raw_clonotype_id' or other column names specified by users to define clones?

If so, users can also generate input files by themselves and not worry about the definition of clonotypes.

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

Hi, thank you for noticing that! We will fix it in the upcoming release at the beginning of the next week.

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

Hi, we were looking into it more thoroughly and found it's quite complicated. The key problem here is how do you define a clonotype here - same CDR3 aa? Same CDR3aa + V + J? Same CDR3aa alpha and beta? And what to do in case of two alpha chains? So it's a much deeper issue that we expected it at first. We will look into it after this release to make sure immunarch is going to the direction of single-cell support. However, is there anything else, probably the very simple and basic, that can we do to help you with single cell analysis? For example, we can add an additional column called "Barcode" with barcodes from the original files so you can process them by yourself.

from immunarch.

KyleTCL avatar KyleTCL commented on July 21, 2024

1.I would suggest to maybe default to CDR3aa as clonotype and perhaps provide an alternative strict option to also look at CDR3nt and/or V/J genes. For example:

immdata <- repLoad("/path/to/data.csv", .format = "10x", .clonotype = "CDR3aa")
  1. Besides, perhaps it will be a good idea to also look at alpha and beta chain separately. (Having both in the same dataset will increase the clonotype repertoire*2, i.e. same TCR will be treated differently since they are different entry in the dataframe). This will have to be solved by either defining clonotype as paired alpha/beta. Perhaps just use as it is from 10x data, regardless double alpha/beta. User can clean the data as they see fit. However, I am not sure how this can be integrated into your package for compatibility with other non-single cell method.

  2. An additional column for Barcode will be excellent! This will make integration of transcriptome data easier with immunarch workflow.

  3. As for additional features, visualization such as scatter plot for comparison of 2 samples will be nice.

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

Hi @KyleTCL , we updated the package to 0.5.4. It correctly parses and extract clonotypes. To filter clones by barcodes, use the filter_barcode function. Can you please try it and get back to us if any problems arise?

from immunarch.

EugeneRumynskiy avatar EugeneRumynskiy commented on July 21, 2024

The issue was solved.

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

Hi @YiweiNiu ,

A very important questions, thank you for asking! The problem lies in different approaches to data analysis:

  • Single-cell is focused on the cell-level.
  • AIRR analysis is focused on repertoires.

So in order to compute statistics such as diversity or gene usage, we should know the number of clones per clonotype, i.e., merge "single-cell-clonotypes" into "airr-clonotypes". "Raw_clonotype_id" is about sequence objects only, and AIRR analysis tools work with "sequence+counts" objects.

from immunarch.

YiweiNiu avatar YiweiNiu commented on July 21, 2024

Hi @vadimnazarov ,

Thank you for your reply! New to this field, I apologize if my questions are too naive. I am working with 10x VDJ data and not familiar with AIRR. I would like to describe the challenges I encountered.

I have scTCR-seq for several samples from different tissues, i.e. blood, normal tissue, and tumor. Since I wanted to define clonotypes using both TCRA and TCRB chains, so I could not use immunarch directly. My plans were to define clonotypes by myself and then feed it to immunarch for high-level analysis (tasks such as gene usage, repertoire overlaps, diversity). As for the 'number of clones per clonotype', I could also count it by myself.

But I found it difficult to integrate with immunarch after reading part of the source code , as immunarch has its own way to define clonotypes and to compute such statistics. Also, since comparing different groups of cells (such as different clusters or different tissues) is needed, cell barcodes and TCR sequences should be in one data frame.

I wanted to use change-o to define clonotypes in the analysis of scBCR-seq. This would encounter the same problem like scTCR-seq.

Best,
Yiwei

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

Hi @YiweiNiu

Thank you for the feedback and thank you for descibing your challenges! I appreciate this, it would help us to make the package better. I have several questions for you to make sure I understood you correctly.

  1. I wanted to define clonotypes using both TCRA and TCRB chains - would you like to use V/J genes as well? Why?

  2. tasks such as gene usage, repertoire overlaps, diversity - What would be the hypotheses you plan to test? What are your goals with this analysis?

  3. comparing different groups of cells (such as different clusters or different tissues) - What types of analysis would you like to do that requires clone-level metadata?

  4. I wanted to use change-o to define clonotypes in the analysis of scBCR-seq - What BCR analysis methods would you like to apply to the BCR data?

Our next major milestones are full support for paired single-cell data and BCR analysis. Your feedback on your analysis goals and routines are greatly appreciated. It will accelerate the development a lot as we will follow specific use cases. Thank you!

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

Hi @YiweiNiu

Would you prefer to discuss this questions via a quick 20min Zoom call? If so, feel free to send an email to [email protected] and we will schedule a call.

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

Pinging @YiweiNiu

from immunarch.

vadimnazarov avatar vadimnazarov commented on July 21, 2024

We have initial single-cell exploration routines here: https://immunarch.com/articles/web_only/v21_singlecell.html

Please create separate issues in case of additional feature requests and bugs. Thank you!

from immunarch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.