Coder Social home page Coder Social logo

Annotating Seurat clusters about scnym HOT 4 CLOSED

calico avatar calico commented on May 24, 2024
Annotating Seurat clusters

from scnym.

Comments (4)

jacobkimmel avatar jacobkimmel commented on May 24, 2024

Hi @timlai4,

Thanks for your interest!

I would like to annotate these clusters. Can scNym accomplish this?

Yes, but scNym does not use the heuristic dimensionality reduction employed by Seurat. The scNym neural network model learns an embedding (a dimensionality reduction solution) within the classifier. After you run scNym on a dataset, you can look at this embedding in the adata.obsm["X_scnym"] attribute added to your AnnData object.

I ask because I'm going through the README and it seems that scNym only takes the count matrix as an input, without any a priori clustering or dimension reduction, let alone integration, and attempts to annotate each individual cell? In other words, it seems an scNym workflow foregoes the standard workflow past preprocessing and does the clustering itself, via the automatic annotation.

That's correct, scNym operates on the normalized counts matrix of all genes, not a dimensionality reduction.

We've run extensive benchmarks and we've found that providing all gene expression values leads to superior cell type classifications relative to latent variables from a dimensionality reduction (e.g. PCA in Seurat) or integration procedures (e.g. CCA in Seurat).

This is true not only for scNym models, but most simple baselines (e.g. SVM, k-NN). Providing the normalized gene expression values appears to be superior, regardless of the classification model.

The basic intuition here is that the information needed for effective cell type classification may be lost in dimensionality reduction that does not take classification accuracy into account.

Unsupervised integration in particular has been problematic in our hands, often leading to very poor cell type classification outcomes. This follows a similar logic -- unsupervised integration procedures have no constraint on cell type classification accuracy, so they may fail to capture information that is useful for cell type calls.

If I've understood this correct, is it still possible nonetheless to incorporate scNym into my current framework? I understand I need to convert my Seurat object to a scanpy format in order to even attempt it.

You understood perfectly. Yes, you'd simply provide scNym with the normalized gene expression counts (log1p(CPM)), then train a classifier as in our tutorials to transfer labels to your cells.

You could then use these cell type labels and display them in any embedding you found useful (e.g. Seurat CCA integration).

Finally, I was also wondering what kind of performance I might be able to expect from scNym on cell lines were the cells are not expected to really differentiate? In particular, I am seeking cluster annotation tools because the manual annotation by comparing markers is proving too difficult.

Hmm, I'm a little unclear on what sort of cells you're looking at here. Most cell type annotation problems are encountered with primary cells and tissues where you have complex mixtures of cell types and states in a single experiment.

Could you provide a bit more information on your experimental setup? Happy to help further if I can.

All the best,
Jacob

from scnym.

timlai4 avatar timlai4 commented on May 24, 2024

Dear Jacob,

Thanks for your detailed explanation. I did have one follow-up question to your answer.

As you indicated, I should run scNym on the normalized count matrix, and scNym will annotate each individual cell appropriately. If I have two or more samples, on which I am running an integrated analysis in Seurat and have integrated clusters, do I pass in an integrated count matrix to scNym or pass in the count matrices separately for annotation?

Could you provide a bit more information on your experimental setup? Happy to help further if I can.

I am working with cells sourced from lots of different types of cell lines. It seems that in these cell lines, biologically, we don't expect much cell differentiation and from the literature, it seems it may not be possible to classify all the clusters as specific cell types. This isn't a very pressing question, since either way I intend to run scNym and validate the results.

from scnym.

timlai4 avatar timlai4 commented on May 24, 2024

Update:

I ended up picking the control samples individually and tried running scNym on those. I saw all the cells had basically 0 confidence scores. Doing a bit more research, I believe it's because the particular cancer cells my samples belong to are not part of the Cell Atlas yet. I believe at this point, in order to get scNym to work in my case, I will need to find an annotated single cell dataset, train scNym on that, and then do the prediction. If you have any feedback on this approach, especially on running scNym on an experiment with multiple samples, please let me know.

Thanks.

from scnym.

jacobkimmel avatar jacobkimmel commented on May 24, 2024

Hey Tim,

Apologies for the delay here.

do I pass in an integrated count matrix to scNym or pass in the count matrices separately for annotation?

You should use the raw counts matrix, providing the domain labels as a parameter to scNym's domain_groupby argument.

I am working with cells sourced from lots of different types of cell lines. It seems that in these cell lines, biologically, we don't expect much cell differentiation and from the literature, it seems it may not be possible to classify all the clusters as specific cell types. This isn't a very pressing question, since either way I intend to run scNym and validate the results.
Doing a bit more research, I believe it's because the particular cancer cells my samples belong to are not part of the Cell Atlas yet.

Yes, immortalized cell lines do not always have a canonical "cell identity" annotation. These lines tend to become fairly distinct from primary cells over time, often with highly irregular ploidy.

As you found, there aren't many reference datasets annotating cell line identity based on RNA profiles. Most experiments involving these lines are designed in such a way that cell line identity can be identified using some orthogonal feature -- e.g. the sample index barcode, or genetic differences across lines.
I think scNym would only be useful for immortalized cell line experiments if (1) cell lines are pooled together, with non-unique sample indices and (2) cell lines are genetically similar, so that assigning identities based on variant calls is impractical.

I will need to find an annotated single cell dataset, train scNym on that, and then do the prediction. If you have any feedback on this approach, especially on running scNym on an experiment with multiple samples, please let me know.

Yes, to use scNym you would definitely need a reference dataset that contains your cell types of interest, but as noted above, scNym not be the most helpful tool depending on your problem.

from scnym.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.