Hi, I have a question about using scNym with my Seurat analysis. I h

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey Tim, Apologies for the delay here. <p dir="auto

Annotating Seurat clusters about scnym HOT 4 CLOSED

calico commented on May 24, 2024

Annotating Seurat clusters

from scnym.

Comments (4)

jacobkimmel commented on May 24, 2024

Hi @timlai4,

Thanks for your interest!

I would like to annotate these clusters. Can scNym accomplish this?

Yes, but scNym does not use the heuristic dimensionality reduction employed by Seurat. The scNym neural network model learns an embedding (a dimensionality reduction solution) within the classifier. After you run scNym on a dataset, you can look at this embedding in the adata.obsm["X_scnym"] attribute added to your AnnData object.

I ask because I'm going through the README and it seems that scNym only takes the count matrix as an input, without any a priori clustering or dimension reduction, let alone integration, and attempts to annotate each individual cell? In other words, it seems an scNym workflow foregoes the standard workflow past preprocessing and does the clustering itself, via the automatic annotation.

That's correct, scNym operates on the normalized counts matrix of all genes, not a dimensionality reduction.

We've run extensive benchmarks and we've found that providing all gene expression values leads to superior cell type classifications relative to latent variables from a dimensionality reduction (e.g. PCA in Seurat) or integration procedures (e.g. CCA in Seurat).

This is true not only for scNym models, but most simple baselines (e.g. SVM, k-NN). Providing the normalized gene expression values appears to be superior, regardless of the classification model.

The basic intuition here is that the information needed for effective cell type classification may be lost in dimensionality reduction that does not take classification accuracy into account.

Unsupervised integration in particular has been problematic in our hands, often leading to very poor cell type classification outcomes. This follows a similar logic -- unsupervised integration procedures have no constraint on cell type classification accuracy, so they may fail to capture information that is useful for cell type calls.

If I've understood this correct, is it still possible nonetheless to incorporate scNym into my current framework? I understand I need to convert my Seurat object to a scanpy format in order to even attempt it.

You understood perfectly. Yes, you'd simply provide scNym with the normalized gene expression counts (log1p(CPM)), then train a classifier as in our tutorials to transfer labels to your cells.

You could then use these cell type labels and display them in any embedding you found useful (e.g. Seurat CCA integration).

Finally, I was also wondering what kind of performance I might be able to expect from scNym on cell lines were the cells are not expected to really differentiate? In particular, I am seeking cluster annotation tools because the manual annotation by comparing markers is proving too difficult.

Hmm, I'm a little unclear on what sort of cells you're looking at here. Most cell type annotation problems are encountered with primary cells and tissues where you have complex mixtures of cell types and states in a single experiment.

Could you provide a bit more information on your experimental setup? Happy to help further if I can.

All the best,
Jacob

from scnym.

timlai4 commented on May 24, 2024

Dear Jacob,

Thanks for your detailed explanation. I did have one follow-up question to your answer.

As you indicated, I should run scNym on the normalized count matrix, and scNym will annotate each individual cell appropriately. If I have two or more samples, on which I am running an integrated analysis in Seurat and have integrated clusters, do I pass in an integrated count matrix to scNym or pass in the count matrices separately for annotation?

Could you provide a bit more information on your experimental setup? Happy to help further if I can.

I am working with cells sourced from lots of different types of cell lines. It seems that in these cell lines, biologically, we don't expect much cell differentiation and from the literature, it seems it may not be possible to classify all the clusters as specific cell types. This isn't a very pressing question, since either way I intend to run scNym and validate the results.

from scnym.

timlai4 commented on May 24, 2024

Update:

I ended up picking the control samples individually and tried running scNym on those. I saw all the cells had basically 0 confidence scores. Doing a bit more research, I believe it's because the particular cancer cells my samples belong to are not part of the Cell Atlas yet. I believe at this point, in order to get scNym to work in my case, I will need to find an annotated single cell dataset, train scNym on that, and then do the prediction. If you have any feedback on this approach, especially on running scNym on an experiment with multiple samples, please let me know.

Thanks.

from scnym.

jacobkimmel commented on May 24, 2024

Hey Tim,

Apologies for the delay here.

do I pass in an integrated count matrix to scNym or pass in the count matrices separately for annotation?

You should use the raw counts matrix, providing the domain labels as a parameter to scNym's domain_groupby argument.

I am working with cells sourced from lots of different types of cell lines. It seems that in these cell lines, biologically, we don't expect much cell differentiation and from the literature, it seems it may not be possible to classify all the clusters as specific cell types. This isn't a very pressing question, since either way I intend to run scNym and validate the results.
Doing a bit more research, I believe it's because the particular cancer cells my samples belong to are not part of the Cell Atlas yet.

Yes, immortalized cell lines do not always have a canonical "cell identity" annotation. These lines tend to become fairly distinct from primary cells over time, often with highly irregular ploidy.

As you found, there aren't many reference datasets annotating cell line identity based on RNA profiles. Most experiments involving these lines are designed in such a way that cell line identity can be identified using some orthogonal feature -- e.g. the sample index barcode, or genetic differences across lines.
I think scNym would only be useful for immortalized cell line experiments if (1) cell lines are pooled together, with non-unique sample indices and (2) cell lines are genetically similar, so that assigning identities based on variant calls is impractical.

I will need to find an annotated single cell dataset, train scNym on that, and then do the prediction. If you have any feedback on this approach, especially on running scNym on an experiment with multiple samples, please let me know.

Yes, to use scNym you would definitely need a reference dataset that contains your cell types of interest, but as noted above, scNym not be the most helpful tool depending on your problem.

from scnym.

Annotating Seurat clusters about scnym HOT 4 CLOSED

Comments (4)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent