Hi, I would be grateful for some guidance on the best approach for a

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

scPoli usage about scarches HOT 4 CLOSED

Nusob888 commented on June 19, 2024

scPoli usage

from scarches.

Comments (4)

moinfar commented on June 19, 2024 1

Hi,

Thanks for asking this interesting question.

I believe the current practice involves building a healthy atlas and mapping disease samples on top, as seen in HLCA, scArches, Supp Fig 6 of scPoli, and this paper from Marioni's lab. The reason is that when you integrate them all together, the cVAE will try to cancel out every variation between healthy and disease samples (as it does for batches). Please note that you can map disease on top of healthy in scPoli as well, which does not contradict what goes next. In fact, we show in Supp Fig 6 of scPoli that detecting cancer cells (integrate healthy then map cancer on top) performs much better when constructing the healthy atlas using scPoli compared to SCANVI.

However, while you remove sample-level information in most cVAE approaches, scPoli keeps this information in the covariate embeddings, and you are able to analyze them. Fig 6 (not supp) shows that healthy/disease signature, as well as other sample-level variations, are there, and you can make use of them. Please note that this claim could not be made if we trained scPoli on healthy and mapped disease on top since the train/mapping paradigms are different in scArches-based models.

Still, I am not sure which approach you should take in your data. In the first approach, you have a latent in which healthy and disease are more separable in latent space, and you may use other tools to analyze the cell-type level differences (e.g., Milo). However, in the second approach, you have sample-level differences in sample embeddings, and you may use them, for example, to classify (as in Fig 4) and analyze different variations across your samples.

from scarches.

Nusob888 commented on June 19, 2024

Thanks for the response, I think this helps a lot!

So if I understand it correctly:

Healthy atlas + mapped disease states, is potentially better using scPoli than scArches. But requires downstream analysis to characterise cells of high uncertainty to infer cell states/cell types. This makes a latent space that will separate out dataset specific cells that won't map with high confidence.

*Based on your response "Please note that this claim could not be made if we trained scPoli on healthy and mapped disease on top since the train/mapping paradigms are different in scArches-based models.", using this method would not allow me to then perform the same PC analysis on the reference mapped data as shown in Fig 6? This confused me a little as the scArches documentation would suggest I can perform PC analysis on the reference mapped embeddings?

Integrate everything (presumably with hvgs calculated with either sample level or dataset level batch labels?), and then use the PCs to identify dataset/disease specific gene correlations, as you did in Fig 6. This may create an embedding that doesn't separate disease states well, but the sample embeddings should be able to deconvolute what genes drives disease states that separate well in the PCs.

It seems option 2 would be the best overall strategy? based on your helpful insights.

If I create an atlas of disease and healthy states, I can distinguish dataset/batch independent variance that accounts for disease states. I can also then presumably reference map more data on top and analyse it based on uncertainty that isn't accounted for in the existing disease datasets in the reference map, or re-examine the PCs.

Does that sound reasonable/sensible? I may have misinterpreted some bits, so any feedback would be much appreciated

from scarches.

cdedonno commented on June 19, 2024

Hi @Nusob888, this is an interesting question and I do not think there is a ready-made solution to this. I think also you might get better feedback from people who have done atlas building, we worked mostly on the development of the method.

I think it might be worth trying both approaches: by integrating all samples at once you can have a joint sample embedding space, in which you might associate large scale gene expression changes to disease. The caveat is that this is doable only if you find strong association between the sample latent space and the disease covariate. It could be that technical effects are the main driver of variation in your data, which would make this type of analysis more difficult.

Integrating only healthy samples and then map on top has been done in other atlases, like the one you mentioned, so it should work. I think also integrating all at once would probably not get rid of all variation that is explained by disease, but it might eat up some of that variance.

Hope this helps.

from scarches.

cdedonno commented on June 19, 2024

Closing now, feel free to reopen.

from scarches.

scPoli usage about scarches HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent