Coder Social home page Coder Social logo

scPoli usage about scarches HOT 4 CLOSED

Nusob888 avatar Nusob888 commented on June 19, 2024
scPoli usage

from scarches.

Comments (4)

moinfar avatar moinfar commented on June 19, 2024 1

Hi,

Thanks for asking this interesting question.

I believe the current practice involves building a healthy atlas and mapping disease samples on top, as seen in HLCA, scArches, Supp Fig 6 of scPoli, and this paper from Marioni's lab. The reason is that when you integrate them all together, the cVAE will try to cancel out every variation between healthy and disease samples (as it does for batches). Please note that you can map disease on top of healthy in scPoli as well, which does not contradict what goes next. In fact, we show in Supp Fig 6 of scPoli that detecting cancer cells (integrate healthy then map cancer on top) performs much better when constructing the healthy atlas using scPoli compared to SCANVI.

However, while you remove sample-level information in most cVAE approaches, scPoli keeps this information in the covariate embeddings, and you are able to analyze them. Fig 6 (not supp) shows that healthy/disease signature, as well as other sample-level variations, are there, and you can make use of them. Please note that this claim could not be made if we trained scPoli on healthy and mapped disease on top since the train/mapping paradigms are different in scArches-based models.

Still, I am not sure which approach you should take in your data. In the first approach, you have a latent in which healthy and disease are more separable in latent space, and you may use other tools to analyze the cell-type level differences (e.g., Milo). However, in the second approach, you have sample-level differences in sample embeddings, and you may use them, for example, to classify (as in Fig 4) and analyze different variations across your samples.

from scarches.

Nusob888 avatar Nusob888 commented on June 19, 2024

Thanks for the response, I think this helps a lot!

So if I understand it correctly:

  1. Healthy atlas + mapped disease states, is potentially better using scPoli than scArches. But requires downstream analysis to characterise cells of high uncertainty to infer cell states/cell types. This makes a latent space that will separate out dataset specific cells that won't map with high confidence.

*Based on your response "Please note that this claim could not be made if we trained scPoli on healthy and mapped disease on top since the train/mapping paradigms are different in scArches-based models.", using this method would not allow me to then perform the same PC analysis on the reference mapped data as shown in Fig 6? This confused me a little as the scArches documentation would suggest I can perform PC analysis on the reference mapped embeddings?

  1. Integrate everything (presumably with hvgs calculated with either sample level or dataset level batch labels?), and then use the PCs to identify dataset/disease specific gene correlations, as you did in Fig 6. This may create an embedding that doesn't separate disease states well, but the sample embeddings should be able to deconvolute what genes drives disease states that separate well in the PCs.

It seems option 2 would be the best overall strategy? based on your helpful insights.

If I create an atlas of disease and healthy states, I can distinguish dataset/batch independent variance that accounts for disease states. I can also then presumably reference map more data on top and analyse it based on uncertainty that isn't accounted for in the existing disease datasets in the reference map, or re-examine the PCs.

Does that sound reasonable/sensible? I may have misinterpreted some bits, so any feedback would be much appreciated

from scarches.

cdedonno avatar cdedonno commented on June 19, 2024

Hi @Nusob888, this is an interesting question and I do not think there is a ready-made solution to this. I think also you might get better feedback from people who have done atlas building, we worked mostly on the development of the method.

I think it might be worth trying both approaches: by integrating all samples at once you can have a joint sample embedding space, in which you might associate large scale gene expression changes to disease. The caveat is that this is doable only if you find strong association between the sample latent space and the disease covariate. It could be that technical effects are the main driver of variation in your data, which would make this type of analysis more difficult.

Integrating only healthy samples and then map on top has been done in other atlases, like the one you mentioned, so it should work. I think also integrating all at once would probably not get rid of all variation that is explained by disease, but it might eat up some of that variance.

Hope this helps.

from scarches.

cdedonno avatar cdedonno commented on June 19, 2024

Closing now, feel free to reopen.

from scarches.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.