Comments (4)
Hi,
Thanks for asking this interesting question.
I believe the current practice involves building a healthy atlas and mapping disease samples on top, as seen in HLCA, scArches, Supp Fig 6 of scPoli, and this paper from Marioni's lab. The reason is that when you integrate them all together, the cVAE will try to cancel out every variation between healthy and disease samples (as it does for batches). Please note that you can map disease on top of healthy in scPoli as well, which does not contradict what goes next. In fact, we show in Supp Fig 6 of scPoli that detecting cancer cells (integrate healthy then map cancer on top) performs much better when constructing the healthy atlas using scPoli compared to SCANVI.
However, while you remove sample-level information in most cVAE approaches, scPoli keeps this information in the covariate embeddings, and you are able to analyze them. Fig 6 (not supp) shows that healthy/disease signature, as well as other sample-level variations, are there, and you can make use of them. Please note that this claim could not be made if we trained scPoli on healthy and mapped disease on top since the train/mapping paradigms are different in scArches-based models.
Still, I am not sure which approach you should take in your data. In the first approach, you have a latent in which healthy and disease are more separable in latent space, and you may use other tools to analyze the cell-type level differences (e.g., Milo). However, in the second approach, you have sample-level differences in sample embeddings, and you may use them, for example, to classify (as in Fig 4) and analyze different variations across your samples.
from scarches.
Thanks for the response, I think this helps a lot!
So if I understand it correctly:
- Healthy atlas + mapped disease states, is potentially better using scPoli than scArches. But requires downstream analysis to characterise cells of high uncertainty to infer cell states/cell types. This makes a latent space that will separate out dataset specific cells that won't map with high confidence.
*Based on your response "Please note that this claim could not be made if we trained scPoli on healthy and mapped disease on top since the train/mapping paradigms are different in scArches-based models.", using this method would not allow me to then perform the same PC analysis on the reference mapped data as shown in Fig 6? This confused me a little as the scArches documentation would suggest I can perform PC analysis on the reference mapped embeddings?
- Integrate everything (presumably with hvgs calculated with either sample level or dataset level batch labels?), and then use the PCs to identify dataset/disease specific gene correlations, as you did in Fig 6. This may create an embedding that doesn't separate disease states well, but the sample embeddings should be able to deconvolute what genes drives disease states that separate well in the PCs.
It seems option 2 would be the best overall strategy? based on your helpful insights.
If I create an atlas of disease and healthy states, I can distinguish dataset/batch independent variance that accounts for disease states. I can also then presumably reference map more data on top and analyse it based on uncertainty that isn't accounted for in the existing disease datasets in the reference map, or re-examine the PCs.
Does that sound reasonable/sensible? I may have misinterpreted some bits, so any feedback would be much appreciated
from scarches.
Hi @Nusob888, this is an interesting question and I do not think there is a ready-made solution to this. I think also you might get better feedback from people who have done atlas building, we worked mostly on the development of the method.
I think it might be worth trying both approaches: by integrating all samples at once you can have a joint sample embedding space, in which you might associate large scale gene expression changes to disease. The caveat is that this is doable only if you find strong association between the sample latent space and the disease covariate. It could be that technical effects are the main driver of variation in your data, which would make this type of analysis more difficult.
Integrating only healthy samples and then map on top has been done in other atlases, like the one you mentioned, so it should work. I think also integrating all at once would probably not get rid of all variation that is explained by disease, but it might eat up some of that variance.
Hope this helps.
from scarches.
Closing now, feel free to reopen.
from scarches.
Related Issues (20)
- cannot setup.py install HOT 3
- ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory HOT 2
- problems about scHPL training HOT 13
- hlca_map_classify.ipynb missing early_stopping_kwarg
- Errors extracting embeddings HOT 1
- Runtime error using scPoli - mat1 and mat2 must have the same dtype, but got Double and Float HOT 7
- the difference between scPoli and scANVI HOT 1
- Update of conda environment yaml
- environment reproducibility HOT 2
- More detailed training logs HOT 2
- scPoli Model for Unsupervised Use HOT 1
- Choice of reference and query data sets HOT 1
- Issue in annotating cell types of unlabelled query data by scPoli
- Runtime error using scPoli - Tensors must have same number of dimensions: got 1 and 2 HOT 2
- Issue with expimap model HOT 1
- No module named 'jax.extend' HOT 1
- PBMC data no longer available HOT 7
- AttributeError: module 'scanpy.neighbors' has no attribute 'compute_neighbors_umap'
- scGEN network.batch_removal error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scarches.