Comments (5)
Hi Chris,
Could you please past in the architechture details of the model here? So you have couple of data from normal Rna-seq and one from sn-rna? What do you use as batch key? Datasets?
from scarches.
Thanks for the quick reply. This is how I built the reference:
condition_key = "author"
adata.obs['author'] = adata.obs['author'].astype('category')
adata = sca.data.normalize_hvg(adata,batch_key=condition_key,n_top_genes=2000, logtrans_input = False)
# I normalized the data previously, so I set logtrans_input = False
Using 2 HVGs from full intersect set
Using 8 HVGs from n_batch-1 set
Using 52 HVGs from n_batch-2 set
Using 92 HVGs from n_batch-3 set
Using 141 HVGs from n_batch-4 set
Using 174 HVGs from n_batch-5 set
Using 221 HVGs from n_batch-6 set
Using 287 HVGs from n_batch-7 set
Using 421 HVGs from n_batch-8 set
Using 602 HVGs from n_batch-9 set
Using 2000 HVGs
network = sca.models.scArches(task_name='atlas',
x_dimension=adata.shape[1],
z_dimension=10,
architecture=[128, 128],
gene_names=adata.var_names.tolist(),
conditions=adata.obs[condition_key].unique().tolist(),
alpha=0.001,
loss_fn='nb',
model_path="./models/scArches/",
)
network.train(adata,
condition_key=condition_key,
n_epochs=100,
batch_size=128,
save=True,
retrain=True)
latent_adata = network.get_latent(adata, condition_key)
sc.pp.neighbors(latent_adata)
With lower alpha and higher epochs, I get a more detailed sub clustering, but the dataset generated by nuclei is always standing out. So far, only one dataset was processed by nuclei but, later on, will include others with the same technique and want a reference that can be used to query either cells or nuclei experiments.
from scarches.
I would suggest making the model deeper [128, 128,128] and make the batch size also smaller: 32 or max 64. I guess the nb
loss does not fit nuclei experiments. try zinb
loss and see if it works.
Do all your experiment have the same genes? it is necessary to have the exact gene set.
finally, if it did not work try switch to sse
or mse
loss this would prob solve the problem.
let me know how it goes, please.
Mo
from scarches.
Thanks for the suggestions. Will test them. When you mentioned that the experiment must have the same exact gene set, you meant the intersected/shared genes among all datasets? (I have seen this approach when using scIB
https://github.com/theislab/scib/blob/master/notebooks/data_preprocessing/pancreas/01_collect_human_pancreas_studies.ipynb) or could be the merged genes for all studies? I merged the matrices using Seurat
, so not all genes are present in all samples initially, but after merging all of them have the same features, and the ones that were not common in the beginning, are filled with zeros.
I have been using the last one, since the reference I am building is for a tumor type (and expect higher heterogeneity compared with a normal tissue) and if I intersect the features across all the studies I end up with only ~5K common genes. What would be your recommendation in this case?
from scarches.
yes, I meant intersection of genes, would also worth to give it a chance since there might be a lot of genes which are zero in nuclei data but not zero in other one and vice versa, therefore, there would not be a share feature set. And even after hvg selection, there might be genes that are the only hvg in nuclei in one and not the other one. Or increases the number of hvgs to 5k or sth like that to increase the chance.
therefore I would suggest :
-
increase the HVG set to 5000 or so and check again with different loss functions
-
try to only subset to the intersection when you concat datasets and rerun models and see.
from scarches.
Related Issues (20)
- installation issue HOT 1
- Matrix multiplication error in scpoli_model.classify HOT 13
- "get_num_classes" not available in newer torchmetrics versions HOT 2
- scPoli error of label transfer HOT 7
- SageNet tutorial notebook missing one cell for model training
- scPoli: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) HOT 22
- Suggested improvements for label transfer function HOT 1
- scPoli get latent representations HOT 2
- Include both gene IDs and gene symbols for check of feature correspondence between query and reference
- "Model was trained without prototypes" when using a loaded scPoli model HOT 8
- Error: Saved model does not contain original setup inputs. HOT 1
- Trouble with expimap -- float values wanted when anndata is in double HOT 5
- ImportError: cannot import name 'setup_anndata' from 'scvi.data' HOT 2
- Label transfer code producing different outputs in different environments HOT 10
- Documentation website failing to load HOT 5
- Which model should I use for a cell typing task HOT 2
- Can I ask you how to make reference dataset from multiple datasets from different tech?
- TypeError: __init__() got an unexpected keyword argument 'condition_keys' HOT 4
- Install issue - no .yml in repo HOT 2
- Scpoli:RuntimeError: expected scalar type Float but found Double HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scarches.