Good day, I am excited to use this tool for building a reference for

Thanks for the quick reply. This is how I built the reference: <div class="snippet

Problems integrating cell and nuclei datasets about scarches HOT 5 CLOSED

theislab commented on May 23, 2024

Problems integrating cell and nuclei datasets

from scarches.

Comments (5)

M0hammadL commented on May 23, 2024

Hi Chris,

Could you please past in the architechture details of the model here? So you have couple of data from normal Rna-seq and one from sn-rna? What do you use as batch key? Datasets?

from scarches.

ccruizm commented on May 23, 2024

Thanks for the quick reply. This is how I built the reference:

condition_key = "author"
adata.obs['author'] = adata.obs['author'].astype('category')

adata = sca.data.normalize_hvg(adata,batch_key=condition_key,n_top_genes=2000, logtrans_input = False)
# I normalized the data previously, so I set logtrans_input = False

Using 2 HVGs from full intersect set
Using 8 HVGs from n_batch-1 set
Using 52 HVGs from n_batch-2 set
Using 92 HVGs from n_batch-3 set
Using 141 HVGs from n_batch-4 set
Using 174 HVGs from n_batch-5 set
Using 221 HVGs from n_batch-6 set
Using 287 HVGs from n_batch-7 set
Using 421 HVGs from n_batch-8 set
Using 602 HVGs from n_batch-9 set
Using 2000 HVGs

network = sca.models.scArches(task_name='atlas',
                              x_dimension=adata.shape[1],
                              z_dimension=10,
                              architecture=[128, 128],
                              gene_names=adata.var_names.tolist(),
                              conditions=adata.obs[condition_key].unique().tolist(),
                              alpha=0.001,
                              loss_fn='nb',
                              model_path="./models/scArches/",
                              )

network.train(adata,
              condition_key=condition_key,
              n_epochs=100,
              batch_size=128,
              save=True,
              retrain=True)

latent_adata = network.get_latent(adata, condition_key)
sc.pp.neighbors(latent_adata)

With lower alpha and higher epochs, I get a more detailed sub clustering, but the dataset generated by nuclei is always standing out. So far, only one dataset was processed by nuclei but, later on, will include others with the same technique and want a reference that can be used to query either cells or nuclei experiments.

from scarches.

M0hammadL commented on May 23, 2024

I would suggest making the model deeper [128, 128,128] and make the batch size also smaller: 32 or max 64. I guess the nb loss does not fit nuclei experiments. try zinb loss and see if it works.
Do all your experiment have the same genes? it is necessary to have the exact gene set.

finally, if it did not work try switch to sse or mse loss this would prob solve the problem.

let me know how it goes, please.

from scarches.

ccruizm commented on May 23, 2024

Thanks for the suggestions. Will test them. When you mentioned that the experiment must have the same exact gene set, you meant the intersected/shared genes among all datasets? (I have seen this approach when using scIB https://github.com/theislab/scib/blob/master/notebooks/data_preprocessing/pancreas/01_collect_human_pancreas_studies.ipynb) or could be the merged genes for all studies? I merged the matrices using Seurat, so not all genes are present in all samples initially, but after merging all of them have the same features, and the ones that were not common in the beginning, are filled with zeros.

I have been using the last one, since the reference I am building is for a tumor type (and expect higher heterogeneity compared with a normal tissue) and if I intersect the features across all the studies I end up with only ~5K common genes. What would be your recommendation in this case?

from scarches.

M0hammadL commented on May 23, 2024

yes, I meant intersection of genes, would also worth to give it a chance since there might be a lot of genes which are zero in nuclei data but not zero in other one and vice versa, therefore, there would not be a share feature set. And even after hvg selection, there might be genes that are the only hvg in nuclei in one and not the other one. Or increases the number of hvgs to 5k or sth like that to increase the chance.

therefore I would suggest :

increase the HVG set to 5000 or so and check again with different loss functions
try to only subset to the intersection when you concat datasets and rerun models and see.

from scarches.

Problems integrating cell and nuclei datasets about scarches HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent