Coder Social home page Coder Social logo

Comments (5)

M0hammadL avatar M0hammadL commented on May 23, 2024

Hi Chris,

Could you please past in the architechture details of the model here? So you have couple of data from normal Rna-seq and one from sn-rna? What do you use as batch key? Datasets?

from scarches.

ccruizm avatar ccruizm commented on May 23, 2024

Thanks for the quick reply. This is how I built the reference:

condition_key = "author"
adata.obs['author'] = adata.obs['author'].astype('category')

adata = sca.data.normalize_hvg(adata,batch_key=condition_key,n_top_genes=2000, logtrans_input = False)
# I normalized the data previously, so I set logtrans_input = False

Using 2 HVGs from full intersect set
Using 8 HVGs from n_batch-1 set
Using 52 HVGs from n_batch-2 set
Using 92 HVGs from n_batch-3 set
Using 141 HVGs from n_batch-4 set
Using 174 HVGs from n_batch-5 set
Using 221 HVGs from n_batch-6 set
Using 287 HVGs from n_batch-7 set
Using 421 HVGs from n_batch-8 set
Using 602 HVGs from n_batch-9 set
Using 2000 HVGs

network = sca.models.scArches(task_name='atlas',
                              x_dimension=adata.shape[1],
                              z_dimension=10,
                              architecture=[128, 128],
                              gene_names=adata.var_names.tolist(),
                              conditions=adata.obs[condition_key].unique().tolist(),
                              alpha=0.001,
                              loss_fn='nb',
                              model_path="./models/scArches/",
                              )

network.train(adata,
              condition_key=condition_key,
              n_epochs=100,
              batch_size=128,
              save=True,
              retrain=True)

latent_adata = network.get_latent(adata, condition_key)
sc.pp.neighbors(latent_adata)

With lower alpha and higher epochs, I get a more detailed sub clustering, but the dataset generated by nuclei is always standing out. So far, only one dataset was processed by nuclei but, later on, will include others with the same technique and want a reference that can be used to query either cells or nuclei experiments.

from scarches.

M0hammadL avatar M0hammadL commented on May 23, 2024

I would suggest making the model deeper [128, 128,128] and make the batch size also smaller: 32 or max 64. I guess the nb loss does not fit nuclei experiments. try zinb loss and see if it works.
Do all your experiment have the same genes? it is necessary to have the exact gene set.

finally, if it did not work try switch to sse or mse loss this would prob solve the problem.

let me know how it goes, please.

Mo

from scarches.

ccruizm avatar ccruizm commented on May 23, 2024

Thanks for the suggestions. Will test them. When you mentioned that the experiment must have the same exact gene set, you meant the intersected/shared genes among all datasets? (I have seen this approach when using scIB https://github.com/theislab/scib/blob/master/notebooks/data_preprocessing/pancreas/01_collect_human_pancreas_studies.ipynb) or could be the merged genes for all studies? I merged the matrices using Seurat, so not all genes are present in all samples initially, but after merging all of them have the same features, and the ones that were not common in the beginning, are filled with zeros.

I have been using the last one, since the reference I am building is for a tumor type (and expect higher heterogeneity compared with a normal tissue) and if I intersect the features across all the studies I end up with only ~5K common genes. What would be your recommendation in this case?

from scarches.

M0hammadL avatar M0hammadL commented on May 23, 2024

yes, I meant intersection of genes, would also worth to give it a chance since there might be a lot of genes which are zero in nuclei data but not zero in other one and vice versa, therefore, there would not be a share feature set. And even after hvg selection, there might be genes that are the only hvg in nuclei in one and not the other one. Or increases the number of hvgs to 5k or sth like that to increase the chance.

therefore I would suggest :

  • increase the HVG set to 5000 or so and check again with different loss functions

  • try to only subset to the intersection when you concat datasets and rerun models and see.

from scarches.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.