Coder Social home page Coder Social logo

scialdonelab / diffvae Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ioanabica/diffvae

0.0 1.0 0.0 33.22 MB

Code for Nature Scientific Reports 2020 paper: "Unsupervised generative and graph neural methods for modelling cell differentiation" by Ioana Bica, Helena Andrés-Terré, Ana Cvejic, Pietro Liò

License: MIT License

Jupyter Notebook 94.80% Python 4.84% R 0.36%

diffvae's Introduction

Ioana Bica, Helena Andres-Terre, Ana Cvejic, Pietro Lio

Dependencies

The project was implemented in Python 3.6. The following packages are needed for running the models and performing the analysis:

  • numpy, pandas, scipy, scikit-learn
  • keras, tensorflow
  • matplotlib, seaborn

DiffVAE

DiffVAE is a variational autoencoder that can be used to model and study the differentiation of cells using gene expression data. In particular, DiffVAE uses disentanglement methods based on information theory to improve the data representation and achieve better separation of the biological factors of variation in the gene expression data.

This allows us to develop methodology for identifying the cell types in a dataset using DiffVAE. The pipeline is illustred in the following figure: DiffVAE-Pipeline

To train DiffVAE using gene expression data, run the following command with the chosen command line arguments.

python train_DiffVAE.py
Options :
		--gene_expression_filename 'data/Zebrafish/GE_mvg.csv'	# Path to file containing the log normalized gene expression data.
		--hidden_dimensions 512 256 # List of hidden dimensions for the layers in the encoder.
		                                 The layers in the decoder will have the same dimensions in reversed order.
		--latent_dimension 50 # Size of latent dimension.
		--batch_size 128 # Batch size to use during training.
		--learning_rate 0.001 # Learning rate used during training.
		--model_name 'DiffVAE_test' # Name used to save the model.

Example usage:

python train_DiffVAE.py --gene_expression_filename 'data/Zebrafish/GE_mvg.csv' --hidden_dimensions 512 256 \
--latent_dimension 50 --batch_size 128 --learning_rate 0.001 --model_name 'DiffVAE_test'

After running train_DiffVAE.py, the encoder and decoder parts of DiffVAE will be saved to the directories Saved-Models/Encoders/ and Saved-Models/Decoders/ respectively using the model name provided.

Note that the hyperparameters of the model should be tuned for each new dataset.

The notebook DiffVAE_methodology.ipynb goes through the steps needed for identyifing the cell types in the dataset and for performing cell perturbations. These steps are illustrated on the Zebrafish dataset.

Graph-DiffVAE

Graph-DiffVAE is a graph variational autoencoder where the encoder and the decoder networks are graph convolutional networks. Graph-DiffVAE can be used to explore links between cells in an unsupervised way as illustrated in the following figure: Graph-DiffVAE-Pipeline

To train Graph-DiffVAE using gene expression data, run the following command with the chosen command line arguments.

python train_GraphDiffVAE.py
Options :
		--gene_expression_filename 'data/Zebrafish/GE_mvg.csv'	# Path to file containing the log normalized gene expression data.
		--hidden_dimensions [512] # List of hidden dimensions for the layers in the encoder.
		                                 The layers in the decoder will have the same dimensions in reversed order.
		--latent_dimension 50 # Size of latent dimension.
		--learning_rate 0.0001 # Learning rate used during training.
		--model_name 'GraphDiffVAE_test' # Name used to save the results.

Example usage:

python train_GraphDiffVAE.py --gene_expression_filename 'data/Zebrafish/GE_mvg.csv' --hidden_dimensions 512 \
--latent_dimension 50 --learning_rate 0.0001 --model_name 'GraphDiffVAE_test'

After running train_GraphDiffVAE.py, the input adjacency matrix, predicted adjacency matrix and latent node features will be saved to 'results/Graphs/' using the model name provided. The predicted adjacency matrix consists of the edges generated by Graph-DiffVAE.

Note that for this specific example, the input adjacency matrix is contructed by connecting each cell to the highest positively correlated cell (as measured by the Pearson correlation). However, if prior biological knowledge is available about existing links between cells, this can be incorporated into the input graph. Based on this, Graph-DiffVAE will generate other links between cells that share the same biological meaning as the input ones.

diffvae's People

Contributors

ioanabica avatar stkmrc avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.