bimsbbioinfo / maui Goto Github PK

View Code? Open in Web Editor NEW

48.0 15.0 20.0 3.13 MB

Multi-omics Autoencoder Integration: Deep learning-based heterogenous data analysis toolkit

License: GNU General Public License v3.0

Python 10.28% Jupyter Notebook 89.72%

bioinformatics deep-learning autoencoder latent-factor-model cancer-genomics multi-omics

maui's Introduction

maui

Multi-omics Autoencoder Integration (maui) is a python package for multi-omics data analysis. It is based on a bayesian latent factor model, with inference done using artificial neural networks. For details, check out our LSA paper: https://www.life-science-alliance.org/content/2/6/e201900517

Installation

maui works with Python 3.6 and TensorFlow 1.1 (does not yet support the yet unreleased TensorFlow 2.0). The easiest way to install is from pypi:

pip install -U maui-tools

This will install all necessary dependencies including keras an tensorflow. The default tensorflow (cpu) will be installed. If tensorflow GPU is needed, please install it prior to installation of maui.

The development version may be installed by cloning this repo and running python setup.py install, or, using pip directly from github:

pip install -e git+https://github.com/BIMSBbioinfo/maui.git#egg=maui

Optional dependencies

Survival analysis functionality supplied by lifelines 1. It may be installed directly from pip using pip install lifelines.

Usage

See the vignette, and check out the documentation.

Citation

Evaluation of colorectal cancer subtypes and cell lines using deep learning. Jonathan Ronen, Sikander Hayat, Altuna Akalin. Life Science Alliance Dec 2019, 2 (6) e201900517; DOI: 10.26508/lsa.201900517

Contributing

Open an issue, send us a pull request, or shoot us an e-mail.

License

maui is released under the GNU General Public License v3.0 or later.

@jonathanronen, BIMSBbioinfo, 2018

maui's People

Contributors

Stargazers

Watchers

maui's Issues

Fails to install with pipenv

Perhaps it's time to extend support to python > 3.6?

installation

Hi @jonathanronen ,

Tried to install maui with pip, it didn't produce any errors. However, when I try importing maui and maui.tools it fails. Any help is appreciated.

(py3.6) pcddas@beagle:~/SOFTWARES$ python
Python 3.6.0 | packaged by conda-forge | (default, Feb 9 2017, 14:36:55)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.

import maui
Traceback (most recent call last):
File "", line 1, in
File "/home/pcddas/miniconda2/envs/py3.6/lib/python3.6/site-packages/maui/init.py", line 1, in
from .model import Maui
File "/home/pcddas/miniconda2/envs/py3.6/lib/python3.6/site-packages/maui/model.py", line 8, in
from .autoencoders_architectures import stacked_vae, deep_vae, train_model
File "/home/pcddas/miniconda2/envs/py3.6/lib/python3.6/site-packages/maui/autoencoders_architectures.py", line 7, in
from keras.models import Model
File "/home/pcddas/miniconda2/envs/py3.6/lib/python3.6/site-packages/keras/init.py", line 25, in
from keras import models
File "/home/pcddas/miniconda2/envs/py3.6/lib/python3.6/site-packages/keras/models.py", line 19, in
from keras import backend
File "/home/pcddas/miniconda2/envs/py3.6/lib/python3.6/site-packages/keras/backend.py", line 36, in
from tensorflow.python.eager.context import get_config
ImportError: cannot import name 'get_config'

return factor importance value rather than filtering them

Hey Jona.

Could you maybe provide an option to return the actual R^2 values of each LF from the function maui.utils.filter_factors_by_r2 rather than returning a filtered matrix. Or, could it be made an option?
The user could sort LFs by this value for downstream applications. When filtered, importance scores/rankings are lost.

Include trained model from preprint and release all input data

Hi,

I was following the vignette and noticed that it makes some simplifications compared to the manuscript.

From what I can see, the repo is currently missing the smoothed mutation input data, which means one cannot train the model as you did for the preprint. This would be quite nice, of course, as for instance the Kaplan-Meyer analysis yields no difference between clusters going by the vignette.

Also, for reproducibility, I think it would be good to include the final model used in the preprint.

Kind regards,
Clemens

Sigmoid activation with inputs out of [0,1] range

Hi Jona @jonathanronen,

I was having another look at maui and now I have another question :)

As the final activation function, the model uses a sigmoid, so all output values will fall between 0 and 1.
On the other hand, inputs from RNA-seq are scaled, and at least in the vignette that leads to values outside of this range.

Is there any theoretical justification for this choice or did you choose it because it performs well in this setting? Did you try anything else, like the MinMaxScaler to start with [0,1] intervals for each feature?

Best wishes and thanks!
Clemens

fail to run the example code

when I run the code "time z = maui_model.fit_transform({'mRNA': gex, 'Mutations': mut, 'CNV': cnv})",I get the error : TypeError: compile() missing 1 required positional argument: 'loss'
Could you know where the trouble is ?

Smoothing mutation data with PPI networks

I can't find the parameter alpha that was used to run netsmooth to generate the smoothed mutation matrix. It would be nice to include it in the manuscript.
The choice of protein-protein network to use is essentially another hyperparameter. Would another PPI network perform similar? And even more interesting, how would a random network of similar density perform? Looking at the methods part and the code, I am be a bit concerned that batch normalizing the binary mutation input features (batch size n=50 per default) could be problematic. Am I missing something?

manuscript typos

Hi,

I was reading the preprint and just wanted to let you know some typos to fix for the next revision of the manuscript :)

adn
deleteions

Clemens

I can't reproduce vignette results

maui_vignette.zip

I like maui and explore the possibility of VAEs in a cancer subtyping project, but I am having a hard time to reproduce your vignette results. The issue arises at plotting the losses, I pretty much have no recorded loss, and from then on everything goes south, ROC curves are deflated etc. I wonder if it has anything to do with me using Keras 2.3? I got this user warning:

miniconda3/envs/iomics/lib/python3.7/site-packages/keras/engine/training_utils.py:819: UserWarning: Output reconstruction missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to reconstruction.
  'be expecting any data to be passed to {0}.'.format(name))

It almost looks as the fit function doesn't update weights from one epoch to another. Have you encountered such error? I attach my vignette, went a bit past the loss plot and then I stopped testing, but at teh end you can see my pip freeze.

support for scipy >= 1.3

scipy 1.3 introduced a rewrite of stats.pearsonr 1 which broke the test_utils.test_correlate_factors_and_features test.

This needs to be investigated - should only the test be rewritten, or the whole "feature correlations" thing?

Fix some sklearn warnings

lib/python3.6/site-packages/sklearn/base.py:420: FutureWarning: The default value of multioutput (not exposed in score method) will change from 'variance_weighted' to 'uniform_average' in 0.23 to keep consistent with 'metrics.r2_score'. To specify the default value manually and avoid the warning, please either call 'metrics.r2_score' directly or make a custom scorer with 'metrics.make_scorer' (the built-in scorer 'r2' uses multioutput='uniform_average').
  "multioutput='uniform_average').", FutureWarning)

This happens in drop_unexplanatory_factors() or in merge_similar_latent_factors().