Coder Social home page Coder Social logo

deepdia's Introduction

DeepDIA

Using deep learning to generate in silico spectral libraries for data-independent acquisition (DIA) analysis.

Updates

1.1.0

  • Dependency of R removed
  • FASTA digestion
  • Ion mobility prediction (experimental)

For the version of the Nat Commun 2020 publication, please refer to the commit #674e2fb.

Dependency

The following software and packages are required:

  • Python (version 3.7 or later, Anaconda distribution is recommended)
  • TensorFlow (version 2.0 or later)
  • Keras (packaged with TensorFlow)

For spectral library generation from FASTA files and data preprocessing for training detectability models, the following package is required:

DeepDIA requires the following Python packages integrated in Anaconda:

  • numpy (version 1.18.5)
  • pandas (version 0.25.3)
  • scipy (version 1.4.1)
  • statsmodels (version 0.13.2)

Later versions may be compatible, but have not been tested.

For model training, NVIDIA graphics cards with CUDA are recommended.

Installation

1. Install Python (Anaconda)

Download and install Anaconda.

Check successful installation by in the Anaconda Prompt:

pip list

Ensure that the following Python packages are installed: numpy, pandas, scipy, and statsmodels. If not, install the missing packages using the following command (as an example for statsmodels):

pip install statsmodels

2. Install TensorFlow

Ensure that NVIDIA GPU driver has been installed. Install the CUDA and cuDNN with conda. This step can be skipped if you run TensorFlow on CPU only.

conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0

Install TensorFlow using pip:

pip install tensorflow

3. Install Biopython

Install Biopython using pip:

pip install biopython

or conda:

conda install -c conda-forge biopython

Getting Started

1. Prepare a Peptide List

A peptide list is stored in a comma-separated values (CSV) file including columns named protein and sequence.

"protein","sequence"
"O43504","HDGITVAVHK"
"P56470","VGSSGDIALHINPR"
"Q9UHL4","LDHFNFER"
"P68371","IREEYPDR"
"P01024","AKDQLTCNK"

Peptides can be collected from public resources. From the Pan Human Library (Rosenberger, G. et al. Sci. Data 2014, 1, 140031, doi:10.1038/sdata.2014.31), peptide lists have been collected and provided as an example in data\peptide folder:

  • Pan_human.peptide.csv
  • Pan_human_charge2.peptide.csv
  • Pan_human_charge3.peptide.csv

DeepDIA only supports peptide sequences with standard amino acids (ACDEFGHIKLMNPQRSTVWY) and length <= 50.

2. Predict MS/MS Spectra

Prepare a model for MS/MS prediction. You can use pre-trained models or train your own models. A model trained with HeLa data on Q Exactive HF (Bruderer, R. et al. Mol. Cell. Proteomics 2017, 16, 2296-2309, doi:10.1074/mcp.RA117.000314) is provided as an example in data\models folder:

  • data\models\charge2\epoch_035.hdf5
  • data\models\charge3\epoch_034.hdf5

Run predict_ms2.py to predict MS/MS ion intensities for peptide precursors with charge 2+.

python src\predict_ms2.py `
--in data\peptide\Pan_human_charge2.peptide.csv `
--model data\models\charge2\epoch_035.hdf5 `
--charge 2 `
--out data\Pan_human_charge2.prediction.ions.json

The predicted MS/MS ion intensities are saved in a JSON file (*.prediction.ions.json).

Predict MS/MS for charge 3+ following the same steps.

python src\predict_ms2.py `
--in data\peptide\Pan_human_charge3.peptide.csv `
--model data\models\charge3\epoch_034.hdf5 `
--charge 3 `
--out data\Pan_human_charge3.prediction.ions.json

3. Predict iRT

Prepare a model for iRT prediction. You can use pre-trained models or train your own models. A pretrained model is provided as an example in data\models folder:

  • data\models\irt\epoch_082.hdf5

Run predict_rt.py.

python src\predict_rt.py `
--in data\peptide\Pan_human.peptide.csv `
--model data\models\irt\epoch_082.hdf5 `
--out data\Pan_human.prediction.irt.csv

The predicted iRT values are saved in a CSV file (*.prediction.irt.csv).

4. Generate Spectral Library

Ensure that the predicted MS/MS and iRT files are present in the data folder.

Run build_assays_from_prediction.py.

python src\build_assays_from_prediction.py `
--peptide data\peptide\Pan_human.peptide.csv `
--ions data\Pan_human_charge2.prediction.ions.json `
       data\Pan_human_charge3.prediction.ions.json `
--rt data\Pan_human.prediction.irt.csv `
--out data\Pan_human.prediction.assay.pickle

The generated spectral library is saved in a Python binary file (*.assay.pickle).

Run convert_assays_to_Spectronaut_library.py.

python src\convert_assays_to_Spectronaut_library.py `
--in data\Pan_human.prediction.assay.pickle `
--out data\Pan_human.prediction.library.xls

The generated spectral library is converted to a speadsheet file (*.library.xls) that is compatible with Spectronaut and DIA-NN.

Tutorial

Tutorials are avaliable in the docs folder.

Spectral Library Pretiction

DeepDIA Tutorial: Spectral Library Generation From Peptide Lists describes the workflow to generate in silico spectral libraries from peptide lists.

Detectability Prediction

DeepDIA Tutorial: Spectral Library Generation with Detectability Prediction describes the complete workflow to generate in silico spectral libraries from proteome databases with detectability filtering.

Model Training

DeepDIA Tutorial: Training New Models for MS/MS and iRT Prediction describes the workflow for training new models for MS/MS and iRT prediction using data-dependent acquisition (DDA) data.

DeepDIA Tutorial: Training a New Model for Detectability Prediction describes the workflow for training a new model for MS detectability prediction using data-dependent acquisition DDA data.

Publications

Yang, Y., Liu, X., Shen, C., Lin, Y., Yang, P., Qiao, L. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun 11, 146 (2020). https://doi.org/10.1038/s41467-019-13866-z.

License

DeepDIA is distributed under a BSD license. See the LICENSE file for details.

Contacts

Please report any problems directly to the github issue tracker. Also, you can send feedback to [email protected].

deepdia's People

Contributors

lapisdev avatar yyi17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

deepdia's Issues

数据格式

您好,感谢您的贡献。
请问有输入训练数据的格式吗?

About predicting the library

Hi,
Thanks for your good job .And I intend to predict a library by deepDIA method.And I have run the test code and get a predicted library with a size of 350449kb.And I try to import it to the DIANN,and I have noticed that it has been imported into the spectronaut in the instruction.Uh, I have get the information that the library in spectronaut is also applicable to DIANN.But the real test is totally different for this predicted library.Could you please give some tips about this library for DIANN ?Thank you so much!
image
image

Clover

Semi-specific digestion

Hi,

Thank you for your great effort for this tool! I am doing limited proteomics and I would like to generate in silico spectral library in semi-specific digestion mode. I am wondering whether DeepDIA could reach it?

Best regards,
Shel

Would this also work for non-tryptic digestion?

Hi,

Thanks for this great tool! I am just wondering, do you think it would work for non-tryptic digests?

I'm not 100% sure how you trained the model, but if it is only on tryptic data I wonder how this would work for non-tryptic digests and/or no-digestion (peptidomics)?

Thanks,
Patrick

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.