Coder Social home page Coder Social logo

hgb-bin-proteomics / msannika_spectral_library_exporter Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 311 KB

Generate a spectral library for Spectronaut from MS Annika results.

Home Page: https://hgb-bin-proteomics.github.io/MSAnnika_Spectral_Library_exporter/

License: MIT License

Python 99.17% Dockerfile 0.83%
cross-linking crosslinker crosslinking fragment-ions fragments library mass mass-spectrometry ms2 python

msannika_spectral_library_exporter's Introduction

workflow_state

MS Annika Spectral Library exporter

Generate a spectral library for Spectronaut from MS Annika results.

Screenshot

Usage

  • Install python 3.7+: https://www.python.org/downloads/
  • Install requirements: pip install -r requirements.txt
  • Export MS Annika CSMs from Proteome Discoverer to Microsoft Excel format. Filter out decoys beforehand and filter for high-confidence CSMs (see below).
  • Convert any RAW files to *.mgf format.
  • Set your desired parameters in config.py (see below).
  • Run python create_spectral_library.py.
  • If the script successfully finishes, the spectral library should be generated with the extension _spectralLibrary.csv.

GUI

Screenshot

Alternatively to the commandline-based python script, a GUI is also available via Docker:

  • After installing Docker [Quick Guide here] run the following command:
    docker run -p 8501:8501 michabirklbauer/spectrallibraryexporter
    
  • Navigate to localhost:8501 in your browser. You should see the MS Annika Spectral Library exporter GUI!

If you don't have/want to install Docker you can also run the GUI natively using the following commands:

  • Open a terminal inside MSAnnika_Spectral_Library_exporter.
  • Enter cp gui/streamlit_app.py ..
  • Enter cp gui/streamlit_util.py ..
  • Enter pip install streamlit.
  • Enter streamlit run streamlit_app.py --server.maxUploadSize 5000.
  • Navigate to localhost:8501 in your browser. You should see the MS Annika Spectral Library exporter GUI!

Exporting MS Annika results to Microsoft Excel

The script uses a Micrsoft Excel files as input, for that MS Annika results need to be exported from Proteome Discoverer. It is recommended to first filter results according to your needs, e.g. filter for high-confidence CSMs and filter out decoy CSMs as depicted below.

PDFilter

Results can then be exported by selecting File > Export > To Microsoft Excel… > Level 1: CSMs > Export in Proteome Discoverer.

Parameters

The following parameters need to be adjusted for your needs in the config.py file:

##### PARAMETERS #####

# name of the mgf file containing the MS2 spectra
SPECTRA_FILE = ["20220215_Eclipse_LC6_PepMap50cm-cartridge_mainlib_DSSO_3CV_stepHCD_OT_001.mgf"]
# name of the CSM file exported from Proteome Discoverer
CSMS_FILE = "20220215_Eclipse_LC6_PepMap50cm-cartridge_mainlib_DSSO_3CV_stepHCD_OT_001.xlsx"
# name of the experiment / run (any descriptive text is allowed)
RUN_NAME = "20220215_Eclipse_LC6_PepMap50cm-cartridge_mainlib_DSSO_3CV_stepHCD_OT_001-(1)"
# name of the crosslink modification
CROSSLINKER = "DSSO"
# possible modifications and their monoisotopic masses
MODIFICATIONS = \
    {"Oxidation": [15.994915],
     "Carbamidomethyl": [57.021464],
     "DSSO": [54.01056, 85.98264, 103.99320]}
# expected ion types (any of a, b, c, x, y, z)
ION_TYPES = ("b", "y")
# maximum expected charge of fragment ions
MAX_CHARGE = 4
# tolerance for matching peaks
MATCH_TOLERANCE = 0.02
# parameters for calculating iRT
iRT_PARAMS = {"iRT_m": 1.3066, "iRT_t": 29.502}

In case you have more than one SPECTRA_FILE you can specify that like this:

##### PARAMETERS #####

# name of the mgf file containing the MS2 spectra
SPECTRA_FILE = ["20220215_Eclipse_LC6_PepMap50cm-cartridge_mainlib_DSSO_3CV_stepHCD_OT_001.mgf",
                "20220215_Eclipse_LC6_PepMap50cm-cartridge_mainlib_DSSO_3CV_stepHCD_OT_002.mgf"]
# name of the CSM file exported from Proteome Discoverer
CSMS_FILE = "20220215_Eclipse_LC6_PepMap50cm-cartridge_mainlib_DSSO_3CV_stepHCD_OT_001.xlsx"
# name of the experiment / run (any descriptive text is allowed)
RUN_NAME = "20220215_Eclipse_LC6_PepMap50cm-cartridge_mainlib_DSSO_3CV_stepHCD_OT_001-(1)"
# name of the crosslink modification
CROSSLINKER = "DSSO"
# possible modifications and their monoisotopic masses
MODIFICATIONS = \
    {"Oxidation": [15.994915],
     "Carbamidomethyl": [57.021464],
     "DSSO": [54.01056, 85.98264, 103.99320]}
# expected ion types (any of a, b, c, x, y, z)
ION_TYPES = ("b", "y")
# maximum expected charge of fragment ions
MAX_CHARGE = 4
# tolerance for matching peaks
MATCH_TOLERANCE = 0.02
# parameters for calculating iRT
iRT_PARAMS = {"iRT_m": 1.3066, "iRT_t": 29.502}

Known Issues

List of known issues

Citing

If you are using the MS Annika Spectral Library exporter script please cite:

MS Annika 2.0 Identifies Cross-Linked Peptides in MS2–MS3-Based Workflows at High Sensitivity and Specificity
Micha J. Birklbauer, Manuel Matzinger, Fränze Müller, Karl Mechtler, and Viktoria Dorfer
Journal of Proteome Research 2023 22 (9), 3009-3021
DOI: 10.1021/acs.jproteome.3c00325

If you are using MS Annika please cite:

MS Annika 2.0 Identifies Cross-Linked Peptides in MS2–MS3-Based Workflows at High Sensitivity and Specificity
Micha J. Birklbauer, Manuel Matzinger, Fränze Müller, Karl Mechtler, and Viktoria Dorfer
Journal of Proteome Research 2023 22 (9), 3009-3021
DOI: 10.1021/acs.jproteome.3c00325

or

MS Annika: A New Cross-Linking Search Engine
Georg J. Pirklbauer, Christian E. Stieger, Manuel Matzinger, Stephan Winkler, Karl Mechtler, and Viktoria Dorfer
Journal of Proteome Research 2021 20 (5), 2560-2569
DOI: 10.1021/acs.jproteome.0c01000

License

Contact

msannika_spectral_library_exporter's People

Contributors

michabirklbauer avatar

Watchers

 avatar  avatar

msannika_spectral_library_exporter's Issues

Parsing of scan number

This really should use a proper function to parse scan numbers, as this might not always work:

scan_nr = int(spectrum["params"]["title"].split("scan=")[1].strip("\""))

For the moment, make sure that the title contains scan= with the scan number, otherwise script won't work.

For future, use something like here: https://github.com/hgb-bin-proteomics/Proteome_Discoverer_MGF_Scan_Number_Repair_Tool/blob/f59c4a667641e75c3ae42fa53496aae7938ba0a3/scan_nr_repair_tool.py#L75

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.