Coder Social home page Coder Social logo

preprocessing's Introduction

Preprocessing for xiSEARCH

This set of tools recalibrates MS1 and MS2 spectra based on mass error of a linear proteomics search. It uses xiSEARCH to perform the linear search. This is usually done as the first step in the xiSEARCH workflow, prior to a crosslinking MS search, to improve identifications and understand what MS1 and MS2 error tolerances one should set. It first converts Thermo .raw files into peakfiles in .mgf format using ProteoWizard MSconvert. The script is designed to work with the windows version of msconvert.

The recalibrated .mgf files from this script may then be used as input for a crosslinking MS search with xiSEARCH.

If you use this preprocessing script, please cite Lenz et al., Nat. Comm. 2021.

Requirements:

Before using, edit the path to MSconvert in config.py

Usage:

create a directory with the following structure (this directory tree is not required, it's just to make the paths in the command clearer):

Top
|
|--rawfiles
|--processed
|--myfasta.fasta

Put your raw files in the "rawfiles" directory. myfasta.fasta is the sequence database you wish to recalibrate on. "processed" will contain your results

In command line (in windows, this may be powershell, anaconda prompt, or within an IDE), from the top of the directory, run

python /path/to/preprocessing_ms2recal.py  --db ./myfasta.fasta --input ./rawfiles --outpath ./processed --xiconf /path/to/resources/xi_linear_by_tryp.conf --config /path/to/config.py

--input folder containing.raw files or single file to process

--db the .fasta file containing the sequences to be searced.

--outpath directory for output, default is separate folder in input directory

--config path to config.py file

--xiconf path to .conf file in resources directory

The .conf file is a xi config file set for a linear search with trypsin digestion. Other files may be chosen with different proteases and they are found in the "resources" directory. Documentation on editing config files with custom settings may be found here .

The output directory contains several files:

  • peakfiles recalibrated according to the ms1 and ms2 errors (recal_*.mgf) these are the files to be used in a crosslinking MS search by xiSEARCH
  • .csv files with the average ms1 and ms2 errors per raw file
  • images of the ms1 and ms2 error distributions - these should be symmetric gaussian shapes. If they are not, something may be wrong with the search or the acquisition.
  • peakfiles without any error recalibration (which retain the original file name)
  • .csv file with the xiSEARCH output

preprocessing's People

Contributors

swlenz avatar grandrea avatar gieses avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.