The proteomics from matthewhirschey

My quantitative proteomics analysis workflow

Hello everyone again!

As discussed in the Slack group, I think it would be good to write this issue to keep track of the ideas we are bringing up. Here I will let you know about my approach for quantitative analyses, which basically start with the use of different software for PSM and have quantification algorithms embedded. Again, my idea is not to force everyone to change their approach and move from PD to Patternlab, but as we commented on Slack, it might be interesting to have this into account.

I use the free software Patternlab for Proteomics for PSM identification and quantification from Thermo raw files. You can check it out here: http://www.patternlabforproteomics.org/
It uses COMET for ID search.
It has its own quantification method (TFold) with a variable Fold-change cut-off and a stringency criteria to pinpoint lowly abundant proteins. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371870/
It uses both spectral counts (normalized as NSAF) and XIC.
It generates at least two tabulated outputs: one with a whole list of proteins identified, and one with the list of proteins used for quantification (not all proteins identified are included in the quantification step). This last output includes NSAF values and fold-change info that could be used afterward to filter the data.
After having these outputs, I input them into R and use Tidyverse packages for filtering and visualization (dynamic abundance plots including information of differential expression, volcano plots, venn diagrams, etc)
Depending on the context, I filter the set of proteins (up- or downregulated, whole set of proteins, separated clusters, etc) and perform enrichment analyses using clusterProfiler and ReactomePA. https://yulab-smu.github.io/clusterProfiler-book/
Visualize enrichment analyses

As we already discussed, I just wanted to share my experience. Noticing that we already have the PD outputs available, the normalization steps to extract quantitative information from this data are not too complicated, and a t-test (with FDR correction?) should be enough to pinpoint proteins differentially expressed between biological conditions/TMT channels.

Cheers

matthewhirschey / proteomics Goto Github PK

proteomics's People

Contributors

Watchers

Forkers

proteomics's Issues

My quantitative proteomics analysis workflow

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent