- Shiny app
- Statistical analyses
- Other tasks
- Resources and references (Markdown, Docker, Git, ...)
The goal is to create tools to handle data that is generated by protein correlation profiling (PCP) data.
At the moment, this entails two main tasks (and several things that are difficult to categorize):
- A Shiny app for data wrangling and visual exploration
- Developing functions for statistical analyses
The DepLab
package that has been developed by the Applied Bioinformatics Core at Weill Cornell Medicine will serve as a starting point.
The DepLab
package contains a shiny app that allows for:
- upload of PCP data into a data base
- smoothening of the data
- visual exploration of individual protein profiles
More details can be found in the manual.
The Hackathon Shiny App can be found here.
It includes functions and examples for the following cool tasks:
- interactive graphics
- additional plots, e.g. histograms of QC values to allow for user-defined filtering [QC should definitely be part of the development]
- log files once a user saves a plot to reload the exact same settings in the future
- connection to String, the database of protein interactions
-
Identify proteins whose profiles change between two (or more) conditions (taking the variability based on replicates into account)
* some sort of ranking * statistical significance?
-
Identify proteins that co-elute/change the same/different way(s), i.e.,
* that may be in the same complex * that may change the complex membership depending on the condition * ...
-
An R package containing the example data that we are going to work with
-
Quality control, both visually and perhaps even cooking up some sort of score?
- per protein - reproducibility between replicates - how well are certain "gold-standard" complexes revocered?
-
Updating the manual, making a proper vignette/tutorial (there should be one for every package at least)
-
Implementing proper tests for the functions, e.g. using Hadley's
testthat
package
- MaxQuant - the software we rely on to produce our primary data
- Brief description @ MPI website
- GUI user guide with some info about the output
- Andromeda paper
- How to interpret MQ output
- Nat. Methods MQ Practical Guide --> mostly Box 2
- Quantitative Proteome Profiling, Methods in Mol Biol --> Section 3.4 Data Analysis Using MaxQuant contains many details about the MQ output (ignore the SILAC details)
- Presentation with lots of MQ details
- Protein correlation profiling
- 15 minute interactive Git tutorial
- Longish Data camp course for using Git via RStudio
- briefer alternative
- to set up Git/RStudio, see Hadley Wickham's guide
- Brief summary of git lingo and commands with a focus on team work
- Brief intro
- The very long and detailed R page (I used it mostly as a reference for the formating of the documentation)
This is based on the packages currently used in DepLab. This is, of course, subject to change!
- R package creation and maintenance
devtools
roxygen2
testthat
- data wrangling
data.table
dplyr
- visualization
ggplot2