Jian Guo, Sam Shen, Tao Huan 07/04/2021
- Part 1: Introduction and installation
- Part 2: Extracting features using MS1 peak picking
- Part 3: Extracting features using MS2 recognition
- Part 4: Extracting features using a targeted list
- Part 5: Alignment
- Part 6: CAMERA annotation
- Part 7: MS2 annotation
- Part 8: Exporting EIC
- Part 9: Additional notes
JPA
is a comprehensive and integrated metabolomics data processing software. It extract both Gaussian and non-Gaussian shaped metabolic features. It also performs sample alignment, adduct and metabolite annotations. 'JPA' is written in R and its source code is publicly available at https://github.com/HuanLab/JPA.git. User can refer to the folder "example" for various types of untargeted metabolomics data processing. The demo data files are available in "Demo data.zip" for users to try different functions.
To install JPA
package R version 4.0.0 or above is required, and we
recommend using RStudio to complete the installation and usage of
JPA
by following the steps below:
# Install "BiocManager" package from CRAN if you do not already have it installed.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Install "devtools" package from CRAN if you do not already have it installed.
if (!requireNamespace("devtools", quietly = TRUE)){
install.packages("devtools")
}
# Load "devtools" package.
library(devtools)
# Install "JPA" from Github using "devtools".
if (!requireNamespace("JPA", quietly = TRUE)){
install_github("HuanLab/JPA")
}
# Load "JPA" package.
library(JPA)
JPA
supports multiple ways to extract metabolic features (labeled as "PP") using peak picking algorithms. Users can choose to extract metabolic features directly from raw LC-MS data using the embedded XCMS functions (Section 2.1). Users can also upload their own feature table in csv format generated by other commonly used data processing software (Section 2.2). In those cases, JPA
will extract additional metabolic features using XCMS peak picking algorithms.
For the rest of the tutorial, to view
details and additional parameters of functions, type:
help("<function name>")
. Note: for multi-sample analysis, sample
alignment is performed after feature extraction. It will be discussed in section 5.
One or multiple mzXML files from LC-MS analyses can be read in and processed to extract MS1 PP features. ‘JPA’ accepts raw data in any vendor format that is compatible with MS-Convert. All mzXML file(s) need to be placed in the same folder containing no other irrelevant mzXML files. Additional details of XCMS algorithms can be found at: https://rdrr.io/bioc/xcms/man/.
# dir specifies the full directory of the folder containing mzXML file(s).
dir = "X:/Users/JPAtest_20210330/multiDDA"
# The XCMS.featureTable() function outputs a dataframe formatted featuretable as well as an MSnbase object.
# The "level" column shows the level of each feature.
featureTable <- XCMS.featureTable(dir = dir, mz.tol = 10, ppm=10, peakwidth=c(5,20), mzdiff = 0.01,
snthresh = 6, integrate = 1, prefilter = c(3,100), noise = 100)
head(featureTable)
## mz rt rtmin rtmax maxo sample level
## F1 44.99865 1446.592 1403.316 1461.054 84702 3 PP
## F2 44.99869 1447.319 1430.959 1457.881 85878 1 PP
## F3 44.99874 1680.279 1676.622 1691.061 68456 1 PP
## F4 44.99877 1679.435 1675.546 1691.578 66910 2 PP
## F5 44.99878 1680.822 1676.406 1691.724 75678 3 PP
## F6 44.99880 1443.377 1435.238 1455.427 83670 2 PP
The input feature table contain only columns in the following order: m/z, retention time, min retention time, max retention time, intensity. Note: column 3 and column 4 are the retention time of the feature edges, and all three columns containing retention time information should be in seconds. For multi-sample analysis, a CSV feature table for each single mzXML file needs to be provided. Aligned feature table is not acceptable. All CSV file(s) of interest need to be placed in the same folder containing no other irrelevant CSV files.
An R code (convertCSV.R) is provided to help user transform the csv from MS-DIAL output to the format recognizable by 'JPA'. For other types of input, user can easily tune the code to adapt to converting different formats of csv files.
# ft_directory specifies the directory of the custom csv file(s).
FTdir = "X:/Users/JPAtest_20210330/multiDDA"
# dir specifies the full directory of the folder containing mzXML file(s).
dir = "X:/Users/JPAtest_20210330/multiDDA"
# Sample csv feature table.
setwd(FTdir)
head(read.csv(list.files(pattern = ".csv")[1], header = T, stringsAsFactors = F))
## mz rt rtmin rtmax Height
## 1 194.90540 25.444 19.459002 35.47100 5737.125
## 2 85.02976 29.384 7.153002 70.32402 1118.750
## 3 87.00925 29.384 7.531002 70.32402 2168.750
## 4 111.00890 29.384 7.153002 72.63300 4777.000
## 5 173.00910 29.384 4.920000 75.61200 1909.250
## 6 173.00920 29.384 5.726000 64.40298 1909.250
# The custom.featureTable() function outputs a dataframe formatted featuretable as well as an MSnbase object.
# The "level" column shows the level of each feature.
featureTable <- custom.featureTable(dir = dir, FTdir = FTdir)
head(featureTable)
## mz rt rtmin rtmax maxo sample level
## F1 44.99859 1444.718 1435.434 1469.897 82181.38 3 PP
## F2 44.99863 1272.449 1211.726 1281.954 77924.63 3 PP
## F3 44.99866 1680.282 1674.256 1693.137 43457.13 1 PP
## F4 44.99866 1680.282 1674.256 1693.137 43457.13 1 PP
## F5 44.99868 1274.340 1201.896 1290.801 80583.88 1 PP
## F6 44.99868 1274.340 1201.896 1290.801 80583.88 1 PP
After PP features have been extracted, extracting features using MS2 recognition (labeled as "MR") can be performed. This step is optional. Please do not use this function when processing full-scan or DIA data set!
The threshold used for determining whether the MR features are true positive can be estimated by using the R code "thresholdEstimate.R" provided on the GitHub.
# Find MR features and add them to the original feature table.
featureTable <- find.level3features(data = MSdata, level3.threshold = 2)
After extracting all PP and MR features, users may also extract features with a targeted list. If these features already exists in the original feature table as either PP or MR features, they will not be added. Otherwise, these features will be explored from the raw data and added to the feature table with “target” as their level if exist. The in-house targeted list must be in CSV format with ONLY the columns “mz” and “rt” in order. "rt" in seconds. This step is also optional.
The threshold used for determining whether the rescued targeted features are true positive can be estimated by using the R code "thresholdEstimate.R" provided on the GitHub.
# Directory containing the in-house standard metabolite library.
tarFTdir = "X:/Users/JPAtest_20210330"
# Name of the in-house standard metabolite library in CSV format.
tarFTname = "LibraryCSVHILIC-.csv"
# Sample format of in-house standard metabolite library.
setwd(tarFTdir)
head(read.csv(tarFTname, header = T, stringsAsFactors = F))
## mz rt
## 1 74.0237 642
## 2 87.0088 708
## 3 88.0393 444
## 4 88.0393 444
## 5 89.0233 468
## 6 101.0244 720
# Extract target feature and add them to the original featuretable if they do no exist already.
featureTable <- find.targetfeatures(data = MSdata, tarFTdir = tarFTdir, tarFTname = tarFTname, target.threshold = 2)
# Target features that are newly added.
head(featureTable[featureTable$level == "target",])
## mz rt rtmin rtmax maxo sample level
## F68 88.0393 444 444 444 402 1 target
## F69 88.0393 444 444 444 402 1 target
## F70 88.0393 444 444 444 330 2 target
## F71 88.0393 444 444 444 330 2 target
## F72 88.0393 444 444 444 512 3 target
## F73 88.0393 444 444 444 512 3 target
Sample alignment is performed after all level features have been extracted. Do not perform sample alignment for single-sample analysis.
# Update feature table to aligned feature table for multi-sample analysis.
featureTable <- peak.alignment(data = MSdata, bw = 5, minfrac = 0.5, mzwid = 0.015,
minsamp = 1, max = 100, quantitative.method = "maxo")
head(featureTable)
## mz rt rtmin rtmax NISTplasmaDDARP(-)1.mzXML
## F1 44.99871 1428.82483 1428.825 1440.01100 82195.130
## F2 44.99873 1680.73462 1670.793 1682.39300 43457.130
## F3 56.99601 1688.53400 1678.229 1690.69250 4393.375
## F4 56.99611 59.37352 54.225 64.03593 5108.125
## F5 60.99350 1686.36900 1678.229 1687.27600 2136.125
## F6 61.98857 1239.31238 1238.709 1239.31238 11225.500
## NISTplasmaDDARP(-)2.mzXML NISTplasmaDDARP(-)3.mzXML
## F1 65772.250 0.000
## F2 39078.130 49169.380
## F3 4224.875 4341.875
## F4 5882.500 5351.500
## F5 2079.500 1892.375
## F6 0.000 10942.250
Optionally, before the alignment, users can use CAMERA to identify adduct
and isotope relationships in the feature table. Note: CAMERA annotation
can ONLY be used for single-sample JPA
analysis.
# Perform CAMERA annotation.
featureTable <- adduct.isotope.annotation(data = MSdata, polarity = "negative")
After you ran the CAMERA annotation, you can perform the alignment function. Several (2+number of samples) new columns will be generated in the aligned feature table with names "isotope", "adducts", and the names of each sample file. The "isotope" column contains a value (0~1) representing the percentage of samples within which the feature is annotated as an isotopic feature. Each of the new column with sample name represents the PC group of this feature in the corresponding sample file. The annotated isotopic features are not removed.
Feature annotation is performed by two functions: “ms2.tofeaturetable” for assigning MS2 to each corresponding feature, and then “feature.annotation” for comparing each experimental MS2 against a standard MS2 library for spectra similarity analysis. The standard library used to perform annotation must be in .msp format.
# The ms2.tofeaturetable() function assigns MS2 spectra to the MS1 feature table. It returns a new feature table with additional columns containing MS2 fragment information.
featureTable <- ms2.tofeaturetable(data = MSdata, featureTable = featureTable)
# Now, use the feature.annotation() function to annotate features in the feature table against a standard database in msp format. This functions returns a feature table containing additional columns with annotation information.
lib_directory <- "X:/Users/Library" # directory containing the library file
lib_name <- "convertedLibraryNeg.msp" # name of the library file
featureTable <- feature.annotation(featureTable = featureTable, lib_directory = lib_directory, lib_name = lib_name, dp = 0.1)
# Display some annotated features.
head(featureTable[featureTable$Annotation != "unknown",])
## mz rt rtmin rtmax NISTplasmaDDARP(-)1.mzXML
## F22 96.96042 33.25266 32.06100 34.44432 0.000
## F23 96.96973 1464.73007 1462.40814 1467.05200 1568.000
## F38 112.98560 1439.97046 1431.12700 1443.37700 6372.125
## F60 129.05550 75.95423 70.53881 79.41445 10862.880
## F61 130.08775 48.60897 44.05600 50.92079 25484.000
## F70 137.02067 92.47435 91.01100 93.93770 10328.000
## NISTplasmaDDARP(-)2.mzXML NISTplasmaDDARP(-)3.mzXML MS2_match
## F22 20666.000 27628.00 TRUE
## F23 1624.125 0.00 TRUE
## F38 6304.625 0.00 TRUE
## F60 11561.880 10012.13 TRUE
## F61 19459.250 15849.88 TRUE
## F70 9604.000 0.00 TRUE
## MS2mz MS2int PeaksCount fromFile
## F22 60.9939;78.9591;79.9571;96.9602 838;1960;5378;6714 4 3
## F23 76.9707;78.9595 816;836 2 2
## F38 44.9989;68.996 1608;1328 2 3
## F60 44.999 900 1 2
## F61 44.999;130.0873 436;1650 2 1
## F70 65.041;76.9703;93.0346;94.9805 900;3602;10898;816 4 3
## Annotation DPscore
## F22 Phosphoric acid 0.9701192
## F23 Phosphoric acid 0.1128625
## F38 Trifluoroacetic acid 0.4502731
## F60 3-Methyl-2-oxovaleric acid 1.0000000
## F61 Unknown (carbon number 6) 0.9668160
## F70 Salicylic acid 0.9462781
EICs can be exported for metabolic features
# Directory to export EICs to.
plotdir <- "X:/Users/JPAtest_20210330"
# Draw and export EICs of all level PP features with MS2.
plot.features(dir = plotdir, featureTable = featureTable, data = MSdata, plot.type = 1, smooth = 2)
# Draw and export EICs of all level PP features without MS2.
plot.features(dir = plotdir, featureTable = featureTable, data = MSdata, plot.type = 2, smooth = 2)
# Draw and export EICs of all level MR features.
plot.features(dir = plotdir, featureTable = featureTable, data = MSdata, plot.type = 3, smooth = 2)
# Draw and export EICs of all level target features.
plot.features(dir = plotdir, featureTable = featureTable, data = MSdata, plot.type = "target", smooth = 2)
# ONLY for multi-sample analysis AND after alignment, draw and export EICs of all aligned features.
plot.features(dir = plotdir, featureTable = featureTable, data = MSdata, plot.type = "multi", smooth = 2)