emetabohub / ms-cleanr Goto Github PK

View Code? Open in Web Editor NEW

25.0 5.0 7.0 20.21 MB

R 100.00%

annotation lc-ms-data metabolomics

ms-cleanr's Introduction

MS-CleanR: A package for cleaning and annotating LC-MS data

The MS-CleanR package provides functions for feature filtering and annotation of LC-MS data.

See the publication and tutorials (pdf files included in the master branch) for more information.

Needs MS-DIAL (v4.00 or higher) and MS-FINDER (3.30 or higher): http://prime.psc.riken.jp/compms/index.html

Short description:

MS-CleanR use as input MS-DIAL peak list processed in data dependent analysis (DDA) or data independent analysis (DIA) using either positive ionization mode (PI) or negative ionization mode (NI) or both. First, MS-CleanR apply generic filters encompassing blank injection signal subtraction, background ions drift removal, unusual mass defect filtering, relative standard deviation threshold (RSD) based on sample class and relative mass defect (RMD) window filtering. All these options are tunable by the user. The second step involves a feature clustering method based on MS-DIAL peak character estimation algorithm followed by parental signal extraction using multi-level optimization of modularity algorithm. Optionally, MS-CleanR can merge PI and NI mode during this step. Then, all selected features are exported to MS-FINDER program for in silico-based annotation using hydrogen rearrangement rules (HRR) scoring system. At this step, multiple databases can be queried and each annotation results will be handled by MS-CleanR. The final step will merge annotation results to the filtered peak list by prioritizing database annotation depending on user choice. Optionally, all results can be exported as .msp file for mass spectral similarity networking purpose.

Installation

devtools::install_github("eMetaboHUB/MS-CleanR")
library(mscleanr)
runGUI()

Known Bugs

First, read carefully the MS-DIAL/CleanR/FINDER tutorial

Thanks to all users for their feedback!!

At least 3 blanks and 3 QCs samples are needed for Blank ratio analysis. These samples must be identified as such in the MS-Dial sample list.
Avoid spaces and "-" in samples or classes names
Avoid class names with only one letter
MSCleanR handle LCMS acquired in DIA or DDA mode. All features without MS/MS will be discarded during the first step. If data contain MS1 only, the first MS-CleanR step will crash.
"Error: the condition has length > 1" is encountered during database annotations merging if using R > 4.2

Last News

Since the active development of MSDial 5.x and the integration of a part of MScleanR, this tool will no longer be maintained. We are cooking a new version which take into account all MSDial 5 new features. Hope it will be published soon.

Citation

Publication link: https://pubs.acs.org/doi/abs/10.1021/acs.analchem.0c01594 MS-CleanR: A Feature-Filtering Workflow for Untargeted LC–MS Based Metabolomics Ophélie Fraisier-Vannier, Justine Chervin, Guillaume Cabanac, Virginie Puech, Sylvie Fournier, Virginie Durand, Aurélien Amiel, Olivier André, Omar Abdelaziz Benamar, Bernard Dumas, Hiroshi Tsugawa, and Guillaume Marti Analytical Chemistry Article ASAP DOI: 10.1021/acs.analchem.0c01594

Credits

Licence

GPL-3

ms-cleanr's People

Contributors

Stargazers

Watchers

Forkers

justinzzw jun-lizst haihaba jardeko-1127 jianghexiliu hechth zhang1leo

ms-cleanr's Issues

Error: replacement has 1 row, data has 0

Hello, I am having a problem using the MScleanR generic filtration step from MS-DIAL export files because I continue to get the following Warning: Error in $<-.data.frame: replacement has 1 row, data has 0 [No stack trace available].

I ensured that groups were appropriately assigned during MS-DIAL analysis and that the alignment export reflects this.

Thanks in advance for your help!

Example parameter file

Is there an example parameter file that can be used to run MSClean-R? I found the parameter file exported from MS-DIAL v4.48 couldn't be recognized by the tool. Thanks in advance!

Error: only 0's may be mixed with negative subscripts

Hello,

I am getting an error message "only 0's may be mixed with negative subscripts" during the clean MS-DIAL data step after a couple of minutes from starting the process. Does someone have an idea what kind of problem this indicates? I tried to start a new session in RStudio with cleaned environment.

Thank you,
Ville

Does MS-CleanR only accept project directory that has both pos and neg mode data?

Hi @guikool .

I tried MS-CleanR with DM0019. MS-DIAL demo files in DROP Met data repository.
And I got the following error:

DM0019. MS-DIAL demo files only have neg mode data.
Do I need to look for other datasets that also have pos mode (to try MS-CleanR)?

error in rowmeans

Hi,

I think I have a similar issue to #1
Do you have any guidance on which filter(s) to try singly to see if any features remain after cleaning?

Thanks!

Error in Negative mode

Hi,

I have tried running MS-Clean R with my MS Dial data.
I have both positive and negative datasets and I am trying to use them separately
Everything went well with the positive mode data set. But the error popped up whenever i tried to use the negative data set.
The error is described below.

Warning: Error in $<-.data.frame: replacement has 0 rows, data has 30997
[No stack trace available]

Publish as package on CRAN or bioconda

Hi! To make this package available to more people and easy to access and install, it would be great if the package could be published on CRAN or bioconda.

Together with that, it also makes sense to introduce versioning so that people can reproducibly process their data.

Duplicated results - error

Hi
I used your packed for processing of negative ion mode data. I have 130 samples in several groups (~30).
I applied commend:
clean_msdial_data( filter_blk = T, filter_blk_threshold = 0.9, filter_blk_ghost_peaks = T, filter_mz = T, filter_rsd = T, filter_rsd_threshold = 200, filter_rmd = T, filter_rmd_range = c(50, 3000), threshold_mz = 0.01, threshold_rt = 0.1, user_pos_adducts_refs = NA, user_neg_adducts_refs = NA, user_pos_neutral_refs = NA, user_neg_neutral_refs = NA, compute_pearson_correlation = T, pearson_correlation_threshold = 0.97, pearson_p_value = 0.01)
and I got this information
Warning messages:
1: In type.convert.default(unlist(x, use.names = FALSE)) :
'as.is' should be specified by the caller; using TRUE
2: In merge.data.frame(peak_data, height_data[, c("id", "ratio_Blank")], :
column names ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘B.1’, ‘Q.1’, ‘Z.2’, ‘Z.2’, ‘Z.2’, ‘B.2’, ‘Z.2’, ‘Q.2’, ‘Z.2’, ‘Z.2’, ‘Z.2’, ‘Z.2’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘B.3’, ‘Z.3’, ‘Z.3’, ‘Q.3’ are duplicated in the result
3: In merge.data.frame(peak_data, ghosts, by = "round_mz") :
column names ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘Z.1’, ‘B.1’, ‘Q.1’, ‘Z.2’, ‘Z.2’, ‘Z.2’, ‘B.2’, ‘Z.2’, ‘Q.2’, ‘Z.2’, ‘Z.2’, ‘Z.2’, ‘Z.2’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘Z.3’, ‘B.3’, ‘Z.3’, ‘Z.3’, ‘Q.3’ are duplicated in the result

Unfortunetly, I got in next step:
Error in [<-.data.frame(*tmp*, samples$Column_name, value = c(0.000193764208962931, :
repeted index for column
when I applied launch_msfinder_annotation

Do you know how can I fix this problem?

Best regards
Darek

"Error: must have >1 column"

Hello, I can't complete the first (generic) filtration step from MS-DIAL output because I continue to receive the following
Warning: Error in Hmisc::rcorr: must have >1 column [No stack trace available]

I ensured that groups were appropriately assigned during MS-DIAL analysis and that the alignment export reflects this. I've also tried different file formats for the MS-DIAL output as input for MS-CleanR.

Thanks for your help!

Intensity of compund in each group in annotated_MS_peaks-cleaned.csv is far from that in raw data

Hi, I'am using MS-CleanR to deal with metabonnomics data. It's very convenient to do data clean after MS-DIAL data processing. But I found that the intensity of compund in each group in annotated_MS_peaks-cleaned.csv is far different from that in raw data, for example, the intensity in Height_xxx.txt exported from MS-DIAL. I would like to know what kind of operations was implemented by ms-cleanR to the raw intensity, and whether there is a way to keep the raw intensity in the output of ms-cleanR. Thank you very much.

Time consumming workflow

Hi @guikool, @gcabanac, @SyrupType
I'd like to ask you about workflow for pos and neg ionisation.
I've got experiment with 60 samples in neg mode and 60 in pos mode. My computer (i9-10900X CPU @ 3.70GHz /64GB RAM) was processing this data set very long. How can I improve this process? I added settings for first step.
Best regards
Darek

Error in rowMeans: 'x' must be numeric [No stack trace available]

I'm having this issue over and over after many attempts.

I'm following the tutorial Tutorial-MS-CleanR-v1.00e.pdf.
I have R version 3.6.1 and MS-Dial 4.16.

I suspect this error is being raised in line 202 of file R/msdial_functions.R since it's the only place that function rowMeans is being called. It appears variable x is not being cast as numeric before the rowMeans is called.

This is what is shown in the R console:

Warning: Error in `rowMeans`: 'x' must be numeric
  [No stack trace available]

I changed all my class names to address the concerns.
I also see the same message in the Shiny GUI. See screenshots below:

Would you know a solution for this problem?

Please advise.

Help with Data Input and Processing in MS-CleanR from Agilent .D Format

Hello everyone,

I'm seeking guidance and assistance with introducing and processing data in the .D format generated by Agilent instruments in MS-CleanR. I would greatly appreciate it if someone could provide information on the format and file extension of data files that are compatible with MS-CleanR and how to carry out the conversion and processing of .D files in this tool.

Additionally, I have encountered the following error when attempting to load the data into MS-CleanR:

Error: A pos or a neg directory must be present. Check your data and your parameters. If the problem persists, you can contact [email protected].

My main questions include:

What is the required format and file extension for data input in MS-CleanR?

What is the recommended process for converting .D files generated by Agilent instruments into formats compatible with MS-CleanR, such as .mzML, .mzXML, or .cdf?

Are there specific steps or recommendations for preparing and organizing data before loading it into MS-CleanR?

Any information, resources, or examples related to data input and processing in MS-CleanR would be greatly appreciated by me and other community members who may have the same question.

Thank you in advance for your cooperation and assistance!

Best regards,
Javier Rodríguez

Error 'x' must be numeric

Hi all,

I have got another error described below.
All data were achieved by DDA mode and I followed the exact same setting that I had used before.
It did work but not today

Kim

Warning in names(dial)[names(dial) == class] <- paste0("avg_", corr[corr$Class == :
number of items to replace is not a multiple of replacement length
Warning in names(dial)[names(dial) == paste0(class, ".1")] <- paste0("sd_", :
number of items to replace is not a multiple of replacement length
Warning: Error in rowMeans: 'x' must be numeric