Coder Social home page Coder Social logo

workflow4metabolomics / tools-metabolomics Goto Github PK

View Code? Open in Web Editor NEW
24.0 10.0 23.0 1.34 GB

Galaxy tools for metabolomics maintained by Workflow4Metabolomics

Home Page: https://workflow4metabolomics.org/

License: GNU General Public License v3.0

Python 4.47% HTML 1.59% R 93.59% Shell 0.35%
metabolomics galaxy galaxy-project workflow4metabolomics usegalaxy mass-spectrometry nmr metabolomics-pipeline lcms gcms

tools-metabolomics's Introduction

Galaxy tools for metabolomics

install with bioconda Galaxy Tool Linting and Tests for push and PR

Purpose

This repository aims to gather tools and contributors from the metabolomics world.

It is maintained by Galaxy metabolomics community and open to any contributors.

Tools themselves should stick with the IUC (Galaxy Intergalactic Utilities Commission) standards and best practices

Team Members

  • Bjorn Gruening (@bgruening)
  • Gildas Le Corguillé (@lecorguille)
  • Guitton Yann (@yguitton)

New team members can be suggested by a PR against this file which needs to be approved by a majority of the current team members.

Workflow4Metabolomics

This repository was initiated by the Workflow4Metabolomics project. Workflow4Metabolomics, W4M in short, is a collective offering ressources for processing, analyzing and annotating metabolomic data.

Related open source projects

Galaxy

Galaxy is an open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.

Dependencies using Conda

install with bioconda

Conda is a package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more.

Planemo

Planemo is a command-line utilities to assist in developing Galaxy

Other informations

Job Dynamic Destination Mapping

Some tools implement a Job Dynamic Destination Mapping, like xcmsSet

Tool listing

The R-package CAMERA is a Collection of Algorithms for MEtabolite pRofile Annotation. Its primary purpose is the annotation and evaluation of LC-MS data. It includes algorithms for annotation of isotope peaks, adducts and fragments in peak lists. Additional methods cluster mass signals that originate from a single metabolite, based on rules for mass differences and peak shape comparison. To use the strength of already existing programs, CAMERA is designed to interact directly with processed peak data from the R-package xcms. What it does? The CAMERA annotation procedure can be split into two parts: We want to answer the questions which peaks occur from the same molecule and secondly compute its exact mass and annotate the ion species. Therefore CAMERA annotation workflow contains following primary functions: 1. peak grouping after retention time (groupFWHM) 2. peak group verification with peakshape correlation (groupCorr) Both methods separate peaks into different groups, which we define as ”pseu- dospectra”. Those pseudospectra can consists from one up to 100 ions, de- pending on the molecules amount and ionizability. Afterwards the exposure of the ion species can be performed with: 2 1. annotation of possible isotopes (findIsotopes) 2. annotation of adducts and calculating hypothetical masses for the group (findAdducts) This workflow results in a data-frame similar to a xcms peak table, that can be easily stored in a comma separated table .csv (Excel-readable). If you have two or more conditions, it will return a diffreport result within the annotation results. The diffreport result shows the most significant differences between two sets of samples. Optionally create extracted ion chromatograms for the most significant differences.

This function check annotations of ion species with the help of a sample from opposite ion mode. As first step it searches for pseudospectra from the positive and the negative sample within a reten- tion time window. For every result the m/z differences between both samples are matched against specific rules, which are combinations from pos. and neg. ion species. As example M+H and M-H with a m/z difference of 2.014552. If two ions matches such a difference, the ion annotations are changed (previous annotation is wrong), confirmed or added. Returns the peaklist from one ion mode with recalculated annotations.

xml macros for other camera repos

xml file describing dependencies for other camera repos

This tool takes as inputs either tabular table files from the metabolomic workflow (variableMetadata, dataMatrix and sampleMetadata) or a table file of your own and can execute three different functions ("sorting", "corrdel" and "corr_matrix").

The "sorting" function: used for metabolomic workflow

  1. First of all, it sorts the data by pcgroup.
  2. It computes the mean operation of all the signal values of the metabolites by sample, and put the results in a new column "signal_moy".
  3. It finally creates a tabular output "sorted_variableMetadata.tsv".

The "corrdel" function: used for metabolomic workflow

For each pcgroup of the previous sorted tabular file "sorted_table.tsv", it does the following things:

  • it computes a correlation matrix
  • it determines the metabolites which are not correlated to others from the same pcgroup based on the threshold value filled in the "Correlation threshold for pcgroup" parameter
  • the metabolites are sorted by the mean signal intensity (form the highest to the lowest), and each metabolite is tested to the previous ones in the list ; if the tested metabolite is at least correlated to one previous one, it is tagged as DEL (for "deleted", written in a column called "suppress")

It creates two additional tabular files:

  • "correlation_matrix_selected.tsv" (correlation matrix of selected metabolites only)
  • "sif_table.tsv" (for visualization in CytoScape, based on selected metabolites and "Cytoscape correlation threshold" filled value)

The "corr_matrix" function: used for user table file

| It computes a correlation matrix named "correlation_matrix.tsv" and creates a sif file named "sif_table.tsv" (for visualization in CytoScape).

Genform generates candidate molecular formulas from high-resolution MS data. It calculates match values (MV) that show how well candidate molecular formulas fit the MS isotope peak distributions (MS MV) and the high-resolution MS/MS fragment peak masses (MSMS MV). Finally it computes a combined match value from these two scores. This software can be regarded as a further development of the ElCoCo and MolForm modules of MOLGEN-MS with a clear specialization towards MS/MS.

Optimize free fluxes and optionaly metabolite concentrations of a given static metabolic network defined in an FTBL file to fit 13C data provided in the same FTBL file.

IPO.ipo4xcmsSet A Tool for automated Optimization of XCMS Parameters

IPO.ipo4xcmsSet A Tool for automated Optimization of XCMS Parameters

We strongly encourage you to read the documentation <https://isoplot.readthedocs.io/en/latest/>_ before using Isoplot.

Reads as set of XML-based mass-spectrometry data files and generates an MSnExp object. This function uses the functionality provided by the ‘mzR’ package to access data and meta data in ‘mzData’, ‘mzXML’ and ‘mzML’.

ASICS, based on a strong statistical theory, handles automatically the metabolite identification and quantification

BARSA is an automatic algorithm for bi-dimensional NMR spectra annotation

Spectra preprocessing

These steps correspond to the following steps in the PEPS-NMR R library (https://github.com/ManonMartin/PEPSNMR):

  • Group Delay suppression (First order phase correction)
  • Removal of solvent residuals signal from the FID
  • Apodization to increase the Signal-to-Noise ratio of the FID
  • Fourier transformation
  • Zero order phase correction
  • Shift referencing to calibrate the spectra with internal compound referencing
  • Baseline correction
  • Setting of negatives values to 0

NMR Read

Nuclear Magnetic Resonance Bruker files reading (from the PEPS-NMR R package (https://github.com/ManonMartin/PEPSNMR))

Normalization (operation applied on each (preprocessed) individual spectrum) of preprocessed data

xcms get sampleMetadata This tool generates a skeleton of sampleMetadata with perhaps some strange sample names which are definitely compatible with xcms and R This sampleMetadata file have to be filled with extra information as the class, batch information and maybe conditions

xcms fillChromPeaks Integrate areas of missing peaks For each sample, identify peak groups where that sample is not represented. For each of those peak groups, integrate the signal in the region of that peak group and create a new peak.

xcms groupChromPeaks

After peak identification with xcmsSet, this tool groups the peaks which represent the same analyte across samples using overlapping m/z bins and calculation of smoothed peak distributions in chromatographic time. Allows rejection of features, which are only partially detected within the replicates of a sample class.

xml macros for other xcms repos

xcms findChromPeaks Merger This tool allows you to run one xcms findChromPeaks process per sample in parallel and then to merge all RData images into one. The result is then suitable for xcms groupChromPeaks. You can provide a sampleMetadata table to attribute phenotypic values to your samples.

xcms plot chromatogram This tool will plot Base Peak Intensity chromatogram (BPI) and Total Ion Current chromatogram (TIC) from xcms experiments.

xcms refineChromPeaks

After peak identification with xcms findChromPeaks (xcmsSet), this tool refines those peaks. It either removes peaks that are too wide or removes peaks with too low intensity or combines peaks that are too close together. Note well that refineChromPeaks methods will always remove feature definitions, because a call to this method can change or remove identified chromatographic peaks, which may be part of features. Therefore it must only be run immediately after findChromPeaks (xcmsSet).

xml file describing dependencies for other xcms repos

xcms adjustRtime After matching peaks into groups, xcms can use those groups to identify and correct correlated drifts in retention time from run to run. The aligned peaks can then be used for a second pass of peak grouping which will be more accurate than the first. The whole process can be repeated in an iterative fashion. Not all peak groups will be helpful for identifying retention time drifts. Some groups may be missing peaks from a large fraction of samples and thus provide an incomplete picture of the drift at that time point. Still others may contain multiple peaks from the same sample, which is a sign of impropper grouping.

xcms process history This tool provide a HTML summary which summarizes your analysis using the [W4M] XCMS and CAMERA tools

test data repo for xcms tool suit

xcms findChromPeaks This tool is used for preprocessing data from multiple LC/MS files (NetCDF, mzXML and mzData formats) using the xcms_ R package. It extracts ions from each sample independently, and using a statistical model, peaks are filtered and integrated. A tutorial on how to perform xcms preprocessing is available as GTN_ (Galaxy Training Network).

Historic contributors (non cited by GitHub)

tools-metabolomics's People

Contributors

bernt-matthias avatar bgruening avatar chufz avatar eschen42 avatar fgiacomoni avatar foellmelanie avatar lain-inrae avatar lecorguille avatar llegregam avatar manonmartin avatar melpetera avatar mmonsoor avatar mtremblayfr avatar ofilangi avatar pkrog avatar rjmw avatar sgsokol avatar sneumann avatar yguitton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tools-metabolomics's Issues

XCMS3: review findChromPeaks/xcmsSet parameters

findChromPeaks/xcmsSet

Generale

  • Advanced Options
    • MSnBase::filterAcquisitionNum (previously scanrange)
    • MSnBase::filterRt
    • MSnBase::filterMz

CentWaveParam()

  • Basic Options
    • ppm: 25
    • peakwidth: 20, 50
  • Advanced Options
    • snthresh: 10
    • prefilter: 3, 100
    • mzCenterFun: wMean
    • integrate: 1
    • mzdiff: -0.001
    • fitgauss: FALSE
    • noise: 0
    • verboseColumns: FALSE
      • List of regions-of-interest (ROI)
      • roiList length: 0
      • firstBaselineCheck TRUE
      • roiScales length: 0

CentWavePredIsoParam()

  • Basic Options
    • ppm: 25
    • peakwidth: 20, 50
  • Advanced Options
    • snthresh: 10
    • prefilter: 3, 100
    • mzCenterFun: wMean
    • integrate: 1
    • mzdiff: -0.001
    • fitgauss: FALSE
    • noise: 0
    • verboseColumns: FALSE
  • List of regions-of-interest (ROI)
    • roiList length: 0
    • firstBaselineCheck TRUE
    • roiScales length: 0
    • snthreshIsoROIs: 6.25
    • maxCharge: 3
    • maxIso: 5
    • mzIntervalExtension: TRUE
    • polarity: unknown

MatchedFilterParam()

  • Basic Options
    • fwhm: 30
    • binSize: 0.1 (previously step)
    • impute: none (previously profmethod)
      • baseValue:
      • distance:
  • Advanced Options
    • impute: none (previously profmethod)
      • baseValue:
      • distance:
    • sigma: 12.73994
    • max: 5
    • snthresh: 10
    • steps: 2
    • mzdiff: 0.6
  • No yet integrated
    • index: FALSE

MSWParam()

  • Basic Options
    • snthresh: 3
    • verboseColumns: FALSE
    • scales: 1,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,36,40,44,48,52,56,60,64
    • peakScaleRange: 5
    • ampTh: 0.01
  • No yet integrated
    • nearbyPeak: TRUE
    • minNoiseLevel: 0.003333333
    • ridgeLength: 24
    • peakThr:
    • tuneIn: FALSE
  • Missing?
    • winSize_noise
    • SNR_method

MassifquantParam()

  • No yet integrated
    • ppm: 25
    • peakwidth: 20, 50
    • snthresh: 10
    • prefilter: 3, 100
    • mzCenterFun: wMean
    • integrate: 1
    • mzdiff: -0.001
    • fitgauss: FALSE
    • noise: 0
    • verboseColumns: FALSE
    • criticalValue: 1.125
    • consecMissedLimit: 2
    • unions: 1
    • checkBack: 0
    • withWave: FALSE

I NEED YOU! :)

So dear @workflow4metabolomics/ms, (and if you have 5 minutes: @sneumann, @jotsetung) :

  • I need you to tell me which parameters we should expose by default or in the advanced section or not exposed at all.
  • I also need you because I can't find some parameters "Missing?"

If you want, you can use the same checklist as me (here) and move items.

Many thanks by advanced

planemo / conda issues

  1. Background

Until september, we were able to manage functional tests and docker building using some wonderful Conda dependencies.
Since, there was a huge migration to R-3.3.1 bioconda/bioconda-recipes#2404
All our tools passed this migration.
But since, there are conflicts between tools versions and channels and whatever which induce that some tools come with the R 3.2.2 and some with R 3.3.1: galaxyproject/planemo#604
An update of Conda within Galaxy should solve this issue...

Currently, I'm testing as suggested here galaxyproject/tools-iuc#1071 to use the last version of miniconda : miniconda3-4.2.12

  1. miniconda installation
wget -q --recursive 'https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh'
bash 'Miniconda3-4.2.12-Linux-x86_64.sh' -b -p /tmp/mc3-4.2.12/
  1. deps installation

planemo conda_install --conda_prefix /tmp/mc3-4.2.12/ .

👍 Good news: all tools are installed with their 3.3.1 version

  1. planemo test

planemo test --install_galaxy --conda_dependency_resolution --conda_prefix /tmp/mc3-4.2.12/

galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:28,567 Executing command: /tmp/mc3-4.2.12/bin/conda list --name [email protected]_1 --export > /tmp/tmpjhqeXz/tmp/jobdepsdf01NEa817a1d312c412c739d4c357c9343718c2068f458808f083afdf1f251d82564c/[email protected]_1
requests.packages.urllib3.connectionpool INFO 2016-12-21 15:53:29,172 Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool DEBUG 2016-12-21 15:53:29,436 "GET /api/jobs/5729865256bc2525?key=89116108df0529eaf07c60bfbc2cd985 HTTP/1.1" 200 None
galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:29,713 Executing command: /tmp/mc3-4.2.12/bin/conda create -y --unknown --offline --prefix /tmp/tmpjhqeXz/job_working_directory/000/2/conda-env --file /tmp/tmpjhqeXz/tmp/jobdepsdf01NEa817a1d312c412c739d4c357c9343718c2068f458808f083afdf1f251d82564c/[email protected]_1 > /dev/null
requests.packages.urllib3.connectionpool INFO 2016-12-21 15:53:31,044 Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool DEBUG 2016-12-21 15:53:31,297 "GET /api/jobs/5729865256bc2525?key=89116108df0529eaf07c60bfbc2cd985 HTTP/1.1" 200 None
galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:32,447 Executing command: /tmp/mc3-4.2.12/bin/conda clean --tarballs -y
galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:33,450 Executing command: /tmp/mc3-4.2.12/bin/conda list --name [email protected] --export > /tmp/tmpjhqeXz/tmp/jobdeps1Qjc8_89b20b2c5915d075d4a0c07dfb65cfb04a50060d7fadd95d12895e582b914f79/[email protected]
requests.packages.urllib3.connectionpool INFO 2016-12-21 15:53:34,506 Starting new HTTP connection (1): localhost
galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:34,576 Executing command: /tmp/mc3-4.2.12/bin/conda install -y --unknown --offline --prefix /tmp/tmpjhqeXz/job_working_directory/000/2/conda-env --file /tmp/tmpjhqeXz/tmp/jobdeps1Qjc8_89b20b2c5915d075d4a0c07dfb65cfb04a50060d7fadd95d12895e582b914f79/[email protected] > /dev/null
requests.packages.urllib3.connectionpool DEBUG 2016-12-21 15:53:34,717 "GET /api/jobs/5729865256bc2525?key=89116108df0529eaf07c60bfbc2cd985 HTTP/1.1" 200 None
galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:36,610 Executing command: /tmp/mc3-4.2.12/bin/conda clean --tarballs -y
galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:37,592 Executing command: /tmp/mc3-4.2.12/bin/conda list --name [email protected]_4 --export > /tmp/tmpjhqeXz/tmp/jobdepsHNeXMre7325fdd48bc9d36b864284fa0ff5f09769060a0bf098ef4ab24c33996193d03/[email protected]_4
galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:38,540 Executing command: /tmp/mc3-4.2.12/bin/conda install -y --unknown --offline --prefix /tmp/tmpjhqeXz/job_working_directory/000/2/conda-env --file /tmp/tmpjhqeXz/tmp/jobdepsHNeXMre7325fdd48bc9d36b864284fa0ff5f09769060a0bf098ef4ab24c33996193d03/[email protected]_4 > /dev/null
...

UnsatisfiableError: The following specifications were found to be in conflict:
  - bzip2 1.0.6 3
Use "conda info <package>" to see the dependencies for each package.


galaxy.tools.deps.conda_util DEBUG 2016-12-21 15:53:39,769 Executing command: /tmp/mc3-4.2.12/bin/conda clean --tarballs -y
galaxy.jobs.runners ERROR 2016-12-21 15:53:40,699 (2) Failure preparing job
Traceback (most recent call last):
  File "/tmp/tmpjhqeXz/galaxy-dev/lib/galaxy/jobs/runners/__init__.py", line 170, in prepare_job
    job_wrapper.prepare()
  File "/tmp/tmpjhqeXz/galaxy-dev/lib/galaxy/jobs/__init__.py", line 913, in prepare
    self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
  File "/tmp/tmpjhqeXz/galaxy-dev/lib/galaxy/tools/__init__.py", line 1331, in build_dependency_shell_commands
    tool_instance=self
  File "/tmp/tmpjhqeXz/galaxy-dev/lib/galaxy/tools/deps/__init__.py", line 104, in dependency_shell_commands
    return [dependency.shell_commands(requirement) for requirement, dependency in requirement_to_dependency.items()]
  File "/tmp/tmpjhqeXz/galaxy-dev/lib/galaxy/tools/deps/resolvers/conda.py", line 245, in shell_commands
    self.build_environment()
  File "/tmp/tmpjhqeXz/galaxy-dev/lib/galaxy/tools/deps/resolvers/conda.py", line 240, in build_environment
    raise DependencyException("Conda dependency seemingly installed but failed to build job environment.")
DependencyException: Conda dependency seemingly installed but failed to build job environment.
  1. Investigation
/tmp/mc3-4.2.12/bin/conda list --name [email protected]_1 | grep bzip2
bzip2                     1.0.6                         3
/tmp/mc3-4.2.12/bin/conda list --name [email protected] | grep bzip2
bzip2                     1.0.6                         3
/tmp/mc3-4.2.12/bin/conda list --name [email protected]_4 | grep bzip2
bzip2                     1.0.6                         3
  1. Sad

😭

TIC/BPC - plot relative intensity

A request we had:

Hi, Is there a way to plot relative intensity on y axis in the BPC and TICs generated from xcmsSet, rather than total intensity?

XCMS3: review adjustRtime/retcor Obiwarp parameters

adjustRtime/retcor Obiwarp

Obiwarp()

  • Basic Options
    • binSize: 1
  • Advanced Options
    • centerSample:
    • response: 1
    • distFun: cor_opt
    • gapInit: 0.3
    • gapExtend: 2.4
    • factorDiag: 2
    • factorGap: 1
    • localAlignment: FALSE
    • initPenalty: 0

I NEED YOU AGAIN! :)

The same as for findChromPeaks/xcmsSet parameters.
So far, we don't expose so many arguments for the method Obiwrap.
Thus dear @workflow4metabolomics/ms, (and if you are interesting in: @sneumann, @jotsetung) :

  • I need you to tell me which parameters we should expose by default or in the advanced section or not exposed at all.

If you want, you can use the same checklist as me (here) and move items.

Many thanks by advanced

Raise an error if no peaks were detected

#2 (comment)

Example:

	XSET OBJECT INFO
An "xcmsSet" object with 1 samples

Time range: Inf--Inf seconds (Inf--Inf minutes)
Mass range: Inf--Inf m/z
Peaks: 0 (about 0 per sample)
Peak Groups: 0 
Sample classes: .

Optained with

	ARGUMENTS INFO
singlefile_galaxyPath	/export/galaxy-central/database/files/000/dataset_5.dat
singlefile_sampleName	MM8.mzML
xfunction	xcmsSet
xsetRdataOutput	/export/galaxy-central/database/files/000/dataset_7.dat
sampleMetadataOutput	/export/galaxy-central/database/files/000/dataset_8.dat
ticspdf	/export/galaxy-central/database/files/000/dataset_9.dat
bicspdf	/export/galaxy-central/database/files/000/dataset_10.dat
nSlaves	1
method	centWave
ppm	25
peakwidth	c(20, 50)

Need some parameter adjustments

Lack

  • xcmsSet.findPeaks.centWave mzCenterFun
  • xcmsSet.findPeaks.centWave fitgauss
  • xcmsSet.findPeaks.matchedFilter sigma
  • xcmsSet.findPeaks.matchedFilter mzdiff
  • group.density minsamp
  • retcor.obiwrap distFunc
  • retcor.obiwrap plottype
  • retcor.obiwrap gapInit
  • retcor.obiwrap gapExtend
  • retcor.obiwrap response
  • retcor.obiwrap factorDiag
  • retcor.obiwrap factorGap
  • retcor.obiwrap localAlignment

Wrong

  • xcmsSet.findPeaks.matchedFilter step is set at 0.01 but the default is 0.1
  • group.density max is set at 5 but default is 50

Other

  • xcmsSet.findPeaks.centWave integrate we have to add 1 - or 2 - in the option labels

Add minsamp agrument in group

group function parameters are not complet, the minsamp argument is missing, can we add it in the advanced options list?

add unzip to the requirements of xcms.xcmsSet

Since unzip might not be available on a cluster environment (as in our case on CentOS) it should be in the requirements.

With zip file import the tool failed with a cryptic error:

find: `NA': No such file or directory
find: `/gpfs1/data/galaxy_server/galaxy/jobs_dir/006/6477/working/NA': No such file or directory
Warning message:
running command 'find $PWD/NA -not -name '\.*' -not -path '*conda-env*' -type f -name "*"' had status 1 
Error in xcmsSet(NA_character_, nSlaves = 1, method = "matchedFilter",  : 
  No NetCDF/mzXML/mzData/mzML files were found.
Calls: do.call -> do.call -> xcmsSet
Execution halted

The file list returned by unzip seems to be empty and therefore the root directory is undetermined. Checking for this might also be a good idea.

@lecorguille maybe you can include this in your efforts while you are working on the merger anyway

scanrange option

ping @yguitton @jfrancoismartin

In the current wrapper version for xcmsSet, scanrange is only available for centWave. In ?xcmsSet, it seems that this obscure option should be available for all methods?

What do you think about that?

Allow as input 1 file zip or not

Report by Mickaël:

Since Galaxy unzip dataset, if there is only one file in it.
The xcmsSet wrapper is not design to deal with only one mzXML file

Error when reading mzXML files (due to accent é è )

Sometimes during MS files life they are stored under filepaths with accent (e.g in french metabolomics is métabolomique)
and so when converting to mzXML (or other format) sometimes those paths are kept in the file and we get error.

invalid UTF-8 input in readChar() ligne <parentFile fileName="file:///D:/JPA/Laits-2015-06-10-Exactive (Metabolomique R�cap)/./211114031_S5_.raw" for exemple

Empty PDF

@yguitton - 22/02/16 to @lecorguille, @melpetera

Bonsoir

j'ai une piste enfin peut-être, pourriez-vous tester en créant deux class samples et pool pour voir?
normalement j'avais fais les modifs pour que plotTIC et plotBPC gèrent le cas à une seule classe mais bon, c'est peut-être pas parfait

dites-moi si ça règle qq chose
Yann

XCMS3: review fillChromPeaks/fillpeaks parameters

fillChromPeaks/fillpeaks

Parameters

  • Basic Options
  • Advanced Options
    • expandMz: 0
    • expandRt: 0
    • ppm: 0

- Missing?

  • method: chrom or MSW

I STILL NEED YOU! :)

So dear @workflow4metabolomics/ms, (and if you have 1 minutes: @sneumann, @jotsetung) :

  • I need you to tell me which parameters we should expose by default or in the advanced section or not exposed at all.
  • I'm also wondering if it's normal that I can't find any setting for the method chrom or MSW

If you want, you can use the same checklist as me (here) and move items.

Many thanks by advanced

XCMS merger: sampleMetadata file generation if not as input

Dear Santa @lecorguille Claus,

When you use XCMS merger without providing any sampleMetadata file, this means that you have no ready "reference" file for the processing step following xcms analyses.

In addition, it is not always straigthforward to construct the sampleMetadata file, knowing that sample identifiers are raw files' names that can be automatically generated by machine's software with unfriendly automatic names.

The possibility to use an empty sampleMetadata file with already the right identifiers, as it is provided with the zip option, is very handy and reduce significantly the misscase errors compared to manual listing in addition to saving time.

For all this reasons I would strongly recommand to add, when not provided as input, a sampleMetadata file as output with identifiers as first column (of course), and maybe just a second column 'class' with a constant value (for example 'no groups' ou "single group').

Thank you for you time.
M. who behaved really well this year.

scan range in xcmsSet not easy to use

Can we change the tool in order to use RTrange instead of scan range?
It is sometime confusing for users to enter a scan number instead of a RT. users are used to cut their chromatogramme between min and max RT values not between scan numbers. And sometimes scan numbers read in in silicos viewer are not the on read by xcms.

get datatypes into Galaxy

@lecorguille if we hurry up we might get the datatypes into 17.05 and ready for GCC :)
The unzip datatype can be removed isn't it? This should now be supported by Galaxy naively.

New column: "namecustom"

This issue concerns the peak table export in xcms.group and xcms.fillpeaks.

The idea is to:

  • keep default ion identifiers,
  • add a "namecustom" column corresponding to what a user asks about decimal places for mass and retention time, and convertion of rt into minutes.

xmcsSet - zip - space in folder name

And example:

  • Data FOO
    • QC
      • QC1.mzXML
      • QC2.mzXML
    • BAR
      • BAR1.mzXML
      • BAR2.mzXML
find: `FOO': No such file or directory
find: `2': No such file or directory
find: `/work/project/w4m/galaxy4metabolomics/galaxy-dist/database/jobs_directory/000/131/131287/working/FOO': No such file or directory
find: `2': No such file or directory
Warning message:
running command 'find $PWD/FOO 2 -not -name '\.*' -not -path '*conda-env*' -type f -name "*"' had status 1 
Error in checkForRemoteErrors(val) : 
  46 nodes produced errors; first error: invalid UTF-8 input in readChar()
Calls: do.call ... xcmsSet -> xcmsClusterApply -> checkForRemoteErrors
Execution halted
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Traceback (most recent call last):
  File "/work/project/w4m/galaxy4metabolomics/galaxy-dist/database/jobs_directory/000/131/131287/set_metadata_y5adti.py", line 1, in <module>
    from galaxy_ext.metadata.set_metadata import set_metadata; set_metadata()
  File "/work/project/w4m/galaxy4metabolomics/galaxy-dist/lib/galaxy_ext/metadata/set_metadata.py", line 14, in <module>
    import cPickle
ImportError: No module named cPickle

xcms_xcmsset gives empty result

When running xcms_xcmsset (revision 15646e937936) I get an empty result. The resulting dataset is marked as successful, but its empty.

In the dataset preview I see the following text:

code for methods in class "Rcpp_Ramp" was not checked for suspicious field assignments (recommended package 'codetools' not available?)

I uploaded the input to https://oc.ufz.de/index.php/s/j4aPVY6iwlv6WU7 with password xcms.

Options:
Input: see OC
Scan range option: hide
Extraction method for peaks detection: centWave
Max tolerated ppm m/z deviation in consecutive scans in ppm: 25
Min,Max peak width in seconds: 20,50
Advanced options: hide

xcms.group error

I get the following error in xcms.group if I input a sample metadata file in xcms.xcmsSet Merger:

Error in if (!any(gcount >= classnum * minfrac & gcount >= minsamp)) next : 
  missing value where TRUE/FALSE needed
Calls: do.call ... do.call -> group.density -> group.density -> .local

Any idea what could cause such this problem?

Somehow I have the feeling that this might be related: #59

annotateDiffrepot bug

OTRS - 2016112910000118
Si on opte pour RT en minutes et qu'on veut utiliser le annotateDiffreport en tant que
variableMetadata pour faire une filtration ou qualityMetrix, ca plante car
l'identificateur d'ions est resté en seconde et du coup, il n'y a pas coh?rence entre les
fichiers dataMatrix et variablesMetadata.
Merci de votre attention
JF

Travis test too long

I'm currently setting travis test for our tools. I need to run all the tests within 50 minutes and if a test don't produce log alter 10 min, it fails.

https://travis-ci.org/workflow4metabolomics/xcms/builds/124014197

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

The build has been terminated

It seems that the group jobs builded from my test cases are too long for TravisCI.

https://github.com/workflow4metabolomics/xcms/blob/master/galaxy/xcms_group/abims_xcms_group.xml#L100

@yguitton @sneumann or anyone else, do you know which method and/or parameters I can set to reduce executing time?

Or do I have too change my input dataset?

Thanks by advance

Pass the functional tests

My priority before carry on some new developments is to pass the functional tests.
Until recently, the zip format was not enough integrated in Galaxy. But from the 16.01, I have no more excuse 😄

  • xcms_xcmsset
  • xcms_group
  • xcms_retcor
  • xcms_fillpeaks
  • xcms_summary

xcmsSet , in file name

If a file with a ',' in the file name is used as input then the link is not created properly (the name of the link is the prefix of the filename up to the comma) and xcms does not find input data:

Error in xcmsSet(".", nSlaves = 1, method = "centWave", ppm = 25, peakwidth = c(10,  : 
  No NetCDF/mzXML/mzData/mzML files were found.

Seems to be related to: #65 (some escaping seems to be necessary).

Non-unique identifiers for ions

Sometimes there are redundant ion identifiers, especially when using RT in minutes (for example "M123.32T11" for two different ions with mass 123.32 and RT 10.6 min and 11.4 min).

Since identifiers are meant to be unique, something must be done (currently users add more decimal places for mass, change RT in seconds and/or modify identifiers themselves).

Bug with the deleteXmlBadCharacters step

With @sneumann we found that is step originally design to delete "é", "è" characters from <parentFile fileName="file://C:/data/métabo/foobar.RAW" where raw files are stored.

2 solutions:

  • enhancement this feature by being more specific on the fileName attribute
  • propose a PR on xcms to deal with those characters

New Chromatogram objects

Hi, tremendous work you're doing! Awesome!

Just wanted to point you to the new Chromatogram/Chromatograms class in MSnbase. It is now very easy to extract ion chromatograms (or base peak or TIC). If you have an OnDiskMSnExp or MSnExp you can simply use the chromatogram method. This returns a Chromatograms object which is simply a matrix like object containing Chromatograms. Rows can be different slices (m/z, rt ranges) of the MS data, columns are for the individual samples/files.

The chromatogram method has also parameters mz and rt that allow to restrict to a certain m/z-rt slice of the MS data. The aggregationFun allows to define how signals for the same rt are handled - for a TIC you would use aggregationFun = "sum", for a BPC aggregationFun = "max".

You can then use the plot method to plot the chromatographic data.

Have also a look at ?MSnbase::chromatogram and ?xcms::chromatogram. For XCMSnExp objects there is an additional parameter adjustedRtime that allows to specify whether the raw or adjusted retention time should be reported.

Decimal places in xcms.group graphical output

In the xcms.group graphical output (Rplots.pdf), plot names correspond to corresponding mz slices. Mz values are written with 2 decimal places (for exemple "164.94 − 164.96").

The problem is that if you choose for any reason to consider narrow mz slices (changing mzwid to 0.005 for example), then you will have things like "164.95 − 164.95" and will not actually know if it is close or not to your maximum mz width.

Would it be possible for the plot titles to have more decimal places? There is room for longest names, so maybe 4 decimal places would be ok?

TODO add verbose.column and fitgauss (to add to parameter completness)

we have a user request regarding xcmsset fitgauss option and verbose.column=TRUE

should be quite easy to add those two options

Note be careful with verbose.column as it will add new columns to xset@peaks tables. those columns can be used only if people do look at each file peak tables individually

xcms.merger: raise an error when a sample isn't in the sampleMetadata

An example:

  • The sampleMetadata:
sampleMetadata	class	polarity	injOrder	sampleType	batch
20170209_P_Blanc12	blank	positive	38	blank	1
20170209_P_Blanc13	blank	positive	52	blank	1
20170209_P_QC01	pool	positive	11	pool	1
20170209_P_QC02	pool	positive	25	pool	1
20170418_P_S01n01	TV_nd	positive	58	sample	1
20170418_P_S01n02	TV_nd	positive	69	sample	1
  • The xcms.merger logs
	XSET OBJECT INFO
                    class
20170418_P_Blanc12   <NA>
20170418_P_Blanc13   <NA>
20170418_P_QC01      <NA>
20170418_P_QC02      <NA>
20170418_P_S01n01   TV_nd
20170418_P_S01n02   TV_nd

So the tool should check and raise and error if it met this use-case.

Job runner configuration: Dynamic Destination Mapping

FYI: @chcaron @fgiacomoni

Since xcmsSet (2.1.0) can now accept both a zip file or an individual sample (single file), we can't have the same number of CPU for the 2 type of feeding: \${GALAXY_SLOTS:-1}

There is a system call Dynamic Destination Mapping which will allow to use one or an other <destination> according to some rules (in my case, the value of an argument).

For example, we will use:

  • thread1-men_free10 for single file
  • thread10-men_free10 for a zip file

I first tried the DTD method but currently, it needs some fix to fit with xcmsSet (PR in progress). It also request the release_16.07 which seem cool but I don't know if my fix will be backported to the 16.07.

I will have to take a look at the Python method.

So W&S

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.