workflow4metabolomics / lcmsmatching Goto Github PK

1.0 1.0 1.0 561 KB

Shell 33.34% Makefile 1.87% R 32.12% HTML 32.67%

lcmsmatching's Introduction

Our project

The Workflow4Metabolomics, W4M in short, is a French infrastructure offering software tool processing, analyzing and annotating metabolomics data. It is based on the Galaxy platform.

In the context of collaboration between metabolomics (MetaboHUB French infrastructure) and bioinformatics platforms (IFB: Institut Français de Bioinformatique), we have developed full LC/MS, GC/MS and NMR pipelines using Galaxy framework for data analysis including preprocessing, normalization, quality control, statistical analysis and annotation steps. Those modular and extensible workflows are composed with existing components (XCMS and CAMERA packages, etc.) but also a whole suite of complementary homemade tools. This implementation is accessible through a web interface, which guarantees the parameters completeness. The advanced features of Galaxy have made possible the integration of components from different sources and of different types. Thus, an extensible Virtual Research Environment (VRE) is offered to metabolomics communities (platforms, end users, etc.), and enables preconfigured workflows sharing for new users, but also experts in the field.

Citation

Giacomoni F., Le Corguillé G., Monsoor M., Landi M., Pericard P., Pétéra M., Duperier C., Tremblay-Franco M., Martin J.-F., Jacob D., Goulitquer S., Thévenot E.A. and Caron C. (2014). Workflow4Metabolomics: A collaborative research infrastructure for computational metabolomics. Bioinformatics, http://dx.doi.org/10.1093/bioinformatics/btu813

Galaxy

Galaxy is an open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.

Homepage: https://galaxyproject.org/

How to contribute

Get our tools

All our tools are publicly available in GitHub and freely installable through the Galaxy ToolShed

However, we will be glad to have [good] feedbacks on their usage in order to motivate us (and our funders).

It will also be great if you can cite our papers:

Franck Giacomoni, Gildas Le Corguillé, Misharl Monsoor, Marion Landi, Pierre Pericard, Mélanie Pétéra, Christophe Duperier, Marie Tremblay-Franco, Jean-François Martin, Daniel Jacob, Sophie Goulitquer, Etienne A. Thévenot and Christophe Caron (2014). Workflow4Metabolomics: A collaborative research infrastructure for computational metabolomics. Bioinformatics

doi:10.1093/bioinformatics/btu813

Push your tools / W4M as a Showcase

Your tools can be installed, integrated and hosted within the main W4M instance Tools.

Quality standards

However, the tools must stick to the IUC standards in order to be easily integrated:

Available in a GitHub repository
Conda dependencies
Functional tests using Planemo
Available in the Main ToolShed

In the first place, your tools will be displayed in the Contribution section of the tool panel. And eventually, it should be promoted among the other tools.

Advanced mode

In order to be fully integrated in our reference workflows, your tools must follow your exchange formats between tools (for more information, contact us).

A collaboration should be established if help is needed!

Support / HelpDesk

In all cases, the tools must be maintained by the developers themselves. A tool can be removed if this after sales service isn't done.

Guidelines

Writing a tool

lcmsmatching's People

Contributors

Stargazers

Watchers

Forkers

bernt-matthias

lcmsmatching's Issues

speed up HTML writing

The HTML writing is done by writing line by line into the output file.
Build the file in memory instead and write it in a single shot.
This way of writing could be related to slow execution in container.

Add test for a file database with RT in minutes

Develop precursor match for biodb version

Chromatographic column names are ambiguous in Peakforest

The Peakforest column names displayed are ambiguous.
There is no indication of the platform/laboratory that has input the column inside the database.
Even inside a laboratory there could be several installations of the same column.

Handle # characters in values of .tsv files

By default read.table() interprets # characters as start of comments.
Use option comment.char option to disable interpretation of # character.

Improve Peakforest mz/rt matching speed

List chrom cols tool

Create a separate tool (hence a separate XML file list-chrom-cols.xml) for listing chrom cols of a database.

Pubchem links not working

The Pubchem links for the HTML output do not work.

UTF-8 chars issue in HTML output

In HTML output UTF-8 chars do not display correctly.

Matching de deux variables en même temps

Une idée de Stéphane Bernillon, INRA Bordeaux :

Bonjour Pierrick
Merci pour la présentation de ton outil de matching pour les spectres MS.
Si j'ai bien compris, il s'agit d'utiliser les m/z des variables de la table "Variable Metadata" et de les comparer une à une avec tous les m/z d'une bibliothèque de référence.
Je vois deux usages à cet outil :

  L'annotation de variables dans une matrice précédemment annotée.

  L'annotation d'une matrice jamais annotée.

Pour le premier cas, en l'état actuel, l'outil est complètement satisfaisant avec une base in-house ad hoc en combinant m/z et Rt.

Dans le deuxième cas, il me semblerait intéressant d'utiliser l'information du temps de rétention des variables du fichier.
Si je prends l'exemple d'un métabolite inconnu associé aux variables M100T1000 et M200T1002.
Si je cherche successivement pour la variable M100T1000 et M200T1002, les spectres proposés seront moins pertinents que si je cherche pour les deux variables M100T1000 et M200T1002 en même temps.
Il reste à trouver le bon critère pour associer les variables ce qui permettrait de faire une recherche sur un pseudo-spectre et non pas sur un m/z unique. Un coefficient de corrélation pourrait être ce critère.

Je suis disponible pour en discuter plus en détail si nécessaire.
A bientôt
Stéphane

Update Galaxy help section

Galaxy help section is outdated. At least for the column tags of the "Single file database" chapter.

For retention times, propose unit choice between minutes and seconds.

Add RT unit field for database file, with choice between 'minutes' and 'seconds'.
Add RT unit field for input file, with choice between 'minutes' and 'seconds'.

Create HTML output class

Conda dependency seemingly installed but failed to build job environment

Hello. Did anybody get this error:
Conda dependency seemingly installed but failed to build job environment.
??

From general Galaxy bug reports it seems to be related to the default behaviour of Conda with job environments which are taking long time to build (which might be the case here). However it's only a guess. Here it is in more detail:

Traceback (most recent call last):
File "/galaxy-central/lib/galaxy/jobs/runners/init.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy-central/lib/galaxy/jobs/init.py", line 971, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy-central/lib/galaxy/tools/init.py", line 1415, in build_dependency_shell_commands
tool_instance=self
File "/galaxy-central/lib/galaxy/tools/deps/init.py", line 112, in dependency_shell_commands
return [dependency.shell_commands(requirement) for requirement, dependency in requirement_to_dependency.items()]
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 392, in shell_commands
self.build_environment()
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 387, in build_environment
raise DependencyException("Conda dependency seemingly installed but failed to build job environment.")
DependencyException: Conda dependency seemingly installed but failed to build job environment.

Wrong reordering of column output

In output, do not insert columns at beginning of array, append them at the end, in order to do not disturb the importance of the first column in W4M (variable names).
Make an option for that in the script.

Rename "Output settings" field values.

In tools XML config, change fields of "Output settings" from "Off/On" to "Default/Customized".

Matching with precursors gives empty result

Using precursor matching option gives an empty result list.

In tool page, present in-house database column names as separate text fields

Each column name is entered in a dedicated text field.
This will avoid to command line database field tag names (they could change).
The same must be done for the input file and the MS modes.

Make planemo tests pass

Remove useless methods

Methods like getMoleculesIds(), getMoleculeNames() and others are deprecated. They are not used by search-mz script. Remove them or put them aside for 4TabSql and Xls databases for porting them later into biodb.

Add r-biodb to bioconda recipes

Add a NA value for pos mode or neg mode

In the Galaxy tool page, in fields "File database MS Positive mode" and "File database MS Negative mode", in case of only one mode found mode name (either neg or pos), add the missing mode or propose NA.

Implement needed methods into MsBioDb

Reorder file output in Galaxy

dataMatrix
sampleMetadata
variableMetadata

No more dynamic fields in XML

Remove all *.py scripts in repos and <code> tags in XML.

Rename properly the database field tags

MZTHEO -> MZREF
COMPOUNDID -> SPECTRUMID

Write tests for new python scripts

Append columns

Add an option for outputing the same input file with new columns appended to it. Do not change order of columns of input file, write all columns as they are and only append new columns.

Move all searchmz tests into test-searchmz

Output an information file

Write information.txt file like univariate, multivariate and biosigner. Make it an option.

Reenable XLS datatabase

XLS (DATABASES=xls make test) database test is failing.

Improve Peakforest mz matching speed

Integrate PeakForest compound information

When a match is found in PeakForest, only the spectra info is printed in the output tables, and the PeakForest compound ID.
We need to retrieve compound info and add it to the output.

Try to improve help page

The help page is too big, try to make getopt present them in subsections.

Write tests for an input file with RT in minutes

Wrong link in HTML output when several results per row

In main output, where several results can be shown on the same line, database IDs are listed using a character separator like "," or "|". The HTML output of this main table, does not however take account of that, and thus displays a wrong URL link.

Update README

Check XML and particularly the help text.
Write all changes made since version 3.4.3 in README update section.

Remove ant

Try to remove the use of ant.
Particularly, remove the need to ant test-data in test subdir.
Maybe store generated files and do not generate them anymore, or use a Makefile to re-generate them if needed.

Planemo test failing

See branch refact/makefile.

----------------------------------------------------------------------
XML: /private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/xunit.xml
----------------------------------------------------------------------
Ran 1 test in 82.431s

FAILED (errors=1)
2017-03-26 18:22:45,435 INFO  [functional_tests.py] Shutting down
2017-03-26 18:22:45,435 INFO  [functional_tests.py] Shutting down embedded web server
2017-03-26 18:22:45,454 INFO  [functional_tests.py] Embedded web server stopped
2017-03-26 18:22:45,455 INFO  [functional_tests.py] Shutting down app
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.handler] job handler queue stopped
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.runners] TaskRunner: Sending stop signal to 2 worker threads
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.runners] LocalRunner: Sending stop signal to 4 worker threads
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,456 INFO  [galaxy.jobs.handler] job handler stop queue stopped
2017-03-26 18:22:45,459 INFO  [functional_tests.py] Embedded Universe application stopped
2017-03-26 18:22:45,460 INFO  [functional_tests.py] Cleaning up temporary files in /var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpSiNHxc/tmpma6L8B
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/paste/httpserver.py", line 1101, in serve_forever
    self.handle_request()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 276, in handle_request
    fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 155, in _eintr_retry
    return func(*args)
error: (9, 'Bad file descriptor')

2017-03-26 18:22:45,462 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.handler] job handler queue stopped
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.runners] TaskRunner: Sending stop signal to 2 worker threads
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.runners] LocalRunner: Sending stop signal to 4 worker threads
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.handler] job handler stop queue stopped
2017-03-26 18:22:45,465 ERROR [galaxy.jobs.runners.local] Job wrapper finish method failed
Traceback (most recent call last):
  File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/jobs/runners/local.py", line 128, in queue_job
    job_wrapper.finish( stdout, stderr, exit_code )
  File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/jobs/__init__.py", line 1362, in finish
    job.set_final_state( final_job_state )
  File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/model/__init__.py", line 686, in set_final_state
    if self.workflow_invocation_step:
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 237, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 578, in get
    value = self.callable_(state, passive)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 529, in _load_for_state
    return self._emit_lazyload(session, state, ident_key, passive)
  File "<string>", line 1, in <lambda>
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 599, in _emit_lazyload
    result = q.all()
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2399, in all
    return list(self)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2516, in __iter__
    return self._execute_and_instances(context)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2531, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (sqlite3.OperationalError) disk I/O error [SQL: u'SELECT workflow_invocation_step.id AS workflow_invocation_step_id, workflow_invocation_step.create_time AS workflow_invocation_step_create_time, workflow_invocation_step.update_time AS workflow_invocation_step_update_time, workflow_invocation_step.workflow_invocation_id AS workflow_invocation_step_workflow_invocation_id, workflow_invocation_step.workflow_step_id AS workflow_invocation_step_workflow_step_id, workflow_invocation_step.job_id AS workflow_invocation_step_job_id, workflow_invocation_step.action AS workflow_invocation_step_action \nFROM workflow_invocation_step \nWHERE ? = workflow_invocation_step.job_id'] [parameters: (2,)]
There were problems with 1 test(s) - out of 1 test(s) executed. See /Users/pierrick/dev/lcmsmatching/tool_test_output.html for detailed breakdown.