The lcmsmatching's discuss from workflow4metabolomics

Append columns

Add an option for outputing the same input file with new columns appended to it. Do not change order of columns of input file, write all columns as they are and only append new columns.

List chrom cols tool

Create a separate tool (hence a separate XML file list-chrom-cols.xml) for listing chrom cols of a database.

Try to remove the use of ant.
Particularly, remove the need to ant test-data in test subdir.
Maybe store generated files and do not generate them anymore, or use a Makefile to re-generate them if needed.

Local continuous integration

Put in place a local continuous integration system in order to test also Peakforest database that uses an access token.

Correct Travis tests

Implement needed methods into MsBioDb

Conda dependency seemingly installed but failed to build job environment

Hello. Did anybody get this error:
Conda dependency seemingly installed but failed to build job environment.
??

From general Galaxy bug reports it seems to be related to the default behaviour of Conda with job environments which are taking long time to build (which might be the case here). However it's only a guess. Here it is in more detail:

Traceback (most recent call last):
File "/galaxy-central/lib/galaxy/jobs/runners/init.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy-central/lib/galaxy/jobs/init.py", line 971, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy-central/lib/galaxy/tools/init.py", line 1415, in build_dependency_shell_commands
tool_instance=self
File "/galaxy-central/lib/galaxy/tools/deps/init.py", line 112, in dependency_shell_commands
return [dependency.shell_commands(requirement) for requirement, dependency in requirement_to_dependency.items()]
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 392, in shell_commands
self.build_environment()
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 387, in build_environment
raise DependencyException("Conda dependency seemingly installed but failed to build job environment.")
DependencyException: Conda dependency seemingly installed but failed to build job environment.

UTF-8 chars issue in HTML output

In HTML output UTF-8 chars do not display correctly.

speed up HTML writing

The HTML writing is done by writing line by line into the output file.
Build the file in memory instead and write it in a single shot.
This way of writing could be related to slow execution in container.

Planemo test failing

See branch refact/makefile.

----------------------------------------------------------------------
XML: /private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/xunit.xml
----------------------------------------------------------------------
Ran 1 test in 82.431s

FAILED (errors=1)
2017-03-26 18:22:45,435 INFO  [functional_tests.py] Shutting down
2017-03-26 18:22:45,435 INFO  [functional_tests.py] Shutting down embedded web server
2017-03-26 18:22:45,454 INFO  [functional_tests.py] Embedded web server stopped
2017-03-26 18:22:45,455 INFO  [functional_tests.py] Shutting down app
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.handler] job handler queue stopped
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.runners] TaskRunner: Sending stop signal to 2 worker threads
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.runners] LocalRunner: Sending stop signal to 4 worker threads
2017-03-26 18:22:45,455 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,456 INFO  [galaxy.jobs.handler] job handler stop queue stopped
2017-03-26 18:22:45,459 INFO  [functional_tests.py] Embedded Universe application stopped
2017-03-26 18:22:45,460 INFO  [functional_tests.py] Cleaning up temporary files in /var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpSiNHxc/tmpma6L8B
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/paste/httpserver.py", line 1101, in serve_forever
    self.handle_request()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 276, in handle_request
    fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 155, in _eintr_retry
    return func(*args)
error: (9, 'Bad file descriptor')

2017-03-26 18:22:45,462 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.handler] job handler queue stopped
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.runners] TaskRunner: Sending stop signal to 2 worker threads
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.runners] LocalRunner: Sending stop signal to 4 worker threads
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,463 INFO  [galaxy.jobs.handler] job handler stop queue stopped
2017-03-26 18:22:45,465 ERROR [galaxy.jobs.runners.local] Job wrapper finish method failed
Traceback (most recent call last):
  File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/jobs/runners/local.py", line 128, in queue_job
    job_wrapper.finish( stdout, stderr, exit_code )
  File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/jobs/__init__.py", line 1362, in finish
    job.set_final_state( final_job_state )
  File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/model/__init__.py", line 686, in set_final_state
    if self.workflow_invocation_step:
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 237, in __get__
    return self.impl.get(instance_state(instance), dict_)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 578, in get
    value = self.callable_(state, passive)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 529, in _load_for_state
    return self._emit_lazyload(session, state, ident_key, passive)
  File "<string>", line 1, in <lambda>
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 599, in _emit_lazyload
    result = q.all()
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2399, in all
    return list(self)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2516, in __iter__
    return self._execute_and_instances(context)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2531, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (sqlite3.OperationalError) disk I/O error [SQL: u'SELECT workflow_invocation_step.id AS workflow_invocation_step_id, workflow_invocation_step.create_time AS workflow_invocation_step_create_time, workflow_invocation_step.update_time AS workflow_invocation_step_update_time, workflow_invocation_step.workflow_invocation_id AS workflow_invocation_step_workflow_invocation_id, workflow_invocation_step.workflow_step_id AS workflow_invocation_step_workflow_step_id, workflow_invocation_step.job_id AS workflow_invocation_step_job_id, workflow_invocation_step.action AS workflow_invocation_step_action \nFROM workflow_invocation_step \nWHERE ? = workflow_invocation_step.job_id'] [parameters: (2,)]
There were problems with 1 test(s) - out of 1 test(s) executed. See /Users/pierrick/dev/lcmsmatching/tool_test_output.html for detailed breakdown.

Wrong reordering of column output

In output, do not insert columns at beginning of array, append them at the end, in order to do not disturb the importance of the first column in W4M (variable names).
Make an option for that in the script.

Rename "Output settings" field values.

In tools XML config, change fields of "Output settings" from "Off/On" to "Default/Customized".

Rename properly the database field tags

MZTHEO -> MZREF
COMPOUNDID -> SPECTRUMID

Chromatographic column names are ambiguous in Peakforest

The Peakforest column names displayed are ambiguous.
There is no indication of the platform/laboratory that has input the column inside the database.
Even inside a laboratory there could be several installations of the same column.

Remove customisation of output column names

Write tests for new python scripts

Matching with precursors gives empty result

Using precursor matching option gives an empty result list.

NA values in M/Z column of input files give multiple matches with NA spectra

Each NA value in M/Z column of the input file will return a NA match with all the peaks of the input data base, leading to huge and cumbersome output files.

Make planemo tests pass

HTML output contains "NA" links

In HTML output, NA values for external databases are presented as clickable links.

Pubchem links not working

The Pubchem links for the HTML output do not work.

The output files must contain RT values in the same unit as input file

Currently, all RT values in output file are in seconds.
Output RT values must be in the same unit as the input file, unless specified otherwise.

Additional column values replaced by integers

While doing PhenoMeNal demonstration, some additional columns (i.e.: not used by matching algorithm) had their values replaced by integers.

Wrong link in HTML output when several results per row

In main output, where several results can be shown on the same line, database IDs are listed using a character separator like "," or "|". The HTML output of this main table, does not however take account of that, and thus displays a wrong URL link.

Use text fields for in-house database column names

Improve Peakforest mz/rt matching speed

Output an information file

Write information.txt file like univariate, multivariate and biosigner. Make it an option.

Add test for a file database with RT in minutes

Integrate PeakForest compound information

When a match is found in PeakForest, only the spectra info is printed in the output tables, and the PeakForest compound ID.
We need to retrieve compound info and add it to the output.

Develop precursor match for biodb version

Reenable XLS datatabase

XLS (DATABASES=xls make test) database test is failing.

Reorder file output in Galaxy

dataMatrix
sampleMetadata
variableMetadata

Improve Peakforest mz matching speed

Replace filedb example in help (XML) with extract from Massbank.

Add a NA value for pos mode or neg mode

In the Galaxy tool page, in fields "File database MS Positive mode" and "File database MS Negative mode", in case of only one mode found mode name (either neg or pos), add the missing mode or propose NA.

Add r-biodb to bioconda recipes

Handle # characters in values of .tsv files

By default read.table() interprets # characters as start of comments.
Use option comment.char option to disable interpretation of # character.

Remove useless methods

Methods like getMoleculesIds(), getMoleculeNames() and others are deprecated. They are not used by search-mz script. Remove them or put them aside for 4TabSql and Xls databases for porting them later into biodb.

Move all searchmz tests into test-searchmz

Improve launch of search-mz script

Even when call search-mz with --help, startup if too much long.
Try to improve startup speed.

Write tests for an input file with RT in minutes

Add examples to help page

Add examples for filedb, peakforest, xlsdb.

Try to improve help page

The help page is too big, try to make getopt present them in subsections.

Create HTML output class

In tool page, present in-house database column names as separate text fields

Each column name is entered in a dedicated text field.
This will avoid to command line database field tag names (they could change).
The same must be done for the input file and the MS modes.

Update Galaxy help section

Galaxy help section is outdated. At least for the column tags of the "Single file database" chapter.

For retention times, propose unit choice between minutes and seconds.

Add RT unit field for database file, with choice between 'minutes' and 'seconds'.
Add RT unit field for input file, with choice between 'minutes' and 'seconds'.

No more dynamic fields in XML

Remove all *.py scripts in repos and <code> tags in XML.

Matching de deux variables en même temps

Une idée de Stéphane Bernillon, INRA Bordeaux :

Bonjour Pierrick
Merci pour la présentation de ton outil de matching pour les spectres MS.
Si j'ai bien compris, il s'agit d'utiliser les m/z des variables de la table "Variable Metadata" et de les comparer une à une avec tous les m/z d'une bibliothèque de référence.
Je vois deux usages à cet outil :

  L'annotation de variables dans une matrice précédemment annotée.

  L'annotation d'une matrice jamais annotée.

Pour le premier cas, en l'état actuel, l'outil est complètement satisfaisant avec une base in-house ad hoc en combinant m/z et Rt.

Dans le deuxième cas, il me semblerait intéressant d'utiliser l'information du temps de rétention des variables du fichier.
Si je prends l'exemple d'un métabolite inconnu associé aux variables M100T1000 et M200T1002.
Si je cherche successivement pour la variable M100T1000 et M200T1002, les spectres proposés seront moins pertinents que si je cherche pour les deux variables M100T1000 et M200T1002 en même temps.
Il reste à trouver le bon critère pour associer les variables ce qui permettrait de faire une recherche sur un pseudo-spectre et non pas sur un m/z unique. Un coefficient de corrélation pourrait être ce critère.

Je suis disponible pour en discuter plus en détail si nécessaire.
A bientôt
Stéphane

Update README

Check XML and particularly the help text.
Write all changes made since version 3.4.3 in README update section.

workflow4metabolomics / lcmsmatching Goto Github PK

lcmsmatching's Issues

Recommend Projects

Recommend Topics

Recommend Org