lcmsmatching's Issues
Append columns
Add an option for outputing the same input file with new columns appended to it. Do not change order of columns of input file, write all columns as they are and only append new columns.
List chrom cols tool
Create a separate tool (hence a separate XML file list-chrom-cols.xml
) for listing chrom cols of a database.
Remove ant
Try to remove the use of ant.
Particularly, remove the need to ant test-data
in test subdir.
Maybe store generated files and do not generate them anymore, or use a Makefile to re-generate them if needed.
Local continuous integration
Put in place a local continuous integration system in order to test also Peakforest database that uses an access token.
Correct Travis tests
Implement needed methods into MsBioDb
Conda dependency seemingly installed but failed to build job environment
Hello. Did anybody get this error:
Conda dependency seemingly installed but failed to build job environment.
??
From general Galaxy bug reports it seems to be related to the default behaviour of Conda with job environments which are taking long time to build (which might be the case here). However it's only a guess. Here it is in more detail:
Traceback (most recent call last):
File "/galaxy-central/lib/galaxy/jobs/runners/init.py", line 170, in prepare_job
job_wrapper.prepare()
File "/galaxy-central/lib/galaxy/jobs/init.py", line 971, in prepare
self.dependency_shell_commands = self.tool.build_dependency_shell_commands(job_directory=self.working_directory)
File "/galaxy-central/lib/galaxy/tools/init.py", line 1415, in build_dependency_shell_commands
tool_instance=self
File "/galaxy-central/lib/galaxy/tools/deps/init.py", line 112, in dependency_shell_commands
return [dependency.shell_commands(requirement) for requirement, dependency in requirement_to_dependency.items()]
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 392, in shell_commands
self.build_environment()
File "/galaxy-central/lib/galaxy/tools/deps/resolvers/conda.py", line 387, in build_environment
raise DependencyException("Conda dependency seemingly installed but failed to build job environment.")
DependencyException: Conda dependency seemingly installed but failed to build job environment.
UTF-8 chars issue in HTML output
In HTML output UTF-8 chars do not display correctly.
speed up HTML writing
The HTML writing is done by writing line by line into the output file.
Build the file in memory instead and write it in a single shot.
This way of writing could be related to slow execution in container.
Planemo test failing
See branch refact/makefile
.
----------------------------------------------------------------------
XML: /private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/xunit.xml
----------------------------------------------------------------------
Ran 1 test in 82.431s
FAILED (errors=1)
2017-03-26 18:22:45,435 INFO [functional_tests.py] Shutting down
2017-03-26 18:22:45,435 INFO [functional_tests.py] Shutting down embedded web server
2017-03-26 18:22:45,454 INFO [functional_tests.py] Embedded web server stopped
2017-03-26 18:22:45,455 INFO [functional_tests.py] Shutting down app
2017-03-26 18:22:45,455 INFO [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,455 INFO [galaxy.jobs.handler] job handler queue stopped
2017-03-26 18:22:45,455 INFO [galaxy.jobs.runners] TaskRunner: Sending stop signal to 2 worker threads
2017-03-26 18:22:45,455 INFO [galaxy.jobs.runners] LocalRunner: Sending stop signal to 4 worker threads
2017-03-26 18:22:45,455 INFO [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,456 INFO [galaxy.jobs.handler] job handler stop queue stopped
2017-03-26 18:22:45,459 INFO [functional_tests.py] Embedded Universe application stopped
2017-03-26 18:22:45,460 INFO [functional_tests.py] Cleaning up temporary files in /var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpSiNHxc/tmpma6L8B
Exception in thread Thread-3:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/paste/httpserver.py", line 1101, in serve_forever
self.handle_request()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 276, in handle_request
fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 155, in _eintr_retry
return func(*args)
error: (9, 'Bad file descriptor')
2017-03-26 18:22:45,462 INFO [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,463 INFO [galaxy.jobs.handler] job handler queue stopped
2017-03-26 18:22:45,463 INFO [galaxy.jobs.runners] TaskRunner: Sending stop signal to 2 worker threads
2017-03-26 18:22:45,463 INFO [galaxy.jobs.runners] LocalRunner: Sending stop signal to 4 worker threads
2017-03-26 18:22:45,463 INFO [galaxy.jobs.handler] sending stop signal to worker thread
2017-03-26 18:22:45,463 INFO [galaxy.jobs.handler] job handler stop queue stopped
2017-03-26 18:22:45,465 ERROR [galaxy.jobs.runners.local] Job wrapper finish method failed
Traceback (most recent call last):
File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/jobs/runners/local.py", line 128, in queue_job
job_wrapper.finish( stdout, stderr, exit_code )
File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/jobs/__init__.py", line 1362, in finish
job.set_final_state( final_job_state )
File "/private/var/folders/kd/nz_frc_x1231cz37xvyzc_v40000gn/T/tmpdrFQKD/galaxy-dev/lib/galaxy/model/__init__.py", line 686, in set_final_state
if self.workflow_invocation_step:
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 237, in __get__
return self.impl.get(instance_state(instance), dict_)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 578, in get
value = self.callable_(state, passive)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 529, in _load_for_state
return self._emit_lazyload(session, state, ident_key, passive)
File "<string>", line 1, in <lambda>
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/strategies.py", line 599, in _emit_lazyload
result = q.all()
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2399, in all
return list(self)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2516, in __iter__
return self._execute_and_instances(context)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2531, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
return meth(self, multiparams, params)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
compiled_sql, distilled_params
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
context)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
exc_info
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
context)
File "/Users/pierrick/.planemo/gx_venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
cursor.execute(statement, parameters)
OperationalError: (sqlite3.OperationalError) disk I/O error [SQL: u'SELECT workflow_invocation_step.id AS workflow_invocation_step_id, workflow_invocation_step.create_time AS workflow_invocation_step_create_time, workflow_invocation_step.update_time AS workflow_invocation_step_update_time, workflow_invocation_step.workflow_invocation_id AS workflow_invocation_step_workflow_invocation_id, workflow_invocation_step.workflow_step_id AS workflow_invocation_step_workflow_step_id, workflow_invocation_step.job_id AS workflow_invocation_step_job_id, workflow_invocation_step.action AS workflow_invocation_step_action \nFROM workflow_invocation_step \nWHERE ? = workflow_invocation_step.job_id'] [parameters: (2,)]
There were problems with 1 test(s) - out of 1 test(s) executed. See /Users/pierrick/dev/lcmsmatching/tool_test_output.html for detailed breakdown.
Wrong reordering of column output
In output, do not insert columns at beginning of array, append them at the end, in order to do not disturb the importance of the first column in W4M (variable names).
Make an option for that in the script.
Rename "Output settings" field values.
In tools XML config, change fields of "Output settings" from "Off/On" to "Default/Customized".
Rename properly the database field tags
MZTHEO -> MZREF
COMPOUNDID -> SPECTRUMID
Chromatographic column names are ambiguous in Peakforest
The Peakforest column names displayed are ambiguous.
There is no indication of the platform/laboratory that has input the column inside the database.
Even inside a laboratory there could be several installations of the same column.
Remove customisation of output column names
Write tests for new python scripts
Matching with precursors gives empty result
Using precursor matching option gives an empty result list.
NA values in M/Z column of input files give multiple matches with NA spectra
Each NA value in M/Z column of the input file will return a NA match with all the peaks of the input data base, leading to huge and cumbersome output files.
Make planemo tests pass
HTML output contains "NA" links
In HTML output, NA values for external databases are presented as clickable links.
Pubchem links not working
The Pubchem links for the HTML output do not work.
The output files must contain RT values in the same unit as input file
Currently, all RT values in output file are in seconds.
Output RT values must be in the same unit as the input file, unless specified otherwise.
Additional column values replaced by integers
While doing PhenoMeNal demonstration, some additional columns (i.e.: not used by matching algorithm) had their values replaced by integers.
Wrong link in HTML output when several results per row
In main output, where several results can be shown on the same line, database IDs are listed using a character separator like "," or "|". The HTML output of this main table, does not however take account of that, and thus displays a wrong URL link.
Use text fields for in-house database column names
Improve Peakforest mz/rt matching speed
Output an information file
Write information.txt
file like univariate, multivariate and biosigner. Make it an option.
Add test for a file database with RT in minutes
Integrate PeakForest compound information
When a match is found in PeakForest, only the spectra info is printed in the output tables, and the PeakForest compound ID.
We need to retrieve compound info and add it to the output.
Develop precursor match for biodb version
Reenable XLS datatabase
XLS (DATABASES=xls make test
) database test is failing.
Reorder file output in Galaxy
dataMatrix
sampleMetadata
variableMetadata
Improve Peakforest mz matching speed
Replace filedb example in help (XML) with extract from Massbank.
Add a NA value for pos mode or neg mode
In the Galaxy tool page, in fields "File database MS Positive mode" and "File database MS Negative mode", in case of only one mode found mode name (either neg or pos), add the missing mode or propose NA.
Add r-biodb to bioconda recipes
Handle # characters in values of .tsv files
By default read.table()
interprets #
characters as start of comments.
Use option comment.char
option to disable interpretation of #
character.
Remove useless methods
Methods like getMoleculesIds(), getMoleculeNames() and others are deprecated. They are not used by search-mz
script. Remove them or put them aside for 4TabSql and Xls databases for porting them later into biodb.
Move all searchmz tests into test-searchmz
Improve launch of search-mz script
Even when call search-mz with --help, startup if too much long.
Try to improve startup speed.
Write tests for an input file with RT in minutes
Add examples to help page
Add examples for filedb, peakforest, xlsdb.
Try to improve help page
The help page is too big, try to make getopt present them in subsections.
Create HTML output class
In tool page, present in-house database column names as separate text fields
Each column name is entered in a dedicated text field.
This will avoid to command line database field tag names (they could change).
The same must be done for the input file and the MS modes.
Update Galaxy help section
Galaxy help section is outdated. At least for the column tags of the "Single file database" chapter.
For retention times, propose unit choice between minutes and seconds.
- Add RT unit field for database file, with choice between 'minutes' and 'seconds'.
- Add RT unit field for input file, with choice between 'minutes' and 'seconds'.
No more dynamic fields in XML
Remove all *.py
scripts in repos and <code>
tags in XML.
Matching de deux variables en même temps
Une idée de Stéphane Bernillon, INRA Bordeaux :
Bonjour Pierrick
Merci pour la présentation de ton outil de matching pour les spectres MS.
Si j'ai bien compris, il s'agit d'utiliser les m/z des variables de la table "Variable Metadata" et de les comparer une à une avec tous les m/z d'une bibliothèque de référence.
Je vois deux usages à cet outil :
-
L'annotation de variables dans une matrice précédemment annotée.
-
L'annotation d'une matrice jamais annotée.
Pour le premier cas, en l'état actuel, l'outil est complètement satisfaisant avec une base in-house ad hoc en combinant m/z et Rt.
Dans le deuxième cas, il me semblerait intéressant d'utiliser l'information du temps de rétention des variables du fichier.
Si je prends l'exemple d'un métabolite inconnu associé aux variables M100T1000 et M200T1002.
Si je cherche successivement pour la variable M100T1000 et M200T1002, les spectres proposés seront moins pertinents que si je cherche pour les deux variables M100T1000 et M200T1002 en même temps.
Il reste à trouver le bon critère pour associer les variables ce qui permettrait de faire une recherche sur un pseudo-spectre et non pas sur un m/z unique. Un coefficient de corrélation pourrait être ce critère.
Je suis disponible pour en discuter plus en détail si nécessaire.
A bientôt
Stéphane
Update README
- Check XML and particularly the help text.
- Write all changes made since version 3.4.3 in README update section.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.