ursgal / ursgal Goto Github PK

View Code? Open in Web Editor NEW

42.0 13.0 32.0 13.42 MB

Ursgal - universal Python module combining common bottom-up proteomics tools for large-scale analysis

License: MIT License

Shell 0.03% Python 99.97%

proteomics peptide-identification python mass-spectrometry ms-data proteomics-data mzml omics

ursgal's Introduction

Introduction

Ursgal - Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis

Summary

Ursgal is a Python module that offers a generalized interface to common bottom-up proteomics tools, e.g.

Peptide spectrum matching with up to eight different search engines (some available in multiple versions), including four open modification search engines

Evaluation and post processing of search results with up to two different engines for protein database searches as well as two engines for the post processing of mass difference results from open modification engines

Integration of search results from different search engines

De novo sequencing with up to four different search engines

Miscellaneous tools including the creation of a target decoy database as well as filtering, sanitizing and visualizing of results

Abstract

Proteomics data integration has become a broad field with a variety of programs offering innovative algorithms to analyze increasing amounts of data. Unfortunately, this software diversity leads to many problems as soon as the data is analyzed using more than one algorithm for the same task. Although it was shown that the combination of multiple peptide identification algorithms yields more robust results (Nahnsen et al. 2011, Vaudel et al. 2015, Kwon et al. 2011), it is only recently that unified approaches are emerging (Vaudel et al. 2011, Wen et al. 2015); however, workflows that, for example, aim to optimize search parameters or that employ cascaded style searches (Kertesz-Farkas et al. 2015) can only be made accessible if data analysis becomes not only unified but also and most importantly scriptable. Here we introduce Ursgal, a Python interface to many commonly used bottom-up proteomics tools and to additional auxiliary programs. Complex workflows can thus be composed using the Python scripting language using a few lines of code. Ursgal is easily extensible, and we have made several database search engines (X!Tandem (Craig and Beavis 2004), OMSSA (Geer et al. 2004), MS-GF+ (Kim et al. 2010), Myrimatch (Tabb et al. 2008), MS Amanda (Dorfer et al. 2014)), statistical postprocessing algorithms (qvality (Käll et al. 2009), Percolator (Käll et al. 2008)), and one algorithm that combines statistically postprocessed outputs from multiple search engines (“combined FDR” (Jones et al. 2009)) accessible as an interface in Python. Furthermore, we have implemented a new algorithm (“combined PEP”) that combines multiple search engines employing elements of “combined FDR” (Jones et al. 2009), PeptideShaker (Vaudel et al. 2015), and Bayes’ theorem.

Recently, we also included multiple open modification search engines (MSFragger (Kong et al. 2017), MODa (Na et al. 2012), PIPI (Yu et al. 2016), TagGraph (Devabhaktuni et al. 2019)) and engines for the downstream processing of open modification search results (PTM-Shepherd (Geiszler et al. 2020), PTMiner (An et al 2019)). The combination of these engines allows for the analysis of wide PTM landscapes.

Schulze, S., Igiraneza, A. B., Koesters, M., Leufken, J., Leidel, S. A., Garcia, B. A., Fufezan, C., and Pohlschroder, M. (2021): Enhancing Open Modification Searches via a Combined Approach Facilitated by Ursgal_ , Journal of Proteome research, 20, 1986-1996. DOI:10.1021/acs.jproteome.0c00799

Kremer, L. P. M., Leufken, J., Oyunchimeg, P., Schulze, S. and Fufezan, C. (2015): Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis_ , Journal of Proteome research, 15, 788-. DOI:10.1021/acs.jproteome.5b00860

Documentation

The complete Documentation can be found at Read the Docs

Besides the Download and Installation steps, this includes a Quick Start Tutorial detailed documentation of the Modules and Available Engines as well as a broad set of Example Scripts and many more.

Download and Installation

Ursgal requires Python 3.7 or higher. If you want to run Ursgal on a Windows system, Python 3.7 or higher is recommended.

There are two recommended ways for installing Ursgal:

Installation via pip

Installation from the source (GitHub)

Installation via pip

Execute the following command from your command line:

user@localhost:~$ pip install ursgal

This installs Python into your Python site-packages.

To download the executables, which we are allowed to distribute run:

user@localhost:~$ ursgal-install-resources

You can now use it with all engines that we have built or that we are allowed to distribute. For all other third-party engines, a manual download from the respective homepage is required (see also: How to install third party engines)

Note

Pip is included in Python 3.7 and higher. However, it might not be included in in your system's PATH environment variable. If this is the case, you can either add the Python scripts directory to your PATH env variable or use the path to the pip.exe directly for the installation, e.g.: ~/Python37/Scripts/pip.exe install ursgal

Installation from source

Download Ursgal using GitHub or the zip file:

GitHub version: Starting from your command line, the easiest way is to clone the GitHub repo.:
```
user@localhost:~$ git clone https://github.com/ursgal/ursgal.git
```
ZIP version: Alternatively, download and extract the ursgal zip file

Next, navigate into the Ursgal folder and install the requirements:

user@localhost:~$ cd ursgal
user@localhost:~/ursgal$ pip install -r requirements.txt

Note

3. Finally, use setup.py to download third-party engines (those that we are allowed to distribute) and to install Ursgal into the Python site-packages:

user@localhost:~/ursgal$ python setup.py install

If you want to install the third-party engines without installing Ursgal into the Python site-packages you can use:

user@localhost:~/ursgal$ python setup.py install_resources

Note

Since we are not allowed to distribute all third party engines, you might need to download and install them on your own. See FAQ (How to install third party engines) and the respective engine documentation for more information.

Note

Under Linux, it may be required to change the permission in the python site-package folder so that all files are executable

(You might need administrator privileges to write in the Python site-package folder. On Linux or OS X, use `sudo python setup.py install or write into a user folder by using this command python setup.py install --user`. On Windows, you have to start the command line with administrator privileges.)

Tests

Run tox in root folder. You might need to install tox for Python3 first although it is in the requirements_dev.txt (above) thus pip install -r requirements_dev.txt should have installed it already. Then just execute:

user@localhost:~/ursgal$ tox

In case you only want to test one python version (e.g because you only have one installed), run for e.g. python3.9:

user@localhost:~/ursgal$ tox -e py39

For other environments to run, check out the tox.ini file to test the package.

Update to v0.6.0 Warning

Please note that, due to significant reorganization of UController functions as well as some uparams, compatibility of v0.6.0 with previous versions is not given in all cases. Most likely, your previous results will not be recognized, i.e. previously executed runs will be executed again. Please consider this before updating to v0.6.0, check the Changelog or ask us if you have any doubts. We are sorry for the inconvenience but changes were necessary for further development. If you want to continue using (and modifying) v0.5.0 you can use the branch v0.5.0.

Questions and Participation

If you encounter any problems you can open up issues at GitHub, join the conversation at Gitter, or write an email to [email protected]. Please also check the Frequently Asked Questions.

For any contributions, fork us at https://github.com/ursgal/ursgal and open up pull requests! Please also check the Contribution Guidelines. Thanks!

Disclaimer

Ursgal is beta and thus still contains bugs. Verify your results manually and as common practice in science, never trust a blackbox :)

Copyrights

Christian Fufezan
Aime B. Igiraneza
Manuel Koesters
Lukas P. M. Kremer
Johannes Leufken
Purevdulam Oyunchimeg
Stefan Schulze
Lukas Vaut
David Yang
Fengchao Yu

Contact

Dr. Christian Fufezan
Institute of Pharmacy and Molecular Biotechnology
Heidelberg University
Germany
eMail: [email protected]

Citation

In an academic world, citations are the only credit that one can hope for ;) Therefore, please do not forget to cite us if you use Ursgal:

Schulze, S., Igiraneza, A. B., Kösters, M., Leufken, J., Leidel, S. A., Garcia, B. A., Fufezan, C., and Pohlschroder, M. (2021) Enhancing Open Modification Searches via a Combined Approach Facilitated by Ursgal Journal of Proteome Research, DOI:10.1021/acs.jproteome.0c00799

Kremer, L. P. M., Leufken, J., Oyunchimeg, P., Schulze, S., and Fufezan, C. (2016) Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis Journal of Proteome Research 15, 788–794, DOI:10.1021/acs.jproteome.5b00860

Note

Please also cite every tool you use in Ursgal. During runtime the references of the tools you are using are shown.

Full list of tools with proper citations that are integrated into Ursgal are:

Craig, R.; Beavis, R. C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20 (9), 1466–1467.

Dorfer, V.; Pichler, P.; Stranzl, T.; Stadlmann, J.; Taus, T.; Winkler, S.; Mechtler, K. MS Amanda, a Universal Identification Algorithm Optimised for High Accuracy Tandem Mass Spectra. J. Proteome Res. 2014.

Frank, A. M.; Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. and Pevzner, P. A. De Novo Peptide Sequencing and Identification with Precision Mass Spectrometry. J. Proteome Res. 2007 6:114-123.',

Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open Mass Spectrometry Search Algorithm. J. Proteome res. 2004, 3 (5), 958–964.

Hoopmann, M. R.; Zelter, A.; Johnson, R. S.; Riffle, M.; Maccoss, M. J.; Davis, T. N.; Moritz, R. L. Kojak: Efficient analysis of chemically cross-linked protein complexes. J Proteome Res 2015, 14, 2190-198

Jones, A. R.; Siepen, J. a.; Hubbard, S. J.; Paton, N. W. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. Proteomics 2009, 9 (5), 1220–1229.

Kim, S.; Mischerikow, N.; Bandeira, N.; Navarro, J. D.; Wich, L.; Mohammed, S.; Heck, A. J. R.; Pevzner, P. A. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. MCP 2010, 2840–2852.

Käll, L.; Canterbury, J. D.; Weston, J.; Noble, W. S.; MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature methods 2007, 4 (11), 923–925.

Käll, L.; Storey, J. D.; Noble, W. S. Qvality: Non-parametric estimation of q-values and posterior error probabilities. Bioinformatics 2009, 25 (7), 964–966.

Kong, A. T., Leprevost, F. V, Avtonomov, D. M., Mellacheruvu, D., and Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nature methods 2017, 14, 513–520

Leufken J, Niehues A, Sarin LP, Wessel F, Hippler M, Leidel SA, Fufezan C. pyQms enables universal and accurate quantification of mass spectrometry data. Mol Cell Proteomics 2017, 16, 1736-1745

Ma, B. Novor: real-time peptide de novo sequencing software. J Am Soc Mass Spectrom. 2015 Nov;26(11):1885-94

Na S, Bandeira N, Paek E. Fast multi-blind modification search through tandem mass spectrometry. Mol Cell Proteomics 2012, 11

Reisinger, F.; Krishna, R.; Ghali, F.; Ríos, D.; Hermjakob, H.; Antonio Vizcaíno, J.; Jones, A. R. JmzIdentML API: A Java interface to the mzIdentML standard for peptide and protein identification data. Proteomics 2012, 12 (6), 790–794.

Tabb, D. L.; Fernando, C. G.; Chambers, M. C. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res. 2008, 6 (2), 654–661.

Yu, F., Li, N., Yu, W. PIPI: PTM-Invariant Peptide Identification Using Coding Method. J Prot Res 2016, 15

Barsnes, H., Vaudel, M., Colaert, N., Helsens, K., Sickmann, A., Berven, F. S., and Martens, L. (2011) compomics-utilities: an open-source Java library for computational proteomics. BMC Bioinformatics 12, 70

Leufken, J., Niehues, A., Sarin, L. P., Wessel, F., Hippler, M., Leidel, S. A., and Fufezan, C. (2017) pyQms enables universal and accurate quantification of mass spectrometry data. Mol. Cell. Proteomics 16, 1736–1745

Jaeger, D., Barth, J., Niehues, A., and Fufezan, C. (2014) pyGCluster, a novel hierarchical clustering approach. Bioinformatics 30, 896–898

Bald, T., Barth, J., Niehues, A., Specht, M., Hippler, M., and Fufezan, C. (2012) pymzML--Python module for high-throughput bioinformatics on mass spectrometry data. Bioinformatics 28, 1052–1053

Kösters, M., Leufken, J., Schulze, S., Sugimoto, K., Klein, J., Zahedi, R. P., Hippler, M., Leidel, S. A., and Fufezan, C. (2018) pymzML v2.0: introducing a highly compressed and seekable gzip format. Bioinformatics 34, 2513-2514

Liu, M.Q.; Zeng, W.F.; Fang, P.; Cao, W.Q.; Liu, C.; Yan, G.Q.; Zhang, Y.; Peng, C.; Wu, J.Q.;

Zhang, X.J.; Tu, H.J.; Chi, H.; Sun, R.X.; Cao, Y.; Dong, M.Q.; Jiang, B.Y.; Huang, J.M.; Shen, H.L.; Wong ,C.C.L.; He, S.M.; Yang, P.Y. (2017) pGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification. Nat Commun 8(1)

Yuan, Z.F.; Liu, C.; Wang, H.P.; Sun, R.X.; Fu, Y.; Zhang, J.F.; Wang, L.H.; Chi, H.; Li, Y.; Xiu, L.Y.; Wang, W.P.; He, S.M. (2012) pParse: a method for accurate determination of monoisotopic peaks in high-resolution mass spectra. Proteomics 12(2)

Hulstaert, N.; Sachsenberg, T.; Walzer, M.; Barsnes, H.; Martens, L. and Perez-Riverol, Y. (2019) ThermoRawFileParser: modular, scalable and cross-platform RAW file conversion. bioRxiv https://doi.org/10.1101/622852

Tran, N.H.; Zhang, X.; Xin, L.; Shan, B.; Li, M. (2017) De novo peptide sequencing by deep learning. PNAS 114 (31)

Devabhaktuni, A.; Lin, S.; Zhang, L.; Swaminathan, K.; Gonzalez, CG.; Olsson, N.; Pearlman, SM.; Rawson, K.; Elias, JE. (2019) TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol. 37(4)

Yang, H; Chi, H; Zhou, W; Zeng, WF; He, K; Liu, C; Sun, RX; He, SM. (2017) Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications. J Proteome Res. 16(2)

Polasky, DA; Yu, F; Teo, GC; Nesvizhskii, AI (2020) Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco. Nat Methods 17 (11)

Geiszler, DJ; Kong, AT; Avtonomov, DM; Yu, F; Leprevost, FV; Nesvizhskii, AI (2020) PTM-Shepherd: analysis and summarization of post-translational and chemical modifications from open search results. bioRxiv doi: https://doi.org/10.1101/2020.07.08.192583

An, Z; Zhai, L; Ying, W; Qian, X; Gong, F; Tan, M; Fu, Y. (2019) PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome. Mol Cell Proteomics 18 (2)

Schulze, S; Oltmanns, A; Fufezan, C; Krägenbring, J; Mormann, M; Pohlschröder, M; Hippler, M (2020). SugarPy facilitates the universal, discovery-driven analysis of intact glycopeptides. Bioinformatics

ursgal's People

Contributors

Stargazers

Watchers

ursgal's Issues

problem running myrimatch - Windows 8

I am getting the following error when executing do_it_all_folder_wide.py example.

ValueError: path is on mount 'D:', start on mount 'C:'
What does that error mean? Should the data be on the C: drive?

Thanks
Witold

[ ucontrol ] Initializing profile QExactive+
[ ucontrol ] 4 parameters have been updated
[ ucontrol ] Preparing unode run for engine myrimatch_2_2_140 on file(s) D:\projects\p2069\dataSearchResults\mzML\20160704_03_C_01.mgf
[ ucontrol ] Setting self.io["input"]
[ ucontrol ] Generated engine myrimatch_2_2_140 output file name: D:\projects\p2069\dataSearchResults\mzML\myrimatch_2_2_140\20160704_03_C_01_myrimatch_2_2_140.mzid.gz
[ ucontrol ] search_mgf() scheduled on input file 20160704_03_C_01.mgf
[ ucontrol ] Reason for run: Never executed before. No out_json 20160704_03_C_01_myrimatch_2_2_140.mzid.gz.u.json found.
[ ucontrol ] Preparing json dump
[ ucontrol ] Json dumped. Path: D:\projects\p2069\dataSearchResults\mzML\myrimatch_2_2_140\20160704_03_C_01_myrimatch_2_2_140.mzid.gz.u.json

Traceback (most recent call last):
-- myrimatch_2_2_140 run initialized with 20160704_03_C_01.mgf (Fri Dec 16 14:02:54 2016) -/-
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev\pydevd.py", line 1596, in

globals = debugger.run(setup['file'], None, None, is_module)

[ myrimatc ] Will compress output 20160704_03_C_01_myrimatch_2_2_140.mzid on the fly ... renamed temporarily params["output_file"]
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev\pydevd.py", line 974, in run
[ myrimatc ] Preparing engine
pydev_imports.execfile(file, globals, locals) # execute the script
[ myrimatc ] Starting engine
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
[ ]
exec(compile(contents+"\n", file, 'exec'), glob, loc)
[ Please ]
File "C:/Users/wolski/prog/ursgal_examples/do_it_all_folder_wide.py", line 126, in
[ cite: ] Tabb DL, Fernando CG, Chambers MC. (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis.
target_decoy_database = "D:/projects/p2069/dataSearchResults/fasta/p2069_db1_d_20160322.fasta"
[ ----- ]
File "C:/Users/wolski/prog/ursgal_examples/do_it_all_folder_wide.py", line 98, in main
[ ]
engine = search_engine,
[ PREFLGHT ] Executing preflight sequence ...
File "C:\Users\wolski\prog\ursgal2\ursgal\ucontroller.py", line 1831, in search
force = force,
File "C:\Users\wolski\prog\ursgal2\ursgal\ucontroller.py", line 1746, in search_mgf
force, engine_name, answer
File "C:\Users\wolski\prog\ursgal2\ursgal\ucontroller.py", line 2071, in run_unode_if_required
json_path = json_path,
File "C:\Users\wolski\prog\ursgal2\ursgal\unode.py", line 1242, in run
report['preflight'] = self._preflight()
File "C:\Users\wolski\prog\ursgal2\ursgal\unode.py", line 988, in _preflight
preflight_answer = self.preflight()
File "C:\Users\wolski\prog\ursgal2\ursgal\wrappers\myrimatch_2_1_138.py", line 65, in preflight
).format( os.path.relpath(self.params['translations']['mzml_input_file']) ),
File "C:\Users\wolski\AppData\Local\Programs\Python\Python35\lib\ntpath.py", line 574, in relpath
path_drive, start_drive))
ValueError: path is on mount 'D:', start on mount 'C:'

Process finished with exit code 1

core digest functionality - unify overlapping params

After merge of #150 , the core digest function has "count_missed_cleavages" and "no_missed_cleavages".
This should be changed, since "no_missed_cleavages" is redundant and should be covered by count_missed_cleavages = 0.

Need to be checked if this would interfere with other functions that use the digest.

Delete obsolete branches

Since we have a lot of branches, which either have been merged or will not be merged but are still in the main repo, we should delete the ones which we do not need anymore.

I'd suggest to delete the following branches:

MKoesters-percolator3
alt_docu_trigger
fix_mp
hotfix/m_double_mods_on1
pypi_badge
upstream/master
venn_1_1_0

these branches are all merged, so we are not loosing anything

What about the format/black branch?
git tells me that this branch has not been merged yet

Best,
Manuel

upeptide_mapper fails with uparam write_unfiltered_results

While attempting a cascaded search, engine upeptide_mapper is failing under the ucontroller function uc.execute_misc_engine. The failure occurs while preparing the unode run for engine upeptide_mapper, and only fails when used for files obtained from parameter write_unfiltered_results. Failure occurs at the set_ios function within the ucontroller, where it appears that md5 files for the filtered result files being input into the upeptide_mapper engine are not being generated/missing.

I have previously used uc.execute_misc_engine with files obtained from parameter write ‘unfiltered results’ with no issue however I used the engine merge_csv instead of upeptide_mapper. The error is probably due to differences in the execute_unode and merge_csv functions within the execute_misc_engine function in the uncontroller.

The error message is shown here:

Traceback (most recent call last):
File "do_it_all_folder_wide_cascade_search_GlycoPA2.py", line 401, in <module>
  target_decoy_database=sys.argv[3],
File "do_it_all_folder_wide_cascade_search_GlycoPA2.py", line 323, in main
  scan_exclusion_list=list(spectra_with_PSM)
File "do_it_all_folder_wide_cascade_search_GlycoPA2.py", line 90, in workflow
  engine='upeptide_mapper',
File "/home/davyang/scripts/ursgal/ucontroller.py", line 965, in execute_misc_engine
  merge_duplicates = merge_duplicates,
File "/home/davyang/scripts/ursgal/ucontroller.py", line 2929, in execute_unode
  force  = force
File "/home/davyang/scripts/ursgal/ucontroller.py", line 1026, in prepare_unode_run
  userdefined_output_fname = output_file
File "/home/davyang/scripts/ursgal/ucontroller.py", line 1314, in set_ios
  self.io['input']['finfo']['md5'] = self.io['input']['o_finfo']['md5']
KeyError: 'md5'

The part of the script I used that is failing shown here:

if search_engine == 'msfragger_20170103':
    if level != '0':
        uc.params['csv_filter_rules']=[
            ['Spectrum ID', 'contains_element_of_list', scan_exclusion_list],
        ]
        uc.params['write_unfiltered_results'] = True
        
        search_result_intermediate = uc.execute_misc_engine(
            input_file=converted_result,
            engine='filter_csv',
        )
        converted_result = search_result_intermediate.replace('accepted', 'rejected')
        uc.params['csv_filter_rules'] = []

mapped_results = uc.execute_misc_engine(
    input_file=converted_result,
    engine='upeptide_mapper',
)

pymzml ERROR

Hi,

I got an error when I run command nosetests, and it is something related to pymzml .


..............................................................................................................................................................................................................................................................................................................................................................................................................................E............................................................................................................................................................................................
======================================================================
ERROR: Failure: Exception (Filename has .gz extension but is missing the gzip magic bytes.
The file may be corrupted or not gzipped.)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/carlos/.local/lib/python3.6/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/carlos/.local/lib/python3.6/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/carlos/.local/lib/python3.6/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/carlos/.local/lib/python3.6/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/usr/lib/python3.6/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/carlos/Software/ursgal/tests/complete_search_with_omssa_test.py", line 65, in <module>
    force      = True
  File "/home/carlos/Software/ursgal/ursgal/ucontroller.py", line 1847, in search
    force = force,
  File "/home/carlos/Software/ursgal/ursgal/ucontroller.py", line 899, in convert
    output_file_name = output_file_name
  File "/home/carlos/Software/ursgal/ursgal/ucontroller.py", line 351, in convert_to_mgf_and_update_rt_lookup
    force, engine_name, answer
  File "/home/carlos/Software/ursgal/ursgal/ucontroller.py", line 2186, in run_unode_if_required
    json_path = json_path,
  File "/home/carlos/Software/ursgal/ursgal/unode.py", line 1387, in run
    report['execution'] = self._execute()
  File "/home/carlos/Software/ursgal/ursgal/wrappers/mzml2mgf_2_0_0.py", line 71, in _execute
    precursor_max_charge  = self.params['translations']['precursor_max_charge'],
  File "/home/carlos/Software/ursgal/ursgal/resources/platform_independent/arc_independent/mzml2mgf_2_0_0/mzml2mgf_2_0_0.py", line 113, in main
    peaks_2_write = spec.peaks('centroided')
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/spec.py", line 1017, in peaks
    mz_params = self._get_encoding_parameters('m/z array')
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/spec.py", line 199, in _get_encoding_parameters
    array_type_accession = self.calling_instance.OT[array_type]['id']
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/obo.py", line 109, in __getitem__
    self.parseOBO()
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/obo.py", line 185, in parseOBO
    "Filename has .gz extension but is missing the gzip magic bytes.\n"
Exception: Filename has .gz extension but is missing the gzip magic bytes.
The file may be corrupted or not gzipped.

----------------------------------------------------------------------
Ran 603 tests in 11.012s

FAILED (errors=1)

Furthermore, I get the same error after running the script complete_search_with_omssa_test.py. This is the log:

 profile  ] Initializing profile LTQ XL low res
[ profile  ] 4 parameters have been updated
[  prprun  ] Preparing unode run for engine mzml2mgf_2_0_0 on file(s) tests/data/test_Creinhardtii_QE_pH8.mzML
[ set_ios  ] Setting self.io["input"]
[   Info   ] Generated engine mzml2mgf_2_0_0 output file name: /home/carlos/Software/ursgal/tests/data/test_Creinhardtii_QE_pH8.mgf
[   Info   ] convert_to_mgf_and_update_rt_lookup() scheduled on input file test_Creinhardtii_QE_pH8.mzML
[   Info   ] Reason for run: No RT lookup pickle found. Expected /home/carlos/Software/ursgal/tests/data/_ursgal_lookup.pkl
[ dmpjson  ] Preparing json dump
[ dmpjson  ] Json dumped. Path: /home/carlos/Software/ursgal/tests/data/test_Creinhardtii_QE_pH8.mgf.u.json

        -\-     mzml2mgf_2_0_0 run initialized with test_Creinhardtii_QE_pH8.mzML (Mon Sep 24 13:28:31 2018)     -/-

[   run    ] Preparing engine
[   run    ] Starting engine
[ Citation ] 
[  Please  ] 
[  cite:   ] Kremer, L. P. M., Leufken, J., Oyunchimeg, P., Schulze, S. & Fufezan, C. (2016) Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis. J. Proteome res. 15, 788-794.
[  -----   ] 
[ Citation ] 
[ PREFLGHT ] Executing preflight sequence ...
[ prefligh ] Execution time 0.000 seconds
[ -ENGINE- ] Executing conversion ..
Converting file:
	mzml : /home/carlos/Software/ursgal/tests/data/test_Creinhardtii_QE_pH8.mzML
	to
	mgf : /home/carlos/Software/ursgal/tests/data/test_Creinhardtii_QE_pH8.mgf
Traceback (most recent call last):H8.mzML       : Processing spectrum 0
  File "tests/complete_search_with_omssa_test.py", line 65, in <module>
    force      = True
  File "/usr/local/lib/python3.6/dist-packages/ursgal/ucontroller.py", line 1847, in search
    force = force,
  File "/usr/local/lib/python3.6/dist-packages/ursgal/ucontroller.py", line 899, in convert
    output_file_name = output_file_name
  File "/usr/local/lib/python3.6/dist-packages/ursgal/ucontroller.py", line 351, in convert_to_mgf_and_update_rt_lookup
    force, engine_name, answer
  File "/usr/local/lib/python3.6/dist-packages/ursgal/ucontroller.py", line 2186, in run_unode_if_required
    json_path = json_path,
  File "/usr/local/lib/python3.6/dist-packages/ursgal/unode.py", line 1387, in run
    report['execution'] = self._execute()
  File "/usr/local/lib/python3.6/dist-packages/ursgal/wrappers/mzml2mgf_2_0_0.py", line 71, in _execute
    precursor_max_charge  = self.params['translations']['precursor_max_charge'],
  File "/usr/local/lib/python3.6/dist-packages/ursgal/resources/platform_independent/arc_independent/mzml2mgf_2_0_0/mzml2mgf_2_0_0.py", line 113, in main
    peaks_2_write = spec.peaks('centroided')
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/spec.py", line 1017, in peaks
    mz_params = self._get_encoding_parameters('m/z array')
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/spec.py", line 199, in _get_encoding_parameters
    array_type_accession = self.calling_instance.OT[array_type]['id']
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/obo.py", line 109, in __getitem__
    self.parseOBO()
  File "/home/carlos/.local/lib/python3.6/site-packages/pymzml/obo.py", line 185, in parseOBO
    "Filename has .gz extension but is missing the gzip magic bytes.\n"
Exception: Filename has .gz extension but is missing the gzip magic bytes.
The file may be corrupted or not gzipped.

Improve spectrum viewer

Any input for improvement or new functionality for the spectrum viewer is highly appreciated.

Feel free to open pull request!

do_it_all_folder crashes with example_data

I use the data from the example_data folder to run the do_it_all_folder script. It does crash with a percolator error.

Percolator terminates with:

Error in the input data: to good separation between target and decoy PSMs.
Impossible to estimate pi0. Terminating.

Ursgal reports:

AssertionError:
percolator_2_08 crashed!

Should in such case the entire workflow crash? Or should it handle this particular error more gracefully? e.g. still produce a result.

Selected feature number 0 as initial search direction, could separate -1 positives in that direction
Selected feature number 0 as initial search direction, could separate -1 positives in that direction
Selected feature number 0 as initial search direction, could separate -1 positives in that direction
Estimating 1 over q=0.01 in initial direction
Reading in data and feature calculation took 0.003 cpu seconds or 0 seconds wall time
---Training with Cpos selected by cross validation, Cneg selected by cross validation, fdr=0.01
Iteration 1 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 2 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 3 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 4 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 5 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 6 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 7 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 8 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 9 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Iteration 10 : After the iteration step, 1 target PSMs with q<0.01 were estimated by cross validation
Obtained weights (only showing weights of first cross validation set)

first line contains normalized weights, second line the raw weights

lnrSp deltLCn deltCn Xcorr Sp IonFrac Mass PepLen Charge1 Charge2 Charge3 Charge4 Charge5 Charge6 Charge7 Charge8 Charge9 Charge10 enzN enzC m0
0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0909
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0909
After all training done, 1 target PSMs with q<0.0100 were found when measuring on the test set
Found 1 target PSMs scoring over 1.0000% FDR level on testset
Merging results from 3 datasets
Error in the input data: too good separation between target and decoy PSMs.
Impossible to estimate pi0. Terminating.
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev\pydevd.py", line 1596, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/wolski/prog/ursgal2/example_scripts/do_it_all_folder_wide.py", line 124, in
target_decoy_database = sys.argv[3],
File "C:/Users/wolski/prog/ursgal2/example_scripts/do_it_all_folder_wide.py", line 100, in main
engine = validation_engine,
File "C:\Users\wolski\prog\ursgal2\ursgal\ucontroller.py", line 2823, in validate
output_file_name = output_file_name
File "C:\Users\wolski\prog\ursgal2\ursgal\ucontroller.py", line 2744, in execute_unode
force, engine_name, answer
File "C:\Users\wolski\prog\ursgal2\ursgal\ucontroller.py", line 2068, in run_unode_if_required
json_path = json_path,
File "C:\Users\wolski\prog\ursgal2\ursgal\unode.py", line 1243, in run
report['execution'] = self._execute()
File "C:\Users\wolski\prog\ursgal2\ursgal\unode.py", line 459, in _execute
'''.format( self.engine, os.path.relpath(self.exe), self.execute_return_code)
AssertionError:

percolator_2_08 crashed!

The executable
..\ursgal\resources\win32\64bit\percolator_2_08\percolator.exe
terminated with Error code 255 .
Inspect the printouts above for possible causes and verify that all input files are valid.

install_requirements fails to download files

I am trying to run ursgal on my ubuntu EC2 machine on AWS. I am using python3.6.

I get this error when I run python install_resources.py:

Executable for mzidentml_lib_1_6_11 is not available on your system
[ -<HTPP>- ] Downloading files from http://plan-a.uni-muenster.de/ursgal/resources/platform_independent/arc_independent/mzidentml_lib_1_6_11/mzidentml_lib_1_6_11.md ...
[ -<HTPP>- ] 	 WARNING! Could not download http://plan-a.uni-muenster.de/ursgal/resources/platform_independent/arc_independent/mzidentml_lib_1_6_11/mzidentml_lib_1_6_11.md Check your internet connection!
Traceback (most recent call last):
  File "install_resources.py", line 40, in <module>
    main(resources)
  File "install_resources.py", line 13, in main
    downloaded_zips = uc.download_resources( resources = resources)
  File "/home/ubuntu/ursgal/ursgal/ucontroller.py", line 2727, in download_resources
    'r'
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/ursgal/ursgal/resources/platform_independent/arc_independent/mzidentml_lib_1_6_11/mzidentml_lib_1_6_11.md'

I expect the command to install the resources, but obviously it fails. How can I get the command to work?

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Update dependency dash to v2 [SECURITY]
Update dependency dash-core-components to v2 [SECURITY]
Update dependency dash-html-components to v2 [SECURITY]
Update actions/checkout action to v4
Update actions/setup-python action to v5
Click on this checkbox to rebase all open PRs at once

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

github-actions

.github/workflows/black.yml

actions/checkout v2

actions/setup-python v4

.github/workflows/pypi_cd.yml

actions/checkout v2

actions/setup-python v2

casperdcl/deploy-pypi v2

.github/workflows/tox_ci.yml

actions/checkout v2

actions/setup-python v2

pip_requirements

docs/requirements.txt

dash-core-components ==1.16.0

dash ==1.20.0

dash-html-components ==1.1.1

dash-renderer ==1.9.0

dash-table ==4.11.0

plotly ==4.14.3

Check this box to trigger a request for Renovate to run again on this repository

xtandem_version_comparison error

Hi,

I just tried to run xtandem_version_comparison.py. However, there is a problem related to the downloading of the fasta file.

I think this is because of the URL on line 58:

http://www.unimuenster.de/Biologie.IBBP.AGFufezan/misc/Creinhardtii_281_v5_5_CP_MT_with_contaminants_target_decoy.fasta

best,

carlos

rerun is not triggered if parameter changed back to default

I did a run with semi_enyzme = True and then changed back to default, but no rerun was triggered. If I explicitly set semi_enyzme = False, rerun is triggered. This probably has something to do with the fact that default parameters are not stored in the jsons.

2 test failures on windows 8

Platform Windows 8.
C:\Users\wolski\prog\ursgal>python
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32

Running:

C:\Users\wolski\prog\ursgal>nosetests --version
nosetests version 1.3.7

Produces the following 2 test failures.

Even with those 2 errors ursgal looks super cool (especially all those great examples) and I will dedicate it some time before Christmas on windows and Linux.

.............................................................................................................................................
.

FAIL: calc_md5_test.check_md5_test({'input': 'tests\data\test.json', 'output': '379450895e2c116886b2e92dfcd68b2b'},)

Traceback (most recent call last):
File "c:\users\wolski\appdata\local\programs\python\python35\lib\site-packages\nose\case.py", line 198, in runTest
self.test(*self.arg)
File "C:\Users\wolski\prog\ursgal\tests\calc_md5_test.py", line 33, in check_md5
out_put
AssertionError:
MD5 {'input': 'tests\data\test.json', 'output': '379450895e2c116886b2e92dfcd68b2b'} failed
output: 54c19ed069413037dc857a0130dd2527
-------------------- >> begin captured stdout << ---------------------
[ ucontrol ] Calculating md5 for test.json ....
54c19ed069413037dc857a0130dd2527 {'input': 'tests\data\test.json', 'output': '379450895e2c116886b2e92dfcd68b2b'}

--------------------- >> end captured stdout << ----------------------

======================================================================
FAIL: calc_md5_test.check_md5_test({'input': 'tests\data\test_without_database.json', 'output': 'deb20d01ff369188a583decf203cf769'},)

Traceback (most recent call last):
File "c:\users\wolski\appdata\local\programs\python\python35\lib\site-packages\nose\case.py", line 198, in runTest
self.test(*self.arg)
File "C:\Users\wolski\prog\ursgal\tests\calc_md5_test.py", line 33, in check_md5
out_put
AssertionError:
MD5 {'input': 'tests\data\test_without_database.json', 'output': 'deb20d01ff369188a583decf203cf769'} failed
output: 26cc3e0850d3ab95c74e4cf680475335
-------------------- >> begin captured stdout << ---------------------
[ ucontrol ] Calculating md5 for test_without_database.json ....
26cc3e0850d3ab95c74e4cf680475335 {'input': 'tests\data\test_without_database.json', 'output': 'deb20d01ff369188a583decf203cf769'}

--------------------- >> end captured stdout << ----------------------

error while downloading percolator 3_2_1

Hi guys,

I am trying to upgrade ursgal to version 0.6.3. However, something happens when it try to install the resources.

Executable for percolator_3_2_1 is not available on your system
[ -- ] Downloading files from https://www.sas.upenn.edu/~sschulze/ursgal_resources/ursgal/resources/linux/64bit/percolator_3_2_1/percolator_3_2_1.md ...
[ -- ] WARNING! Could not download https://www.sas.upenn.edu/~sschulze/ursgal_resources/ursgal/resources/linux/64bit/percolator_3_2_1/percolator_3_2_1.md Check your internet connection! [ -- ] For OSX, make sure that certificates are installed (Applications/Python 3.x/Install Certificates.command)
Traceback (most recent call last):
File "install_resources.py", line 40, in
main(resources)
File "install_resources.py", line 13, in main
downloaded_zips = uc.download_resources( resources = resources)
File "/home/carlos/projects/epigg/containers/test_ursgal/ursgal/ursgal/ucontroller.py", line 2763, in download_resources
'r'
FileNotFoundError: [Errno 2] No such file or directory: '/home/carlos/projects/epigg/containers/test_ursgal/ursgal/ursgal/resources/linux/64bit/percolator_3_2_1/percolator_3_2_1.md'

Incorrect merging of duplicate rows

Hi StSchulze,
Sorry for taking so long, I was testing in more detail. When the param "semi_enzyme" is set to true and combined with trypsin or aspn, sometimes (no all of them) msfgplus creates columns with two values separated by semicolon. for example, I got several rows in the field MS-GF:SpecEValue like this 5.131911E-7;1.4088748E-8. The same happens with MS-GF:RawScore and MS-GF:EValue.

Given that say, this leds to error message in the posterior validation process with percolator or qvality.
semi_sample.zip

Originally posted by @cguetot in #94 (comment)

merge_csvs() should be working if only one file is specified

At the moment the sanity check crashes, because at least two files need to be specified. I think it would be more convenient if it just continues without merging, when only one file is specified, i.e. return the same file.

cpus does not work for moda_style_1

Hi guys,

The parameter cpus is not working when I run MoDa. I know the wrapper should activate -@ on MoDA. I already tried from a fresh install on Linux but the behavior remains the same.

I hope it has an easy fix.

thanks,
C

Msgfplus and nonspecific

Hi ursgal,

I noted that you have been including new code to deal with nonspecific search. I am testing it and xtandem and omssa run without any issues so far. However, I get error message for msgfplus.

dmpjson  ] Json dumped. Path: /home/ubuntu/projects/lars/msgfplus_v2018_09_12/nonspecific_KristofferS_H1507_195_msgfplus_v2018_09_12.mzid.gz.u.json

       -\-     msgfplus_v2018_09_12 run initialized with nonspecific_KristofferS_H1507_195.mgf (Thu Jan 10 09:36:44 2019)     -/-

[   run    ] Will compress output nonspecific_KristofferS_H1507_195_msgfplus_v2018_09_12.mzid on the fly ... renamed temporarily params["output_file"]
[   run    ] Preparing engine
[   run    ] Starting engine
[ Citation ]
[  Please  ]
[  cite:   ] Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJ, Pevzner PA. (2010) The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search.
[  -----   ]
[ Citation ]
[ PREFLGHT ] Executing preflight sequence ...
[ prefligh ] Execution time 0.000 seconds
[ eXecutio ] Executing command list ...
MS-GF+ Release (v2018.09.12) (12 September 2018)
[Error] Cannot specify a MaxMissedCleavages when using unspecific cleavage enzyme

Traceback (most recent call last):
 File "/home/ubuntu/repos/ursgal_dda/nonspecific_run.py", line 116, in <module>
   search_engine=SEARCH_ENGINE
 File "/home/ubuntu/repos/ursgal_dda/nonspecific_run.py", line 65, in main
   engine=search_engine,
 File "/usr/local/lib/python3.6/dist-packages/ursgal/ucontroller.py", line 1856, in search
   force      = force,
 File "/usr/local/lib/python3.6/dist-packages/ursgal/ucontroller.py", line 1766, in search_mgf
   force, engine_name, answer
 File "/usr/local/lib/python3.6/dist-packages/ursgal/ucontroller.py", line 2186, in run_unode_if_required
   json_path = json_path,
 File "/usr/local/lib/python3.6/dist-packages/ursgal/unode.py", line 1400, in run
   report['execution'] = self._execute()
 File "/usr/local/lib/python3.6/dist-packages/ursgal/unode.py", line 485, in _execute
   '''.format( self.engine, os.path.relpath(self.exe), self.execute_return_code)
AssertionError:

msgfplus_v2018_09_12 crashed!

 The executable
   ../../../../usr/local/lib/python3.6/dist-packages/ursgal/resources/platform_independent/arc_independent/msgfplus_v2018_09_12/MSGFPlus.jar
 terminated with Error code 255 .
 Inspect the printouts above for possible causes and verify that all input files are valid.

Peptide mapping partially fails for MSAmanda results

I am getting quite strange results for mapping peptides from MSAmanda (msamanda_2_0_0_9706 on OSX) results using the UPeptideMapper_v3 (or v4, others not tested yet):

peptides that only occur in decoy sequences cannot be mapped at all, I get the following warnings (and I checked the database, they are in there)

[ WARNING ] The peptide ABC could not be mapped to the
[ WARNING ] given database XYZ.fasta
[ WARNING ]
[ WARNING ] This PSM will be skipped.
...
[ WARNING ] These 16071 peptides above (truncated to 100) could not be mapped to the database
[ WARNING ] Check Search and Database if neccesary
[ map_peps ] Attempting re-map of non-mappable peptides
...
[ WARNING ] These 16071 peptides (truncated to 100) are indeed not mappable
[ WARNING ] Check of Search parameters and database is strongly recommended

for peptides that occur in target as well as decoy sequences it works (getting the merged results with protein delimiter)
everything works fine for other search engine results (using the same pipeline), so it seems to be MSAmanda specific

Any ideas?

unifycsv fails for nonspecific and semi_trypsin

Hi,

I am trying to run XTandem using semi_tryptic and nonspecific parameters. However, unifycsv fails all the time. This is one of my outputs with nonspecific:

[ profile ] Initializing profile QExactive+
[ profile ] 2 parameters have been updated
[ prprun ] Preparing unode run for engine xtandem_alanine on file(s) CK_H1801_324.mgf
[ set_ios ] Setting self.io["input"]
[ Info ] Generated engine xtandem_alanine output file name: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine.xml.gz
[ Info ] Skipping search_mgf() on file CK_H1801_324.mgf since it was previously executed with the same input file(s) and parameters.
[ Info ] To re-run, use search_mgf( force=True )
[ dmpjson ] Preparing json dump
[ dmpjson ] Json dumped. Path: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine.xml.gz.u.json
[ profile ] Initializing profile QExactive+
[ profile ] 2 parameters have been updated
[ prprun ] Preparing unode run for engine xtandem2csv_1_0_0 on file(s) /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine.xml.gz
[ set_ios ] Setting self.io["input"]
[ Info ] Generated engine xtandem2csv_1_0_0 output file name: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine.csv
[ Info ] Skipping execute_unode() on file CK_H1801_324_xtandem_alanine.xml.gz since it was previously executed with the same input file(s) and parameters.
[ Info ] To re-run, use execute_unode( force=True )
[ dmpjson ] Preparing json dump
[ dmpjson ] Json dumped. Path: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine.csv.u.json
[ profile ] Initializing profile QExactive+
[ profile ] 2 parameters have been updated
[ prprun ] Preparing unode run for engine upeptide_mapper_1_0_0 on file(s) /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine.csv
[ set_ios ] Setting self.io["input"]
[ Info ] Generated engine upeptide_mapper_1_0_0 output file name: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine_pmap.csv
[ Info ] Skipping execute_unode() on file CK_H1801_324_xtandem_alanine.csv since it was previously executed with the same input file(s) and parameters.
[ Info ] To re-run, use execute_unode( force=True )
[ dmpjson ] Preparing json dump
[ dmpjson ] Json dumped. Path: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine_pmap.csv.u.json
[ profile ] Initializing profile QExactive+
[ profile ] 2 parameters have been updated
[ prprun ] Preparing unode run for engine unify_csv_1_0_0 on file(s) /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine_pmap.csv
[ set_ios ] Setting self.io["input"]
[ Info ] Generated engine unify_csv_1_0_0 output file name: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine_pmap_unified.csv
[ Info ] execute_unode() scheduled on input file CK_H1801_324_xtandem_alanine_pmap.csv
[ Info ] Reason for run: parameter "keep_asp_pro_broken_peps" was not found in previous output params....
& Previous run was not completed, status launching
[ dmpjson ] Preparing json dump
[ dmpjson ] Json dumped. Path: /data/DDA_ursgal/ursgal_dda/xtandem_alanine/CK_H1801_324_xtandem_alanine_pmap_unified.csv.u.json

    -\-     unify_csv_1_0_0 run initialized with CK_H1801_324_xtandem_alanine_pmap.csv (Wed Nov 28 08:38:57 2018)     -/-

[ run ] Preparing engine
[ run ] Starting engine
[ Citation ]
[ Please ]
[ cite: ] Kremer, L. P. M., Leufken, J., Oyunchimeg, P., Schulze, S. & Fufezan, C. (2016) Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis. J. Proteome res. 15, 788-794.
[ ----- ]
[ Citation ]
[ PREFLGHT ] Executing preflight sequence ...
[ prefligh ] Execution time 0.000 seconds
[ -ENGINE- ] Executing conversion ..

[ unifycsv ] Converting CK_H1801_324_xtandem_alanine_pmap.csv of engine xtandem_alanine to unified CSV format...

[ unify_cs ] Buffering csv file
[ unify_cs ] Buffering csv file done

error Traceback (most recent call last)
in
8 input_file=mgf_file,
9 engine='xtandem_alanine',
---> 10 force=Force
11 )
12

/data/external/installers/ursgal/env/lib/python3.6/site-packages/ursgal/ucontroller.py in search(self, input_file, engine, force, output_file_name)
1889 engine = self.params['unify_csv_converter_version'],
1890 force = force,
-> 1891 merge_duplicates = True,
1892 )
1893 return unified_search_results

/data/external/installers/ursgal/env/lib/python3.6/site-packages/ursgal/ucontroller.py in execute_misc_engine(self, input_file, engine, force, output_file_name, merge_duplicates)
962 force = force,
963 output_file_name = output_file_name,
--> 964 merge_duplicates = merge_duplicates,
965 )
966

/data/external/installers/ursgal/env/lib/python3.6/site-packages/ursgal/ucontroller.py in execute_unode(self, input_file, engine, force, output_file_name, dry_run, merge_duplicates)
2927 report = self.run_unode_if_required(
2928 force, engine_name, answer,
-> 2929 merge_duplicates=merge_duplicates
2930 )
2931 return report['output_file']

/data/external/installers/ursgal/env/lib/python3.6/site-packages/ursgal/ucontroller.py in run_unode_if_required(self, force, engine_name, answer, merge_duplicates, history_addon)
2184 json_path = self.dump_json_and_calc_md5()
2185 report = self.unodes[ engine_name ]['class'].run(
-> 2186 json_path = json_path,
2187 )
2188

/data/external/installers/ursgal/env/lib/python3.6/site-packages/ursgal/unode.py in run(self, json_path)
1398 )
1399 report['preflight'] = self._preflight()
-> 1400 report['execution'] = self._execute()
1401 report['postflight'] = self._postflight()
1402

/data/external/installers/ursgal/env/lib/python3.6/site-packages/ursgal/wrappers/unify_csv_1_0_0.py in _execute(self)
91 params = self.params,
92 search_engine = last_engine,
---> 93 score_colname = last_search_engine_colname,
94 )
95 for tmp_file in tmp_files:

/data/external/installers/ursgal/env/lib/python3.6/site-packages/ursgal/resources/platform_independent/arc_independent/unify_csv_1_0_0/unify_csv_1_0_0.py in main(input_file, output_file, scan_rt_lookup, params, search_engine, score_colname)
1013 )
1014 missed_cleavage_counter +=
-> 1015 len(re.findall(missed_cleavage_pattern, line_dict['Sequence']))
1016 elif cleavage_site == 'N':
1017 missed_cleavage_pattern = '[^{1}]{0}'.format(

/usr/lib/python3.6/re.py in findall(pattern, string, flags)
220
221 Empty matches are included in the result."""
--> 222 return _compile(pattern, flags).findall(string)
223
224 def finditer(pattern, string, flags=0):

/usr/lib/python3.6/re.py in _compile(pattern, flags)
299 if not sre_compile.isstring(pattern):
300 raise TypeError("first argument must be string or compiled pattern")
--> 301 p = sre_compile.compile(pattern, flags)
302 if not (flags & DEBUG):
303 if len(_cache) >= _MAXCACHE:

/usr/lib/python3.6/sre_compile.py in compile(p, flags)
560 if isstring(p):
561 pattern = p
--> 562 p = sre_parse.parse(p, flags)
563 else:
564 pattern = None

/usr/lib/python3.6/sre_parse.py in parse(str, flags, pattern)
853
854 try:
--> 855 p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
856 except Verbose:
857 # the VERBOSE flag was switched on inside the pattern. to be

/usr/lib/python3.6/sre_parse.py in _parse_sub(source, state, verbose, nested)
414 while True:
415 itemsappend(_parse(source, state, verbose, nested + 1,
--> 416 not nested and not items))
417 if not sourcematch("|"):
418 break

/usr/lib/python3.6/sre_parse.py in _parse(source, state, verbose, nested, first)
521 if this is None:
522 raise source.error("unterminated character set",
--> 523 source.tell() - here)
524 if this == "]" and set != start:
525 break

error: unterminated character set at position 1

unify_csv_1_0_0

``

re-ordering of modifications in params triggers re-run

If the same analysis script is re-run with changing the order of the modifications list in params, then re-run is (unnecessarily) triggered.

Add custom exceptions

Old text:

When running upeptide_mapper_1_0_0 with an empty input file, it crashes with the following error:

File "/home/manuel/Gits/Ursgal/ursgal_fork2/ursgal/resources/platform_independent/arc_independent/upeptide_mapper_1_0_0/upeptide_mapper_1_0_0.py", line 234, in main
*class_etxra_args
File "/home/manuel/Gits/Ursgal/ursgal_fork2/ursgal/resources/platform_independent/arc_independent/upeptide_mapper_1_0_0/upeptide_mapper_1_0_0.py", line 776, in map_peptides
for match in self.automatons[fasta_name].iter(self.total_sequence_string[fasta_name]):
AttributeError: Not an Aho-Corasick automaton yet: call add_word to add some keys and call make_automaton to convert the trie to an automaton.

I'd suggest to just return an empty output file and maybe print a warning, since this could crash e.g. a machine offset sweep workflow otherwise

New:

In order to handle the above mentioned issue, we agreed to implement ursgal specific exceptions, for e.g. empty files, so we can handle these exceptions in try/except statements so we don't need to fallback on accepting e.g. AttributeErrors, which could also occur due to different reasons.

Best,
Manuel

MSFragger + unify_csv

Unify_csv (ursgal 0.5.0) seems to have problems with MSFragger results. Other search engines work fine on the same mzml file.

[ unify_cs ] Buffering csv file
[ unify_cs ] Buffering csv file done
Traceback (most recent call last):r: 0/70192
File "do_it_all_folder_wide_15N.py", line 144, in
target_decoy_database = sys.argv[3],
File "do_it_all_folder_wide_15N.py", line 116, in main
engine = search_engine,
File "C:\Python35\lib\site-packages\ursgal\ucontroller.py", line 1844, in search
force = force,
File "C:\Python35\lib\site-packages\ursgal\ucontroller.py", line 2163, in unify_csv
output_file_name = output_file_name
File "C:\Python35\lib\site-packages\ursgal\ucontroller.py", line 2802, in execute_unode
force, engine_name, answer
File "C:\Python35\lib\site-packages\ursgal\ucontroller.py", line 2081, in run_unode_if_required
json_path = json_path,
File "C:\Python35\lib\site-packages\ursgal\unode.py", line 1313, in run
report['execution'] = self._execute()
File "C:\Python35\lib\site-packages\ursgal\wrappers\unify_csv_1_0_0.py", line 90, in _execute
score_colname = last_search_engine_colname,
File "C:\Python35\lib\site-packages\ursgal\resources\platform_independent\arc_independent\unify_csv_1_0_0\unify_csv_1_0_0.py", line 449, in main
if scan_rt_lookup[ input_file_basename ]['unit'] == 'second':
KeyError: 'FC_PLP-t2-15N-ne_03082017'

patch version of pymzml dependencies in install_requires

ursgal currently pins the versions of pymzml and pyqms to the patch level in the package install_requires. This is causing me issues trying to install ursgal alongside other packages with different requirements. Could this be relaxed to specify compatible minor/patch versions?

It might be an idea to have a separate definition of the install_requires which is referenced in setup.py alongside the requirements.txt which is used in the build/test pipeline.

e.g. for patch
pymzml~=2.4
pyqms~=0.6

I'm happy to prepare a PR. What versions of pymzml/pyqms are currently expected to work with ursgal?

history addon in merged file missing

Traceback (most recent call last):
File "merge_n_count.py", line 45, in
main(input_files=file_lists)
File "merge_n_count.py", line 39, in main
engine = 'venndiagram',
File "/ursgal/ucontroller.py", line 3019, in visualize
output_file_name = output_file_name
File "/ursgal/ucontroller.py", line 2888, in execute_unode
force = force
File "/ursgal/ucontroller.py", line 1007, in prepare_unode_run
self.input_file_dicts = self.generate_multi_file_dicts(input_file)
File "/ursgal/ucontroller.py", line 1187, in generate_multi_file_dicts
multiple_engines = True,
File "/ursgal/unode.py", line 679, in get_last_search_engine
multiple_engines=multiple_engines,
File "/ursgal/unode.py", line 576, in get_last_engine
for element in self.flatten_list(history_event['history_addon']['search_engines_of_merged_files']):
KeyError: 'search_engines_of_merged_files'

The last steps of the pipeline are a merge, a filtering and another merge.
After the first merge, the history addon 'search_engines_of_merged_files' is there, after the filtering it is also still there, but after the merge of the filtered files, it is gone.
Any ideas?

Percolator only crashes, Qvality doesn't work

Hello,

Thanks for making ursgal.

I'm trying to use just the basic search.py example and it seems the output of qvality will only be "1" for PEP, and "-1.#IND" for q-value regardless of any input.

Secondly, I can't get percolator to work on any data sets. I'd found a previous issue with this, Issue#13 ( #13 ), but the only inference was that the data set was not large enough. I've had this crash on both small and large data sets alike with no success whatsoever.

The largest data set I've tried is a proteomic experiment available on PRIDE, q01511.raw of:
https://www.ebi.ac.uk/pride/archive/projects/PXD008222/files
that was searched with the canonical & isoform proteom from uniprot:
http://www.uniprot.org/proteomes/UP000005640

I believe this would be large enough to perform validation on based on the number of peptides identified in the publication.

The version I'm using was cloned yesterday. I'm really interested in getting this workflow up and running, is there anything I could do to alleviate this problem?

Output from large data set crash and the small data set crash are the same:

percolator_2_08 crashed!

  The executable
    ursgal\resources\win32\64bit\percolator_2_08\percolator.exe
  terminated with Error code 3221226356 .
  Inspect the printouts above for possible causes and verify that all input files are valid.

Trouble running pepnovo using example script

Hi there, I just tried to run the example script for pepnovo searching and it crashed:

profile ] Initializing profile LTQ XL low res
[ profile ] 4 parameters have been updated
[ prprun ] Preparing unode run for engine mzml2mgf_2_0_0 on file(s) ../example_data/BSA_simple_de_novo_search/BSA1.mzML
[ set_ios ] Setting self.io["input"]
[ Info ] Generated engine mzml2mgf_2_0_0 output file name: /fs/local/projects/ursgal/example_data/BSA_simple_de_novo_search/BSA1.mgf
[ Info ] Skipping convert_to_mgf_and_update_rt_lookup() on file BSA1.mzML since it was previously executed with the same input file(s) and parameters.
[ Info ] To re-run, use convert_to_mgf_and_update_rt_lookup( force=True )
[ dmpjson ] Preparing json dump
[ dmpjson ] Json dumped. Path: /fs/local/projects/ursgal/example_data/BSA_simple_de_novo_search/BSA1.mgf.u.json
[ profile ] Initializing profile LTQ XL low res
[ profile ] 4 parameters have been updated
[ prprun ] Preparing unode run for engine pepnovo_3_1 on file(s) /fs/local/projects/ursgal/example_data/BSA_simple_de_novo_search/BSA1.mgf
[ set_ios ] Setting self.io["input"]
[ Info ] Generated engine pepnovo_3_1 output file name: /fs/local/projects/ursgal/example_data/BSA_simple_de_novo_search/pepnovo_3_1/BSA1_pepnovo_3_1.csv
[ Info ] search_mgf() scheduled on input file BSA1.mgf
[ Info ] Reason for run: parameter "precursor_mass_tolerance_plus" was not found in previous output params....
& Previous run was not completed, status launching
[ dmpjson ] Preparing json dump
[ dmpjson ] Json dumped. Path: /fs/local/projects/ursgal/example_data/BSA_simple_de_novo_search/pepnovo_3_1/BSA1_pepnovo_3_1.csv.u.json

    -\-     pepnovo_3_1 run initialized with BSA1.mgf (Wed Oct  3 14:43:34 2018)     -/-

[ run ] Preparing engine
[ run ] Starting engine
[ Citation ]
[ Please ]
[ cite: ] Ari M. Frank, Mikhail M. Savitski, Michael L. Nielsen, Roman A. Zubarev, and Pavel A. Pevzner (2007) De Novo Peptide Sequencing and Identification with Precision Mass Spectrometry, J. Proteome Res. 6:114-123.
[ ----- ]
[ Citation ]
[ PREFLGHT ] Executing preflight sequence ...

        [ WARNING ] precursor_mass_tolerance_plus and precursor_mass_tolerance_minus
        [ WARNING ] need to be combined for pyQms (use of symmetric tolerance window).
        [ WARNING ] The arithmetic mean is used.

[ prefligh ] Execution time 0.000 seconds
Traceback (most recent call last):
File "simple_de_novo_search.py", line 81, in
main()
File "simple_de_novo_search.py", line 69, in main
force=False
File "/fs/local/projects/ursgal/ursgal_env/lib/python3.6/site-packages/ursgal/ucontroller.py", line 1856, in search
force = force,
File "/fs/local/projects/ursgal/ursgal_env/lib/python3.6/site-packages/ursgal/ucontroller.py", line 1766, in search_mgf
force, engine_name, answer
File "/fs/local/projects/ursgal/ursgal_env/lib/python3.6/site-packages/ursgal/ucontroller.py", line 2186, in run_unode_if_required
json_path = json_path,
File "/fs/local/projects/ursgal/ursgal_env/lib/python3.6/site-packages/ursgal/unode.py", line 1387, in run
report['execution'] = self._execute()
File "/fs/local/projects/ursgal/ursgal_env/lib/python3.6/site-packages/ursgal/wrappers/pepnovo_3_1.py", line 249, in _execute
'''.format( self.engine, os.path.relpath(self.exe), self.execute_return_code)
AssertionError:

pepnovo_3_1 crashed!

The executable
../ursgal_env/lib/python3.6/site-packages/ursgal/resources/linux/64bit/pepnovo_3_1/PepNovo_bin
terminated with Error code 1 .
Inspect the printouts above for possible causes and verify that all input files are valid.

Not sure what I did wrong. I adapted the example_script "simple_de_novo_search.py" by commenting out novor but that was all.

If I try running the PepNovo from the command line it also fails :
Initializing models (this might take a few seconds)...
Warning: no peptide composition assigner was found (e.g., LTQ_COMP/IT_TRYP)!
This may cause problems when trying to run PepNovo!

Done.
Fragment tolerance : 0.5000
PM tolernace : 2.5000
PTMs considered : C+57

0 2442 BSA1.2442.2442.2Error: using an uninitialized peptide composition asssigner!
You might need to use an existing composition assigner.
Try adding the line: "#COMP_ASSIGNER LTQ_COMP/IT_TRYP" to the main model file (model_name.txt)

I have never tried to run a de novo search this way before so apologies if I am doing something really stoopid.
Toby

Implement new X!Tandem version ALANINE

A new X!Tandem version is out since february: http://www.thegpm.org/tandem/
Would be nice to implement it in Ursgal. Although there are no major changes to the normal search, the "expert systems methods for finding PTMs and SAVs" seems to be improved.

bug in terminal output print

Somehow the terminal output print has changed. The brackets are now empty most of the time (but not always ... see below). I don't know since when this is the case, but I have observed it on different machines.

[          ] Initializing profile LTQ XL low res
[          ] 4 parameters have been updated
[          ] Preparing unode run for engine mzml2mgf_1_0_0 on file(s) ..\example_data\BSA_simple_example_search\BSA1.mzML
[          ] Setting self.io["input"]

...

[ POSTFLGH ] Executing postflight sequence ...
[ postflig ] Execution time 0.00 seconds
[   run    ] Execution time 0.02 seconds
[          ] Preparing json dump
[          ] Calculating md5 for BSA1___unified.u_venndiagram.svg ....
[          ] Json dumped. Path: C:\Users\Admin\Desktop\ursgal\example_data\BSA_simple_example_search\BSA1___unified.u_venndiagram.svg.u.json

suffle peptide for target-decoy generation could result in new inhibitor sites

So far, existing inhibitor sites are conserved, but it is possible that new inhibitor sites are created by shuffling. The shuffled sequence needs to be checked and reshuffled accordingly.

Including elastase for msgfplus

Hi guys,

I know Ursgal doesn't include elastase for Msgfplus. However, starting from version v2019.02.07, there was included support for customizing enzyme definitions. We need to create file params\enzymes.txt (or params/enzymes.txt on Linux) below the working directory to define custom enzymes or override the cleavage residues for built-in enzymes.

The format of the enzymes.txt is defined here: https://msgfplus.github.io/msgfplus/examples/enzymes.txt

I wonder how we can work around this information to include support for elastase.

best,

Carlos

source: https://msgfplus.github.io/msgfplus/Changelog.html

machine_offset_in_ppm

ursgal has a parameter call machine_offset_in_ppm to re-calibrate mgfs. The name suggests that the input should be, e.g. 1 for 1ppm offset. However, the algorithm that uses that parameter, e.g. mzml2mgf expects 1e-6. Any follow up search with an offset of e.g -10 will crash as all mz values will be negative.

pyahocorasick with Windows and Python3.4+

At the moment, pyahocorasick fails to compile/install under Windows (7 and 10 64bit) when using pip with Python3.4+.

Refer to this issue for a (possible/'i-hope-that-it-works-for-you-too') solution: WojciechMula/pyahocorasick#55.

We may have a fix soon, stay tuned.

Cheers,
Johannes

setting an unknown parameter should raise an errror/warning

e.g. setting uc.params['new_param_or_incorrect_spelling'] = 'working' should raise an error because 'new_param_or_incorrect_spelling' is not in uparams, at the moment it is accepted and also written into the json but never used, which might be confusing, especially in case of incorrect spelling

ERROR: coverage

Hi,
Running 'tox' tests, I get the following error from 'coverage':

py37: commands succeeded
ERROR: coverage: commands failed
docu: commands succeeded
example_scripts: commands succeeded

Any ideas what is wrong with my installation?
I'm in Windows 8.1 64-bit. Python Anaconda installation. The same thing happens in Windows 10 too.
Is it just a "path to coverage" issue?
Regards,
Dean

Time stamps in history and output needs fixing

Docu build fails

After the merge with the new upapa_v3 mapper branch it should work again.

Multiple entries for userdefined mods in respective xml

When the same userdefined mod is searched, it is added again to the userdefined_unimod.xml, so something seems to be wrong with the parsing/checking of the existing userdefined mods.

Percolator wrapper crashing when enzyme is set to nonspecific

When setting the value of the parameter enzyme to nonspecific, the following line in the percolator wrapper crashes:
cleavage_site = self.params['translations']['enzyme'].split(';')[1]

Since there is no percolator2 translation for enzyme "nonspecific", its not changed to the internal format and there is no list entry 1 after splitting

I suggest, to always set enzN and enzC in the mascot wrapper always to 1 (True) and misscleavages to zero when the enzyme is nonspecific, however im not 100% sure about this.

issue with TMT modifications & OMSSA

Hi there,
I wanted to use Ursgal to search TMT modifications on any Ntermini and as fixed modification on K.

I adapted the all_mods as follows:

    'K,fix,any,TMT6plex',
    '*,opt,N-term,TMT6plex',

But I get this error when Ursgal starts OMSSA (it works with most of the other search engines):

[ WARNING ]
The combination of modification name and aminoacid is not supported by
OMSSA. Continuing without modification: {'id': '737', 'org': 'K,fix,any,TMT6plex', 'name': 'TMT6plex', 'composition': {'O': 2, 'N': 1, '15N': 1, '13C': 4, 'C': 8, 'H': 20}, 'mass': 229.162932, 'unimod': True, 'aa': 'K', '_id': 3, 'pos': 'any'}

[ WARNING ]
The combination of modification name and aminoacid is not supported by
OMSSA. Continuing without modification: {'id': '737', 'org': ',opt,N-term,TMT6plex', 'name': 'TMT6plex', 'composition': {'O': 2, 'N': 1, '15N': 1, '13C': 4, 'C': 8, 'H': 20}, 'mass': 229.162932, 'unimod': True, 'aa': '', '_id': 0, 'pos': 'N-term'}

If I look in the OMSSA mods.xml file, TMT is there but has slightly different names:

<MSModSpec_name>TMT 6-plex on n-term peptide</MSModSpec_name>

<MSModSpec_name>TMT 6-plex on K</MSModSpec_name>

Any help gratefully received!
Toby

install_resources.py not working properly on OS X

After executing

python install_resources.py

it all works up to

[ -<HTPP>- ] Downloading files from http://www.uni-muenster.de/Biologie.IBBP.AGFufezan/ursgal/resources/darwin/64bit/xtandem_vengeance/xtandem_vengeance.zip ...

then it just doesn't go on. Aborting after 10+ minutes results in

[  INFO   ] Please contact the Ursgal team!
                            
[   md5    ] Calculating md5 for xtandem_vengeance.zip ....

    [ WARNING ] md5 of downloaded zip file ursgal/resources/darwin/64bit/xtandem_vengeance/xtandem_vengeance.zip differs from
    [ WARNING ] md5 in knowledge base, exiting now!!!
    [  INFO   ] Please contact the Ursgal team!
                                

[ INFO ] No engines were downloaded, all should be available

The tandem_vengeance folder has been created but nothing is downloaded/extracted there.

I'm using OS X Yosemite 10.10.5

Modules not passing engine_sanity_check

Some of the example scripts seems to not be passing the engine_sanity_check.

From simple_combined_fdr_score.py:

C:\Base\ursgal\example_scripts>python simple_combined_fdr_score.py

        -\-     UController initialized (Fri Mar 29 13:17:14 2019)     -/-
         -\-    Ursgal v0.6.3  -  https://github.com/ursgal/ursgal    -/-

[ profile  ] Initializing profile LTQ XL low res
[ profile  ] 4 parameters have been updated

Traceback (most recent call last):
  File "simple_combined_fdr_score.py", line 103, in <module>
    main()
  File "simple_combined_fdr_score.py", line 71, in main
    engine='get_ftp_files_1_0_0'
  File "C:\Python37\lib\site-packages\ursgal\ucontroller.py", line 563, in fetch_file
    engine     = engine,
  File "C:\Python37\lib\site-packages\ursgal\ucontroller.py", line 2916, in execute_unode
    engine_name = self.engine_sanity_check( engine )
  File "C:\Python37\lib\site-packages\ursgal\ucontroller.py", line 512, in engine_sanity_check
    '''.format( short_engine )
AssertionError:
      The engine name "get_ftp_files_1_0_0" you have specified was not found.
      Make sure that you spelled it correctly!

I get similar output from other example scripts, usually things in the misc engine don't pass the sanity check. However I do seem to have all of these components installed, they come up as available when I run install_resources.py (ursgal.UController() doesn't show the full list of what's available anymore?), and the files are present in the correct folders. Nosetests doesn't show any issues.

An example of something I can get to work is target_decoy_generation_example.py. This seems to have the same problem, involving generate_target_decoy_1_0_0 not passing engine_sanity_check, but I was able to run generate_target_decoy_1_0_0 to generate a decoy fasta file just fine.

Delete obsolete branches

The upapa dev branches upapa_v3 and upapa2 can be deleted. Ok for everyone?

Could not load RT lookup dict from this location:

Windows 8. Did run into the following problem:
Could not load RT lookup dict from this location:
D:\projects\p2069\dataSearchResults\mgf\GRAY_HD_ursgal_lookup.pkl

Indeed there is no pkl file at this location.

Please find below the complete console output.

C:\Users\wolski\AppData\Local\Programs\Python\Python35\python.exe "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev\pydevd.py" --multiproc --qt-support --client 127.0.0.1 --port 64812 --file C:/Users/wolski/prog/ursgal/example_scripts/do_it_all_folder_wide.py
pydev debugger: process 5004 is connecting

Connected to pydev debugger (build 163.9735.8)

    -\-     UController initialized (Thu Dec 15 11:52:37 2016)     -/-
     -\-    Ursgal v0.4.0  -  https://github.com/ursgal/ursgal    -/-

[ ucontrol ] Initializing profile QExactive+
[ ucontrol ] 4 parameters have been updated
[ WARNING! ] Engine msamanda_1_0_0_5242 is not available in C:\Users\wolski\prog\ursgal\ursgal\resources\win32\64bit\msamanda_1_0_0_5242
[ WARNING! ] Engine msamanda_1_0_0_6299 is not available in C:\Users\wolski\prog\ursgal\ursgal\resources\win32\64bit\msamanda_1_0_0_6299
[ WARNING! ] Engine msamanda_1_0_0_7503 is not available in C:\Users\wolski\prog\ursgal\ursgal\resources\win32\64bit\msamanda_1_0_0_7503
[ WARNING! ] Engine novor_1_1beta is not available in C:\Users\wolski\prog\ursgal\ursgal\resources\win32\64bit\novor_1_1beta
[ WARNING! ] Engine pepnovo_3_1 is not available in C:\Users\wolski\prog\ursgal\ursgal\resources\win32\64bit\pepnovo_3_1

CONVERTER(s):
  0 : add_estimated_fdr_1_0_0      [         available         ] 
  1 : filter_csv_1_0_0             [         available         ] 
  2 : generate_target_decoy_1_0_0  [         available         ] 
  3 : merge_csvs_1_0_0             [         available         ] 
  4 : msgfplus2csv_v2016_09_16     [         available         ] 
  5 : mzidentml_lib_1_6_10         [         available         ] 
  6 : mzidentml_lib_1_6_11         [         available         ] 
  7 : mzml2mgf_1_0_0               [         available         ] 
  8 : sanitize_csv_1_0_0           [         available         ] 
  9 : xtandem2csv_1_0_0            [         available         ] 
DENOVO_ENGINE(s):
 10 : novor_1_1beta                [       cant find exe       ] 
 11 : pepnovo_3_1                  [       cant find exe       ] 
FETCHER(s):
 12 : get_http_files_1_0_0         [         available         ] 
META_ENGINE(s):
 13 : combine_FDR_0_1              [         available         ] 
 14 : combine_pep_1_0_0            [         available         ] 
 15 : naive_bayes_1_0_0            [         available         ] 
SEARCH_ENGINE(s):
 16 : msamanda_1_0_0_5242          [       cant find exe       ] 
 17 : msamanda_1_0_0_5243          [            n/d            ] 
 18 : msamanda_1_0_0_6299          [       cant find exe       ] 
 19 : msamanda_1_0_0_6300          [            n/d            ] 
 20 : msamanda_1_0_0_7503          [       cant find exe       ] 
 21 : msamanda_1_0_0_7504          [            n/d            ] 
 22 : msgfplus_v2016_09_16         [         available         ] 
 23 : msgfplus_v9979               [         available         ] 
 24 : myrimatch_2_1_138            [            n/d            ] 
 25 : myrimatch_2_2_140            [         available         ] 
 26 : omssa_2_1_9                  [         available         ] 
 27 : xtandem_cyclone_2010         [         available         ] 
 28 : xtandem_jackhammer           [         available         ] 
 29 : xtandem_piledriver           [         available         ] 
 30 : xtandem_sledgehammer         [         available         ] 
 31 : xtandem_vengeance            [         available         ] 
VALIDATION_ENGINE(s):
 32 : percolator_2_08              [         available         ] 
 33 : qvality_2_02                 [         available         ] 
VISUALIZER(s):
 34 : venndiagram_1_0_0            [         available         ]

[ ucontrol ] Initializing profile QExactive+
[ ucontrol ] 4 parameters have been updated
[ ucontrol ] Preparing unode run for engine xtandem_vengeance on file(s) D:/projects/p2069/dataSearchResults/mgf/GRAY_HD\20160704_03_C_01.mgf
[ ucontrol ] Setting self.io["input"]
[ ucontrol ] Generated engine xtandem_vengeance output file name: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.xml.gz
[ ucontrol ] search_mgf() scheduled on input file 20160704_03_C_01.mgf
[ ucontrol ] Reason for run: Never executed before. No out_json 20160704_03_C_01_xtandem_vengeance.xml.gz.u.json found.
[ ucontrol ] Preparing json dump
[ ucontrol ] Json dumped. Path: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.xml.gz.u.json

    -\-     xtandem_vengeance run initialized with 20160704_03_C_01.mgf (Thu Dec 15 11:52:37 2016)     -/-

[ xtandem_ ] Will compress output 20160704_03_C_01_xtandem_vengeance.xml on the fly ... renamed temporarily params["output_file"]
[ xtandem_ ] Preparing engine
[ xtandem_ ] Starting engine
[ ]
[ Please ]
[ cite: ] Craig R, Beavis RC. (2004) TANDEM: matching proteins with tandem mass spectra.
[ ----- ]
[ ]
[ PREFLGHT ] Executing preflight sequence ...
[ xtandem_ ] wrote input file taxonomy.xml
[ xtandem_ ] wrote input file default_input.xml
[ xtandem_ ] wrote input file input.xml
[ prefligh ] Execution time 0.00 seconds
[ eXecutio ] Executing command list ...

X! TANDEM Vengeance (2015.12.15.2)

Loading spectra| (mgf)............... loaded.
Spectra matching criteria = 27889
Starting threads .|.|.|.|.|.|.|.|.|.|. started.
Computing models:
S
waiting for 234|5|6|7|8|9|10|11| done.

sequences modelled = 41 ks
Model refinement:

waiting for 2|3|4|5|6|7|8|9|10|11| done.

Merging results:
from 2..3..4..5..6..7..8..9..10..11..

Creating report:
initial calculations ..... done.
sorting ..... done.
finding repeats ..... done.
evaluating results ..... done.
calculating expectations ..... done.
writing results ..... done.

Valid models = 12517

[ executio ] Execution time 1.63 minutes
[ POSTFLGH ] Executing postflight sequence ...
[ postflig ] Execution time 0.00 seconds
[ xtandem_ ] Compressing output 20160704_03_C_01_xtandem_vengeance.xml > 20160704_03_C_01_xtandem_vengeance.xml.gz
[ run ] Execution time 2.20 minutes
[ ucontrol ] Preparing json dump
[ ucontrol ] Calculating md5 for 20160704_03_C_01_xtandem_vengeance.xml.gz ....
[ ucontrol ] Json dumped. Path: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.xml.gz.u.json
[ ucontrol ] Initializing profile QExactive+
[ ucontrol ] 4 parameters have been updated
[ ucontrol ] Preparing unode run for engine xtandem2csv_1_0_0 on file(s) D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.xml.gz
[ ucontrol ] Setting self.io["input"]
[ ucontrol ] Generated engine xtandem2csv_1_0_0 output file name: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.csv
[ ucontrol ] execute_unode() scheduled on input file 20160704_03_C_01_xtandem_vengeance.xml.gz
[ ucontrol ] Reason for run: Never executed before. No out_json 20160704_03_C_01_xtandem_vengeance.csv.u.json found.
[ ucontrol ] Preparing json dump
[ ucontrol ] Json dumped. Path: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.csv.u.json

    -\-     xtandem2csv_1_0_0 run initialized with 20160704_03_C_01_xtandem_vengeance.xml.gz (Thu Dec 15 11:54:49 2016)     -/-

[ xtandem2 ] Preparing engine
[ xtandem2 ] Starting engine
[ PREFLGHT ] Executing preflight sequence ...
[ prefligh ] Execution time 0.00 seconds
[ -ENGINE- ] Executing conversion ..
Converting XTandem XML into CSV: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.xml.gz
[ executio ] Execution time 1.63 minutes
[ POSTFLGH ] Executing postflight sequence ...
[ postflig ] Execution time 0.00 seconds
[ run ] Execution time 2.20 minutes
[ ucontrol ] Preparing json dump
[ ucontrol ] Calculating md5 for 20160704_03_C_01_xtandem_vengeance.csv ....
[ ucontrol ] Json dumped. Path: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.csv.u.json
[ ucontrol ] Initializing profile QExactive+
[ ucontrol ] 4 parameters have been updated
[ ucontrol ] Preparing unode run for engine unify_csv_1_0_0 on file(s) D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance.csv
[ ucontrol ] Setting self.io["input"]
[ ucontrol ] Generated engine unify_csv_1_0_0 output file name: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance_unified.csv
[ ucontrol ] execute_unode() scheduled on input file 20160704_03_C_01_xtandem_vengeance.csv
[ ucontrol ] Reason for run: Never executed before. No out_json 20160704_03_C_01_xtandem_vengeance_unified.csv.u.json found.
[ ucontrol ] Preparing json dump
[ ucontrol ] Json dumped. Path: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD\xtandem_vengeance\20160704_03_C_01_xtandem_vengeance_unified.csv.u.json

    -\-     unify_csv_1_0_0 run initialized with 20160704_03_C_01_xtandem_vengeance.csv (Thu Dec 15 11:55:17 2016)     -/-

[ unify_cs ] Preparing engine
[ unify_cs ] Starting engine
[ ]
[ Please ]
[ cite: ] Kremer, L. P. M., Leufken, J., Oyunchimeg, P., Schulze, S. & Fufezan, C. (2016) Ursgal, Universal Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis. J. Proteome res. 15, 788-794.
[ ----- ]
[ ]
[ PREFLGHT ] Executing preflight sequence ...
[ prefligh ] Execution time 0.00 seconds
[ -ENGINE- ] Executing conversion ..
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev\pydevd.py", line 1596, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/wolski/prog/ursgal/example_scripts/do_it_all_folder_wide.py", line 125, in
target_decoy_database = "D:/projects/p2069/dataSearchResults/fasta/p2069_db1_d_20160322.fasta"
File "C:/Users/wolski/prog/ursgal/example_scripts/do_it_all_folder_wide.py", line 97, in main
engine = search_engine,
File "C:\Users\wolski\prog\ursgal\ursgal\ucontroller.py", line 1843, in search
force = force,
File "C:\Users\wolski\prog\ursgal\ursgal\ucontroller.py", line 2150, in unify_csv
output_file_name = output_file_name
File "C:\Users\wolski\prog\ursgal\ursgal\ucontroller.py", line 2744, in execute_unode
force, engine_name, answer
File "C:\Users\wolski\prog\ursgal\ursgal\ucontroller.py", line 2068, in run_unode_if_required
json_path = json_path,
File "C:\Users\wolski\prog\ursgal\ursgal\unode.py", line 1243, in run
report['execution'] = self._execute()
File "C:\Users\wolski\prog\ursgal\ursgal\wrappers\unify_csv_1_0_0.py", line 68, in _execute
""".format( scan_rt_lookup_path )
AssertionError:
Could not load RT lookup dict from this location: D:\projects\p2069\dataSearchResults\mgf\GRAY_HD_ursgal_lookup.pkl

pip does not install resources

Hello,

I'm using Python 3.8.5 in Ubuntu 20.04.

The paths to the downloaded zip files when running UController.download_resources don't seem to be resolving properly. At some point it is using a relative path, but it is either being improperly defined or there is a chdir missing. When running, it fails on the first downloaded resource:

>>> ursgal.UController().download_resources()
Executable for msamanda_2_0_0_9706 is not available on your system
Engine msamanda_2_0_0_9706 cannot be downloaded automatically, please download the engine manually and move it to the appropriate folder
Executable for omssa_2_1_9 is not available on your system
[ - HTTP - ] Downloading files from https://www.sas.upenn.edu/~sschulze/ursgal_resources/ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.md ...
[ - HTTP - ] Saved file as /Data/Development/ursgal_test/venv/lib/python3.8/site-packages/ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.md
Remove tmp file: /Data/Development/ursgal_test/venv/lib/python3.8/site-packages/ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.md
MD5 check successful! The executables will be downloaded from https://www.sas.upenn.edu/~sschulze/ursgal_resources/ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.zip
[ - HTTP - ] Downloading files from https://www.sas.upenn.edu/~sschulze/ursgal_resources/ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.zip ...
[ - HTTP - ] Saved file as /Data/Development/ursgal_test/venv/lib/python3.8/site-packages/ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.zip
[   md5    ] Calculating md5 for omssa_2_1_9.zip ....
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Data/Development/ursgal_test/venv/lib/python3.8/site-packages/ursgal/ucontroller.py", line 2791, in download_resources
    calculated_zip_md5 = self.calc_md5( zip_file_name )
  File "/Data/Development/ursgal_test/venv/lib/python3.8/site-packages/ursgal/unode.py", line 205, in calc_md5
    with open(input_file, mode='rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.zip'

Note that the file does indeed exist when I manually look for it. When comparing the file it downloaded with the file it is looking for, the only way it would ever find it is if the working directory is the python environment's 'site-packages' directory (because the path used is relative):

[ - HTTP - ] Saved file as /Data/Development/ursgal_test/venv/lib/python3.8/site-packages/ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.zip

FileNotFoundError: [Errno 2] No such file or directory: 'ursgal/resources/linux/64bit/omssa_2_1_9/omssa_2_1_9.zip'

If I chdir into the environment's site-packages directory before calling UController().download_resources() then everything runs fine.

I'd also mention that the download_resources function doesn't seem to be mentioned in the Introduction section of the documentation. I see that it gets used to automatically download things when using setup.py, but when installing via pip it doesn't get invoked so you have to do it manually, which took a bit of digging to figure out as there is no mention that you need to do it yourself.

Best wishes,
Kevin

Error in _group_psms with bigger_scores_better param

When setting the param 'bigger_scores_better' and not setting 'validation_score_field' we get this error

bigger_scores_better = self.UNODE_UPARAMS['bigger_scores_better']['uvalue_style_translation'][search_engine]
UnboundLocalError: local variable 'search_engine' referenced before assignment

This is due to the fact, that search_engine is only set, when validation_score_field is None
I guess we should require, that both of these parameters need to be set

ursgal / ursgal Goto Github PK

ursgal's Introduction

Introduction

Summary

Abstract

Documentation

Download and Installation

Installation via pip

Installation from source

Tests

Update to v0.6.0 Warning

Questions and Participation

Disclaimer

Copyrights

Contact

Citation

ursgal's People

Contributors

Stargazers

Watchers

Forkers

ursgal's Issues

first line contains normalized weights, second line the raw weights

Open

Ignored or Blocked

Detected dependencies

............................................................................................................................................. .

FAIL: calc_md5_test.check_md5_test({'input': 'tests\data\test.json', 'output': '379450895e2c116886b2e92dfcd68b2b'},)

====================================================================== FAIL: calc_md5_test.check_md5_test({'input': 'tests\data\test_without_database.json', 'output': 'deb20d01ff369188a583decf203cf769'},)

[ unify_cs ] Buffering csv file [ unify_cs ] Buffering csv file done

Recommend Projects

Recommend Topics

Recommend Org

.............................................................................................................................................
.

======================================================================
FAIL: calc_md5_test.check_md5_test({'input': 'tests\data\test_without_database.json', 'output': 'deb20d01ff369188a583decf203cf769'},)

[ unify_cs ] Buffering csv file
[ unify_cs ] Buffering csv file done