pyprophet / pyprophet Goto Github PK

View Code? Open in Web Editor NEW

29.0 5.0 21.0 46.47 MB

PyProphet: Semi-supervised learning and scoring of OpenSWATH results.

Home Page: http://www.openswath.org

License: BSD 3-Clause "New" or "Revised" License

Python 96.02% Shell 0.59% R 0.31% Dockerfile 0.11% Cython 2.97%

proteomics swath-ms openswath mass-spectrometry data-independent-acquisition semi-supervised-learning python

pyprophet's Introduction

PyProphet

PyProphet: Semi-supervised learning and scoring of OpenSWATH results.

PyProphet is a Python re-implementation of the mProphet algorithm [1] optimized for SWATH-MS data acquired by data-independent acquisition (DIA). The algorithm was originally published in [2] and has since been extended to support new data types and analysis modes [3,4].

Please consult the OpenSWATH website for usage instructions and help.

Reiter L, Rinner O, Picotti P, Hüttenhain R, Beck M, Brusniak MY, Hengartner MO, Aebersold R. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat Methods. 2011 May;8(5):430-5. doi: 10.1038/nmeth.1584. Epub 2011 Mar 20.
Teleman J, Röst HL, Rosenberger G, Schmitt U, Malmström L, Malmström J, Levander F. DIANA--algorithmic improvements for analysis of data-independent acquisition MS data. Bioinformatics. 2015 Feb 15;31(4):555-62. doi: 10.1093/bioinformatics/btu686. Epub 2014 Oct 27.
Rosenberger G, Liu Y, Röst HL, Ludwig C, Buil A, Bensimon A, Soste M, Spector TD, Dermitzakis ET, Collins BC, Malmström L, Aebersold R. Inference and quantification of peptidoforms in large sample cohorts by SWATH-MS. Nat Biotechnol 2017 Aug;35(8):781-788. doi: 10.1038/nbt.3908. Epub 2017 Jun 12.
Rosenberger G, Bludau I, Schmitt U, Heusel M, Hunter CL, Liu Y, MacCoss MJ, MacLean BX, Nesvizhskii AI, Pedrioli PGA, Reiter L, Röst HL, Tate S, Ting YS, Collins BC, Aebersold R. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses. Nat Methods. 2017 Sep;14(9):921-927. doi: 10.1038/nmeth.4398. Epub 2017 Aug 21.

Installation

We strongly advice to install PyProphet in a Python virtualenv. PyProphet is compatible with Python 3.

Install the development version of pyprophet from GitHub:

    $ pip install git+https://github.com/PyProphet/pyprophet.git@master

Install the stable version of pyprophet from the Python Package Index (PyPI):

    $ pip install pyprophet

Running pyprophet

pyprophet is not only a Python package, but also a command line tool:

   $ pyprophet --help

or:

   $ pyprophet score --in=tests/test_data.txt

Docker

PyProphet is also available from Docker (automated builds):

Pull the stable version (e.g. 2.1.2) of pyprophet from DockerHub (synced with releases):

    $ docker pull pyprophet/pyprophet:2.1.2

Running tests

The pyprophet tests are best executed using py.test and the pytest-regtest plugin:

    $ pip install pytest
    $ pip install pytest-regtest
    $ py.test ./tests

pyprophet's People

Contributors

Stargazers

Watchers

Forkers

hroest grosenberger inambioinfo guoci bretttully cmri-procan abelew oliveralka marcos914 llyx chunxiaojiu hanhongyuan singjc gozzdzik hcji yachliu jcharkow shubham1637 roestlab

pyprophet's Issues

error of using pyprophet-cli

Hi,
I have installed pyprophet (version 2.1.10) and pyprophet-cli (version 0.0.19). However, an error occurred when I used pyprophet-cli.

The BUG is shown as follows:

Traceback (most recent call last):
File "C:\Miniconda3\Scripts\pyprophet-cli-script.py", line 11, in
load_entry_point('pyprophet-cli==0.0.19', 'console_scripts', 'pyprophet-cli')()
File "C:\Miniconda3\lib\site-packages\pkg_resources_init_.py", line 484, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "C:\Miniconda3\lib\site-packages\pkg_resources_init_.py", line 2707, in load_entry_point
return ep.load()
File "C:\Miniconda3\lib\site-packages\pkg_resources_init_.py", line 2325, in load
return self.resolve()
File "C:\Miniconda3\lib\site-packages\pkg_resources_init_.py", line 2331, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "C:\Miniconda3\lib\site-packages\pyprophet_cli-0.0.19-py3.7.egg\pyprophet_cli\main.py", line 17, in
from . import score
File "C:\Miniconda3\lib\site-packages\pyprophet_cli-0.0.19-py3.7.egg\pyprophet_cli\score.py", line 16, in
from pyprophet.stats import (calculate_final_statistics, summary_err_table,
ImportError: cannot import name 'calculate_final_statistics' from 'pyprophet.stats' (C:\Miniconda3\lib\site-packages\pyprophet-2.1.10-py3.7-win-amd64.egg\pyprophet\stats.py)

How can I fix this BUG?

I found that pyprophet-cli hasn't been updated for a long time. How to use the current pyprophet?

Thanks!

pip install fails claiming that numpy is not installed

Windows Server
Both

pip install pyprophet and pip install git+https://github.com/PyProphet/pyprophet.git@master fails on numpy even though numpy is installed:

PS C:\Windows\system32> pip install git+https://github.com/PyProphet/pyprophet.git@master
Collecting git+https://github.com/PyProphet/pyprophet.git@master
  Cloning https://github.com/PyProphet/pyprophet.git (to revision master) to c:\users\ketilkl-null\appdata\local\temp\pip-req-build-rgg9yyv9
  Running command git clone --filter=blob:none --quiet https://github.com/PyProphet/pyprophet.git 'C:\Users\ketilkl-null\AppData\Local\Temp\pip-req-build-rgg9yyv9'
  Resolved https://github.com/PyProphet/pyprophet.git to commit 31eecfe066b41533b707658a4e62edcab33866e8
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Traceback (most recent call last):
        File "C:\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "C:\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\ketilkl-null\AppData\Local\Temp\pip-build-env-46tms50f\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\ketilkl-null\AppData\Local\Temp\pip-build-env-46tms50f\overlay\Lib\site-packages\setuptools\build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "C:\Users\ketilkl-null\AppData\Local\Temp\pip-build-env-46tms50f\overlay\Lib\site-packages\setuptools\build_meta.py", line 487, in run_setup
          super().run_setup(setup_script=setup_script)
        File "C:\Users\ketilkl-null\AppData\Local\Temp\pip-build-env-46tms50f\overlay\Lib\site-packages\setuptools\build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 2, in <module>
      ModuleNotFoundError: No module named 'numpy'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

pyprophet export ends w/o any output

Using the version from dockerhub

$~ pyprophet --version
pyprophet, version 2.1.3

The osw file is ~160M and produced by merging several osw, scoring ms1 ms2, doing peptide and protein levels in all 3 contexts. Then,

$~ pyprophet export --in=/scratch/runs_merged.osw --out=/scratch/whatever
Info: Reading peak group-level results.
Info: Reading transition-level results.
Info: Reading protein identifiers.
Info: Reading peptide-level results.
Info: Reading protein-level results.

results in no new files. Same pipeline works well with SGS. Dunno if it is related to #49
How can I check what is going on? Is there a debug or verbose flag? How could I check if my merged osw doesnt just contain garbage (I imagine something like COUNT ... WHERE SCORE_MS2.QVALUE < 0.05 == 0)?

AssertionError: Column group_id is not in input file(s).

Hello,

I encountered an error when going from OpenSwathWorkflow into pyprophet. As some background, I have one .mzML file that I aligned with a library with the .TraML extension, not a .pqp file. The library is the one from SWATHAtlas (SAL00026). Therefore, I cannot output a .osw from OpenSwathWorkflow, so my input to pyprophet is a TSV file. Pyprophet did not throw an error at this, so I assume it is okay. After pyprophet was initiated, I got the error "AssertionError: Column group_id is not in input file(s)." I opened the TSV file that is output by OpenSwathWorkflow, and indeed the column "group_id" is not available, though there are a few others such as "id" or "transition_group_id." I am not sure if one of these is what is supposed to be read as the group id. I am using openms version 2.4.0.

I have attached the error report for the pyprophet section of my code, and the full error report from the start of OpenSwathWorkflow through pyprophet. (I ran them as one script and simply copy/pasted the pyprophet section to make it easier to find.)

Thank you for your consideration.

Regards,
Andrew Sweatt

error_report_full.txt
pyprophet_error_report.txt

SVD does not converge

Hi,

I'm getting a couple of problems when trying to run pyprophet on my data:

My target - decoy distribution is quite uneven, which is maybe the source of the problem:

Info: Data set contains 262 decoy and 2601 target groups.

When running pyprophet I often run into a SVD convergence problem:

File "/anaconda2/lib/python2.7/site-packages/sklearn/discriminant_analysis.py", line 384, in _solve_svd
    U, S, V = linalg.svd(X, full_matrices=False)
  File "/anaconda2/lib/python2.7/site-packages/scipy/linalg/decomp_svd.py", line 132, in svd
    raise LinAlgError("SVD did not converge")
numpy.linalg.linalg.LinAlgError: SVD did not converge

or a segmentation fault:

Info: Start learning on 10 folds using 1 processes.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Segmentation fault: 11

However after trying a couple of times it also works sometimes..

Question about calculating scores based the LDA scalings_

Dear @grosenberger , @uweschmitt , @hroest ,

After LDA fitted the train data, we need to score the test data using the LDA model params. As I know, there are two methods to calculate the scores:

"LinearDiscriminantAnalysis().transform()". This function transforms the features to the new small subspace. In fact, it scores like this: np.dot(X - lda.xbar, lda.scalings)
LinearDiscriminantAnalysis().predict(). In detail, this function determines the classification based on: np.dot(X, lda.coef.T) + lda.intercept

reference can be found here

But in the file of 'classifiers.py', the function of 'score()' is just 'clf_scores = np.dot(X, lda.scalings)' . Incomprehensibly, in the function of 'start_semi_supervised_learning', the clf_scores -= np.mean(clf_scores) (the mean of clf_scores is not alway zero?), but in the function of 'iter_semi_supervised_learning', the clf_scores does not minus the mean of itself.

In conclusion, I doubt the score formulation's correctness used by pyprophet. Is it should like this np.dot(X - lda.xbar, lda.scalings) instend of np.dot(X, lda.scalings)? Maybe these two methods don't make much difference to the final result in the end.

Thanks.

Error message: Missing SCORE_MS2 table

When performing transition level scoring (with IPF) and the SCORE_MS2 table is missing, the error message is not informative. It just throws an error saying that. It would be better to inform the user which command to run first and what the issue is.

Remove m_score filtering?

Hi,

Is it possible to remove the m_score filtering. The output seems to be filtered at m_score > 0.05. I have tried increasing the values of ss_initial_fdr and ss_iteration_fdr without success.

Thanks,
Patrick

pi0 in legacy code

Dear @uweschmitt ,

I found the calculation of FDR in the legacy and the master code is the same except the pi0 estimation. In legacy, the pi0 is fixed to 0.4 in default, in master, pi0 is estimated by smoother or bootstrap method. Am I right?

Thanks.

create_table_fmt is incorrect in backpropagate_osw

create_table_fmt always includes PEPTIDE_ID, this causes the protein table to be ill formed. (Introduced in PR #40)

Note: Installing xgboost on MacOS Catalina 10.15 (pyprophet)

I ran into a pyprophet segmentation fault error after upgrading to MacOS Catalina 10.15 .

This is due to an incompatibility with the current xgboost installation.

Here is a reference with a workaround for this issue dmlc/xgboost#4949

Using the workaround to install xgboost, pyprophet runs without issues.

pyprophet export ignores --out parameter

E.g.

$ time pyprophet export --in ../olgas_K121026_013_SW_Wayne_R3_d00.new.osw   --max_rs_peakgroup_qvalue 1.0 --out test.csv
Info: Reading peak group-level results.
Info: Reading transition-level results.
Info: Reading protein identifiers.

real    0m33.396s
user    0m25.228s
sys     0m5.372s
(py27_bleeding) hr@hr-Precision-5520:~/openmsall/builds/openms_trunk2/t5$ ls -ltrh
total 188M
-rw-rw-r-- 1 hr hr 188M Feb 20 13:31 olgas_K121026_013_SW_Wayne_R3_d00.mzML.gz.tsv

no test.csv file is preduced, instead the file is named after the "run" filename with the full "mzML" and "gz" attached: schubert_chapter/raw/olgas_K121026_013_SW_Wayne_R3_d00.mzML.gz

Error: At least 10 decoy groups and 10 target groups are required

Hi there,

I'm pretty novice in proteomic tools and I try to use OpenSwath on our own data.

I start launching this command "OpenSwathWorkflow "
"-in {input} "
"-tr iRTassays.pqp "
"-tr_irt iRTassays.TraML "
"-mz_extraction_window 10 -ppm "
"-RTNormalization:alignmentMethod linear "
"-Scoring:stop_report_after_feature 5 "
"-swath_windows_file {config[window_file]} "
"-Library:override_group_label_check "
"-sort_swath_maps "
"-out_osw {output} "
" > {log.stdout} 2> {log.stderr}"

The parameter -Library:override_group_label_check was added because of the warning I had : "Warning: Found multiple peptide sequences for peptide label group light. This is most likely an error and to fix this, a new peptide label group will be inferred - to override this decision, please use the override_group_label_check parameter". Even if I add the option, nothing seems to change.

The command works fine it seems, after that I can merge my osw files. But when I try to launch scoring command with pyprophet score I have "Error: At least 10 decoy groups and 10 target groups are required".

I don't understand why I have this kind of issue, Do I add some other parameters in previous steps ?

Thanks for your help.

Loïc.

Transition group ids do not form unique blocks in data file

Hello,
I have a set of runs all processed trough OpenMS 2.1.0 with the same parameters but when I try to use pyprophet 0.24.1 it only works on some and the others give me the error "Exception: transition group ids do not form unique blocks in data file"

What can be the problem?

Thanks

singularity pull issue

Hi,

I want to use it in the singularity containers but when I use the demand like:

singularity pull docker://pyprophet/pyprophet:2.1.3

It is not working. Do you know the correct code like using executables or others?

_optimize.pyx missing from tarball on pypi

The automated update on bioconda of pyprophet (bioconda/bioconda-recipes#26899 ) is throwing an error:

11:32:42 BIOCONDA INFO (OUT) Processing $SRC_DIR
11:32:43 BIOCONDA INFO (OUT)     ERROR: Command errored out with exit status 1:
11:32:43 BIOCONDA INFO (OUT)      command: /opt/conda/conda-bld/pyprophet_1613734077926/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-4kgkzoca/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-4kgkzoca/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-d71avhe_
11:32:43 BIOCONDA INFO (OUT)          cwd: /tmp/pip-req-build-4kgkzoca/
11:32:43 BIOCONDA INFO (OUT)     Complete output (11 lines):
11:32:43 BIOCONDA INFO (OUT)     Traceback (most recent call last):
11:32:43 BIOCONDA INFO (OUT)       File "<string>", line 1, in <module>
11:32:43 BIOCONDA INFO (OUT)       File "/tmp/pip-req-build-4kgkzoca/setup.py", line 56, in <module>
11:32:43 BIOCONDA INFO (OUT)         ext_modules=cythonize(ext_modules, compiler_directives={'language_level' : "2"}),
11:32:43 BIOCONDA INFO (OUT)       File "/opt/conda/conda-bld/pyprophet_1613734077926/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 972, in cythonize
11:32:43 BIOCONDA INFO (OUT)         aliases=aliases)
11:32:43 BIOCONDA INFO (OUT)       File "/opt/conda/conda-bld/pyprophet_1613734077926/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 815, in create_extension_list
11:32:43 BIOCONDA INFO (OUT)         for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
11:32:43 BIOCONDA INFO (OUT)       File "/opt/conda/conda-bld/pyprophet_1613734077926/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 114, in nonempty
11:32:43 BIOCONDA INFO (OUT)         raise ValueError(error_msg)
11:32:43 BIOCONDA INFO (OUT)     ValueError: 'pyprophet/_optimized.pyx' doesn't match any files

Inspecting the package, there is indeed no .pyx file, just the .c file produced by cython. This appears to related to #95 (specifically: https://github.com/PyProphet/pyprophet/pull/91/files#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7R7 )

Assay RT calculation is not proper

pyprophet/pyprophet/export.py

Line 59 in eb048de

FEATURE.EXP_RT - FEATURE.DELTA_RT AS assay_rt,

To get assay_rt, we should be doing exp_rt - lib2expTrafo(delta_rt). Currently, we are subtracting library_space time from experimental_space time. lib2expTrafo function would convert delta_rt from library_space to experimental_space.

sqlite3.OperationalError: no such table: PYPROPHET_WEIGHTS

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pandas-1.2.2-py3.9-linux-x86_64.egg/pandas/io/sql.py", line 1697, in execute
cur.execute(*args, **kwargs)
sqlite3.OperationalError: no such table: PYPROPHET_WEIGHTS

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pyprophet-2.1.10-py3.9-linux-x86_64.egg/pyprophet/runner.py", line 455, in init
data = pd.read_sql_query("SELECT * FROM PYPROPHET_WEIGHTS WHERE LEVEL=='%s'" % self.level, con)
File "/usr/local/lib/python3.9/site-packages/pandas-1.2.2-py3.9-linux-x86_64.egg/pandas/io/sql.py", line 377, in read_sql_query
return pandas_sql.read_query(
File "/usr/local/lib/python3.9/site-packages/pandas-1.2.2-py3.9-linux-x86_64.egg/pandas/io/sql.py", line 1743, in read_query
cursor = self.execute(*args)
File "/usr/local/lib/python3.9/site-packages/pandas-1.2.2-py3.9-linux-x86_64.egg/pandas/io/sql.py", line 1709, in execute
raise ex from exc
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM PYPROPHET_WEIGHTS WHERE LEVEL=='ms1ms2'': no such table: PYPROPHET_WEIGHTS
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pandas-1.2.2-py3.9-linux-x86_64.egg/pandas/io/sql.py", line 1697, in execute
cur.execute(*args, **kwargs)
sqlite3.OperationalError: no such table: PYPROPHET_WEIGHTS

for run in BGS_*.osw
do
run_subsampled=${run}s # generates .osws files
pyprophet subsample --in=$run --out=$run_subsampled --subsample_ratio=0.1
done

pyprophet merge --template=MCB_spikearabi_10percent_MSF_DDA_lib.withDecoys.pqp --out=model.osw *.osws

pyprophet score \
--in=model.osw \
--classifier=XGBoost \
--level=ms1ms2 \
--ss_iteration_fdr=0.01 \
--parametric \
--threads=48

for run in BGS_*.osw
do
pyprophet score --in=$run --apply_weights=model.osw --level=ms1ms2
done

Initialise default config class

Add the following to the constructor of _ConfigHolder in https://github.com/PyProphet/pyprophet/blob/master/pyprophet/config.py

_fix_config_types(self.config)

This will ensure that the CONFIG is ready to be used after construction; currently the delimiters are defaulted to 'tab' and this means running pyprophet from within python (i.e. not on the cmd line) doesn't work.

Mapping d_scores with p-values

Not really an issue, more of a question: I was wondering if there's a way to map d_scores in the [FILE]_with_dscore.csv output to p-values (or q-values)?

sqlite3.OperationalError:database is locked

I have meet an error when I run my code as follows:

Besides, I meet continuous problems when I install pyprophet. For now, the environment has python of 3.9.13, numpy 1.24.3, the installation is through pip install git+https://github.com/PyProphet/pyprophet (with installation warning WARNING: Ignoring invalid distribution -ip (e:\tools\gprodia\gprodia-main\data\result\syn\cleanenv\lib\site-packages)
WARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.) Besides, I have installed visual studio, with MSVC v140, MSVC v143, and Windows 10 SDK, windows 11 SDK, C++ CMake installed.
Is that anything wrong with my installation?

Thank you very much. Looking forward to your reply.

Request: Ability to keep decoys during scoring->export

Hello,
I am playing with processing some data using the OpenSwathWorkflow(openMS) osw outputs to pyprophet(the main branch on github) to feature_alignment.py(msproteomicstools main branch) to SWATH2stats(slightly modified main branch) to MSstats(significantly modified main branch) and comparing the result to what happens when I cast the intensity matrix to an expressionSet and pass it to limma/DESeq2/edgeR.

In a previous iteration of the same process, I used the tsv outputs from openMS etc and was able to explicitly look at my decoy scores from the beginning to the end.

All the pieces are mostly working as expected; but I am noticing that when I get to the export stage in pyprophet I am losing all the decoys in the resulting tsv. Looking more closely at the git repository, I am seeing that there are explicit exclusions of the decoy rows in runner.py, ipf.py, and export.py.
Therefore I am able to see the decoy entries if I tail the data at line 146 of export.py but they are gone after the merge on line 189.

In my own exploration into the score/export process, I have played with removing the portions of the where clauses which explicitly remove the decoys: (lines 167 of export.py, line 101 of runner.py, and a few places in ipf.py). I quickly realized this is a bad idea, as it messes up the enumeration at lines 165-168 of data_handling.py.

My primary question: Is there a specific reason to remove the decoys when scoring/exporting? If so, what is it, and why then keep the decoy columns in the data?
My secondary question: Assuming I can work through the existing logic and parameterize the inclusion/exclusion of the decoys, would that be worth submitting as a pull request?

Thank you for your time.

Metabolomics - Global Context

Hi,
is it possible for Metabolomics to process OpenSwath data with Pyprophet in global context to filter a pan library to a subset library?

Greetings

Swath2stat analysis

Is ti pyprophet able to export datasets compatible with R package Swath2Stat?

I exported a tsv using the following command:

pyprophet export --in=data/merged.osw --out=data/legacy.tsv

The file does not look like the one I can see in the R package. Do I need to use TRIC? Is it possible to use TRIC with one single file?

Py3 support

Just wondering if there is any plans for py3 support? I've just run a 2to3 conversion and many tests pass, but many fail due to str/byte comparison issues.

Cheers

sqlite3.Integrity Error

pyprophet=...../pyprophet
for run in *.osw 
do
$pyprophet score --in=$run --apply_weights=model.osw --level=ms1ms2
done

for run in *.osw 
do
run_reduced=${run}r 
$pyprophet reduce --in=$run --out=$run_reduced
done

$pyprophet merge --template=X_decoys.PQP --out=model_global.osw *.oswr

$pyprophet peptide --context=global --in=model_global.osw

$pyprophet protein --context=global --in=model_global.osw

for run in *.osw 
do
run_reduced=${run}r
$pyprophet backpropagate --in=$run --apply_scores=model_global.osw
done


for run in *.osw 
do
 $pyprophet export --in=$run --max_global_peptide_qvalue=0.05 --max_global_protein_qvalue=0.05  
done

Hi, we used the our routine command lines to run PyProphet but we received the above error for this 6h data. Could you please give us a hint of this error? Is the problem coming from osw results or the PQP library?

ImportError: cannot import name 'factorial'

Traceback (most recent call last):
File "/anaconda3/bin/pyprophet", line 6, in
from pyprophet.main import cli
File "/anaconda3/lib/python3.6/site-packages/pyprophet/main.py", line 6, in
from .runner import PyProphetLearner, PyProphetWeightApplier
File "/anaconda3/lib/python3.6/site-packages/pyprophet/runner.py", line 12, in
from .pyprophet import PyProphet
File "/anaconda3/lib/python3.6/site-packages/pyprophet/pyprophet.py", line 11, in
from .stats import (lookup_values_from_error_table, error_statistics,
File "/home/daishaozheng_beihang/anaconda3/lib/python3.6/site-packages/pyprophet/stats.py", line 14, in
from statsmodels.nonparametric.kde import KDEUnivariate
File "/anaconda3/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py", line 21, in
from statsmodels.sandbox.nonparametric import kernels
File "./anaconda3/lib/python3.6/site-packages/statsmodels/sandbox/nonparametric/kernels.py", line 24, in
from scipy.misc import factorial
ImportError: cannot import name 'factorial'

pyprophet seems not compatible with scipy version 1.3.1. If scipy downgrades to 1.2.1, pyprophet works. The location of factorial in the scipy 1.3.1 now is: from scipy.special import factorial. Do you consider upgrade your code that is compatible with newer version or we'd better setup a conda environment for the scipy 1.2?

merge step fails with `sqlite3.OperationalError: table FEATURE has 3 columns but 9 values were supplied`

I am running the three PASS00779 mzMLs through OpenSwathWorkflow (current OpenMS develop 2.6beta) and postprocess the resulting osw files using pyprophet (version 2.1.5).

Running the second merge step in pyprophet results in

Running: pyprophet merge --template=model.osw --out=model_global.osw C:/Users/bielow/SwathWizardOut/olgas_K121026_007_SW_Wayne_R2_d00_copy.osw C:/Users/bielow/SwathWizardOut/olgas_K121026_013_SW_Wayne_R3_d00_copy.osw C:/Users/bielow/SwathWizardOut/olgas_K121026_001_SW_Wayne_R1_d00_copy.osw

Info: Merged runs of file C:/Users/bielow/SwathWizardOut/olgas_K121026_007_SW_Wayne_R2_d00_copy.osw to model_global.osw.
Info: Merged runs of file C:/Users/bielow/SwathWizardOut/olgas_K121026_013_SW_Wayne_R3_d00_copy.osw to model_global.osw.
Info: Merged runs of file C:/Users/bielow/SwathWizardOut/olgas_K121026_001_SW_Wayne_R1_d00_copy.osw to model_global.osw.



Traceback (most recent call last):
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\WinPython\3.7\python-3.7.0.amd64\Scripts\pyprophet.exe\__main__.py", line 9, in <module>
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\click\core.py", line 1092, in invoke
    rv.append(sub_ctx.command.invoke(sub_ctx))
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\pyprophet\main.py", line 271, in merge
    merge_osw(infiles, outfile, templatefile, same_run)
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\pyprophet\levels_contexts.py", line 494, in merge_osw
    merge_oswr(infiles, outfile, templatefile, same_run)
  File "C:\WinPython\3.7\python-3.7.0.amd64\lib\site-packages\pyprophet\levels_contexts.py", line 735, in merge_oswr
    c.executescript('ATTACH DATABASE "%s" AS sdb; INSERT INTO FEATURE SELECT * FROM sdb.FEATURE; DETACH DATABASE sdb;' % infile)
sqlite3.OperationalError: table FEATURE has 3 columns but 9 values were supplied

Not sure why this happens.
This is how I got there (this is c++ code, but the calls should be recognizable). The last call fails (see above).
osws is a list of 3 input files in osw format.

QString pp = "pyprophet";
// list of calls to make: exe, args, [optional] list of args to append one-by-one in a loop
std::vector<std::tuple<QString, QStringList, QStringList>> calls;
// merge all osws ...
calls.emplace_back(pp, QStringList() << "merge" << "--template=" + library << "--out=model.osw" << osws, QStringList());
// to build a common model --> creates merged_ms1ms2_report.pdf
calls.emplace_back(pp, QStringList() << "score" << "--in=model.osw" << "--level=ms1ms2", QStringList());
// apply in loop
calls.emplace_back(pp, QStringList() << "score" << "--apply_weights=model.osw" << "--level=ms1ms2" << "--in", osws);
// merge again for peptide and protein error rate control
calls.emplace_back(pp, QStringList() << "merge" << "--template=model.osw" << "--out=model_global.osw" << osws, QStringList());

Do I need additonal flags (maybe even when running OSW?)??

Decoys peptides are counted in pyprophet statistics

Not sure if this is intended behavior but when displaying the total number of peptides and proteins identified in global context it seems that decoys are included in this count.
If this is intended behavior I was thinking it might be useful to also display the counts with no decoys?

Merging across different libraries

We had an issue when we tried to merge different osw files across runs that were analyzed using different libraries -- the results made no sense since the keys were mapping to the wrong peptides but there was no error message. Maybe pyprophet merge should have a sanity check to prevent this ?

Back propagate fails if 'SCORE_PROTEIN' isn't in apply_scores

Suggested edit -- but haven't got full enough understanding of the code to know if this is worth a PR and there doesn't appear to be any tests that cover this function

def backpropagate_oswr(infile, outfile, apply_scores):
    # store data in table
    if infile != outfile:
        copyfile(infile, outfile)

    # find out what tables exist in the scores
    score_con = sqlite3.connect(apply_scores)
    transition_present = check_sqlite_table(score_con, "SCORE_TRANSITION")
    peptide_present = check_sqlite_table(score_con, "SCORE_PEPTIDE")
    protein_present = check_sqlite_table(score_con, "SCORE_PROTEIN")
    score_con.close()
    if not (transition_present or peptide_present or protein_present):
        raise RuntimeError('You must have at least one score table present')

    # build up the list
    script = list()
    script.append('PRAGMA synchronous = OFF;')
    script.append('DROP TABLE IF EXISTS SCORE_TRANSITION;')
    script.append('DROP TABLE IF EXISTS SCORE_PEPTIDE;')
    script.append('DROP TABLE IF EXISTS SCORE_PROTEIN;')

    # create the tables
    create_table_fmt = 'CREATE TABLE {} (CONTEXT TEXT, RUN_ID INTEGER, PEPTIDE_ID INTEGER, SCORE REAL, PVALUE REAL, QVALUE REAL, PEP REAL);'
    if transition_present:
        script.append(create_table_fmt.format('SCORE_TRANSITION'))
    if peptide_present:
        script.append(create_table_fmt.format('SCORE_PEPTIDE'))
    if protein_present:
        script.append(create_table_fmt.format('SCORE_PROTEIN'))

    # copy across the tables
    script.append(f'ATTACH DATABASE "{apply_scores}" AS sdb;')
    insert_table_fmt = 'INSERT INTO {0}\nSELECT *\nFROM sdb.{0};'
    if transition_present:
        script.append(insert_table_fmt.format('SCORE_TRANSITION'))
    if peptide_present:
        script.append(insert_table_fmt.format('SCORE_PEPTIDE'))
    if protein_present:
        script.append(insert_table_fmt.format('SCORE_PROTEIN'))

    # execute the scriptl
    conn = sqlite3.connect(outfile)
    c = conn.cursor()
    c.executescript('\n'.join(script))
    conn.commit()
    conn.close()

    click.echo("Info: All multi-run data was backpropagated.")

Transition ids are not preserved

When pyprophet exports data to TSV, it changes the transition id as follows:

pyprophet/pyprophet/export.py

Line 245 in 806b35b

    
                 GROUP_CONCAT(TRANSITION.ID || "_" || TRANSITION.TYPE || TRANSITION.ORDINAL || "_" || TRANSITION.CHARGE,';') AS aggr_Fragment_Annotation

Why is that the case? This means the transition ids in the export file in the column aggr_Fragment_Annotation are not matching the transition ids in the mzML / sqMass file any more.

ValueError: Buffer dtype mismatch

Dear All,

I am getting a following error when I am trying to run pyprophet:

File "c:\users\a-isilber\anaconda2\lib\site-packages\pyprophet\data_handling.py", line 279, in rank_by
flags = find_top_ranked(self.df.tg_num_id.values, self.df[score_col_name].values)
File "pyprophet/_optimized.pyx", line 157, in pyprophet._optimized.find_top_ranked (pyprophet/_optimized.c:3729)
ValueError: Buffer dtype mismatch, expected 'DATA_TYPE' but got 'double'

I get the same issue on two systems x64 and x32 windows 10 and 7 with python 2.7 installed through Anaconda
pyprophet version 0.22.0
numpy version 1.11.3
updating numpy through $pip install numpy also does not solve the issue

command which I am using:
pyprophet --ignore.invalid_score_columns --xeval.num_iter=10 --target.dir=PATH\ PATH\file.tsv

I would very appreciate your help!
Thank you in advance,
Ivan

PATH issuse with pyprophet

Greetings,
#cmd prompt was set to directory with .osw files. The following command was run

C:\Users\Shawn\Desktop\DIA2018\Tutorial4_OpenSWATH>pyprophet merge --out=training.osw --subsample_ratio=0.33 *.osw

#The error listed below was produced by the above command.

Usage: pyprophet merge [OPTIONS] [INFILES]...

Error: Invalid value for "infiles": Path "*.osw" does not exist.

Adding the path the command "C:\Users\Shawn\Desktop\DIA2018\Tutorial4_OpenSWATH*.osw" did not work either.

Any suggestions?

Thanks,
Shawn

More Explanation of Installation and Running for non-root User

I have nearly no Python experience, so installing this on a Linux server is unexpectedly challenging. Could you add more explanation to the README file? For example, I used the -t option of pip but I don't see any pyprophet file in that location after installing it, so I don't know what command I'd use for printing the help text.

/savona/nobackup/biostat/software/Python/lib/python2.7/site-packages/pyprophet$ ls
__init__.py    classifiers.py   data_handling.py   main_helpers.py   pyprophet.py   semi_supervised.py   std_logger.py
__init__.pyc   classifiers.pyc  data_handling.pyc  main_helpers.pyc  pyprophet.pyc  semi_supervised.pyc  std_logger.pyc
_optimized.c   config.py        main.py            optimized.py      report.py      stats.py             version.py
_optimized.so  config.pyc       main.pyc           optimized.pyc     report.pyc     stats.pyc            version.pyc

nearly identical pep/qvalue distribution for target and decoy but with bimodal distribution on score

Thank you for making pyprophet, I am really excited to use it for my project

I use the docker option described on the openswath docs, and this is the command I ran

pyprophet peptide --in 20181201_FlMe_SA_diaPASEF_200ng_HeLa_py3.osw

qvalue pvalue svalue pep ... tn fp fn cutoff
0 0.00 0.000011 0.504993 0.000019 ... 23541.547695 0.270487 35210.452305 3.927691
1 0.01 0.027242 0.893162 0.162110 ... 22900.493991 641.324190 7599.506009 1.972629
2 0.02 0.057287 0.928787 0.286734 ... 22193.171023 1348.647159 5065.828977 1.613684
3 0.05 0.154524 0.971287 0.576165 ... 19904.041279 3637.776903 2045.958721 1.056899
4 0.10 0.334348 0.995971 0.843348 ... 15670.652454 7871.165727 292.347546 0.489151
5 0.20 0.758247 1.000000 1.000000 ... 5691.312649 17850.505533 -271.312649 -0.643834
6 0.30 0.999989 1.000000 1.000000 ... 0.270487 23541.547695 -0.270487 -3.230400
7 0.40 NaN NaN NaN ... NaN NaN NaN NaN
8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

================================================================================
qvalue pvalue svalue pep ... tn fp fn cutoff
0 0.00 0.000011 0.504993 0.000019 ... 23541.547695 0.270487 35210.452305 3.927691
1 0.01 0.027242 0.893162 0.162110 ... 22900.493991 641.324190 7599.506009 1.972629
2 0.02 0.057287 0.928787 0.286734 ... 22193.171023 1348.647159 5065.828977 1.613684
3 0.05 0.154524 0.971287 0.576165 ... 19904.041279 3637.776903 2045.958721 1.056899
4 0.10 0.334348 0.995971 0.843348 ... 15670.652454 7871.165727 292.347546 0.489151
5 0.20 0.758247 1.000000 1.000000 ... 5691.312649 17850.505533 -271.312649 -0.643834
6 0.30 0.999989 1.000000 1.000000 ... 0.270487 23541.547695 -0.270487 -3.230400
7 0.40 NaN NaN NaN ... NaN NaN NaN NaN
8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

================================================================================
qvalue pvalue svalue pep ... tn fp fn cutoff
0 0.00 0.000012 0.554719 0.000014 ... 19312.498433 0.228840 33194.501567 3.779354
1 0.01 0.034647 0.888135 0.157204 ... 18643.598963 669.128309 8339.401037 1.885508
2 0.02 0.072979 0.926413 0.278727 ... 17903.301398 1409.425875 5485.698602 1.511289
3 0.05 0.197052 0.969980 0.556820 ... 15507.117223 3805.610050 2237.882777 0.916645
4 0.10 0.428147 0.998476 0.816992 ... 11044.049706 8268.677566 118.950294 0.252429
5 0.20 0.966088 1.000000 1.000000 ... 654.940226 18657.787046 -87.940226 -1.794583
6 0.30 0.999929 1.000000 1.000000 ... 1.373040 19311.354232 -1.373040 -3.144660
7 0.40 NaN NaN NaN ... NaN NaN NaN NaN
8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

================================================================================
qvalue pvalue svalue pep ... tn fp fn cutoff
0 0.00 0.000012 0.511254 0.000019 ... 24148.811503 0.279406 34384.188497 3.915643
1 0.01 0.026391 0.896830 0.166087 ... 23511.765023 637.325886 7258.234977 1.970259
2 0.02 0.055293 0.930276 0.297998 ... 22813.807958 1335.282951 4905.192042 1.626648
3 0.05 0.148895 0.971151 0.578817 ... 20553.410579 3595.680330 2029.589421 1.078977
4 0.10 0.321613 0.993667 0.847170 ... 16382.432573 7766.658336 454.567427 0.519142
5 0.20 0.732350 1.000000 0.996909 ... 6463.507116 17685.583794 -388.507116 -0.576950
6 0.30 0.999931 1.000000 1.000000 ... 1.676438 24147.414471 -1.676438 -3.051916
7 0.40 NaN NaN NaN ... NaN NaN NaN NaN
8 0.50 NaN NaN NaN ... NaN NaN NaN NaN

[9 rows x 12 columns]

Killed

20181201_FlMe_SA_diaPASEF_200ng_HeLa_py3.osw_6514992444606274777_run-specific_peptide.pdf

I am not sure why the distribution look like this?, it is normal for pep/qvalue to look like this

Any help would be appreciated, thank you.

"pyprophet score --threads=-1 ... "raise exception

with version 2.0.2.
on centos 7.4 anaconda2 conda 4.5.11 python 2.7.15

$ pyprophet score --threads=-1 ...

raise exception:

..../pyprophet/data_handling.py", line 30, in transform_threads
value = multiprocessing.cpu_count()
NameError: global name 'multiprocessing' is not defined

Thanks.

Pyprophet protein error

Hi, I am having some problems when attempting to run pyprophet protein. I have progressed through the OpenSwath Worfklow; merging and scoring (at the ms2 level).

I have ran pyprophet peptide (seemingly without an issue) with:

pyprophet peptide \
--in=merged.osw \
--context=run-specific peptide \
--in=merged.osw \
--context=experiment-wide peptide \
--in=merged.osw --context=global

However when I try the same with pyrophet protein:

pyprophet protein \
--in=merged.osw \
--context=run-specific peptide \
--in=merged.osw \
--context=experiment-wide peptide \
--in=merged.osw --context=global

I get the following error:

Traceback (most recent call last):
  File "c:\programdata\anaconda2\lib\runpy.py", line 174, in _run_module_as_main

    "__main__", fname, loader, pkg_name)
  File "c:\programdata\anaconda2\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\ProgramData\Anaconda2\Scripts\pyprophet.exe\__main__.py", line 9, in
<module>
  File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 722, in
__call__
    return self.main(*args, **kwargs)
  File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 697, in
main
    rv = self.invoke(ctx)
  File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 1092, in
 invoke
    rv.append(sub_ctx.command.invoke(sub_ctx))
  File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 895, in
invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 535, in
invoke
    return callback(*args, **kwargs)
  File "c:\programdata\anaconda2\lib\site-packages\pyprophet\main.py", line 172,
 in protein
    infer_proteins(infile, outfile, context, parametric, pfdr, pi0_lambda, pi0_m
ethod, pi0_smooth_df, pi0_smooth_log_pi0, lfdr_truncate, lfdr_monotone, lfdr_tra
nsformation, lfdr_adj, lfdr_eps)
  File "c:\programdata\anaconda2\lib\site-packages\pyprophet\levels_contexts.py"
, line 102, in infer_proteins
    data = data.groupby('run_id').apply(statistics_report, outfile, context, "pr
otein", parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_
pi0, lfdr_truncate, lfdr_monotone, lfdr_transformation, lfdr_adj, lfdr_eps).rese
t_index()
  File "c:\programdata\anaconda2\lib\site-packages\pandas\core\groupby\groupby.p
y", line 930, in apply
    return self._python_apply_general(f)
  File "c:\programdata\anaconda2\lib\site-packages\pandas\core\groupby\groupby.p
y", line 936, in _python_apply_general
    self.axis)
  File "c:\programdata\anaconda2\lib\site-packages\pandas\core\groupby\groupby.p
y", line 2273, in apply
    res = f(group)
  File "c:\programdata\anaconda2\lib\site-packages\pandas\core\groupby\groupby.p
y", line 908, in f
    return func(g, *args, **kwargs)
  File "c:\programdata\anaconda2\lib\site-packages\pyprophet\levels_contexts.py"
, line 16, in statistics_report
    error_stat, pi0 = error_statistics(data[data.decoy==0]['score'], data[data.d
ecoy==1]['score'], parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_
smooth_log_pi0, True, lfdr_truncate, lfdr_monotone, lfdr_transformation, lfdr_ad
j, lfdr_eps)
  File "c:\programdata\anaconda2\lib\site-packages\pyprophet\stats.py", line 465
, in error_statistics
    error_stat['pep'] = lfdr(target_pvalues, pi0['pi0'], lfdr_trunc, lfdr_monoto
ne, lfdr_transf, lfdr_adj, lfdr_eps)
  File "c:\programdata\anaconda2\lib\site-packages\pyprophet\stats.py", line 309
, in lfdr
    y = sp.interpolate.spline(myd.support, myd.density, x)
  File "c:\programdata\anaconda2\lib\site-packages\numpy\lib\utils.py", line 101
, in newfunc
    return func(*args, **kwds)
  File "c:\programdata\anaconda2\lib\site-packages\scipy\interpolate\interpolate
.py", line 2919, in spline
    return spleval(splmake(xk, yk, order=order, kind=kind, conds=conds), xnew)
  File "c:\programdata\anaconda2\lib\site-packages\numpy\lib\utils.py", line 101
, in newfunc
    return func(*args, **kwds)
  File "c:\programdata\anaconda2\lib\site-packages\scipy\interpolate\interpolate
.py", line 2828, in splmake
    coefs = func(xk, yk, order, conds, B)
  File "c:\programdata\anaconda2\lib\site-packages\scipy\interpolate\interpolate
.py", line 2752, in _find_smoothest
    p = scipy.linalg.solve(Q, tmp)
  File "c:\programdata\anaconda2\lib\site-packages\scipy\linalg\basic.py", line
216, in solve
    _solve_check(n, info)
  File "c:\programdata\anaconda2\lib\site-packages\scipy\linalg\basic.py", line
31, in _solve_check
    raise LinAlgError('Matrix is singular.')
numpy.linalg.linalg.LinAlgError: Matrix is singular.

Any ideas what could be causing this?
Thanks in advance,
Sian

Issue Module Import py.test

I stumbled over an issue that the tests running by py.test, would not be able to import modules - I am actually not sure why.
After removing the __init__.py from the tests directory, all modules could be imported without any issues.

I just wanted to report it, if anyone else runs into the same issue.

py.test ./tests

python 3.8
MacOS 15.5.5

before removing:

============================================================================================================================ ERRORS =============================================================================================================================
_________________________________________________________________________________________________________ ERROR collecting tests/test_data_handling.py __________________________________________________________________________________________________________
ImportError while importing test module '/Users/alka/Documents/work/software/pyprophet_test/pyprophet/tests/test_data_handling.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_data_handling.py:4: in <module>
    from pyprophet.data_handling import check_for_unique_blocks
pyprophet/data_handling.py:9: in <module>
    from .optimized import find_top_ranked, rank
pyprophet/optimized.py:1: in <module>
    from ._optimized import *
E   ModuleNotFoundError: No module named 'pyprophet._optimized'
______________________________________________________________________________________________________________ ERROR collecting tests/test_ipf.py _______________________________________________________________________________________________________________
ImportError while importing test module '/Users/alka/Documents/work/software/pyprophet_test/pyprophet/tests/test_ipf.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_ipf.py:4: in <module>
    from pyprophet.ipf import prepare_precursor_bm, prepare_transition_bm, apply_bm, compute_model_fdr
pyprophet/ipf.py:9: in <module>
    from .data_handling import check_sqlite_table
pyprophet/data_handling.py:9: in <module>
    from .optimized import find_top_ranked, rank
pyprophet/optimized.py:1: in <module>
    from ._optimized import *
E   ModuleNotFoundError: No module named 'pyprophet._optimized'
___________________________________________________________________________________________________________ ERROR collecting tests/test_optimized.py ____________________________________________________________________________________________________________
ImportError while importing test module '/Users/alka/Documents/work/software/pyprophet_test/pyprophet/tests/test_optimized.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_optimized.py:2: in <module>
    import pyprophet.optimized as o
pyprophet/optimized.py:1: in <module>
    from ._optimized import *
E   ModuleNotFoundError: No module named 'pyprophet._optimized'
_________________________________________________________________________________________________________ ERROR collecting tests/test_pyprophet_ipf.py __________________________________________________________________________________________________________
ImportError while importing test module '/Users/alka/Documents/work/software/pyprophet_test/pyprophet/tests/test_pyprophet_ipf.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_pyprophet_ipf.py:13: in <module>
    from pyprophet.ipf import read_pyp_peakgroup_precursor
pyprophet/ipf.py:9: in <module>
    from .data_handling import check_sqlite_table
pyprophet/data_handling.py:9: in <module>
    from .optimized import find_top_ranked, rank
pyprophet/optimized.py:1: in <module>
    from ._optimized import *
E   ModuleNotFoundError: No module named 'pyprophet._optimized'
________________________________________________________________________________________________________ ERROR collecting tests/test_pyprophet_score.py _________________________________________________________________________________________________________
ImportError while importing test module '/Users/alka/Documents/work/software/pyprophet_test/pyprophet/tests/test_pyprophet_score.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_pyprophet_score.py:13: in <module>
    from pyprophet.ipf import read_pyp_peakgroup_precursor
pyprophet/ipf.py:9: in <module>
    from .data_handling import check_sqlite_table
pyprophet/data_handling.py:9: in <module>
    from .optimized import find_top_ranked, rank
pyprophet/optimized.py:1: in <module>
    from ._optimized import *
E   ModuleNotFoundError: No module named 'pyprophet._optimized'
_____________________________________________________________________________________________________________ ERROR collecting tests/test_stats.py ______________________________________________________________________________________________________________
ImportError while importing test module '/Users/alka/Documents/work/software/pyprophet_test/pyprophet/tests/test_stats.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_stats.py:4: in <module>
    from pyprophet.stats import to_one_dim_array, pnorm, pemp, pi0est, qvalue, bw_nrd0, lfdr, stat_metrics
pyprophet/stats.py:12: in <module>
    from .optimized import (find_nearest_matches as _find_nearest_matches,
pyprophet/optimized.py:1: in <module>
    from ._optimized import *
E   ModuleNotFoundError: No module named 'pyprophet._optimized'
==================================================================================================================== short test summary info ====================================================================================================================
ERROR tests/test_data_handling.py
ERROR tests/test_ipf.py
ERROR tests/test_optimized.py
ERROR tests/test_pyprophet_ipf.py
ERROR tests/test_pyprophet_score.py
ERROR tests/test_stats.py

after removing:

platform darwin -- Python 3.8.0, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/alka/Documents/work/software/pyprophet_test/pyprophet
plugins: regtest-1.4.4
collected 53 items                                                                                                                                                                                                                                              

tests/test_data_handling.py ..                                                                                                                                                                                                                            [  3%]
tests/test_ipf.py ....                                                                                                                                                                                                                                    [ 11%]
tests/test_optimized.py .....                                                                                                                                                                                                                             [ 20%]
tests/test_pyprophet_export.py ..........                                                                                                                                                                                                                 [ 39%]
tests/test_pyprophet_ipf.py .....                                                                                                                                                                                                                         [ 49%]
tests/test_pyprophet_levels_contexts.py ..                                                                                                                                                                                                                [ 52%]
tests/test_pyprophet_score.py .................                                                                                                                                                                                                           [ 84%]
tests/test_stats.py ........                                                                                                                                                                                                                              [100%]

How can I calculate the sub score?

Hi 👋, I’m yurim in South Korea.
I am very impressed with the pyprophet tool you developed.
Thanks for providing a great tool.

I have a one question about pyprophet.
I am trying to implement mprophet's sub scores by referring to your code, but I couldn't find a code to calculate the sub scores.
The sub score is 8 scores developed by mprophet(intensity score, intensity correlation, coelution score etc).
Can you tell me where you calculated the subscore in your pyprophet code? If it's not in the code, is there a code or formula that you referenced for the calculation method?

Thank you so much.
Have a nice day :)

struct.error: 'i' format requires -2147483648 <= number <= 2147483647

Have an error while use code ：pyprophet score --in=model.osw --classifier=XGBoost --xgb_autotune --level=ms1ms2 --threads=24
try another parameter has the same error：pyprophet score --in=model.osw --level=ms1ms2 --threads=24
root@12c1180257b2:/mnt/data/hela_pd/result2# pyprophet score --in=model.osw --classifier=XGBoost --xgb_autotune --level=ms1ms2 --threads=24
Info: Learn and apply classifier from input data.
Warning: Column var_mi_ratio_score contains only invalid/missing values. Column will be dropped.
Warning: Column var_elution_model_fit_score contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_lag contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_shape contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_log_sn contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_log_diff contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_log_trend contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_rsq contains only invalid/missing values. Column will be dropped.
Info: Data set contains 961149 decoy and 967338 target groups.
Info: Summary of input data:
Info: 9741919 peak groups
Info: 1928487 group ids
Info: 35 scores including main score
Info: Semi-supervised learning of weights:
Info: Start learning on 10 folds using 24 processes.
Traceback (most recent call last):
File "/usr/local/bin/pyprophet", line 10, in
sys.exit(cli())
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1163, in invoke
rv.append(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pyprophet/main.py", line 93, in score
PyProphetLearner(infile, outfile, classifier, xgb_hyperparams, xgb_params, xgb_params_space, xeval_fraction, xeval_num_iter, ss_initial_fdr, ss_iteration_fdr, ss_num_iter, ss_main_score, group_id, parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, lfdr_truncate, lfdr_monotone, lfdr_transformation, lfdr_adj, lfdr_eps, level, ipf_max_peakgroup_rank, ipf_max_peakgroup_pep, ipf_max_transition_isotope_overlap, ipf_min_transition_sn, tric_chromprob, threads, test).run()
File "/usr/local/lib/python3.6/dist-packages/pyprophet/runner.py", line 223, in run
(result, scorer, weights) = self.run_algo()
File "/usr/local/lib/python3.6/dist-packages/pyprophet/runner.py", line 385, in run_algo
(result, scorer, weights) = PyProphet(self.classifier, self.xgb_hyperparams, self.xgb_params, self.xgb_params_space, self.xeval_fraction, self.xeval_num_iter, self.ss_initial_fdr, self.ss_iteration_fdr, self.ss_num_iter, self.group_id, self.parametric, self.pfdr, self.pi0_lambda, self.pi0_method, self.pi0_smooth_df, self.pi0_smooth_log_pi0, self.lfdr_truncate, self.lfdr_monotone, self.lfdr_transformation, self.lfdr_adj, self.lfdr_eps, self.tric_chromprob, self.threads, self.test).learn_and_apply(self.table)
File "/usr/local/lib/python3.6/dist-packages/pyprophet/pyprophet.py", line 255, in learn_and_apply
result, scorer, trained_weights = self._learn_and_apply(table)
File "/usr/local/lib/python3.6/dist-packages/pyprophet/pyprophet.py", line 263, in _learn_and_apply
final_classifier = self._learn(experiment)
File "/usr/local/lib/python3.6/dist-packages/pyprophet/pyprophet.py", line 295, in _learn
res = pool.map(unwrap_self_for_multiprocessing, args)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 288, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
raise self._value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 450, in _handle_tasks
put(task)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

OpenSWATH IPF workflow errored.

Hi,OpenMS devs.

I found a runtime error when I run OpenSWATH IPF workflow as followings.

OpenSwathAssayGenerator.exe -in irt.tsv -out irt.TraML -product_lower_mz_limit 100 -product_mz_threshold 0.05 -swath_windows_file *.txt

OpenSwathAssayGenerator.exe -in *.tsv -out lib.os.tsv -product_lower_mz_limit 100 -product_mz_threshold 0.05 -swath_windows_file *.txt -unimod_file Phos.unimod.xml -enable_ipf

OpenSwathDecoyGenerator.exe -in lib.os.tsv -out lib.os.pqp

thus far everything is ok

OpenSwathWorkflow.exe -readOptions workingInMemory -sort_swath_maps -rt_extraction_window 1200 -use_ms1_traces -enable_uis_scoring -tr_irt irt.TraML -tr lib.os.pqp -threads 48 -swath_windows_file *.txt -out_osw *.osw -in *.mzXML

but in this step get a Application Error (Access Violation 0xc0000005) Offset 0x0000000000013fb0 (in OpenMS.dll)
this error info is retrieved from eventvwr.msc -> Windows Log -> Application

and OpenSwathWorkflow's output(see POSTSCRIPT) seems to be ok , so how to overcome this error?
By the way, I am using openMS 2.5.

Best regards.

--POSTSCRIPT--

Will load iRT transitions and try to find iRT peptides
Progress of 'Load TraML file':

-- done [took 1.61 s (CPU), 0.05 s (Wall)] --
Progress of 'Extract iRT chromatograms':

-- done [took 16.16 s (CPU), 8.27 s (Wall)] --
Progress of 'Retention time normalization':
Will analyse 21 peptides with a total of 126 transitions
WARNING in SignalToNoiseEstimatorMedian: 2.39868% of all Signal-to-Noise estimates are too high, because the median was found in the rightmost histogram-bin. You should consider increasing 'max_intensity' (and maybe 'bin_count' with it, to keep bin width reasonable)
WARNING in SignalToNoiseEstimatorMedian: 2.31596% of all Signal-to-Noise estimates are too high, because the median was found in the rightmost histogram-bin. You should consider increasing 'max_intensity' (and maybe 'bin_count' with it, to keep bin width reasonable)
WARNING in SignalToNoiseEstimatorMedian: 2.48139% of all Signal-to-Noise estimates are too high, because the median was found in the rightmost histogram-bin. You should consider increasing 'max_intensity' (and maybe 'bin_count' with it, to keep bin width reasonable)
WARNING in SignalToNoiseEstimatorMedian: 2.64682% of all Signal-to-Noise estimates are too high, because the median was found in the rightmost histogram-bin. You should consider increasing 'max_intensity' (and maybe 'bin_count' with it, to keep bin width reasonable)
<WARNING in SignalToNoiseEstimatorMedian: 2.39868% of all Signal-to-Noise estimates are too high, because the median was found in the rightmost histogram-bin. You should consider increasing 'max_intensity' (and maybe 'bin_count' with it, to keep bin width reasonable)> occurred 5 times
WARNING in SignalToNoiseEstimatorMedian: 1.65426% of all Signal-to-Noise estimates are too high, because the median was found in the rightmost histogram-bin. You should consider increasing 'max_intensity' (and maybe 'bin_count' with it, to keep bin width reasonable)
rsq: 0.849468 points: 21
rsq: 0.936791 points: 20
rsq: 0.99906 points: 19

-- done [took 1.42 s (CPU), 1.31 s (Wall)] --
Will analyze 672261 transitions in total.
Progress of 'Extracting and scoring transitions':

0.99 % Thread 7_0 will analyze 7 compounds and 796 transitions from SWATH 13 (batch 0 out of 1)
Thread 47_0 will analyze 7 compounds and 803 transitions from SWATH 16 (batch 0 out of 1)
Thread 30_0 will analyze 14 compounds and 790 transitions from SWATH 19 (batch 0 out of 1)
Thread 40_0 will analyze 10 compounds and 425 transitions from SWATH 14 (batch 0 out of 1)
Thread 26_0 will analyze 20 compounds and 1599 transitions from SWATH 35 (batch 0 out of 1)
Thread 14_0 will analyze 10 compounds and 910 transitions from SWATH 23 (batch 0 out of 1)
Thread 28_0 will analyze 6 compounds and 266 transitions from SWATH 10 (batch 0 out of 1)
Thread 25_0 will analyze 21 compounds and 1710 transitions from SWATH 42 (batch 0 out of 1)
Thread 2_0 will analyze 14 compounds and 3024 transitions from SWATH 22 (batch 0 out of 1)
Thread 16_0 will analyze 5 compounds and 510 transitions from SWATH 15 (batch 0 out of 1)
Thread 31_0 will analyze 8 compounds and 353 transitions from SWATH 25 (batch 0 out of 1)
Thread 44_0 will analyze 5 compounds and 89 transitions from SWATH 12 (batch 0 out of 1)
Thread 18_0 will analyze 8 compounds and 746 transitions from SWATH 20 (batch 0 out of 1)
Thread 39_0 will analyze 11 compounds and 966 transitions from SWATH 33 (batch 0 out of 1)
Thread 20_0 will analyze 2 compounds and 161 transitions from SWATH 3 (batch 0 out of 1)
Thread 36_0 will analyze 20 compounds and 3116 transitions from SWATH 37 (batch 0 out of 1)
Thread 17_0 will analyze 3 compounds and 190 transitions from SWATH 8 (batch 0 out of 1)
Thread 24_0 will analyze 2 compounds and 342 transitions from SWATH 5 (batch 0 out of 1)
Thread 3_0 will analyze 14 compounds and 640 transitions from SWATH 26 (batch 0 out of 1)
Thread 46_0 will analyze 28 compounds and 2467 transitions from SWATH 47 (batch 0 out of 1)
Thread 12_0 will analyze 18 compounds and 1234 transitions from SWATH 32 (batch 0 out of 1)
Thread 11_0 will analyze 5 compounds and 177 transitions from SWATH 11 (batch 0 out of 1)
Thread 29_0 will analyze 16 compounds and 2039 transitions from SWATH 31 (batch 0 out of 1)
Thread 0_0 will analyze 33 compounds and 3835 transitions from SWATH 38 (batch 0 out of 1)
Thread 41_0 will analyze 14 compounds and 3012 transitions from SWATH 28 (batch 0 out of 1)
Thread 42_0 will analyze 27 compounds and 3880 transitions from SWATH 40 (batch 0 out of 1)
Thread 10_0 will analyze 18 compounds and 1993 transitions from SWATH 43 (batch 0 out of 1)
Thread 37_0 will analyze 3 compounds and 18 transitions from SWATH 2 (batch 0 out of 1)
Thread 22_0 will analyze 4 compounds and 102 transitions from SWATH 7 (batch 0 out of 1)
Thread 6_0 will analyze 10 compounds and 601 transitions from SWATH 30 (batch 0 out of 1)
Thread 35_0 will analyze 15 compounds and 591 transitions from SWATH 24 (batch 0 out of 1)
Thread 13_0 will analyze 36 compounds and 2930 transitions from SWATH 44 (batch 0 out of 1)
Thread 21_0 will analyze 18 compounds and 977 transitions from SWATH 29 (batch 0 out of 1)
Thread 38_0 will analyze 24 compounds and 3072 transitions from SWATH 34 (batch 0 out of 1)
Thread 32_0 will analyze 27 compounds and 3155 transitions from SWATH 39 (batch 0 out of 1)
Thread 43_0 will analyze 3 compounds and 192 transitions from SWATH 4 (batch 0 out of 1)
Thread 34_0 will analyze 19 compounds and 2966 transitions from SWATH 36 (batch 0 out of 1)
Thread 15_0 will analyze 24 compounds and 3389 transitions from SWATH 46 (batch 0 out of 1)
Thread 4_0 will analyze 17 compounds and 1371 transitions from SWATH 27 (batch 0 out of 1)
Thread 19_0 will analyze 33 compounds and 4938 transitions from SWATH 49 (batch 0 out of 1)
Thread 9_0 will analyze 3 compounds and 569 transitions from SWATH 9 (batch 0 out of 1)
Thread 1_0 will analyze 31 compounds and 3524 transitions from SWATH 50 (batch 0 out of 1)
Thread 23_0 will analyze 6 compounds and 260 transitions from SWATH 18 (batch 0 out of 1)
Thread 45_0 will analyze 28 compounds and 3344 transitions from SWATH 41 (batch 0 out of 1)
Thread 8_0 will analyze 16 compounds and 2479 transitions from SWATH 45 (batch 0 out of 1)
Thread 33_0 will analyze 10 compounds and 1053 transitions from SWATH 17 (batch 0 out of 1)
Thread 27_0 will analyze 27 compounds and 2138 transitions from SWATH 48 (batch 0 out of 1)
Thread 5_0 will analyze 17 compounds and 1645 transitions from SWATH 21 (batch 0 out of 1)
Thread 37_0 will analyze 3 compounds and 18 transitions from SWATH 2 (batch 1 out of 1)

3.96 %               Thread 37_0 will analyze 31 compounds and 2254 transitions from SWATH 51 (batch 0 out of 1)

<WARNING in SignalToNoiseEstimatorMedian: 2.64682% of all Signal-to-Noise estimates are too high, because the median was found in the rightmost histogram-bin. You should consider increasing 'max_intensity' (and maybe 'bin_count' with it, to keep bin width reasonable)> occurred 2 times

Allow for color adjustment of the automatically generated target decoy plots

Dear developers,

Thank you very much for providing such very nice software for FDR computation and scoring.
We noticed that the current color scheme of the automatically generated FDR scoring plots (targets in green and decoys in red) might could be improved, enabling a more color-blind friendly color scheme e.g. by using red and blue and similar distinguishable colors.

Thanks in advance and also for developing and maintaining the pyprophet software!

Best regards
Matthias

pyprophet ipf error

Hi,
when i try to use pyprophet, this error appear:
root@61d6818b7369:/data/ftp.peptideatlas.org# pyprophet ipf --in=merged.osw Info: Starting IPF (Inference of PeptidoForms). Info: Reading precursor-level data. Info: Preparing precursor-level data. Info: Conducting precursor-level inference. Traceback (most recent call last): File "/usr/local/bin/pyprophet", line 10, in <module> sys.exit(cli()) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 1163, in invoke rv.append(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/pyprophet/main.py", line 122, in ipf infer_peptidoforms(infile, outfile, ipf_ms1_scoring, ipf_ms2_scoring, ipf_h0, ipf_grouped_fdr, ipf_max_precursor_pep, ipf_max_peakgroup_pep, ipf_max_precursor_peakgroup_pep, ipf_max_transition_pep) File "/usr/local/lib/python3.5/dist-packages/pyprophet/ipf.py", line 366, in infer_peptidoforms precursor_data = precursor_inference(precursor_table, ipf_ms1_scoring, ipf_ms2_scoring, ipf_max_precursor_pep, ipf_max_precursor_peakgroup_pep) File "/usr/local/lib/python3.5/dist-packages/pyprophet/ipf.py", line 323, in precursor_inference prec_pp_data = apply_bm(precursor_data_bm) File "/usr/local/lib/python3.5/dist-packages/pyprophet/ipf.py", line 284, in apply_bm pp_data.columns = ['feature_id','hypothesis','likelihood_prior'] File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 5080, in __setattr__ return object.__setattr__(self, name, value) File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__ File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 638, in _set_axis self._data.set_axis(axis, labels) File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 155, in set_axis 'values have {new} elements'.format(old=old_len, new=new_len)) ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
could you help, thank you a lot

pyprophet / pyprophet Goto Github PK

pyprophet's Introduction

PyProphet

Installation

Running pyprophet

Docker

Running tests

pyprophet's People

Contributors

Stargazers

Watchers

Forkers

pyprophet's Issues

The BUG is shown as follows:

pyprophet peptide --in 20181201_FlMe_SA_diaPASEF_200ng_HeLa_py3.osw

[9 rows x 12 columns]

[9 rows x 12 columns]

[9 rows x 12 columns]

[9 rows x 12 columns]

Recommend Projects

Recommend Topics

Recommend Org