compomics / deeplc Goto Github PK

View Code? Open in Web Editor NEW

46.0 13.0 18.0 811.36 MB

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.

Home Page: https://iomics.ugent.be/deeplc

License: Apache License 2.0

Python 98.90% Dockerfile 0.18% Inno Setup 0.93%

proteomics peptides retention-time peptide-modification deep-learning

deeplc's People

Contributors

Stargazers

Watchers

Forkers

ralfg ogawan animesh sailfish009 allynmuzhixu wenbostar chaohk ningzhibin compomics-research tj-lab111 caetera shubham1637 arthurdeclercq markmipt jaderocks18 harel-coffee heliu226 whiffen-cann

deeplc's Issues

DeepLC (tensorflow) does not seem to obey num_jobs

DeepLC (tensorflow) does not seem to obey num_jobs, as NUMEXPR_MAX_THREADS env variable is not set.

2021-02-05 08:57:40 // INFO // numexpr.utils // Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-02-05 08:57:40 // INFO // numexpr.utils // NumExpr defaulting to 8 threads.

I would propose to first check if NUMEXPR_MAX_THREADS has already been set, and if not, we can set it to the num_jobs parameter that was passed to DeepLC.

error using https://iomics.ugent.be/deeplc/

I got sequence and retention times from the msms.txt file of a chymotrypsin MaxQuant search result. Use R to select columns, and split them into the following csvs.

test.csv
seq,modifications
AAAARDVGSSIKSDRDKF,
AAAARDVGSSIKSDRDKF,
AAAVKDLGSSIKTDGDKF,
AACDPRHGRY,
AAEREGQQQTQQTEQSQEKKEEKN,
......

train.csv
seq,modifications,tr
KHWPFEVVSDGGKPKIKVSY,,4196.160000000001
AIQKSDMDLRKVLY,,3924.3
LCIGTSSGTM,,4293.78
ELDDELKSVENQMRY,,5764.2
......

Then I loaded them to https://iomics.ugent.be/deeplc/ using all the default settings. I got the following errors.
Running DeepLC
❌ DeepLC ran into a problem

OSError: [Errno 12] Cannot allocate memory
Traceback:
File "/deeplc/deeplc_streamlit.py", line 165, in _run_deeplc
dlc.calibrate_preds(seq_df=config["input_df_calibration"])
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 1010, in calibrate_preds
calibrate_output = self.calibrate_preds_func(
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 855, in calibrate_preds_func
predicted_tr = self.make_preds(
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 790, in make_preds
temp_preds = self.make_preds_core(
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 471, in make_preds_core
X = self.do_f_extraction_pd_parallel(seq_df)
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 319, in do_f_extraction_pd_parallel
pool = multiprocessing.Pool(self.n_jobs)
File "/venv/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/venv/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/venv/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/venv/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/venv/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/venv/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/venv/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/venv/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()

I also tried with another msms.txt from a trypsin MaxQuant search and got the same error. What should I do to get deepLC working.

update. I installed the windows version DeepLC and it works OK.

Error during calibration using modified peptides

Hi,
we tried to calibrate DeepLC using a list of peptides, some of which had modifications.

This modification caused the crash: "Label:13C(5)15N(1)"
it is defined in Unimod as well as in your unimod_to_formula.csv

Traceback (most recent call last):
  File "/home/user/deeplc/deeplc_cli.py", line 333, in <module>
    sys.exit(main())
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/deeplc/deeplc_cli.py", line 304, in main
    df_deeplc_output = run_deeplc(df_deeplc_input, calibration_df)
  File "/home/user/deeplc/deeplc_cli.py", line 189, in run_deeplc
    dlc.calibrate_preds(seq_df=calibration_df)
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 986, in calibrate_preds
    calibrate_output = self.calibrate_preds_func_pygam(
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 697, in calibrate_preds_func_pygam
    predicted_tr = self.make_preds(
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 633, in make_preds
    X = self.do_f_extraction_psm_list_parallel(psm_list)
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 442, in do_f_extraction_psm_list_parallel
    all_feats = self.do_f_extraction_psm_list(psm_list)
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 408, in do_f_extraction_psm_list
    return self.f_extractor.full_feat_extract(psm_list)
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 664, in full_feat_extract
    X_cnn = self.encode_atoms( # X_sum, X_cnn_pos, X_cnn_count, X_hc
  File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 549, in encode_atoms
    matrix[i, dict_index[atom_position_composition]] += atom_change
KeyError: 'C[13]'

This is the data frame we used for the calibration

seq	modifications	tr
VVKKHIKEL		592.3
KLSEVNKRL		893.4
FHHGLGHSL		999.0
HIKTHELHL		1109.4
HRTEFYRNL		1344.6
KLQEKIQEL		1519.0
LEHEHLIKL		1624.1
DRHSFLKAL		1780.7
IEQEQKLAL		1852.3
HHSLIRISL		1963.5
TETVHIFKL		2200.0
IESSDVIRL		2255.5
TGLIRPVAL		2392.4
IGDGYVIHL		2527.9
LPQELKLTL		2664.9
KLLQFYPSL		2806.5
DGTVRLWSL		2967.6
LMLGEFLKL	2|Oxidation	3212.2
SLLSSVFKL		3263.0
LPQLPLAAL		3473.7
SYLEDVRLI	6|Label:13C(5)15N(1)	3617.3

Best,
Steffen

Raise `TypeError: object of type 'float' has no len()` for the table without modifications

When I use the example provided in the doc to do retention time prediction using dlc.make_preds with parameter calibrate=False, TypeError: object of type 'float' has no len() was raised at:
File "c:\program files\python38\lib\site-packages\deeplc\feat_extractor.py", line 989, in encode_atoms if len(mods) == 0: TypeError: object of type 'float' has no len()

Python 3.8 compatibility

Currently, DeepLC is marked as incompatible with python 3.8. It might be interesting to resolve any incompatibilities and remove the restriction.

Low prediction accuracy for versions 2.0.4+

Hi,

I've noticed that starting from version 2.0.4 update I get much worse RT prediction accuracy compared to version 1.1.2.
I have a set of peptides (~2000) which is split into two parts - one for DeepLC calibration and one for estimation of predicted RTs.
The old version provides me with 0.124 min standard deviation for the difference between predicted and experimental RTs, while the new one (2.0.4) - 0.321 min.

All peptides have no modifications except fixed Carbamidomethyl of C.
I run DeepLC with basic command line options (deeplc_path, '--file_pred', estimate_file_name, '--file_cal', calibrate_file_name, '--file_pred_out', out_file_name).
The only strange thing in my data is ~5-20% peptide FDR.

The same behavior I see for another dataset, as well as for the latest DeepLC version (2.1.9).

Please find the attached DeepLC logs, files for testing and the figures with results.

Regards,
Mark

test_calibrate.txt
test_estimate.txt

Log_DeepLC112.txt

Log_DeepLC204.txt

Pygam calibration is the only option in the latest version

Hi everyone,

I've noticed that Pygam calibration is forced in the DeepLC 2+ versions. Any reason for that? According to my experience with DeepLC (v. 1.1.2), your default linear calibration usually works better (~20-30% reduce in standard deviation between predicted and experimental RTs) compared to pygam calibration. Of course, I could do "old" calibration by myself, but I trying to make sure that I'm not missing something.

Regards,
Mark

Error when using unmodified peptide to train and using modified peptides to test

I am using the windows installed deepLC application. Train with unmodified peptide information from MaxQuant evidence.txt, and test with unmodified peptide gets good results. However, there is the following error when using unmodified peptides to train and test with modified peptides.

Traceback (most recent call last):
File "pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
cpdef get_loc(self, object val):
File "pandas_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
return self.mapping.get_item(val)
File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas.libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas.libs.hashtable.PyObjectHashTable.get_item
KeyError: 'modifications'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 65, in main
run(**vars(argu))
File "deeplc_main.py", line 155, in run
preds = dlc.make_preds(seq_df=df_pred)
File "deeplc\deeplc.py", line 862, in make_preds
temp_preds = self.make_preds_core(
File "deeplc\deeplc.py", line 455, in make_preds_core
seq_df["idents"] = seq_df["seq"] + "|" + seq_df["modifications"]
File "pandas\core\frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'modifications'*

my train csv:
seq,modifications,tr
ISDAGEVVAIAR,,4013.04
ATMQNLNDR,,1882.4399999999998
TTTTTTTVVTQK,,1673.8799999999999
.......
The modified peptide csv
seq,modification,tr
AARPLVTVYDEK,1|Acetyl,4367.64
ADFDTNPTSLYSIK,1|Acetyl,7029
AHIVQTHK,1|Acetyl,1314.48

Another modified peptide csv also got the same error.
seq,modification,tr
AAAESIQMR,8|Oxidation,1353
AASVGPTMR,8|Oxidation,1264.26
ADLEMQIESLK,5|Oxidation,5267.34

Is it possible to train deepLC using unmodified peptides and test with modified peptides?

data set

DeepLC 0.1.14 crashes my computer systematically when a peptide file for calibration is not supplied

Hi,

I have used the DeepLC v. 0.1.14 python module many times in Ubuntu 18.04, and managed to run iRT predictions successfully every time. However, when trying to run DeepLC without supplying a peptide file for calibration, my computer crashes when running a task. I have so far predicted iRTs for 1.6 million peptide sequences.

My inital guess was that I had to set --batch_num to a lower value to lower the memory footprint. So I tried to set it to 100 000 and even 75 000, and tried to use less threads like 4 or even 2 (I use an Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz). I only have 32 GB RAM though, but since I dropped the --batch_num to 75 000 with --n_threads 2, I think that it should work. But the computer crashes still.

Any ideas on this?

Best,

Marc

How to configure support for selenocysteine in its carbamidomethylated form?

Selenocysteine (symbol Sec or U) is encoded in the human proteome in 72 instances. Selenocysteine is an analogue of the more common cysteine with selenium in place of the sulfur. Sec reacts like Cys with common reduction/alkylation chemistries using iodoacetamide to create the carbamidomethyl form. In a list of HLA-I peptides from an immunopeptidomics experiments submitted to DeepLC I had one, IEHCTSURVY SELENOH selenoprotein H ENSP00000373509.4 with no modification specified. It caused a python call to DeepLC to crash with the following error 1. Which I think means selenium is not in the DeepLC list of elements, but the elemental composition for amino acid U is present.

So I looked into DeepLC code and modified the following to support amino acid U and element Se:
C:\ProgramData\Anaconda3\Lib\site-packages\deeplc\feat_extractor.py
C:\ProgramData\Anaconda3\Lib\site-packages\deeplc\aa_comp_rel.csv

This led to error 2 below. The 6 columns are for the original elements C,H,N,O,S,P, the 7th column was for the addition of Se. I think this means that there won't be support for the element Se without suitable training data. But due to the rarity of Selenocysteine it seems unlikely that there will ever be enough examples to enable training.

So to avoid crashes I went back to the input sequence and changed the U to a C, IEHCTSCRV with the C's specified as Carbamidomethyl, as chemically that was the closest I could get. Do you have any other suggestions?

 File "C:\ProgramData\Anaconda3\lib\site-packages\deeplc\feat_extractor.py", line 525, in encode_atoms
   matrix_pos[pn, dict_index_pos[atom]] = val
KeyError: 'Se'

Error 2
   File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\input_spec.py", line 295, in assert_input_compatibility
       raise ValueError(
   ValueError: Input 0 of layer "model_179" is incompatible with the layer: expected shape=(None, 60, 6), found shape=(None, 60, 7)

negative retention times

I am using DeepLC through the MS2PIP website along with the intensity predictor for a list of peptides. Many of the peptides are assigned a negative retention time and I am not sure how to interpret that result.
Thank you for your help!

What is the unit of tr: retention time (only required for calibration)?

Hi,

I would like to know the unit used for the retention time for prediction?

scikit-learn / sklearn missing from dependency list

sklearn is missing as a dependency in setup.py and the conda yaml file:
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/deeplc/meta.yaml

The GitHub Actions tests pass, as sklearn was already a test dependency. The bioconda tests fail, however:
bioconda/bioconda-recipes#32320

The sklearn dependency seems to have been added in 675f5ab

Training new models

Issue #12 from May 2020 response was:

yes it is possible to train new models. The python scripts to retrain are available in "figures_without_models.zip" here: >>https://doi.org/10.5281/zenodo.3706875
In that zip file you can use "run_full_mod.py" to retrain. It should be documented, but if things are unclear please do not hesitate to ask.

I downloaded that zip and can not find "run_full_mod.py" perhaps you mean "run.py"?

Is there a more current version of the new model training capability?
The current release contains a file called deeplc/trainl3.py

Deeplc docker GPU support

Hi,

Thank you for providing such useful tools! Good job!
Something related to #16
Does the docker version support GPU? In specific Tesla A100/V100 series GPUs.
Thanks.

Best,
Wenjin

pip install on Python 3.10 not working correctly

Hi and thank you for the great package!

pip install deeplc behaves strangely on Python 3.10. It downloads mutltiple versions of deeplc:

pip install deeplc
Collecting deeplc
  Downloading deeplc-0.2.0-py3-none-any.whl (77.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.8/77.8 MB 2.1 MB/s eta 0:00:00
Requirement already satisfied: numpy<2,>=1.17 in /home/lev/venv/ms1/lib/python3.10/site-packages (from deeplc) (1.22.1)
Requirement already satisfied: setuptools>=42.0.1 in /home/lev/venv/ms1/lib/python3.10/site-packages (from deeplc) (59.6.0)
  Downloading deeplc-0.1.39-py3-none-any.whl (77.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.8/77.8 MB 1.3 MB/s eta 0:00:00
  Downloading deeplc-0.1.37-py3-none-any.whl (77.8 MB)
     ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.8/77.8 MB 1.0 MB/s eta 0:00:51
... etc.

Any idea what the reason and the solution may be?

How is it possible to predict RT when DeepLC doesnt know the LC gradient

Hi,

DeepLC enjoys a good reputation which is why I'd like to use it to predict retention times of peptides with with a specific gradient. Unfortunately I cannot provide calibration peptides with that gradient.

When I feed the DeepLC streamlit GUI my peptides to predict it returns a table with valuesin the "predicted_tr" column between 1 and 10.

how do I interpret this value?
how can DeepLC predict the retention time without knowing anything about the LC gradient I am using? (with he simple scenario that my peptides elute somewhere between 0% and 100% ACN)

Thank you for your help!

The last of default models is selected as the best one

Hello,

I have noticed, that when selecting one of the default models (i.e. --file_model is not provided) the last one in the list is selected as the best one independently of the performance.
Here is a piece of debug log (I have included the model name in the output):

2022-03-14 20:05:12 - DEBUG - For full_hc_hela_hf_psms_aligned model got a performance of: 0.28377379963897975
2022-03-14 20:05:12 - DEBUG - For full_hc_PXD005573_mcp model got a performance of: 0.28786216590159314
2022-03-14 20:05:12 - DEBUG - Model with the best performance got selected: {'/home/vgor/.local/lib/python3.8/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5': '/home/vgor/.local/lib/python3.8/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5'}

Although the performance of full_hc_hela_hf_psms_aligned is better, the last one in the list is selected.

Most likely, the error is this line

DeepLC/deeplc/deeplc.py

Line 1194 in 18fb8c3

m_group_name = "_".join(m.split("_")[:-1]).split("/")[-1]

I believe it should be
m_group_name = m_name

Since, I am not very sure that I understand everything correctly, I prefer to open the issue before PR.

The issue was observed with versions 0.1.36 and 1.0.0 (the latest in pip)

GUI DeepLC errors

Here is the error i get while trying to run DeepLC
2024-03-28 17:20:01 - INFO - Using DeepLC version 2.2.27
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
File "gooey\python_bindings\gooey_decorator.py", line 134, in
File "deeplc\gui.py", line 35, in start_gui
File "deeplc_main_.py", line 70, in main
File "deeplc_main_.py", line 134, in run
File "psm_utils\io_init_.py", line 151, in read_file
File "psm_utils\io_init_.py", line 112, in _infer_filetype
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.

Predictions are slow when library is not full

Hi Robbin!

DeepLC predictions are slow when library is not full ("--use_library" option). For example, if I'm trying to make predictions for 2000 sequences and 10 of them are not presented in the library, the prediction takes much more time compared to the predictions for 2000 sequences with no "--use_library". We have analysed your code and we are pretty sure that the problem is that multiprocessing used in "do_f_extraction_pd_parallel" method has a bottleneck due to creation of child processes takes a lot of time because self.library object is too big for fast copy process. I've tried a quick fix, which is adding line "del self.library" in deeplc.py file just a line before "# If we need to apply deep NN" and adding" self.library = {}
if self.use_library:
self.read_library()" right before line "# If we need to calibrate"

This fix make things much better, but still it is not optimal, since the library will be reloaded multiple times with every call of "make_preds_core" function. I don't have an idea for optimal change right now, and that is why I haven't prepared a pull request.

Regards,
Mark

Blank spot after calibration

Hello,

I noticed that there are regions with no prediction in the predicted rt domain. You can see blank rows in this scatter plot where no value is predicted.

I don`t see this without calibration.

Any idea about the source of this issue?

Thanks

Skipping calibration step?

Hello,

I tried to use DeepLC 0.1.17 for predicting retention times for some peptides in a csv file. I also provided a calibration file to the software. When looking at the console output, I could see some strange errors like:

ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3977001953125,0.795400390625

I am not really sure what this means? Does it mean that only some peptides in the calibration file are not included because they have invalid RTs? Or does the program not use the calibration file at all? I did not encounter this issue before (but then I provided iRT values and not RTs in minutes for the calibration)

I would be happy if you could help me understand this issue a little better (see attached console output).

Thank you in advance. :-)

Best,

Marc

(venv) marc@supercomputer:~/PythonProject/deeplc$ deeplc --file_pred /mnt/d/Data_MS2PIP/MSLibrarian_1/precursors_rt.csv --file_cal /mnt/d/Data_MS2PIP/MSLibrarian_1/precursors_rt_calib.csv --n_threads 8 2020-08-27 10:26:10.855515: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory2020-08-27 10:26:10.855561: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2020-08-27 10:26:11 - INFO - Using DeepLC version 0.1.17 2020-08-27 10:26:12 - INFO - Selecting best model and calibrating predictions... 2020-08-27 10:26:12.706689: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2020-08-27 10:26:12.706736: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303) 2020-08-27 10:26:12.706751: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (supercomputer): /proc/driver/nvidia/version does not exist 2020-08-27 10:26:12.706966: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-08-27 10:26:12.723027: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3311995000 Hz 2020-08-27 10:26:12.726908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4cf0b40 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-08-27 10:26:12.727027: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 1/1 [==============================] - 0s 288us/step 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.3977001953125 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3977001953125,0.795400390625 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.795400390625,1.1931005859375001 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1931005859375001,1.59080078125 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.59080078125,1.9885009765625 1/1 [==============================] - 0s 317us/step 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.4063890075683594 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.4063890075683594,0.8127780151367188 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.8127780151367188,1.2191670227050782 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.2191670227050782,1.6255560302734375 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.6255560302734375,2.0319450378417967 1/1 [==============================] - 0s 252us/step 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.3999577331542969 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3999577331542969,0.7999154663085938 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.7999154663085938,1.1998731994628906 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1998731994628906,1.5998309326171876 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.5998309326171876,1.9997886657714845 1/1 [==============================] - 0s 257us/step 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.38717674255371093 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.38717674255371093,0.7743534851074219 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.7743534851074219,1.1615302276611328 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1615302276611328,1.5487069702148437 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.5487069702148437,1.9358837127685546 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.9358837127685546,2.3230604553222656 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.3230604553222656,2.7102371978759763 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.7102371978759763,3.0974139404296874 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 3.0974139404296874,3.4845906829833986 2020-08-27 10:26:22 - INFO - Making predictions using model: {'/home/marc/PythonProject/deeplc/venv/lib/python3.6/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5': '/home/marc/PythonProject/deeplc/venv/lib/python3.6/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5'}

How can I get the PSI-MS format for the 'modifications' column?

Hello!

I have a list of modified peptides in the 'ModifiedPeptides' column above. I would like to use DeepLC to predict their retention times. Is there an efficient way to reformat this column into the 'modifications' column input for DeepLC?

DeepLC as a docker image

Hi,

I just wonder if there's a plan to distribute a Docker image of DeepLC later? Btw, I think this tool is
performing really well given the benchmarks that I have done between predicted vs. experimental RT and iRTs.

Best,

Marc

DeepLC CPU vs GPU performance

Hello,

First of all, thanks for outstanding software for RT prediction!

I'm actively using DeepLC in my workflow and now it becomes a bottleneck in the total analysis time. So, I'm thinking to switch from CPU-based DeepLC calculations to GPU. But I want to estimate effect of this change before buying a powerful video card for the server. Have you done any comparison for CPU vs GPU DeepLC time consumption?

Regards,
Mark

How to convert peptide modifications of maxquant output file to MS2PIP-style

Hi,

I want to use deeplc to predict retention times for some peptides, but since I am using maxquant for the analysis, I don't know how to convert the peptide modifications of the output file to the MS2PIP-style that deeplc needs, The output of maxquant looks like the following:

ASMGTLAFDEYGRPFLIIK | 19 | Acetyl (Protein N-term),Oxidation (M) | _(Acetyl (Protein N-term))ASM(Oxidation   (M))GTLAFDEYGRPFLIIK_
-- | -- | -- | --
DDDIAALVVDNGSGMCK | 17 | Acetyl (Protein N-term),Oxidation (M) | _(Acetyl (Protein N-term))DDDIAALVVDNGSGM(Oxidation (M))CK_
AMEALATAEQACK | 13 | Oxidation (M) | _AM(Oxidation (M))EALATAEQACK_
MQQQLDEYQELLDIK | 15 | Oxidation (M) | _M(Oxidation (M))QQQLDEYQELLDIK_

Has anyone tried to convert such decoration information into the input information required by deeplc, is there any software or function that can do it?

Thanks,
LeeLee

Bioconda package autobump has some issues

See bioconda/bioconda-recipes#23815

deeplc gui

Hello，When using version 2.2.3 of deeplc gui, the following problems occurred:

Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main_.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 181, in run
if len(psm_list_cal) > 0:
UnboundLocalError: local variable 'psm_list_cal' referenced before assignment

And the input file I used is examples/datasets/seqs_exp.csv of the deeplc source code.

batch_num is not used anymore in the resent versions

Hi again,

I've noticed that latest versions of DeepLC (2+) does not use argument batch_num anymore (Or did I read the code wrong?). That leads to GPU memory errors when DeepLC is used to predict RTs for few millions of peptides on the weak GPU.

I compare everything to version 1.1.2, which is the most stable and effective for me right now. And works fine for the same set of peptides on the same hardware.

Regards,
Mark

Possibility to retrain a model

Is it possible to retrain a new model? or once it's calibrated could we save the calibrated models for future use?

I cannot run it on website when my file getting bigger

When I only read the first 100 lines of the file, it ran smoothly, but when I read all the file data, an error occurred.

this is traceback:
KeyError: 'Se'
Traceback:
File "/deeplc/deeplc_streamlit.py", line 173, in _run_deeplc
preds = dlc.make_preds(seq_df=config["input_df"], calibrate=calibrate)
File "/usr/local/lib/python3.10/site-packages/deeplc/deeplc.py", line 634, in make_preds
X = self.do_f_extraction_psm_list_parallel(psm_list)
File "/usr/local/lib/python3.10/site-packages/deeplc/deeplc.py", line 443, in do_f_extraction_psm_list_parallel
all_feats = self.do_f_extraction_psm_list(psm_list)
File "/usr/local/lib/python3.10/site-packages/deeplc/deeplc.py", line 409, in do_f_extraction_psm_list
return self.f_extractor.full_feat_extract(psm_list)
File "/usr/local/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 664, in full_feat_extract
X_cnn = self.encode_atoms( # X_sum, X_cnn_pos, X_cnn_count, X_hc
File "/usr/local/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 525, in encode_atoms
matrix_pos[pn, dict_index_pos[atom]] = val

error using GUI to predict rt

I want to use deeplc to predict retention times for some peptides so I just use deeplc from GUI.
However, there are some problem like below.

2023-05-28 22:43:39 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main_.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 130, in run
list_of_psms.append(PSM(peptidoform=peprec_to_proforma(seq,mod),spectrum_id=ident))
File "psm_utils\io\peptide_record.py", line 398, in peprec_to_proforma
peptide[int(position)] += f"[{label}]"
IndexError: list index out of range
2023-05-28 22:44:05 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main_.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
2023-05-28 22:48:35 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
2023-05-28 22:49:11 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
2023-05-28 23:19:30 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in _infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.

And my CSV file are like below.
sequence_information.csv
seq,modifications
KPVGAAK,
KPAVQK,
KPLQGK,
VPKQAK,
QVAPKK,
GQLPKK,
AAGVPKK,
PVVLDK,
VPVIDK,
QPKIGK,
KPAAAAGAK,
KPQVNAK,
KPSPEVK,
KPMLPAK,
EPVIAQK,
APLMPKK,
PLDLGAAK,
PKPPAFK,
LPLDQAK,
PKQGINK,
SIHHAR,
SLHAHR,
ISEPFK,
LSIMEK,
LSMIEK,
SLEIMK,
SLEPFK,
SLLEMK,
SLMLEK,
ISFPEK,
KPMLPAK,3|Oxidation
LVWPSAK,
QSVELPK,
GVIEPSAK,
PSLTGLGR,
LGDLGLGR,
AEVVGAVR,
WVLVQR,
APPPPPPK,
DTLINPK,
GQQIGK,
QAGGIGK,
QGQIGK,
QNALGK,
GIVWR,
GLVGER,
GVIDAR,
LGVEGR,
LVGWR,
VGLWR,
QMIQQYEMYCK,10|Carbamidomethyl
TEEMPNDSVLENK,4|Oxidation
KPATAAGTK,
AIHLNK,
....

How can I use deeplc to predict retention time?

compomics / deeplc Goto Github PK

deeplc's People

Contributors

Stargazers

Watchers

Forkers

deeplc's Issues

Recommend Projects

Recommend Topics

Recommend Org