compomics / deeplc Goto Github PK
View Code? Open in Web Editor NEWDeepLC: Retention time prediction for (modified) peptides using Deep Learning.
Home Page: https://iomics.ugent.be/deeplc
License: Apache License 2.0
DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
Home Page: https://iomics.ugent.be/deeplc
License: Apache License 2.0
DeepLC (tensorflow) does not seem to obey num_jobs, as NUMEXPR_MAX_THREADS env variable is not set.
2021-02-05 08:57:40 // INFO // numexpr.utils // Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-02-05 08:57:40 // INFO // numexpr.utils // NumExpr defaulting to 8 threads.
I would propose to first check if NUMEXPR_MAX_THREADS
has already been set, and if not, we can set it to the num_jobs
parameter that was passed to DeepLC.
I got sequence and retention times from the msms.txt file of a chymotrypsin MaxQuant search result. Use R to select columns, and split them into the following csvs.
test.csv
seq,modifications
AAAARDVGSSIKSDRDKF,
AAAARDVGSSIKSDRDKF,
AAAVKDLGSSIKTDGDKF,
AACDPRHGRY,
AAEREGQQQTQQTEQSQEKKEEKN,
......
train.csv
seq,modifications,tr
KHWPFEVVSDGGKPKIKVSY,,4196.160000000001
AIQKSDMDLRKVLY,,3924.3
LCIGTSSGTM,,4293.78
ELDDELKSVENQMRY,,5764.2
......
Then I loaded them to https://iomics.ugent.be/deeplc/ using all the default settings. I got the following errors.
Running DeepLC
❌ DeepLC ran into a problem
OSError: [Errno 12] Cannot allocate memory
Traceback:
File "/deeplc/deeplc_streamlit.py", line 165, in _run_deeplc
dlc.calibrate_preds(seq_df=config["input_df_calibration"])
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 1010, in calibrate_preds
calibrate_output = self.calibrate_preds_func(
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 855, in calibrate_preds_func
predicted_tr = self.make_preds(
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 790, in make_preds
temp_preds = self.make_preds_core(
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 471, in make_preds_core
X = self.do_f_extraction_pd_parallel(seq_df)
File "/venv/lib/python3.8/site-packages/deeplc/deeplc.py", line 319, in do_f_extraction_pd_parallel
pool = multiprocessing.Pool(self.n_jobs)
File "/venv/lib/python3.8/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/venv/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/venv/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/venv/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/venv/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/venv/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/venv/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/venv/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
I also tried with another msms.txt from a trypsin MaxQuant search and got the same error. What should I do to get deepLC working.
update. I installed the windows version DeepLC and it works OK.
Hi,
we tried to calibrate DeepLC using a list of peptides, some of which had modifications.
This modification caused the crash: "Label:13C(5)15N(1)"
it is defined in Unimod as well as in your unimod_to_formula.csv
Traceback (most recent call last):
File "/home/user/deeplc/deeplc_cli.py", line 333, in <module>
sys.exit(main())
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/user/deeplc/deeplc_cli.py", line 304, in main
df_deeplc_output = run_deeplc(df_deeplc_input, calibration_df)
File "/home/user/deeplc/deeplc_cli.py", line 189, in run_deeplc
dlc.calibrate_preds(seq_df=calibration_df)
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 986, in calibrate_preds
calibrate_output = self.calibrate_preds_func_pygam(
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 697, in calibrate_preds_func_pygam
predicted_tr = self.make_preds(
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 633, in make_preds
X = self.do_f_extraction_psm_list_parallel(psm_list)
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 442, in do_f_extraction_psm_list_parallel
all_feats = self.do_f_extraction_psm_list(psm_list)
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/deeplc.py", line 408, in do_f_extraction_psm_list
return self.f_extractor.full_feat_extract(psm_list)
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 664, in full_feat_extract
X_cnn = self.encode_atoms( # X_sum, X_cnn_pos, X_cnn_count, X_hc
File "/home-link/user/mambaforge/envs/deeplc/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 549, in encode_atoms
matrix[i, dict_index[atom_position_composition]] += atom_change
KeyError: 'C[13]'
This is the data frame we used for the calibration
seq modifications tr
VVKKHIKEL 592.3
KLSEVNKRL 893.4
FHHGLGHSL 999.0
HIKTHELHL 1109.4
HRTEFYRNL 1344.6
KLQEKIQEL 1519.0
LEHEHLIKL 1624.1
DRHSFLKAL 1780.7
IEQEQKLAL 1852.3
HHSLIRISL 1963.5
TETVHIFKL 2200.0
IESSDVIRL 2255.5
TGLIRPVAL 2392.4
IGDGYVIHL 2527.9
LPQELKLTL 2664.9
KLLQFYPSL 2806.5
DGTVRLWSL 2967.6
LMLGEFLKL 2|Oxidation 3212.2
SLLSSVFKL 3263.0
LPQLPLAAL 3473.7
SYLEDVRLI 6|Label:13C(5)15N(1) 3617.3
Best,
Steffen
When I use the example provided in the doc to do retention time prediction using dlc.make_preds
with parameter calibrate=False
, TypeError: object of type 'float' has no len()
was raised at:
File "c:\program files\python38\lib\site-packages\deeplc\feat_extractor.py", line 989, in encode_atoms if len(mods) == 0: TypeError: object of type 'float' has no len()
Currently, DeepLC is marked as incompatible with python 3.8. It might be interesting to resolve any incompatibilities and remove the restriction.
Hi,
I've noticed that starting from version 2.0.4 update I get much worse RT prediction accuracy compared to version 1.1.2.
I have a set of peptides (~2000) which is split into two parts - one for DeepLC calibration and one for estimation of predicted RTs.
The old version provides me with 0.124 min standard deviation for the difference between predicted and experimental RTs, while the new one (2.0.4) - 0.321 min.
All peptides have no modifications except fixed Carbamidomethyl of C.
I run DeepLC with basic command line options (deeplc_path, '--file_pred', estimate_file_name, '--file_cal', calibrate_file_name, '--file_pred_out', out_file_name).
The only strange thing in my data is ~5-20% peptide FDR.
The same behavior I see for another dataset, as well as for the latest DeepLC version (2.1.9).
Please find the attached DeepLC logs, files for testing and the figures with results.
Hi everyone,
I've noticed that Pygam calibration is forced in the DeepLC 2+ versions. Any reason for that? According to my experience with DeepLC (v. 1.1.2), your default linear calibration usually works better (~20-30% reduce in standard deviation between predicted and experimental RTs) compared to pygam calibration. Of course, I could do "old" calibration by myself, but I trying to make sure that I'm not missing something.
Regards,
Mark
I am using the windows installed deepLC application. Train with unmodified peptide information from MaxQuant evidence.txt, and test with unmodified peptide gets good results. However, there is the following error when using unmodified peptides to train and test with modified peptides.
Traceback (most recent call last):
File "pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
cpdef get_loc(self, object val):
File "pandas_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
return self.mapping.get_item(val)
File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas.libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas.libs.hashtable.PyObjectHashTable.get_item
KeyError: 'modifications'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 65, in main
run(**vars(argu))
File "deeplc_main.py", line 155, in run
preds = dlc.make_preds(seq_df=df_pred)
File "deeplc\deeplc.py", line 862, in make_preds
temp_preds = self.make_preds_core(
File "deeplc\deeplc.py", line 455, in make_preds_core
seq_df["idents"] = seq_df["seq"] + "|" + seq_df["modifications"]
File "pandas\core\frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'modifications'*
my train csv:
seq,modifications,tr
ISDAGEVVAIAR,,4013.04
ATMQNLNDR,,1882.4399999999998
TTTTTTTVVTQK,,1673.8799999999999
.......
The modified peptide csv
seq,modification,tr
AARPLVTVYDEK,1|Acetyl,4367.64
ADFDTNPTSLYSIK,1|Acetyl,7029
AHIVQTHK,1|Acetyl,1314.48
Another modified peptide csv also got the same error.
seq,modification,tr
AAAESIQMR,8|Oxidation,1353
AASVGPTMR,8|Oxidation,1264.26
ADLEMQIESLK,5|Oxidation,5267.34
Is it possible to train deepLC using unmodified peptides and test with modified peptides?
Hi,
I have used the DeepLC v. 0.1.14 python module many times in Ubuntu 18.04, and managed to run iRT predictions successfully every time. However, when trying to run DeepLC without supplying a peptide file for calibration, my computer crashes when running a task. I have so far predicted iRTs for 1.6 million peptide sequences.
My inital guess was that I had to set --batch_num to a lower value to lower the memory footprint. So I tried to set it to 100 000 and even 75 000, and tried to use less threads like 4 or even 2 (I use an Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz). I only have 32 GB RAM though, but since I dropped the --batch_num to 75 000 with --n_threads 2, I think that it should work. But the computer crashes still.
Any ideas on this?
Best,
Marc
Selenocysteine (symbol Sec or U) is encoded in the human proteome in 72 instances. Selenocysteine is an analogue of the more common cysteine with selenium in place of the sulfur. Sec reacts like Cys with common reduction/alkylation chemistries using iodoacetamide to create the carbamidomethyl form. In a list of HLA-I peptides from an immunopeptidomics experiments submitted to DeepLC I had one, IEHCTSURVY SELENOH selenoprotein H ENSP00000373509.4 with no modification specified. It caused a python call to DeepLC to crash with the following error 1. Which I think means selenium is not in the DeepLC list of elements, but the elemental composition for amino acid U is present.
So I looked into DeepLC code and modified the following to support amino acid U and element Se:
C:\ProgramData\Anaconda3\Lib\site-packages\deeplc\feat_extractor.py
C:\ProgramData\Anaconda3\Lib\site-packages\deeplc\aa_comp_rel.csv
This led to error 2 below. The 6 columns are for the original elements C,H,N,O,S,P, the 7th column was for the addition of Se. I think this means that there won't be support for the element Se without suitable training data. But due to the rarity of Selenocysteine it seems unlikely that there will ever be enough examples to enable training.
So to avoid crashes I went back to the input sequence and changed the U to a C, IEHCTSCRV with the C's specified as Carbamidomethyl, as chemically that was the closest I could get. Do you have any other suggestions?
File "C:\ProgramData\Anaconda3\lib\site-packages\deeplc\feat_extractor.py", line 525, in encode_atoms
matrix_pos[pn, dict_index_pos[atom]] = val
KeyError: 'Se'
Error 2
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\input_spec.py", line 295, in assert_input_compatibility
raise ValueError(
ValueError: Input 0 of layer "model_179" is incompatible with the layer: expected shape=(None, 60, 6), found shape=(None, 60, 7)
I am using DeepLC through the MS2PIP website along with the intensity predictor for a list of peptides. Many of the peptides are assigned a negative retention time and I am not sure how to interpret that result.
Thank you for your help!
Hi,
I would like to know the unit used for the retention time for prediction?
sklearn is missing as a dependency in setup.py and the conda yaml file:
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/deeplc/meta.yaml
The GitHub Actions tests pass, as sklearn was already a test dependency. The bioconda tests fail, however:
bioconda/bioconda-recipes#32320
The sklearn dependency seems to have been added in 675f5ab
Issue #12 from May 2020 response was:
yes it is possible to train new models. The python scripts to retrain are available in "figures_without_models.zip" here: >>https://doi.org/10.5281/zenodo.3706875
In that zip file you can use "run_full_mod.py" to retrain. It should be documented, but if things are unclear please do not hesitate to ask.
I downloaded that zip and can not find "run_full_mod.py" perhaps you mean "run.py"?
Is there a more current version of the new model training capability?
The current release contains a file called deeplc/trainl3.py
Hi,
Thank you for providing such useful tools! Good job!
Something related to #16
Does the docker version support GPU? In specific Tesla A100/V100 series GPUs.
Thanks.
Best,
Wenjin
Hi and thank you for the great package!
pip install deeplc
behaves strangely on Python 3.10. It downloads mutltiple versions of deeplc:
pip install deeplc
Collecting deeplc
Downloading deeplc-0.2.0-py3-none-any.whl (77.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.8/77.8 MB 2.1 MB/s eta 0:00:00
Requirement already satisfied: numpy<2,>=1.17 in /home/lev/venv/ms1/lib/python3.10/site-packages (from deeplc) (1.22.1)
Requirement already satisfied: setuptools>=42.0.1 in /home/lev/venv/ms1/lib/python3.10/site-packages (from deeplc) (59.6.0)
Downloading deeplc-0.1.39-py3-none-any.whl (77.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.8/77.8 MB 1.3 MB/s eta 0:00:00
Downloading deeplc-0.1.37-py3-none-any.whl (77.8 MB)
━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.8/77.8 MB 1.0 MB/s eta 0:00:51
... etc.
Any idea what the reason and the solution may be?
Hi,
DeepLC enjoys a good reputation which is why I'd like to use it to predict retention times of peptides with with a specific gradient. Unfortunately I cannot provide calibration peptides with that gradient.
When I feed the DeepLC streamlit GUI my peptides to predict it returns a table with valuesin the "predicted_tr" column between 1 and 10.
Thank you for your help!
Hello,
I have noticed, that when selecting one of the default models (i.e. --file_model
is not provided) the last one in the list is selected as the best one independently of the performance.
Here is a piece of debug log (I have included the model name in the output):
2022-03-14 20:05:12 - DEBUG - For full_hc_hela_hf_psms_aligned model got a performance of: 0.28377379963897975
2022-03-14 20:05:12 - DEBUG - For full_hc_PXD005573_mcp model got a performance of: 0.28786216590159314
2022-03-14 20:05:12 - DEBUG - Model with the best performance got selected: {'/home/vgor/.local/lib/python3.8/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5': '/home/vgor/.local/lib/python3.8/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5'}
Although the performance of full_hc_hela_hf_psms_aligned
is better, the last one in the list is selected.
Most likely, the error is this line
Line 1194 in 18fb8c3
m_group_name = m_name
Since, I am not very sure that I understand everything correctly, I prefer to open the issue before PR.
The issue was observed with versions 0.1.36 and 1.0.0 (the latest in pip)
Here is the error i get while trying to run DeepLC
2024-03-28 17:20:01 - INFO - Using DeepLC version 2.2.27
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
File "gooey\python_bindings\gooey_decorator.py", line 134, in
File "deeplc\gui.py", line 35, in start_gui
File "deeplc_main_.py", line 70, in main
File "deeplc_main_.py", line 134, in run
File "psm_utils\io_init_.py", line 151, in read_file
File "psm_utils\io_init_.py", line 112, in _infer_filetype
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
Hi Robbin!
DeepLC predictions are slow when library is not full ("--use_library" option). For example, if I'm trying to make predictions for 2000 sequences and 10 of them are not presented in the library, the prediction takes much more time compared to the predictions for 2000 sequences with no "--use_library". We have analysed your code and we are pretty sure that the problem is that multiprocessing used in "do_f_extraction_pd_parallel" method has a bottleneck due to creation of child processes takes a lot of time because self.library object is too big for fast copy process. I've tried a quick fix, which is adding line "del self.library" in deeplc.py file just a line before "# If we need to apply deep NN" and adding" self.library = {}
if self.use_library:
self.read_library()" right before line "# If we need to calibrate"
This fix make things much better, but still it is not optimal, since the library will be reloaded multiple times with every call of "make_preds_core" function. I don't have an idea for optimal change right now, and that is why I haven't prepared a pull request.
Regards,
Mark
Hello,
I tried to use DeepLC 0.1.17 for predicting retention times for some peptides in a csv file. I also provided a calibration file to the software. When looking at the console output, I could see some strange errors like:
ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3977001953125,0.795400390625
I am not really sure what this means? Does it mean that only some peptides in the calibration file are not included because they have invalid RTs? Or does the program not use the calibration file at all? I did not encounter this issue before (but then I provided iRT values and not RTs in minutes for the calibration)
I would be happy if you could help me understand this issue a little better (see attached console output).
Thank you in advance. :-)
Best,
Marc
(venv) marc@supercomputer:~/PythonProject/deeplc$ deeplc --file_pred /mnt/d/Data_MS2PIP/MSLibrarian_1/precursors_rt.csv --file_cal /mnt/d/Data_MS2PIP/MSLibrarian_1/precursors_rt_calib.csv --n_threads 8 2020-08-27 10:26:10.855515: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory2020-08-27 10:26:10.855561: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2020-08-27 10:26:11 - INFO - Using DeepLC version 0.1.17 2020-08-27 10:26:12 - INFO - Selecting best model and calibrating predictions... 2020-08-27 10:26:12.706689: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2020-08-27 10:26:12.706736: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303) 2020-08-27 10:26:12.706751: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (supercomputer): /proc/driver/nvidia/version does not exist 2020-08-27 10:26:12.706966: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-08-27 10:26:12.723027: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3311995000 Hz 2020-08-27 10:26:12.726908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4cf0b40 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-08-27 10:26:12.727027: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 1/1 [==============================] - 0s 288us/step 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.3977001953125 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3977001953125,0.795400390625 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.795400390625,1.1931005859375001 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1931005859375001,1.59080078125 2020-08-27 10:26:13 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.59080078125,1.9885009765625 1/1 [==============================] - 0s 317us/step 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.4063890075683594 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.4063890075683594,0.8127780151367188 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.8127780151367188,1.2191670227050782 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.2191670227050782,1.6255560302734375 2020-08-27 10:26:16 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.6255560302734375,2.0319450378417967 1/1 [==============================] - 0s 252us/step 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.3999577331542969 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.3999577331542969,0.7999154663085938 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.7999154663085938,1.1998731994628906 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1998731994628906,1.5998309326171876 2020-08-27 10:26:18 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.5998309326171876,1.9997886657714845 1/1 [==============================] - 0s 257us/step 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.0,0.38717674255371093 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.38717674255371093,0.7743534851074219 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 0.7743534851074219,1.1615302276611328 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.1615302276611328,1.5487069702148437 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.5487069702148437,1.9358837127685546 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 1.9358837127685546,2.3230604553222656 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.3230604553222656,2.7102371978759763 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 2.7102371978759763,3.0974139404296874 2020-08-27 10:26:21 - ERROR - Skipping calibration step, due to no points in the predicted range (are you sure about the split size?): 3.0974139404296874,3.4845906829833986 2020-08-27 10:26:22 - INFO - Making predictions using model: {'/home/marc/PythonProject/deeplc/venv/lib/python3.6/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5': '/home/marc/PythonProject/deeplc/venv/lib/python3.6/site-packages/deeplc/mods/full_hc_PXD005573_mcp_cb975cfdd4105f97efa0b3afffe075cc.hdf5'}
Hi,
I just wonder if there's a plan to distribute a Docker image of DeepLC later? Btw, I think this tool is
performing really well given the benchmarks that I have done between predicted vs. experimental RT and iRTs.
Best,
Marc
Hello,
First of all, thanks for outstanding software for RT prediction!
I'm actively using DeepLC in my workflow and now it becomes a bottleneck in the total analysis time. So, I'm thinking to switch from CPU-based DeepLC calculations to GPU. But I want to estimate effect of this change before buying a powerful video card for the server. Have you done any comparison for CPU vs GPU DeepLC time consumption?
Regards,
Mark
Hi,
I want to use deeplc to predict retention times for some peptides, but since I am using maxquant for the analysis, I don't know how to convert the peptide modifications of the output file to the MS2PIP-style that deeplc needs, The output of maxquant looks like the following:
ASMGTLAFDEYGRPFLIIK | 19 | Acetyl (Protein N-term),Oxidation (M) | _(Acetyl (Protein N-term))ASM(Oxidation (M))GTLAFDEYGRPFLIIK_
-- | -- | -- | --
DDDIAALVVDNGSGMCK | 17 | Acetyl (Protein N-term),Oxidation (M) | _(Acetyl (Protein N-term))DDDIAALVVDNGSGM(Oxidation (M))CK_
AMEALATAEQACK | 13 | Oxidation (M) | _AM(Oxidation (M))EALATAEQACK_
MQQQLDEYQELLDIK | 15 | Oxidation (M) | _M(Oxidation (M))QQQLDEYQELLDIK_
Has anyone tried to convert such decoration information into the input information required by deeplc, is there any software or function that can do it?
Thanks,
LeeLee
Hello,When using version 2.2.3 of deeplc gui, the following problems occurred:
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main_.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 181, in run
if len(psm_list_cal) > 0:
UnboundLocalError: local variable 'psm_list_cal' referenced before assignment
And the input file I used is examples/datasets/seqs_exp.csv of the deeplc source code.
Hi again,
I've noticed that latest versions of DeepLC (2+) does not use argument batch_num anymore (Or did I read the code wrong?). That leads to GPU memory errors when DeepLC is used to predict RTs for few millions of peptides on the weak GPU.
I compare everything to version 1.1.2, which is the most stable and effective for me right now. And works fine for the same set of peptides on the same hardware.
Regards,
Mark
Is it possible to retrain a new model? or once it's calibrated could we save the calibrated models for future use?
When I only read the first 100 lines of the file, it ran smoothly, but when I read all the file data, an error occurred.
this is traceback:
KeyError: 'Se'
Traceback:
File "/deeplc/deeplc_streamlit.py", line 173, in _run_deeplc
preds = dlc.make_preds(seq_df=config["input_df"], calibrate=calibrate)
File "/usr/local/lib/python3.10/site-packages/deeplc/deeplc.py", line 634, in make_preds
X = self.do_f_extraction_psm_list_parallel(psm_list)
File "/usr/local/lib/python3.10/site-packages/deeplc/deeplc.py", line 443, in do_f_extraction_psm_list_parallel
all_feats = self.do_f_extraction_psm_list(psm_list)
File "/usr/local/lib/python3.10/site-packages/deeplc/deeplc.py", line 409, in do_f_extraction_psm_list
return self.f_extractor.full_feat_extract(psm_list)
File "/usr/local/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 664, in full_feat_extract
X_cnn = self.encode_atoms( # X_sum, X_cnn_pos, X_cnn_count, X_hc
File "/usr/local/lib/python3.10/site-packages/deeplc/feat_extractor.py", line 525, in encode_atoms
matrix_pos[pn, dict_index_pos[atom]] = val
Hi
I want to use deeplc to predict retention times for some peptides so I just use deeplc from GUI.
However, there are some problem like below.
2023-05-28 22:43:39 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main_.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 130, in run
list_of_psms.append(PSM(peptidoform=peprec_to_proforma(seq,mod),spectrum_id=ident))
File "psm_utils\io\peptide_record.py", line 398, in peprec_to_proforma
peptide[int(position)] += f"[{label}]"
IndexError: list index out of range
2023-05-28 22:44:05 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main_.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
2023-05-28 22:48:35 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
2023-05-28 22:49:11 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
2023-05-28 23:19:30 - INFO - Using DeepLC version 2.0.4
Traceback (most recent call last):
File "deeplc\gui.py", line 38, in
start_gui()
File "gooey\python_bindings\gooey_decorator.py", line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
File "deeplc\gui.py", line 35, in start_gui
main(gui=True)
File "deeplc_main.py", line 70, in main
run(**vars(argu))
File "deeplc_main_.py", line 134, in run
psm_list_pred = read_file(file_pred)
File "psm_utils\io_init_.py", line 125, in read_file
filetype = infer_filetype(filename)
File "psm_utils\io_init.py", line 86, in _infer_filetype
raise PSMUtilsIOException("Could not infer filetype.")
psm_utils.io.exceptions.PSMUtilsIOException: Could not infer filetype.
And my CSV file are like below.
sequence_information.csv
seq,modifications
KPVGAAK,
KPAVQK,
KPLQGK,
VPKQAK,
QVAPKK,
GQLPKK,
AAGVPKK,
PVVLDK,
VPVIDK,
QPKIGK,
KPAAAAGAK,
KPQVNAK,
KPSPEVK,
KPMLPAK,
EPVIAQK,
APLMPKK,
PLDLGAAK,
PKPPAFK,
LPLDQAK,
PKQGINK,
SIHHAR,
SLHAHR,
ISEPFK,
LSIMEK,
LSMIEK,
SLEIMK,
SLEPFK,
SLLEMK,
SLMLEK,
ISFPEK,
KPMLPAK,3|Oxidation
LVWPSAK,
QSVELPK,
GVIEPSAK,
PSLTGLGR,
LGDLGLGR,
AEVVGAVR,
WVLVQR,
APPPPPPK,
DTLINPK,
GQQIGK,
QAGGIGK,
QGQIGK,
QNALGK,
GIVWR,
GLVGER,
GVIDAR,
LGVEGR,
LVGWR,
VGLWR,
QMIQQYEMYCK,10|Carbamidomethyl
TEEMPNDSVLENK,4|Oxidation
KPATAAGTK,
AIHLNK,
....
How can I use deeplc to predict retention time?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.