conglabcode / dpam Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 7.0 54.86 MB

A domain parser for Alphafold models

Python 96.92% Shell 0.71% Dockerfile 0.76% Perl 1.61%

dpam's People

Contributors

Stargazers

Watchers

Forkers

yakomaxa baba-hashimoto golshkovq jzhan6 ebgu byun-jinyoung jsauce5p

dpam's Issues

step 21 fails

hi,
I get the following error:
rm: cannot remove 'sw5b_step24.log': No such file or directory
Traceback (most recent call last):
File "/opt/DPAM/scripts/step22_merge_domains.py", line 28, in
fp = open('step21_' + dataset + '.result', 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'step21_sw5b.result'
Error in step22
But it seems 21 already fails as the file is missing.

I call dpam using:
apptainer exec --fakeroot -e --bind ${DB_dir}:/mnt/databases:ro
--bind ${Input_dir}:${ContainerOutput_dir}:rw
/opt/kgtools/images/dpam/latest/dpam.sif
/bin/bash -c "cd ${ContainerOutput_dir}; run_dpam.py ${Dataset} ${Threads} --log_file ${log_file}" 2>&1 | tee ${log_file}

the "--log_file ${log_file}" does not work (that is to say: many files but all say "done",
except for step4 (step4_sw5b_75tcm94x.log), but since the program continues I added: 2>&1 | tee ${log_file} to capture more data
I could send it, but it is quite large (35Mb)

Do you have some suggestions, or need more input (I will be happy to provide)
regards
Raymond

Running DPAM on a very large input - stuck on step 3

Hi CongLab!,
I am curently trying to use you'r tool!
I tried at first using it on a test directory with 15 proteins, I provided the json and pdb files as needed and it worked as it should.
now I am trying to run it on a very large input 20200 proteins and I keep getting stuck at step 3.
Is there a limit to the test size ? or is there any other issues with running dpam on such a large test?
I keep running on this error (taken from the docker log file):
"Traceback (most recent call last):
File "/opt/DPAM/scripts/run_step3.py", line 100, in
time.sleep(1)
NameError: name 'time' is not defined"

and before gettig this error I get:

15:57:38.177 INFO: Scoring 20000 HMMs using HMM-HMM Viterbi alignment
- 16:01:17.523 INFO: Set premerge to 0! (premerge: 3 iteration: 2 hits.Size: 21829)
16:58:11.696 WARNING: database contains sequences that exceed maximum allowed size (maxres = 20001). Max sequence length can be increased with parameter -maxres.
15:57:38.176 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 20000
15:57:31.306 WARNING: Number of hits passing 2nd prefilter (reduced from 106056 to allowed maximum of 20000).
You can increase the allowed maximum using the -maxfilt option.

step 1 and step 2 are done as they should I got the log files with done.

thankyou!

Domain annotations?

Hi,

I was excited that the preprocessed data for the model organisms had been added but when I opened, I realized that they have no annotation. Could it be added that what kind of domains was hit by that region? This way we need to parse and search again to see what other domains they are similar to.

Thank you!

IndexError: list index out of range

Hi,
I got an error when testing the DPAM.py using AF-E7MCA2-F1 model from the AF2 database. I'm not sure how to fix it. Could you help with this error? Thanks!

AF-E7MCA2-F1.zip

updated version in PNAS paper

Hi,
I noticed that you'd updated the DPAM in your recently published PNAS paper (https://www.pnas.org/doi/10.1073/pnas.2214069120).
Would you like to update the PNAS version scripts here? Thanks for your time.
Best.

Missing script pdb2fasta

I think this script is missing from GitHub

FileNotFoundError: [Errno 2] No such file or directory: '/home/hunter/projects/homology/DPAM/pdb2fasta ranked_0.pdb > ranked_0.fa'```

Dockerized DPAM

Dear DPAM,

Any chance you have a docker? If not, can you consider making one?

Thanks!

Demo cases?

Hi, I just test DPAM on my local Linux server using AF-A0A0K2WPR7-F1-model_v4 protein model from Caenorhabditis elegans. It seems to be executed well, but some warnings appeared.

- 13:41:05.697 WARNING: Number of hits passing 2nd prefilter (reduced from 70733 to allowed maximum of 20000).
  You can increase the allowed maximum using the -maxfilt option.
- 13:47:09.756 WARNING: Number of hits passing 2nd prefilter (reduced from 87397 to allowed maximum of 20000).
  You can increase the allowed maximum using the -maxfilt option.
- 13:49:14.511 WARNING: Input alignment A0A0K2WPR7.hmm looks like aligned FASTA instead of A2M/A3M format. Consider using '-M first' or '-M 50'
cat: /home/liuhongbin/soft/DPAM-main/output_dir2/iterativeDali_A0A0K2WPR7/A0A0K2WPR7_*_hits: 没有那个文件或目录

the log file is attached here.
log.txt

And the final number of found domains was 4, but in the model_organisms folder it's 3.
4 domains:
D1 31-65
D2 141-165
D3 166-260
D4 296-320
3 domains:
D1 6-65
D2 156-260
D3 261-335

Has the tool been run well? Should I fix these warnings? And I'm not sure the difference between two results was caused by the update of database or some other reasons.
Could you provide some demo AF2 models and results for the test?
Many thanks!

Getting domains without mapping to ECOD

Hi,

Looks like a useful package! I am wondering if it possible to just extract "domain-like" regions of a protein, without any mapping to ECOD, etc? Just stating that from reside X to Y, it is a possible folded domain.

Thank you!

pdb2fasta: Permission denied

Hi CongLab! I met an error that said pdb2fasta: Permission denied
May I know how to deal with this?
Thanks!

~/test/test_DPAM/DPAM$ python DPAM.py AF-Q9UQB3-F1-model_v4.pdb AF-Q9UQB3-F1-predicted_aligned_error_v4.json AF-Q9UQB3-F1-model_v4 /home/jupyter/test/test_DPAM/DPAM/ 16 /home/jupyter/script/DPAM/database/

cp: 'AF-Q9UQB3-F1-model_v4.pdb' and '/home/jupyter/test/test_DPAM/DPAM//AF-Q9UQB3-F1-model_v4.pdb' are the same file
start input processing 2023-11-08 02:44:41.288828
sh: 1: /home/jupyter/test/test_DPAM/DPAM/pdb2fasta: Permission denied

Structure without PAE

Hi,

Is it possible to parser domain without PAE json files?

Thanks

DPAM results are different between local runs and the values in the "Model organism" folder

Hello, I have been trying to run DPAM locally with the updated scripts and I compared some results to what you had uploaded in this repository. In most cases, it was giving different domain boundaries. In some cases entires domains were different. For example for O00592, my local run produces

D1	376-434

But this repository has:

D1	1-15,456-495
D2	357-434

Is there a reason to expect these differences? And which should we believe in case of these differences?

Linker regions between domains

Hi,

Thanks for the work you've put into this - it looks really useful!

I've been looking at extracting linker regions between domains using DPAM, and I noticed something a little odd. If you plot the length of a linker region (i.e., in between the end of a domain and before the start of the next domain), it looks a lot like the lengths of linkers is a multiple of 5 (just showing linkers up to 100 amino acids long).

It looks suspiciously like an artefact. Is there any way that multiples of 5 snuck in to the domain calling somehow? I'm using the pre-generated Homo sapiens DPAM data from this repository, and extracting the domain starts/ends into a big table, which I then transform.

Minor issue in DPAM.py

Right now the code reads:
if basedir[0] != '/': basedir = os.getcwd() + basedir

It should read:
if basedir[0] != '/': basedir = os.getcwd() + '/' + basedir

Execution against the local Alphafold2 output

Hi,

First of all, I would like to thank the author for creating a great tool.
The problem I am having is very simple. Currently, DPAM uses json to read predicted aligned errors (for AF2 database and Colabfold). But Alphafold2 running locally contains these information in the pkl file.

Below is an example output from Alphafold2 (ver. 2.2.4) running on local.

features.pkl        relaxed_model_1_pred_0.pdb  result_model_4_pred_0.pkl
msas                relaxed_model_2_pred_0.pdb  result_model_5_pred_0.pkl
ranked_0.pdb        relaxed_model_3_pred_0.pdb  timings.json
ranked_1.pdb        relaxed_model_4_pred_0.pdb  unrelaxed_model_1_pred_0.pdb
ranked_2.pdb        relaxed_model_5_pred_0.pdb  unrelaxed_model_2_pred_0.pdb
ranked_3.pdb        result_model_1_pred_0.pkl   unrelaxed_model_3_pred_0.pdb
ranked_4.pdb        result_model_2_pred_0.pkl   unrelaxed_model_4_pred_0.pdb
ranking_debug.json  result_model_3_pred_0.pkl   unrelaxed_model_5_pred_0.pdb

Do you have any plans to support pkl files in the future?

Best,

Keigo