conglabcode / dpam Goto Github PK
View Code? Open in Web Editor NEWA domain parser for Alphafold models
A domain parser for Alphafold models
hi,
I get the following error:
rm: cannot remove 'sw5b_step24.log': No such file or directory
Traceback (most recent call last):
File "/opt/DPAM/scripts/step22_merge_domains.py", line 28, in
fp = open('step21_' + dataset + '.result', 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'step21_sw5b.result'
Error in step22
But it seems 21 already fails as the file is missing.
I call dpam using:
apptainer exec --fakeroot -e --bind ${DB_dir}:/mnt/databases:ro
--bind ${Input_dir}:${ContainerOutput_dir}:rw
/opt/kgtools/images/dpam/latest/dpam.sif
/bin/bash -c "cd ${ContainerOutput_dir}; run_dpam.py ${Dataset} ${Threads} --log_file ${log_file}" 2>&1 | tee ${log_file}
the "--log_file ${log_file}" does not work (that is to say: many files but all say "done",
except for step4 (step4_sw5b_75tcm94x.log), but since the program continues I added: 2>&1 | tee ${log_file} to capture more data
I could send it, but it is quite large (35Mb)
Do you have some suggestions, or need more input (I will be happy to provide)
regards
Raymond
Hi CongLab!,
I am curently trying to use you'r tool!
I tried at first using it on a test directory with 15 proteins, I provided the json and pdb files as needed and it worked as it should.
now I am trying to run it on a very large input 20200 proteins and I keep getting stuck at step 3.
Is there a limit to the test size ? or is there any other issues with running dpam on such a large test?
I keep running on this error (taken from the docker log file):
"Traceback (most recent call last):
File "/opt/DPAM/scripts/run_step3.py", line 100, in
time.sleep(1)
NameError: name 'time' is not defined"
and before gettig this error I get:
step 1 and step 2 are done as they should I got the log files with done.
thankyou!
Hi,
I was excited that the preprocessed data for the model organisms had been added but when I opened, I realized that they have no annotation. Could it be added that what kind of domains was hit by that region? This way we need to parse and search again to see what other domains they are similar to.
Thank you!
Hi,
I got an error when testing the DPAM.py using AF-E7MCA2-F1 model from the AF2 database. I'm not sure how to fix it. Could you help with this error? Thanks!
Hi,
I noticed that you'd updated the DPAM in your recently published PNAS paper (https://www.pnas.org/doi/10.1073/pnas.2214069120).
Would you like to update the PNAS version scripts here? Thanks for your time.
Best.
I think this script is missing from GitHub
FileNotFoundError: [Errno 2] No such file or directory: '/home/hunter/projects/homology/DPAM/pdb2fasta ranked_0.pdb > ranked_0.fa'```
Dear DPAM,
Any chance you have a docker? If not, can you consider making one?
Thanks!
Hi, I just test DPAM on my local Linux server using AF-A0A0K2WPR7-F1-model_v4 protein model from Caenorhabditis elegans. It seems to be executed well, but some warnings appeared.
the log file is attached here.
log.txt
And the final number of found domains was 4, but in the model_organisms folder it's 3.
4 domains:
D1 31-65
D2 141-165
D3 166-260
D4 296-320
3 domains:
D1 6-65
D2 156-260
D3 261-335
Has the tool been run well? Should I fix these warnings? And I'm not sure the difference between two results was caused by the update of database or some other reasons.
Could you provide some demo AF2 models and results for the test?
Many thanks!
Hi,
Looks like a useful package! I am wondering if it possible to just extract "domain-like" regions of a protein, without any mapping to ECOD, etc? Just stating that from reside X to Y, it is a possible folded domain.
Thank you!
Hi CongLab! I met an error that said pdb2fasta: Permission denied
May I know how to deal with this?
Thanks!
~/test/test_DPAM/DPAM$ python DPAM.py AF-Q9UQB3-F1-model_v4.pdb AF-Q9UQB3-F1-predicted_aligned_error_v4.json AF-Q9UQB3-F1-model_v4 /home/jupyter/test/test_DPAM/DPAM/ 16 /home/jupyter/script/DPAM/database/
cp: 'AF-Q9UQB3-F1-model_v4.pdb' and '/home/jupyter/test/test_DPAM/DPAM//AF-Q9UQB3-F1-model_v4.pdb' are the same file
start input processing 2023-11-08 02:44:41.288828
sh: 1: /home/jupyter/test/test_DPAM/DPAM/pdb2fasta: Permission denied
Hi,
Is it possible to parser domain without PAE json files?
Thanks
Hello, I have been trying to run DPAM locally with the updated scripts and I compared some results to what you had uploaded in this repository. In most cases, it was giving different domain boundaries. In some cases entires domains were different. For example for O00592, my local run produces
D1 376-434
But this repository has:
D1 1-15,456-495
D2 357-434
Is there a reason to expect these differences? And which should we believe in case of these differences?
Hi,
Thanks for the work you've put into this - it looks really useful!
I've been looking at extracting linker regions between domains using DPAM, and I noticed something a little odd. If you plot the length of a linker region (i.e., in between the end of a domain and before the start of the next domain), it looks a lot like the lengths of linkers is a multiple of 5 (just showing linkers up to 100 amino acids long).
It looks suspiciously like an artefact. Is there any way that multiples of 5 snuck in to the domain calling somehow? I'm using the pre-generated Homo sapiens DPAM data from this repository, and extracting the domain starts/ends into a big table, which I then transform.
Right now the code reads:
if basedir[0] != '/': basedir = os.getcwd() + basedir
It should read:
if basedir[0] != '/': basedir = os.getcwd() + '/' + basedir
Hi,
First of all, I would like to thank the author for creating a great tool.
The problem I am having is very simple. Currently, DPAM uses json to read predicted aligned errors (for AF2 database and Colabfold). But Alphafold2 running locally contains these information in the pkl file.
Below is an example output from Alphafold2 (ver. 2.2.4) running on local.
features.pkl relaxed_model_1_pred_0.pdb result_model_4_pred_0.pkl
msas relaxed_model_2_pred_0.pdb result_model_5_pred_0.pkl
ranked_0.pdb relaxed_model_3_pred_0.pdb timings.json
ranked_1.pdb relaxed_model_4_pred_0.pdb unrelaxed_model_1_pred_0.pdb
ranked_2.pdb relaxed_model_5_pred_0.pdb unrelaxed_model_2_pred_0.pdb
ranked_3.pdb result_model_1_pred_0.pkl unrelaxed_model_3_pred_0.pdb
ranked_4.pdb result_model_2_pred_0.pkl unrelaxed_model_4_pred_0.pdb
ranking_debug.json result_model_3_pred_0.pkl unrelaxed_model_5_pred_0.pdb
Do you have any plans to support pkl files in the future?
Best,
Keigo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.