anton-bushuiev / ppiformer Goto Github PK
View Code? Open in Web Editor NEWLearning to design protein-protein interactions with enhanced generalization (ICLR24)
Home Page: https://arxiv.org/abs/2310.18515
License: MIT License
Learning to design protein-protein interactions with enhanced generalization (ICLR24)
Home Page: https://arxiv.org/abs/2310.18515
License: MIT License
Hello.
Thanks for this awesome work (along with PPIRef).
I just tried PPIformer on one of my structures.
I just did an alanine scan and most of the predictions make sense.
There are two Alanine mutations away from the interface where the affinity actually increases but PPIformer shows a slight destabilization. This is not unexpected as these mutations possibly act by stabilizing the unbound structure which is hard to understand.
What's unintuitive is that there are two neighboring residues, both asparagine, one is making a hydrogen bond with the other chain and the other one is facing the solvent and not making any interactions. But PPIFormer predicts the mutation of second asparagine (the one that makes no interactions with the partner) to Alanine to be more destabilizing than the first.
What could be the reason for such behavior?
Unfortunately, I can't share this structure but I will try to reproduce this on other structures that I can share.
Best,
Amin.
Hi, Congrats on your excellent work!
I try to pretrain my model on PPIRef50K. However, it seems that there is no specific description of the data-splitting strategy in your paper. How do you split the PPIRef50K for pre-training, conduct the validation, and select the pre-training checkpoint? Besides, given several protein-protein interfaces, how do you construct batches for training? Specifically, do you crop interfaces into patches and conduct padding like RDE? I would really appreciate it if I could get your help! Thank you!
Regards,
Ralph
Dear Authors,
I am evaluating a fine-tuned version of ESM-IF against the base model on the SKEMPI Dataset. I was getting quite bad results with the base model so I decided to compare with your results on the test set to see if they matched, but my results are much worse.
Here is my code, score_sequence_in_complex is from the github, ll means log likelihood:
lls = []
wt_lls = []
for mutation_info, mut_idx in mutation_info_and_idx:
wt_res, mutated_chain, mutant_res = mutation_info
mut_chain_seq = row[mutated_chain]
wt_chain_seq = mut_chain_seq[:mut_idx] + wt_res + mut_chain_seq[mut_idx + 1:]
ll, _ = score_sequence_in_complex(model, alphabet, all_coords, mutated_chain, mut_chain_seq)
wt_ll, _ = score_sequence_in_complex(model, alphabet, all_coords, mutated_chain, wt_chain_seq)
wt_lls.append(wt_ll)
lls.append(ll)
avg_ll = np.average(lls)
avg_wt_ll = np.average(wt_lls)
avg_ll -= avg_wt_ll
I calculate correlations of the ll's against float(row['wt_affinity'] - row['affinity'])), corresponding to the respective columns in the SKEMPIv2 CSV. I have tried taking the log as well. You mention ddG for SKEMPI but I only see affinity values in the form of K values so I assume that's what you mean. I have also tried of subtracting wt from mutant log likelihood and then averaging after and got similar results.
I load in the structure like this by the way also using the esm githubs methods:
structure = esm.inverse_folding.util.load_structure(os.path.join(args.pdb_dir,pdb_path), list(chains))
all_coords, _ = extract_coords_from_complex(structure)
Here are my results:
{'barnase': {'r': -0.0038375418409582057, 'rho': 0.09931680817890196, 'auroc': 0.7229199372056514},
'e6': {'r': 0.3046623623188521, 'rho': 0.23827270137182446, 'auroc': 0.6396396396396395},
'h3': {'r': -0.009171801888835096, 'rho': 0.1337611181017528, 'auroc': 0.7232142857142857},
'c3d': {'r': 0.3005179144816237, 'rho': 0.3666666666666667, 'auroc': 0.8},
'thermophilum': {'r': -0.0365999651406376, 'rho': -0.054879068998719374, 'auroc': 0.48000000000000004}}
Avg Spearman Rho: 0.150286
Avg Pearson R: 0.096758
Avg AUROC: 0.68039
Would you please share your code so I can better understand what I am doing wrong? That will probably be better than trying to figure out the error in what I'm doing I assume. I hope that's not a problem given this is just an evaluation of a baseline.
Best,
Talal
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.