anton-bushuiev / ppiformer Goto Github PK

View Code? Open in Web Editor NEW

36.0 5.0 3.0 2.12 MB

Learning to design protein-protein interactions with enhanced generalization (ICLR24)

Home Page: https://arxiv.org/abs/2310.18515

License: MIT License

TeX 0.03% Jupyter Notebook 93.13% Python 6.78% Shell 0.06%

equivariant-representations machine-learning protein-design protein-protein-interactions proteins

ppiformer's Introduction

ppiformer's People

Contributors

Stargazers

Watchers

Forkers

kehan777 ardeat heiidii

ppiformer's Issues

Possible reason for a counter-intuitive result

Hello.
Thanks for this awesome work (along with PPIRef).

I just tried PPIformer on one of my structures.

I just did an alanine scan and most of the predictions make sense.
There are two Alanine mutations away from the interface where the affinity actually increases but PPIformer shows a slight destabilization. This is not unexpected as these mutations possibly act by stabilizing the unbound structure which is hard to understand.

What's unintuitive is that there are two neighboring residues, both asparagine, one is making a hydrogen bond with the other chain and the other one is facing the solvent and not making any interactions. But PPIFormer predicts the mutation of second asparagine (the one that makes no interactions with the partner) to Alanine to be more destabilizing than the first.

What could be the reason for such behavior?

Unfortunately, I can't share this structure but I will try to reproduce this on other structures that I can share.
Best,
Amin.

Pre-training details about PPIRef

Hi, Congrats on your excellent work!

I try to pretrain my model on PPIRef50K. However, it seems that there is no specific description of the data-splitting strategy in your paper. How do you split the PPIRef50K for pre-training, conduct the validation, and select the pre-training checkpoint? Besides, given several protein-protein interfaces, how do you construct batches for training? Specifically, do you crop interfaces into patches and conduct padding like RDE? I would really appreciate it if I could get your help! Thank you!

Regards,
Ralph

ESM-IF Evaluation

Dear Authors,

I am evaluating a fine-tuned version of ESM-IF against the base model on the SKEMPI Dataset. I was getting quite bad results with the base model so I decided to compare with your results on the test set to see if they matched, but my results are much worse.

Here is my code, score_sequence_in_complex is from the github, ll means log likelihood:

        lls = []
        wt_lls = []
        for mutation_info, mut_idx in mutation_info_and_idx:
            wt_res, mutated_chain, mutant_res = mutation_info
            mut_chain_seq = row[mutated_chain]
            wt_chain_seq = mut_chain_seq[:mut_idx] + wt_res + mut_chain_seq[mut_idx + 1:]
            ll, _ = score_sequence_in_complex(model, alphabet, all_coords, mutated_chain, mut_chain_seq)
            wt_ll, _ = score_sequence_in_complex(model, alphabet, all_coords, mutated_chain, wt_chain_seq)
            wt_lls.append(wt_ll)
            lls.append(ll) 
        avg_ll = np.average(lls)
        avg_wt_ll = np.average(wt_lls)
        avg_ll -= avg_wt_ll

I calculate correlations of the ll's against float(row['wt_affinity'] - row['affinity'])), corresponding to the respective columns in the SKEMPIv2 CSV. I have tried taking the log as well. You mention ddG for SKEMPI but I only see affinity values in the form of K values so I assume that's what you mean. I have also tried of subtracting wt from mutant log likelihood and then averaging after and got similar results.

I load in the structure like this by the way also using the esm githubs methods:

    structure = esm.inverse_folding.util.load_structure(os.path.join(args.pdb_dir,pdb_path), list(chains))
    all_coords, _ = extract_coords_from_complex(structure)

Here are my results:

{'barnase': {'r': -0.0038375418409582057, 'rho': 0.09931680817890196, 'auroc': 0.7229199372056514},
'e6': {'r': 0.3046623623188521, 'rho': 0.23827270137182446, 'auroc': 0.6396396396396395},
'h3': {'r': -0.009171801888835096, 'rho': 0.1337611181017528, 'auroc': 0.7232142857142857},
'c3d': {'r': 0.3005179144816237, 'rho': 0.3666666666666667, 'auroc': 0.8},
'thermophilum': {'r': -0.0365999651406376, 'rho': -0.054879068998719374, 'auroc': 0.48000000000000004}}

Avg Spearman Rho: 0.150286
Avg Pearson R: 0.096758
Avg AUROC: 0.68039

Would you please share your code so I can better understand what I am doing wrong? That will probably be better than trying to figure out the error in what I'm doing I assume. I hope that's not a problem given this is just an evaluation of a baseline.

Best,
Talal

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.