Coder Social home page Coder Social logo

anton-bushuiev / ppiformer Goto Github PK

View Code? Open in Web Editor NEW
36.0 5.0 3.0 2.12 MB

Learning to design protein-protein interactions with enhanced generalization (ICLR24)

Home Page: https://arxiv.org/abs/2310.18515

License: MIT License

TeX 0.03% Jupyter Notebook 93.13% Python 6.78% Shell 0.06%
equivariant-representations machine-learning protein-design protein-protein-interactions proteins

ppiformer's Introduction

ppiformer's People

Contributors

anton-bushuiev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ppiformer's Issues

Possible reason for a counter-intuitive result

Hello.
Thanks for this awesome work (along with PPIRef).

I just tried PPIformer on one of my structures.

I just did an alanine scan and most of the predictions make sense.
There are two Alanine mutations away from the interface where the affinity actually increases but PPIformer shows a slight destabilization. This is not unexpected as these mutations possibly act by stabilizing the unbound structure which is hard to understand.

What's unintuitive is that there are two neighboring residues, both asparagine, one is making a hydrogen bond with the other chain and the other one is facing the solvent and not making any interactions. But PPIFormer predicts the mutation of second asparagine (the one that makes no interactions with the partner) to Alanine to be more destabilizing than the first.

What could be the reason for such behavior?

Unfortunately, I can't share this structure but I will try to reproduce this on other structures that I can share.
Best,
Amin.

Pre-training details about PPIRef

Hi, Congrats on your excellent work!

I try to pretrain my model on PPIRef50K. However, it seems that there is no specific description of the data-splitting strategy in your paper. How do you split the PPIRef50K for pre-training, conduct the validation, and select the pre-training checkpoint? Besides, given several protein-protein interfaces, how do you construct batches for training? Specifically, do you crop interfaces into patches and conduct padding like RDE? I would really appreciate it if I could get your help! Thank you!

Regards,
Ralph

ESM-IF Evaluation

Dear Authors,

I am evaluating a fine-tuned version of ESM-IF against the base model on the SKEMPI Dataset. I was getting quite bad results with the base model so I decided to compare with your results on the test set to see if they matched, but my results are much worse.

Here is my code, score_sequence_in_complex is from the github, ll means log likelihood:

        lls = []
        wt_lls = []
        for mutation_info, mut_idx in mutation_info_and_idx:
            wt_res, mutated_chain, mutant_res = mutation_info
            mut_chain_seq = row[mutated_chain]
            wt_chain_seq = mut_chain_seq[:mut_idx] + wt_res + mut_chain_seq[mut_idx + 1:]
            ll, _ = score_sequence_in_complex(model, alphabet, all_coords, mutated_chain, mut_chain_seq)
            wt_ll, _ = score_sequence_in_complex(model, alphabet, all_coords, mutated_chain, wt_chain_seq)
            wt_lls.append(wt_ll)
            lls.append(ll) 
        avg_ll = np.average(lls)
        avg_wt_ll = np.average(wt_lls)
        avg_ll -= avg_wt_ll

I calculate correlations of the ll's against float(row['wt_affinity'] - row['affinity'])), corresponding to the respective columns in the SKEMPIv2 CSV. I have tried taking the log as well. You mention ddG for SKEMPI but I only see affinity values in the form of K values so I assume that's what you mean. I have also tried of subtracting wt from mutant log likelihood and then averaging after and got similar results.

I load in the structure like this by the way also using the esm githubs methods:

    structure = esm.inverse_folding.util.load_structure(os.path.join(args.pdb_dir,pdb_path), list(chains))
    all_coords, _ = extract_coords_from_complex(structure)

Here are my results:

{'barnase': {'r': -0.0038375418409582057, 'rho': 0.09931680817890196, 'auroc': 0.7229199372056514},
'e6': {'r': 0.3046623623188521, 'rho': 0.23827270137182446, 'auroc': 0.6396396396396395},
'h3': {'r': -0.009171801888835096, 'rho': 0.1337611181017528, 'auroc': 0.7232142857142857},
'c3d': {'r': 0.3005179144816237, 'rho': 0.3666666666666667, 'auroc': 0.8},
'thermophilum': {'r': -0.0365999651406376, 'rho': -0.054879068998719374, 'auroc': 0.48000000000000004}}

Avg Spearman Rho: 0.150286
Avg Pearson R: 0.096758
Avg AUROC: 0.68039

Would you please share your code so I can better understand what I am doing wrong? That will probably be better than trying to figure out the error in what I'm doing I assume. I hope that's not a problem given this is just an evaluation of a baseline.

Best,
Talal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.