Coder Social home page Coder Social logo

massivefold's Introduction

header

MassiveFold

This AlphaFold version aims at massively expand the sampling of structure predictions following Björn Wallner's AFsample version of AlphaFold (https://github.com/bjornwallner/alphafoldv2.2.0/) and to provide some optimizations in the computing. These optimizations are described below with the flags that were added to the genuine DeepMind's AlphaFold.

It was started with a fork of the DeepMind's AlphaFold v2.3.1 - 10/03/2023: https://github.com/deepmind/alphafold

Setup

The setup is the same as the one for AlphaFold v2.3 except that this repository has to be used instead of the DeepMind's one. However, v1 and v2 neural network (NN) model parameters have to be present in the param folder and should contain the version number in the name.
Therefore, the list of NN model parameters in the folder should be as follows:

params_model_1_multimer_v1.npz
params_model_1_multimer_v2.npz
params_model_1_multimer_v3.npz
params_model_1.npz
params_model_1_ptm.npz
params_model_2_multimer_v1.npz
params_model_2_multimer_v2.npz
params_model_2_multimer_v3.npz
params_model_2.npz
params_model_2_ptm.npz
params_model_3_multimer_v1.npz
params_model_3_multimer_v2.npz
params_model_3_multimer_v3.npz
params_model_3.npz
params_model_3_ptm.npz
params_model_4_multimer_v1.npz
params_model_4_multimer_v2.npz
params_model_4_multimer_v3.npz
params_model_4.npz
params_model_4_ptm.npz
params_model_5_multimer_v1.npz
params_model_5_multimer_v2.npz
params_model_5_multimer_v3.npz
params_model_5.npz
params_model_5_ptm.npz

Parameters for monomer and multimer v3 are available here: https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar
Parameters for monomer and multimer v2 are available here: https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar
Parameters for monomer and multimer v1 are available here: https://storage.googleapis.com/alphafold/alphafold_params_2021-10-27.tar

Added flags

Here is the list of the flags added to AlphaFold 2.3.1 (this intermediate version of AlphaFold includes the flags of 2.3.2) and their description, also accessible through the --help option.

--alignments_only: whether to generate only alignments. Only alignments will be generated by the data pipeline, the modelling will not be performed (default: 'false')
--dropout: turn on drop out during inference to get more diversity (default: 'false')
--dropout_rates_filename: provide dropout rates at inference from a json file. If None, default rates are used, if "dropout" is True.
--max_recycles: maximum number of recycles to run (default: '20') (an integer)
--early_stop_tolerance: early stopping threshold for recycling (default: '0.5') (a number)
--bfd_max_hits: max hits in BFD/uniref MSA (default: '100000') (an integer)
--mgnify_max_hits: max hits in mgnify MSA (default: '501') (an integer)
--uniprot_max_hits: max hits in uniprot MSA (default: '50000') (an integer)
--uniref_max_hits: max hits in uniref MSA (default: '10000') (an integer)
--model_preset: <monomer|monomer_casp14|monomer_ptm|multimer>:
     choose preset model configuration - monomer model,
     monomer model with extra ensembling, monomer model with pTM head, or
     multimer model; "multimer" computes the 3 versions of multimer models by default
     if models are not specified in the --models_to_use flag
     (default: 'monomer')
--models_to_use: specify which models in --model_preset that should be run, each model should be formated,
     for monomer and monomer_casp14 as model_X_, with X the number of the model,
     for monomer_ptm as model_X _ptm, with X the number of the model,
     for multimer as model_X_multimer_vY with X the number of the model and Y
     the version of the model.')
     (a comma separated list)
--num_predictions_per_model: how many predictions (each with a different random seed) will be
     generated per model. e.g. if this is 2 and there are 5 models then there will be 10 predictions per input.
     Note: this FLAG works for monomer and multimer
     (default: '5')
--start_prediction: model to start with, can be used to parallelize jobs,
     e.g. --num_predictions_per_model 20 --start_prediction 20 will only make model _20
     e.g. --num_predictions_per_model 21 --start_prediction 20 will make model _20 and _21 etc.
     (default: '1')
--no_templates: will not use any template, will be faster than filter by date (default: 'false')
--template_mmcif_dir: path to a directory with template mmCIF structures, each named <pdb_id>.cif

Dropout

The dropout at inference can be activated with the --dropout flag set to true. In this case, the same dropout rates as those used by DeepMind at training are used. Here are DeepMind's architectural details (Jumper J et al, Nature, 2021 - Fig 3.a), annotated by Björn Wallner for CASP15 (https://predictioncenter.org/), that shows the various dropout rates:

Dropout

However, the --dropout_rates_filename flag allows to modify these rates, providing them in a json file. Here is an example of the content of such a json file:

{  
    "dropout_rate_msa_row_attention_with_pair_bias": 0.15,  
    "dropout_rate_msa_column_attention": 0.0,  
    "dropout_rate_msa_transition": 0.0,  
    "dropout_rate_outer_product_mean": 0.0,  
    "dropout_rate_triangle_attention_starting_node": 0.25,  
    "dropout_rate_triangle_attention_ending_node": 0.25,  
    "dropout_rate_triangle_multiplication_outgoing": 0.25,  
    "dropout_rate_triangle_multiplication_incoming": 0.25,  
    "dropout_rate_pair_transition": 0.0,  
    "dropout_rate_structure_module": 0.1  
}  

Example

Here is an example how to run a multimer prediction with all versions of model parameters, without templates, activating dropout at inference, with 100 recycles max and early stop tolerance set at 0.2 Angströms. The flags can be set in a separated text file called for instance flags.flg and called by the command line:

python3 ./run_alphafold.py --flagfile=./flags.flg

the flags.flg flag file containing:

--fasta_paths=./seq.fasta
--output_dir=./output
--data_dir=path_to_set
--uniref90_database_path=path_to_set
--mgnify_database_path=path_to_set
--template_mmcif_dir=path_to_set
--obsolete_pdbs_path=path_to_set
--bfd_database_path=path_to_set
--pdb_seqres_database_path=path_to_set
--uniref30_database_path=path_to_set
--uniprot_database_path=path_to_set
--max_template_date=2023-05-01
--use_precomputed_msas=true
--num_predictions_per_model=5
--models_to_relax=best
--use_gpu_relax=true
--alignments_only=false
--dropout=true
--dropout_rates_filename=
--max_recycles=100
--early_stop_tolerance=0.2
--bfd_max_hits=100000
--mgnify_max_hits=501
--uniprot_max_hits=50000
--uniref_max_hits=10000
--model_preset=multimer
--models_to_use=
--start_prediction=1
--no_templates=true

To only use a selection of models, separate them with a comma in the --models_to_use flag, e.g.:
--models_to_use=model_3_multimer_v1,model_3_multimer_v3

A script is also provided to relax only one structure. The pkl file of the prediction has to be given in parameters and the features.pkl file must be present in the folder. e.g.:

python3 run_relax_from_results_pkl.py result_model_4_multimer_v3_pred_0.pkl

Authors

Guillaume Brysbaert (UGSF - UMR 8576, France)
Nessim Raouraoua (UGSF - UMR 8576, France)
Christophe Blanchet (IFB, France)
Claudio Mirabello (NBIS, Sweden)
Björn Wallner (Linköping University, Sweden)

This work was carried out as part of the Work Package 4 of the MUDIS4LS project lead by the French Bioinformatics Institute (IFB). It was initiated at the IDRIS Open Hackathon, part of the Open Hackathons program. The authors would like to acknowledge OpenACC-Standard.org for their support.

massivefold's People

Contributors

gblille avatar augustin-zidek avatar htomlinson14 avatar clami66 avatar tomwardio avatar rich12321 avatar nessimraou avatar andycowie avatar alexanderimanicowenrivers avatar hawkinsp avatar sbodenstein avatar mfigurnov avatar rchen152 avatar saran-t avatar tfgg avatar tomhennigan avatar bjornwallner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.