aqlaboratory / openfold Goto Github PK

View Code? Open in Web Editor NEW

2.6K 2.6K 474.0 15.6 MB

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2

License: Apache License 2.0

Python 93.17% Shell 2.84% Jupyter Notebook 3.18% Dockerfile 0.13% C 0.01% C++ 0.17% Cuda 0.49%

alphafold2 protein-structure pytorch

openfold's People

Contributors

Stargazers

Watchers

Forkers

genesandatshirt lilleswing akichinguyen rdk kaushikb11 abhinavm24 davidlandup0 tinkeringengr baba-hashimoto md-club tianflame yuzhiguo07 zchwang das-projects hochshi xxzhai123 laplacekorea ginobacallao chopralab sadikrajabasha nanogenomic patman01 lyrl jiancheng-ai meet-ai cclauss sailfish009 szaman19 ebetica cmeninwa jacknicoludis huni-ml biocheming mahdeto brennanaba davidcjuergens mbordyuh rubenszimbres dnlebard basicskywards adbmd jun-lizst lhatsk sofroniewn 404notfound101 cutecows zwormz xuyinghui8888 xinshi-chen drug-matrix hmms117 ndnng gurvindersingh noahpieta mzheng3 chinggyliu owenvickery alip67 guolinke seanahmad techthiyanes hejujie sword1018 martianmartina 20171130 yoshitakamo liuylide huhlim mevol co2e14 nsridhar1 r1ckya adaszews schrodinger yhgon orezra37 ai-and-ml ericmjl mrauha merouone r-krishna cyrusbiotechnology truatpasteurdotfr kirito-ausna ville761 ychnh panganqi empyriumz utdal l4fl4m3 khatvangi kiddozhu crochereau mydkzgj abhik1368 zhang038 daasin foss-archives baselhindi raidene1

openfold's Issues

data_modules.py have not prepare_data function

line 131 in train_openfold.py , data_module.prepare_data error, there is no prepare_data in OpenFoldDataModule class, Is it missing?

Training duration & NaNs during training

First of all, great work!

I'm wondering what training times I can expect for a single target. I'm currently at 1min/ it (sample) which seems too slow (v100s with fp16 and deepspeed activated, crop size 256). The official implementation takes around 20sec for a comparable sample (single GPU, about 16s with an A100). Haven't tested how much of an overhead is introduced by deepspeed. Gradient accumulation should help to reduce this.

Is it actually possible to train batch size > 1 on a single GPU? I'm assuming it would work with fixed_size=True. I just vaguely remember that they did some dimensionality juggling with the template/ recycling dimensions which might interfere.

Thanks!

About the evaluation in CASP14

Hello everyone. I am doing some evaluation jobs of the inference pipeline. I am wondering how to evaluate the result pdb file such like TM-score, for those proteins that CASP14 doesn't provide the remarking pdb file.

pdb files not exist in mmcif dir

Hi,
Thank for the last time it helped me.
However, now I have another error.
After running the training from ProteinNet input:

python /data/openfold/train_openfold.py /data/af_databases/pdb_mmcif/mmcif_files/ /home/ubuntu/ProteinNet_parsed/ProteinNet_lc/ /data/af_databases/pdb_mmcif/mmcif_files/ /home/ubuntu/OF_train_from_Protein_Net/try_1_Dec29_2021/ 2021-10-10 --template_release_dates_cache_path /data/af_databases/pdb_mmcif/mmcif_cache.json --precision 16 --replace_sampler_ddp=True--deepspeed_config /data/deepspeed_config.json --default_root_dir /home/ubuntu/OF_train_from_Protein_Net/try_1_Dec29_2021/ --gpus 1 --seed 44

I got this error:
###############

Epoch 0: 0%| | 0/50939 [00:00<?, ?it/s]Traceback (most recent call last):
File "/data/openfold/train_openfold.py", line 336, in
main(args)
File "/data/openfold/train_openfold.py", line 196, in main
ckpt_path=ckpt_path,
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 736, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1193, in _run
self._dispatch()
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1272, in _dispatch
self.training_type_plugin.start_training(self)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1282, in run_stage
return self._run_train()
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1312, in _run_train
self.fit_loop.run()
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 140, in run
self.on_run_start(*args, **kwargs)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 141, in on_run_start
self._dataloader_iter = _update_dataloader_iter(data_fetcher, self.batch_idx + 1)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/utilities.py", line 121, in _update_dataloader_iter
dataloader_iter = enumerate(data_fetcher, batch_idx)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 199, in iter
self.prefetching(self.prefetch_batches)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 258, in prefetching
self._fetch_next_batch()
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 300, in _fetch_next_batch
batch = next(self.dataloader_iter)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 536, in next
return self.request_next_batch(self.loader_iters)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 548, in request_next_batch
return apply_to_collection(loader_iters, Iterator, next)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 92, in apply_to_collection
return function(data, *args, **kwargs)
File "/data/openfold/openfold/data/data_modules.py", line 350, in _batch_prop_gen
for batch in iterator:
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/openfold/openfold/data/data_modules.py", line 178, in getitem
chain_id=chain_id,
File "/data/openfold/openfold/data/data_pipeline.py", line 577, in process_pdb
with open(pdb_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/af_databases/pdb_mmcif/mmcif_files/4l6v_9.pdb'

Epoch 0: 0%| | 0/50939 [00:00<?, ?it/s]

Thanks,
Oz

trained parameters

Hi, will you be releasing the parameters on a non-academic license as well ? Or do we have to train it from scratch?

Error while installing dependencies

When I run

scripts/install_third_party_dependencies.sh

it fails at the last step

gzip: tests/test_data/sample_feats.pickle.gz: No such file or directory

This is because the file sample_feats.pickle.gz is not downloaded.

More components of the model should be TorchScript-compatible

As it stands, only the attention primitives Attention and GlobalAttention are TorchScript-ed (or, for that matter, TorchScript-able) during inference. For better runtimes and memory allocation, more of the network's modules---especially in the Evoformer---should be made compatible with TorchScript. In my estimation, the biggest hurdle before this goal is the inference-time chunking functionality, which currently makes heavy use of function pointers not supported by TorchScript.

Colab not Working !!! Error when importing `datapipeline`

Great work with reproducing the original code and creating a OpenSource PyTorch Implementation ☕️☕️☕️☕️

When I try to run the attached Colab Notebook, In the "Search against genetic databases" subsection while importing datapipeline from openfold.data, I run into a ImportError, viz.

ImportError: cannot import name 'MultipleChainsError' from 'openfold.data.templates' (/opt/conda/lib/python3.7/site-packages/openfold/data/templates.py)

The full traceback is attached below :-

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-7-8051d602620b> in <module>()
     27 from openfold.data import feature_pipeline
     28 from openfold.data import parsers
---> 29 from openfold.data import data_pipeline
     30 from openfold.data.tools import jackhmmer
     31 from openfold.model import model

/opt/conda/lib/python3.7/site-packages/openfold/data/data_pipeline.py in <module>()
     20 import numpy as np
     21 
---> 22 from openfold.data import templates, parsers, mmcif_parsing
     23 from openfold.data.tools import jackhmmer, hhblits, hhsearch
     24 from openfold.data.tools.utils import to_date

/opt/conda/lib/python3.7/site-packages/openfold/data/templates.py in <module>()
     26 import numpy as np
     27 
---> 28 from openfold.data import parsers, mmcif_parsing
     29 from openfold.data.tools import kalign
     30 from openfold.data.tools.utils import to_date

/opt/conda/lib/python3.7/site-packages/openfold/data/mmcif_parsing.py in <module>()
     27 import numpy as np
     28 
---> 29 from openfold.data.templates import MultipleChainsError
     30 import openfold.np.residue_constants as residue_constants
     31 

ImportError: cannot import name 'MultipleChainsError' from 'openfold.data.templates' (/opt/conda/lib/python3.7/site-packages/openfold/data/templates.py)

Interesting enough if I add from openfold.data.templates import MultipleChainsError I run into a circular ImportError, the error trace is attached below

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-8-71256580fa0c> in <module>()
     27 from openfold.data import feature_pipeline
     28 from openfold.data import parsers
---> 29 from openfold.data.templates import MultipleChainsError
     30 from openfold.data import data_pipeline
     31 from openfold.data.tools import jackhmmer

/opt/conda/lib/python3.7/site-packages/openfold/data/templates.py in <module>()
     26 import numpy as np
     27 
---> 28 from openfold.data import parsers, mmcif_parsing
     29 from openfold.data.tools import kalign
     30 from openfold.data.tools.utils import to_date

/opt/conda/lib/python3.7/site-packages/openfold/data/mmcif_parsing.py in <module>()
     27 import numpy as np
     28 
---> 29 from openfold.data.templates import MultipleChainsError
     30 import openfold.np.residue_constants as residue_constants
     31 

ImportError: cannot import name 'MultipleChainsError' from 'openfold.data.templates' (/opt/conda/lib/python3.7/site-packages/openfold/data/templates.py)

Enabling compatibility with the latest OpenMM release

Our recent OpenMM 7.6 release included some namespace changes that look liked they required pinning this repo to OpenMM 7.5.1.

Is it OK with you folks if we propose a pull request that should enable compatibility with OpenMM 7.6 and later (ideally without breaking backwards compatibility as well)?

Issue in prep_mmseqs_dbs.sh

Hi, Iam running the script prep_mmseqs_dbs.sh. Ive done the corrections in script changing tar2exprofiledb to tsv2exprofiledb.

but the script extract the files and return the following error:

uniclust30_2018_08/uniclust30_2018_08_a3m.ffdata
uniclust30_2018_08/uniclust30_2018_08_a3m.ffindex
uniclust30_2018_08/uniclust30_2018_08_hhm.ffdata
uniclust30_2018_08/uniclust30_2018_08_hhm.ffindex
uniclust30_2018_08/uniclust30_2018_08_cs219.ffdata
uniclust30_2018_08/uniclust30_2018_08_cs219.ffindex
uniclust30_2018_08/uniclust30_2018_08.cs219
uniclust30_2018_08/uniclust30_2018_08.cs219.sizes
uniclust30_2018_08/uniclust30_2018_08_a3m_db
uniclust30_2018_08/uniclust30_2018_08_a3m_db.index
uniclust30_2018_08/uniclust30_2018_08_hhm_db
uniclust30_2018_08/uniclust30_2018_08_hhm_db.index
uniclust30_2018_08/uniclust30_2018_08_md5sum
../../scripts/prep_mmseqs_dbs.sh: line 33: mmseqs: command not found

I got mmseqs installed.

Could anyone help me?

No checkpoints saved after validation epoch ends

Checkpoints are not saved after validation epoch ends. checkpoint_best_val is active

Validation loss is not shown during validation, maybe this is connected? (since it's supposed to track val_loss)

Severe memory fragmentation

In some cases, especially for larger crop sizes, intermediate tensors during training grow so large that PyTorch OOM's despite having allocated as little as 60% of available GPU memory. It would be good to carefully the profile the network to identify the worst culprit modules and come up with clean ways to prevent such degenerate tensor allocation.

OOM with bfloat16, no speed-up

New issue based on: #34

Turning on bfloat16 in deepspeed doesn't seem to have the desired effect. Model params size remains unchanged. Hitting OOM in validation which works fine in FP16.

Training with bfloat16 in pytorch-lightning fails:

File "openfold/openfold/utils/loss.py", line 46, in sigmoid_cross_entropy
log_p = torch.nn.functional.logsigmoid(logits)
RuntimeError: "log_sigmoid_forward_cuda" not implemented for 'BFloat16'

Support still missing in deepspeed? microsoft/DeepSpeed#974

Tested on A100 with torch 1.10.1+cu113

Sampling recycling iterations in validation

I was a bit surprised that the number of recycling iterations are sampled during validation. This makes different validation epochs less comparable and the progress less smooth. I think eval should mimic predict in this aspect.

max_iters = self.config.common.max_recycling_iters
if(stage_cfg.supervised):
    clamp_prob = self.config.supervised.clamp_prob
    keyed_probs.append(
        ("use_clamped_fape", [1 - clamp_prob, clamp_prob])
    )

if(self.stage == "train" and self.config.supervised.uniform_recycling):
    recycling_probs = [
        1. / (max_iters + 1) for _ in range(max_iters + 1)
    ]
    keyed_probs.append(
        ("no_recycling_iters", recycling_probs)
    )
else:
    recycling_probs = [
        0. for _ in range(max_iters + 1)
    ]
    recycling_probs[-1] = 1.
    keyed_probs.append(
        ("no_recycling_iters", recycling_probs)
    )

the consuming time for precompute alignments

I ran "precompute_alignments.py" to precompute 184,700 protein alignments before training the model because I want to use the same data as AlphaFold, but It took me ~4h to finish only one protein alignment (1yxq), so I want to know my operation is correct or not, besides, is there any precomputed alignments can be download to save my aligned time?

script error

when i use script_preset_(model_module), code error,
RuntimeError:
'Tensor' object has no attribute or method 'new_ones'.:
File "openfold/openfold/model/msa.py", line 118
if mask is None:
# [*, N_seq, N_res]
mask = m.new_ones(
~~~~~~~~~~ <--- HERE
m.shape[:-3] + (n_seq, n_res),
)

Data_transforms.py bug

Line:

openfold/openfold/data/data_transforms.py

Line 1139 in 03bb003

num_templates_crop_size = num_templates

should not exist. Otherwise most of the features are empty, when there are no templates.
See corresponding line in the original AF2:

https://github.com/lupoglaz/alphafold/blob/2d53ad87efedcbbda8e67ab3be96af769dbeae7d/alphafold/model/tf/data_transforms.py#L524

Some questions about running inference

When I use openfold to infer proteins, some of them can be inferred, but some of them will report errors. The reason for the error is probably: the template name searched out is outdated, and there is no outdated protein cif file in the template_mmcif_dir directory.

Below is an example of an error protein ：
5IZB_A
Traceback (most recent call last):
File "run_pretrained_openfold.py", line 253, in
main(args)
File "run_pretrained_openfold.py", line 118, in main
fasta_path=fasta_path, alignment_dir=local_alignment_dir
File "/home/jsr/openfold/openfold/data/data_pipeline.py", line 420, in process_fasta
self.template_featurizer,
File "/home/jsr/openfold/openfold/data/data_pipeline.py", line 55, in make_template_features
hits=hits_cat,
File "/home/jsr/openfold/openfold/data/templates.py", line 1059, in get_templates
kalign_binary_path=self._kalign_binary_path,
File "/home/jsr/openfold/openfold/data/templates.py", line 827, in _process_single_hit
with open(cif_path, "r") as cif_file:
FileNotFoundError: [Errno 2] No such file or directory: '/public/database/alphafold2_database/mmcif/mmcif_files/4zai.cif'

What should template_mmcif_dir be?

For training use ColabFold pipeline (and templates with HHsearch), there is a path template_mmcif_dir.
Should it be something like data/pdb_mmcif/mmcif_files/ or other precomputed folders？

Invalid Command: tar2exprofiledb

in openfold/scripts/prep_mmseqs_dbs.sh
I guess it should be mmseqs tsv2exprofiledb not mmseqs tar2exprofiledb

Also a bug at line 26: tar --extract --verbose --file="${DOWNLOAD_DIR}/${f}" \
I think it should be tar --extract --verbose --file="${f}" \

Dockerfile

Hey epic work!
Could you post a Dockerfile for training/inference?
Thanks!

About the memory

Well done！I am quitly wondering that using 4 TITAN 2080 with 12G, can i train this model? will i meet the error on out of the memory?

New entries in obsolete.dat will throw up errors.

Traceback (most recent call last):
  File "/ocean/projects/bio210060p/kadyan/openfold-release/scripts/precompute_te
mplate_hits.py", line 224, in <module>
    main(args, template_pipeline_runner)
  File "/ocean/projects/bio210060p/kadyan/openfold-release/scripts/precompute_te
mplate_hits.py", line 116, in main
    feature_dict = template_pipeline_runner.run(a3m_dir, fasta_file_path)
  File "/ocean/projects/bio210060p/kadyan/openfold-release/scripts/precompute_te
mplate_hits.py", line 80, in run
    alignment_dir=a3m_dir,
  File "/ocean/projects/bio210060p/kadyan/openfold-release/openfold/data/data_pi
peline.py", line 360, in process_fasta
    hits=hits_cat,
  File "/ocean/projects/bio210060p/kadyan/openfold-release/openfold/data/templat
es.py", line 1058, in get_templates
    kalign_binary_path=self._kalign_binary_path,
  File "/ocean/projects/bio210060p/kadyan/openfold-release/openfold/data/templat
es.py", line 828, in _process_single_hit
    with open(cif_path, "r") as cif_file:
FileNotFoundError: [Errno 2] No such file or directory: '/databases/pdb_mmcif/mmcif_files/6ek0.cif'

ISSUE: New entries added in obsolete.dat will fail because the corresponding replacements will not be found in the pre-downloaded pdb_mmcifs.

Acquiring MSAs

Thanks so much for an excellent repo!

I'm trying to weigh all of the options for acquiring MSAs in order to train the model. I could either 1) use trrosetta's MSAs, 2) Use Protein Net's MSAs, or 3) Make MSAs myself using MMSeqs2. Do you potentially know how these options compare and how long 3) would take?

Thanks!

Bug in affine_utils.py

Hi,

Thanks for the great effort. I was looking at

openfold/openfold/utils/affine_utils.py

Line 282 in a933bc7

translation = -1 * c_xyz

I wonder this line should be
translation = -1 * ca_xyz rather than translation = -1 * c_xyz?

The parameter mmcif_dir in precompute_alignments.py

I need to precompute protein alignments before training the model, but I didn't find the path mmcif_dir/.

"Module 'Attention' has no attribute 'linear_g' : "

Hi,
When I'm trying to train the model and running this command :

python /data/openfold/train_openfold.py /home/ubuntu/train_mmcif_Dec29_2021/ //home/ubuntu/ProteinNet_parsed/ProteinNet_MSA/ /data/af_databases/pdb_mmcif/mmcif_files/ /home/ubuntu/OF_train_from_ProteinNet_try1_Dec29_20210/ 2021-10-10 --template_release_dates_cache_path /data/af_databases/pdb_mmcif/mmcif_cache.json --precision 16 --replace_sampler_ddp=True --deepspeed_config_path /data/deepspeed_config.json --resume_from_ckpt ckpt_dir/ --gpus 1 --precision 16 --seed 44

I get this error -

"Module 'Attention' has no attribute 'linear_g' : "

I'm running it from the conda env (openfold_venv)

Thanks
Oz

About

issue in prep_mmseqs_dbs.sh

I noticed a small bug in the prep_mmseqs_dbs.sh. This script fails due to the lack of the mmseqs_dbs directory. I made a branch to try to make a pull request but I got an error saying permission was denied. I also updated the readme to fix the instruction for running this script (download_mmseqs_databases.sh -> download_mmseqs_dbs.sh, prep_mmseqs_databases.sh -> prep_mmseqs_dbs.sh). Here are the changed I propose to prep_mmseqs_dbs.sh:

#!/bin/bash
#
# Copyright 2021 AlQuraishi Laboratory 
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Downloads and unzips all required data for AlphaFold.
#
# Usage: bash download_all_data.sh /path/to/download/directory
set -e

DOWNLOAD_DIR="$1"
ROOT_DIR="${DOWNLOAD_DIR}/mmseqs_dbs"

mkdir --parents "${ROOT_DIR}"

for f in $(ls ${DOWNLOAD_DIR}/*.tar.gz)
do
  tar --extract --verbose --file="${f}" \
      --directory="${ROOT_DIR}"
  rm "${f}"
  BASENAME="$(basename {f%%.*})"
  DB_NAME="${BASENAME}_db"
  OLD_PWD=$(pwd)
  cd "${ROOT_DIR}"
  mmseqs tsv2exprofiledb "${BASENAME}" "${DB_NAME}"
  mmseqs createindex "${DB_NAME}" "${DOWNLOAD_DIR}/tmp/"
  cd "${OLD_PWD}"
done

Can openfold achieve AF2 performance?

Nice work! Can openfold achieve AF2 performance?

confidence per residue

[EDIT: I can see 'plddt' is part of the output, closing issue, will reopen if it's not the per-residue confidence score]

Thank you for this amazing repo!
Is there any suggested way to output the per-residue confidence score that AlphaFold produces?

OOM in validation

I get CUDA OOM error when I add my validation set, which I can predict just fine with run_pretrained_openfold.py

Are you limiting your validation set to a certain size? I assume the problem is because of the additional features necessary to compute the loss.

I had to do some changes to make validation work:
val needs to be changed to eval in data_modules, e.g.:
https://github.com/aqlaboratory/openfold/blob/main/openfold/data/data_modules.py#L153

The third argument "unclamped" no longer exists:
https://github.com/aqlaboratory/openfold/blob/main/openfold/data/data_modules.py#L188

Switch validation also to _output_raw=True

Clamped fape loss in validation

Currently, the fape loss is clamped in 90% of the cases during validation. I'm wondering if this should be made deterministic (always clamp or never clamp) to make validation runs more comparable.

training data

In the supplement of alphafold2 1.2.5, there are some filters, which are applied to the training data, does the latest code not include this part?

ddp error

when i use strategy='ddp', train_openfold.py error, follows:

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 4983 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.

about self.cached_weights in train_openfold.py

I ran train_openfold.py, and when I reached the validation set, I got an ‘OpenfoldWrapper’ object without cached_weights attribute. Can you help me see what is wrong?

Traceback (most recent call last):
File "train_openfold.py", line 370, in
main(args)
File "train_openfold.py", line 233, in main
ckpt_path=ckpt_path,
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 739, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 773, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1195, in _run
self._dispatch()
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch
self.training_type_plugin.start_training(self)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1285, in run_stage
return self._run_train()
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1315, in _run_train
self.fit_loop.run()
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 146, in run
self.on_advance_end()
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 242, in on_advance_end
self._run_validation()
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 337, in _run_validation
self.val_loop.run()
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 236, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(*args, **kwargs)
File "train_openfold.py", line 108, in validation_step
if(self.cached_weights is None):
File "/public/tools/anaconda3/envs/openfold/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1178, in getattr
type(self).name, name))
AttributeError: 'OpenFoldWrapper' object has no attribute 'cached_weights'

chunk_layer is memory-inefficient

The chunk_layer function in openfold/utils/tensor_utils.py, which implements the "chunking" procedure described in subsection 1.11.8 of the Alphafold 2 supplement, relies on a memory-expensive expand/reshape operation at the top to standardize the batch dimensions of input tensors. This operation can be a bottleneck during inference, so some optimization here would do wonders.

mmseqs tsv2exprofiledb issue with colabfold_envdb_202108

I was having issues with the prep_mmseqs_db.sh script, so I tried running the steps individually and I'm having an issue with running mmseqs tsv2exprofiledb with the colabfold_envdb_202108 database.

First, I downloaded this databases using the download_mmseqs_dbs.sh script and then ran the tar command according to the example in prep_mmseqs_dbs.sh such that I had a directory with the following files:

colabfold_envdb_202108.tsv           
colabfold_envdb_202108_seq.tsv  
colabfold_envdb_202108_aln.tsv 
colabfold_envdb_202108_h.tsv 
uniref30_2103.md5sums
uniref30_2103.tsv
uniref30_2103_h.tsv
uniref30_2103_aln.tsv  
uniref30_2103_seq.tsv

I then used mmseqs tsv2exprofiledb mmseqs_dbs/uniref30_2103 /mmseqs/uniref30_2103_db which seemed to complete without error (though there is no .idx file, which is supposed to be the output of this command, I believe), generating the following files:

uniref30_2103_db.dbtype    
uniref30_2103_db_seq_tmp
uniref30_2103_db.index     
uniref30_2103_db_seq_tmp.index.0
uniref30_2103_db.sh
uniref30_2103_db_h
uniref30_2103_db.0
uniref30_2103_db.1               
uniref30_2103_db_h.dbtype
uniref30_2103_db_h.index

However, when I tried to do the same with the colabfold_envdb_202108 database, it seemed to start correctly, but then was killed after a minute or two. The following files were generated:

colabfold_envdb_202108_db.sh   
colabfold_envdb_202108_db_h     
colabfold_envdb_202108_db_h.index.0

I used nohup and this is the extent of the output from that command:

tsv2exprofiledb /mmseqs_dbs/colabfold_envdb_202108 /mmseqs_dbs/colabfold_envdb_202108_db

MMseqs Version: 4f046dd1979ec87b440656ff13b12e5c525b8374
Verbosity       3

Killed

I'm wondering if I'm using an instance with insufficient RAM. Do you have an idea of the amount of RAM needed for the idx files?

Purpose of rc.MAP_HHBLITS_AATYPE_TO_OUR_AATYPE

Hi,

Thanks for a great repo!

I'm confused why the template's amino acids and the msa's amino acids are modified again using rc.MAP_HHBLITS_AATYPE_TO_OUR_AATYPE.

It seems like we read the amino acids from the pdb structure and convert them to ids using HHBLITS_AA_TO_ID. I'm wondering why we need to modify them again?

training speed is about 2x slower than JAX trainable version (Uni-Fold)

device: 1 A100 with 40GB memory
cuda: 11.3
Compared with https://github.com/dptech-corp/Uni-Fold, using model_2 setting, and the same data (only use one sample, and use DummyDataLoader in openfold).

And I follow this issue, #19, disabled clear_cache_between_blocks and deepspeed for cpu offload.
The commit I used is c4d9f57

speed per example:

	FP32	FP16
openfold	24.5 s	17 s
Uni-Fold	13.25 s	8.9 s

Is that expected? any tricks that I can get further speed-up?

Option to run in "de-novo" mode

Most of the scripts at https://github.com/sokrypton/ColabFold have a ton of additional flexibility that comes in handy when running AF on de-novo sequences (for which you usually can't generated an MSA) or to do protein-design with.

Can this codebase also be leveraged to:

run predictions without MSA input or template structures, so just the raw sequence input
increase the number of recycle iterations (since this has been shown to recover some of the accuracy lost by not having an MSA)

Low-memory attention a little slow

I've implemented low-memory attention (9670958) using an algorithm from a recent preprint (https://arxiv.org/pdf/2112.05682.pdf), enhanced a little bit with the ability to add multiple biases + batch dimensions. Lacking the JAX map & scan used in the original implementation, which I've had to replace with for loops, ours is quite a bit slower (exact figures depend heavily on the choice of chunk sizes, but it seems to be in the ballpark of 2x slower than our own standard Attention implementation). It would be nice to speed it up a little.

What are the different npz files?

From downloading DeepMind's pretrained parameters, there are 5 models and for each model there is a .npz file and a _ptm.npz file. May I know what the 5 different models are and what the corresponding _ptm.npz files mean?

Multimer

Hi,
Thanks for this great work.
Just wondering, is there a way to do complex (multimer) prediction as in alphafold multimer ?

Thanks
Oz

Question about loss weight

Hi all! Firstly, thanks for your work and effort! I noticed that in the config file, the weight for each loss is different than that in af2's paper. For example, the weight for angle loss is 1, instead of 0.3. Some of them, such as the violation loss, experimentally solved loss, have a weight of 0. Is there any reason that the weight is set up this way? For instance, for losses that have been assigned a weight of 0 in the implementation, are they still under testing? Thanks!

about self.cached_weights

Why do I report this error when I specify the validation set during training: AttributeError:'OpenFoldWrapper' object has no attribute'cached_weights'

Checkpointing Issue

Thanks for such a great repo! I get the following issue when running the model (but only when I use GPUs). I'm using torch checkpointing, not deepspeed. I saw an issue similar to this, but it seemed to be deepspeed-specific so I thought I'd repost.

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 2000 with name module.model.evoformer.blocks.19.pair_transition.linear_2.bias [For reference I have 20 blocks ] has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration.

how to extract the embeddings for each Protein Sequence

First of all, great work!

As you know for each protein Sequence Evolutionary Scale Modeling(ESM) generates an embedding in the size of #aminoacids*1280, I was wondering if we could get such information from openfold as well. do you think is this possible to extract such an embedding from the inner layers of openfold?
could you give some guide on how to extract such information from openfold?

Thanks!

Data parsing Bug for dataloader.

Hi, I've processed some data for training but get the bug of dataloader of:

File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 92, in apply_to_collection return function(data, *args, **kwargs) File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise raise exception TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch return self.collate_fn(data) File "/share/home/openfold/openfold-main/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/utilities/auto_restart.py", line 474, in _capture_metadata_collate data = default_collate(samples) File "/share/home/openfold/openfold-main/openfold/data/data_modules.py", line 297, in __call__ prot, self.stage File "/share/home/openfold/openfold-main/openfold/data/feature_pipeline.py", line 116, in process_features mode=mode, File "/share/home/openfold/openfold-main/openfold/data/feature_pipeline.py", line 93, in np_example_to_features cfg[mode], File "/share/home/openfold/openfold-main/openfold/data/input_pipeline.py", line 187, in process_tensors_from_config lambda x: wrap_ensemble_fn(tensors, x), torch.arange(num_recycling + 1) File "/share/home/openfold/openfold-main/openfold/data/input_pipeline.py", line 201, in map_fn ensembles = [fun(elem) for elem in x] File "/share/home/openfold/openfold-main/openfold/data/input_pipeline.py", line 201, in <listcomp> ensembles = [fun(elem) for elem in x] File "/share/home/openfold/openfold-main/openfold/data/input_pipeline.py", line 187, in <lambda> lambda x: wrap_ensemble_fn(tensors, x), torch.arange(num_recycling + 1) File "/share/home/openfold/openfold-main/openfold/data/input_pipeline.py", line 168, in wrap_ensemble_fn return fn(d) File "/share/home/openfold/openfold-main/openfold/data/data_transforms.py", line 76, in <lambda> return lambda x: f(x, *args, **kwargs) File "/share/home/openfold/openfold-main/openfold/data/input_pipeline.py", line 196, in compose x = f(x) File "/share/home/openfold/openfold-main/openfold/data/data_transforms.py", line 76, in <lambda> return lambda x: f(x, *args, **kwargs) File "/share/home/openfold/openfold-main/openfold/data/data_transforms.py", line 180, in sample_msa num_seq = protein["msa"].shape[0] TypeError: 'function' object is not subscriptable

And I upload one of the datasample, Am I wrong with the generate MSAs pipeline or wrong with the dataloader?
5E0Y.zip

Frequent loss is NaN & Training Hangs

Thank you for sharing your code!

I am trying to train openfold, but the problem of loss being NAN persists, and the whole training hangs when this problem occurs.

I downloaded the code in early December and trained on 8 V100 cards with a training dataset size of 1000. When I ran to the 26th sample of the 2nd epoch, there were many warning outputs with a loss of NAN and the training was interrupted.
I read your solution of "Replace training_step in train_openfold.py " in Issue #19, after changing, when I train the first sample, I got this:

WARNING:root:loss is NaN. Returning 0 loss...

Training still hangs.

I ran a recent commit again and retrained with the same dataset and the same problem occurred again and on the same sample. Like this:

I changed the way the mapping is generated in data_modules.py so that the dataset can be loaded in a fixed order when it is loaded, and I checked the data where the loss is NAN and found no abnormalities.

This is very strange, because with your first version of the code, there is no NAN loss so far, but with the version you committed after December this problem keeps occurring, even if I change my training dataset and the learning rate in the deepspeed config file, it does not improve the situation.

Is there a workaround for this situation?

aqlaboratory / openfold Goto Github PK

openfold's People

Contributors

Stargazers

Watchers

Forkers

openfold's Issues

Recommend Projects

Recommend Topics

Recommend Org