Coder Social home page Coder Social logo

lhatsk / alphalink Goto Github PK

View Code? Open in Web Editor NEW
60.0 60.0 15.0 13.81 MB

AlphaLink: Integrating crosslinking MS data into OpenFold

License: Apache License 2.0

Dockerfile 0.20% Jupyter Notebook 4.25% Python 89.37% C 0.02% C++ 1.59% Cuda 0.81% Shell 3.75%

alphalink's People

Contributors

grandrea avatar lhatsk avatar samuelmurail avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

alphalink's Issues

FDR Description in Arg parse is potentially wrong

In the file: predict_with_crosslink.py
The description for the following code is possibly wrong. Number of CPUs definitely cannot be floating point. What is fdr and what does it mean?
parser.add_argument( "--fdr", type=float, default=0.05, help="""Number of CPUs with which to run alignment tools"""

No MSA output - precomputed alignments called automatically

Hello,

I am able to "run" AlphaLink successfully (i.e., generate pkl and pdb outputs from a fasta and crosslinks file), but when I check the generated 'alignments' folder, I get a subfolder with the name of the input fasta and then nothing else. So, no MSA has been generated in that folder nor anywhere else as far as I can tell. When checking my slurm outputs, I noticed that the --use_precomputed_alignments flag was automatically being called, even though this flag was not in the original script. This flag was pointing to the aforementioned 'alignments' folder that gets created for the outputs...which is empty.

Am I doing something wrong? Here is what one of my scripts looks like; I used the example on the GitHub page:

python $HOME/AlphaLink/predict_with_crosslinks.py
$FASTAS/BLAH.fasta
$CROSSLINKS/BLAH.csv
--checkpoint_path $HOME/AlphaLink/finetuning_model_5_ptm_CACA_10A.pt
--uniref90_database_path $SOURCE/uniref90/uniref90.fasta
--mgnify_database_path $SOURCE/mgnify/mgy_clusters_2022_05.fa
--pdb70_database_path $SOURCE/pdb70/pdb70_hhm.ffdata
--uniclust30_database_path $SOURCE/uniref30/uniref30.fasta
--output_dir AlphaLink_Outputs/Batch_Testing/TEST
--neff 10

As you can see this is when I subsample neff. I can double-check the slurm output when --neff flag is not used, but the result is the same - no MSA data. Here is the slurm output that refers to the precomputed msas flag:

Using precomputed alignments for sp|BLAH|BLAH at AlphaLink_Outputs/Batch_Testing/TEST/alignments...

Andrea recommended that I try adding more flags for jackhmmer, hhblits, etc., but this did not help the issue.

Thank you,

Anthony

Over-weight of crosslinking data

Hi,

How can we figure out the over weight problem for crosslinking data? i noticed if there are lots of crosslinking restraints for one sequence, the final models looks like over-constrained and some well-folded domains looks unstructured.

Thanks.
Yan

Run "python preprocessing_distributions.py --infile restraints.csv" but get an error

Hi,

When I tried to get distance distributions from restraint lists by running "python preprocessing_distributions.py --infile restraints.csv", I received an error like below

$ python preprocessing_distributions.py --infile restraints.csv
Traceback (most recent call last):
File "/lscratch/14291792/preprocessing_distributions.py", line 50, in
for line in restraints:
TypeError: iteration over a 0-d array

The restraints.csv is a test file and only has one line
12,135,15.0,5.0,normal

Could you let me know how to solve this problem?

Really appreciate!

Xiang

Request for training

Hello, thank you for great research.

I already install alphalink in my workspace following by openfold page.
Actually, I want to try to train the model like in your paper.
So could you provide any training script which I follow?

In the paper, you guys trained the model with fine-tuning method.
And I want to follow up your script.
So if there are any scripts to follow, please provide to us.

Thank you for reading.

problem with model loading

Hi AlphaLink developers!

I am trying to use AlphaLink. I've downloaded the model via your dropbox link and unpacked it with gunzip. So I start the prediction like this:

 python predict_with_crosslinks.py ./test/test/input.fasta ./test/test/restraints.txt --distograms --checkpoint_path ./alphalink/resources/finetuning_model_5_ptm_CACA_10A.pt --uniref90_database_path /resources/alphafold2/uniref90/uniref90.fasta --mgnify_database_path /resources/alphafold2/mgnify/mgy_clusters.fa --pdb70_database_path /resources/alphafold2/pdb70/ --uniclust30_database_path /resources/alphafold2/uniclust30/uniclust30_2018_08/

The traceback I've got:

  File "/users/user/alphalink/AlphaLink/predict_with_crosslinks.py", line 571, in <module>  main(args)
  File "/users/user/alphalink/AlphaLink/predict_with_crosslinks.py", line 376, in main model, output_directory = load_models_from_command_line(args, config)
  File "/users/user/alphalink/AlphaLink/predict_with_crosslinks.py", line 271, in load_models_from_command_line model.load_state_dict(sd)
  File "/software/f2021/software/pytorch/1.10.0-foss-2021a-cuda-11.3.1/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AlphaFold:
        size mismatch for xl_embedder.linear.weight: copying a param with shape torch.Size([128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]).

Do you have any ideas about what went wrong?:)

Best regards,
Julia

Save pkl files

Hi,

is there a way to save the model pkl files?

I added the --save_outputs flag, but they're neither being saved while running the distogram nor the 10A distance mode.

hands-on protocol for contacts_to_distograms

Hi,

Could you please share a hands-on protocol on how we can generate distogram with contact information? as a beginner, it seems hard for me to use the scripts (contacts_to_distograms.py) to build the distogram.

Thank you so much!
Yan

Problem with Crosslinking data input

When I was reproducing the results of CDK in the test_set, you provided input data in the form of crosslink data in both CSV and PT file formats. I noticed that in the PT file, the xl_array contains duplicated entries for residueTo and residueFrom. Can you explain why these entries are duplicated in reverse order?
Additionally, could you clarify the information represented by the grouping_array?
Furthermore, the results I inferred from these inputs do not match the PDB file located at test_set/CDK/predictions/CDK_neff10_1h01_xl_model_5_ptm.pdb, specifically in terms of RMSD and TM-score.

this is my call script:
python predict_with_crosslinks.py test_set/CDK/fasta/CDK.fasta test_set/CDK/crosslinks/1h01_xl.pt --features test_set/CDK/features/CDK_neff10.pkl --checkpoint_path resources/AlphaLink_params/finetuning_model_5_ptm_CACA_10A.pt --uniref90_database_path /xxx/uniref90.fasta --mgnify_database_path /xxx/mgnify/mgy_clusters_2022_05.fa --pdb70_database_path /xxx/pdb70 --uniclust30_database_path /xxx/uniref30/

Request for an example folder

Hi,

Is it possible to create a folder with sample files (including "7K3N_A.fasta", "restraints.csv", and "photoL.csv") as mentioned in the provided example? This would help us better understand the work and run the tool.

As shown in the examples from https://github.com/lhatsk/AlphaLink#readme:
python predict_with_crosslinks.py 7K3N_A.fasta restraints.csv...
python predict_with_crosslinks.py 7K3N_A.fasta photoL.csv ...

Running "predict_with_crosslinks.py" with "restraints.csv --distograms" gives the error

Dear AlphaLink developers,

I am trying to run "predict_with_crosslinks.py" as follows:

# Running
predict_with_crosslinks.py $FASTA_FILE restraints.csv --distograms $UNIREF90_PATH $MGNIFY_PATH $PDB70_PATH $MMCIF_PATH $UNICLUST30_PATH --features features.pkl --checkpoint_path $ALPHALINK_WEIGHTS

where "restraints.csv" looks like:

55,236,35.0,0.5,normal
236,311,26.0,1.5,normal

$ALPHALINK_WEIGHTS corresponds to <'PATH'>finetuning_model_5_ptm_CACA_10A.pt.
'features.pkl' is an output file after AlphaFold2 run (with my protein).

I receive the following error message:

Traceback (most recent call last):
  File "<'PATH'>/predict_with_crosslinks.py", line 550, in <module>
    main(args)
  File "<'PATH'>/predict_with_crosslinks.py", line 367, in main
    model, output_directory = load_models_from_command_line(args, config)
  File "<'PATH'>/predict_with_crosslinks.py", line 270, in load_models_from_command_line
    model.load_state_dict(sd)
  File "<'PATH'>/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AlphaFold:
        size mismatch for xl_embedder.linear.weight: copying a param with shape torch.Size([128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]).

Could you please clarify where my mistake is?

Alphalink install failed

I have been trying to install virtual env using environment.yml

And get the following error:

Collecting deepspeed==0.5.10 (from -r /vast/scratch/users/iskander.j/AlphaLink/condaenv.s_3tgy1b.requirements.txt (line 2))
  Using cached deepspeed-0.5.10.tar.gz (515 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'

Pip subprocess error:
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/dllogger.git /vast/scratch/users/iskander.j/tmp/pip-req-build-b5e98t7e
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [14 lines of output]
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/vast/scratch/users/iskander.j/tmp/pip-install-16f1fgl7/deepspeed_58e11f6d38c1437fb3136539611b056b/setup.py", line 27, in <module>
          import torch
        File "/home/users/allstaff/iskander.j/.local/lib/python3.7/site-packages/torch/__init__.py", line 217, in <module>
          _load_global_deps()
        File "/home/users/allstaff/iskander.j/.local/lib/python3.7/site-packages/torch/__init__.py", line 177, in _load_global_deps
          raise err
        File "/home/users/allstaff/iskander.j/.local/lib/python3.7/site-packages/torch/__init__.py", line 172, in _load_global_deps
          ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
        File "/stornext/System/data/apps/rc-tools/rc-tools-1.0/bin/tools/envs/alphalink/lib/python3.7/ctypes/__init__.py", line 364, in __init__
          self._handle = _dlopen(self._name, mode)
      OSError: /home/users/allstaff/iskander.j/.local/lib/python3.7/site-packages/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
                                                                                                                           failed

CondaEnvException: Pip failed

When I installed OpenFold from the OpenFold GitHub, I got errors due to deprecated simtk version.

Issue with multimer

Hello,

I have been trying using AlphaLink with distance constraints between different subunits.
From what I understand from the code, it doesn't seem possible to add such constraints.

Am I right ?

Cheers,
Samuel

Installation unclear

Does this sentence:

AlphaLink requires the same packages, since it builds on top of OpenFold.

mean openfold must also be installed? Or does it mean just follow the example of how openfold is installed. It's not clear.

Inter-subunit crosslinking data

Hi, great work !
I am wondering if it is possible to leverage intermolecular crosslinking data as distance restraint in alphalink?
Further, would it be possible to use ambiguous distance restraints, like in NMR structure calculation and haddock, generated from homo-oligomer crosslinking data ? Or translate pair representation / MSA coevolution information into explicit distance restraint ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.