Coder Social home page Coder Social logo

ieo-tutorial-alphafold's Introduction

Alphafold configuration and tutorial IEO

basic command cheat sheet

This is more of a reference, to start the stuff you need to do to configure the environment for alphafold, jump to the next section.

command Description
cd change directory
pwd show me current directory
ls list files
chmod change permissions
mkdir create directory
cp copy files
rm delete files
mv move files or directories
ln -s create symbolic link
watch nvidia-smi check gpu load
htop check processes and memory

options for each commmands are given as "flags" i.e. - signs after the command

cp -r onedir another 

will copy a folder named "onedir" into a folder named "another" recursively (-r), meaning it will copy all the contents of onedir.

Places can be specified with paths or some shorthand locations

location meaning example command
~ home directory cd ~
./ current directory ls ./
../ one directory up cd ../
/ the top of the directory tree ls /

In general, just do this tutorial, especially sections 4-6 .

installation of anaconda and configuration

Login onto the workstation, open your browser and go to

https://www.anaconda.com/

And download anaconda. Then open a terminal, which is where we will work the rest of the time. The terminal is located under "applications->system tools" or on your desktop.

Create a "software" directory in your user's space within data

mkdir /data/username/software

Then, install anaconda there. From your home, go to the "Downloads" folder where the anaconda installer is located

cd ~/Downloads

And make anaconda executable

chmod +x Anaconda3-2022.10-Linux-x86_64.sh

Then, run the installer

./Anaconda3-2022.10-Linux-x86_64.sh

CAREFUL!! DURING THE INSTALLATION, DO NOT INSTALL IN THE DEFAULT LOCATION by typing "yes"!! Install in the software directory by typing

/data/username/software/anaconda3

as the install location.

At the end of the install, agree to configure "conda init".

At the end of the installation, you can close and reopen the terminal. At this point, the word (base) should appear before your bash prompt.

Finish configuring by disabling autostart of anaconda.

conda config --set auto_activate_base false

Those that already have anaconda can start here

If you do not see the word (base) before your prompt in the terminal, activate anaconda

conda activate

Download the file containing the package list for the alphafold environment into your software directory

cd /data/username/software/

wget https://raw.githubusercontent.com/grandrea/ieo-tutorial-alphafold/main/alphafold_env.yml 

Then, create an environment called "alphafold" from the package list in the .yml file you just downloaded

conda env create -f alphafold_env.yml

At this point, you should have an additional environment in addition to your base. check this by running

conda info --envs

and you should see base and alphafold there.

Download tools to analyse and visualize alphafold results

install pymol in the workstation in the alphafold environment

 conda activate alphafold
 conda install -c conda-forge pymol-open-source 

download the scripts to plot and analyse AF results. Go to your software directory

cd /data/username/software

and download the scripts

git clone https://github.com/grandrea/Alphafold-analysis

Optional: install the integrative modeling platform

We will also work with the integrative modeling platform, IMP (www.integrativemodeling.org). This can be installed in a separate anaconda environment

conda create -n imp python=3.9

then go to that environment

conda activate imp

and install imp from the channel conda-forge (a community maintained set of tools that is similar to BioconductoR)

conda install -c conda-forge imp

now you should have 3 anaconda environments: base, alphafold and imp. Any further work in integrative modeling will make use of imp.

Running Alphafold

Remeber, if your sequence is small (<1400 amino acids), you can run alphafold with no coding using alphafold-colab here

Alphafold is already configured on the workstation but you need some packages to run it, which is why we installed anaconda and installed some packages in a separate environment

Activate the alphafold environment

conda activate alphafold

and navigate to the alphafold installation directory

cd /data/software/alphafold

then you should be able to launch alphafold by typing

python docker/run_docker.py --help

You should be greeted by a message like:

Docker launch script for Alphafold docker image.
flags:

docker/run_docker.py:
  --[no]benchmark: Run multiple JAX model evaluations to obtain a timing that excludes the
    compilation time, which should be more indicative of the time required for inferencing
    many proteins.
    (default: 'false')
  --data_dir: Path to directory with supporting data: AlphaFold parameters and genetic and
    template databases. Set to the target of download_all_databases.sh.
  --db_preset: <full_dbs|reduced_dbs>: Choose preset MSA database configuration - smaller
    genetic database config (reduced_dbs) or full genetic database config (full_dbs)
    (default: 'full_dbs')
  --docker_image_name: Name of the AlphaFold Docker image.
    (default: 'alphafold')
  --docker_user: UID:GID with which to run the Docker container. The output directories will
    be owned by this user:group. By default, this is the current user. Valid options are:
    uid or uid:gid, non-numeric values are not recognised by Docker unless that user has
    been created within the container.
    (default: '1014:1014')
  --[no]enable_gpu_relax: Run relax on GPU if GPU is enabled.
    (default: 'true')
  --fasta_paths: Paths to FASTA files, each containing a prediction target that will be
    folded one after another. If a FASTA file contains multiple sequences, then it will be
    folded as a multimer. Paths should be separated by commas. All FASTA paths must have a
    unique basename as the basename is used to name the output directories for each
    prediction.
    (a comma separated list)
  --gpu_devices: Comma separated list of devices to pass to NVIDIA_VISIBLE_DEVICES.
    (default: 'all')
  --max_template_date: Maximum template release date to consider (ISO-8601 format: YYYY-MM-
    DD). Important if folding historical test sets.
  --model_preset: <monomer|monomer_casp14|monomer_ptm|multimer>: Choose preset model
    configuration - the monomer model, the monomer model with extra ensembling, monomer
    model with pTM head, or multimer model
    (default: 'monomer')
  --num_multimer_predictions_per_model: How many predictions (each with a different random
    seed) will be generated per model. E.g. if this is 2 and there are 5 models then there
    will be 10 predictions per input. Note: this FLAG only applies if model_preset=multimer
    (default: '5')
    (an integer)
  --output_dir: Path to a directory that will store the results.
    (default: '/tmp/alphafold')
  --[no]run_relax: Whether to run the final relaxation step on the predicted models. Turning
    relax off might result in predictions with distracting stereochemical violations but
    might help in case you are having issues with the relaxation stage.
    (default: 'true')
  --[no]use_gpu: Enable NVIDIA runtime to run with GPUs.
    (default: 'true')
  --[no]use_precomputed_msas: Whether to read MSAs that have been written to disk instead of
    running the MSA tools. The MSA files are looked up in the output directory, so it must
    stay the same between multiple runs that are to reuse the MSAs. WARNING: This will not
    check if the sequence, database or configuration have changed.
    (default: 'false')

Try --helpfull to get a list of all flags.

We can then set up our first AlphaFold run. First, create the output directory (here called "myrun) inside your alphafold_runs folder

mkdir /data/username/alphafold_runs/myrun

then prepare your fasta file in your user directory.

Then execute alphafold. For multimers:

python docker/run_docker.py --fasta_paths=/path/to/my/fasta/myfasta.fasta  --output_dir=/path/to/myrun --model_preset=multimer --max_template_date=2022-11-01 --data_dir=/data/software/dbs/ --gpu_devices=0 --num_predictions_per_model=2 --run_relax=False

For monomers

python docker/run_docker.py --fasta_paths=/path/to/my/fasta/myfasta.fasta  --output_dir=/path/to/myrun --model_preset=monomer_ptm --max_template_date=2022-11-01 --data_dir=/data/software/dbs/ --gpu_devices=0  --run_relax=False

As explained in the help, run_relax enables final relaxation. This adjusts some side chain positions and fixes some steric clashes. it can add a lot of hours to your predictions, and i would disable it unless you plan to run something "final"

The number in gpu_devices refers to a specific gpu. Use gpu_devices=0,1 for example to use both gpu 0 and gpu 1.

Adjust templating behaviour by changing max_template date.

Analysing results

In /data/software there is also a set of scripts to plot and analyse the alphafold run called AlphaFold-analysis. The usage instructions for that are here. You may need to install biopython by doing

conda install -c conda-forge biopython

in your alphafold anaconda environment. Once you put your .fasta file in your results directory, you can run the analysis by doing

python plot_AF_all.py

and you'll obtain a file with statistics, and plots for PAE and pLDDT. You can also make figures of the structure colored by pLDDT as described in the alphafold analysis instructions.

ieo-tutorial-alphafold's People

Contributors

grandrea avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.