Coder Social home page Coder Social logo

rigdenlab / ample Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 5.0 88.85 MB

Ab initio Modelling of Proteins for moLEcular replacement

Home Page: http://ample.rtfd.io

License: BSD 3-Clause "New" or "Revised" License

CMake 0.04% Python 94.47% Shell 0.09% Batchfile 0.01% TeX 0.90% Fortran 4.49%
mr xray-crystallography

ample's People

Contributors

ccp4um avatar filomenosanchez avatar hlasimpk avatar linucks avatar rmk65 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ample's Issues

docs setup

Setup ReadTheDocs and Sphinx documentation generation

Features to include:

  • Generic website
  • API documentation
  • Automatic command line flag page
  • Examples
    • ab initio mode
    • ab initio mode with contacts
    • multiple distant homologs
    • single distant homolog with CONCOORD ensemble
    • NMR ensembles
    • Ensembler as standalone
  • Colour scheme adapted (Buttons, Navbar, etc ...) adapted to logo colours

Refactor pdb_edit.py

The purpose of this issue is for me to keep track of the refactored functions. A list of currently implemented functions is below:

All ticked methods contain docstrings, were modified and only require final checks.

  • def backbone(inpath=None, outpath=None):
  • def calpha_only(inpdb, outpdb):
  • def check_pdb_directory(directory,single=True,allsame=True,sequence=None):
  • def check_pdbs(models,single=True,allsame=True,sequence=None):
  • def extract_chain(inpdb, outpdb, chainID=None, newChainID=None, cAlphaOnly=False, renumber=True ):
  • def extract_model(inpdb, outpdb, modelID ):
  • def keep_matching(refpdb=None, targetpdb=None, outpdb=None, resSeqMap=None ):
  • def _keep_matching(refpdb=None, targetpdb=None, outpdb=None, resSeqMap=None ):
  • def get_info(inpath):
  • def match_resseq(targetPdb=None, outPdb=None, resMap=None, sourcePdb=None ):
  • def merge(pdb1=None, pdb2=None, pdbout=None ):
  • def molecular_weight(pdbin):
  • def most_prob(hierarchy, always_keep_one_conformer=True):
  • def num_atoms_and_residues(pdbin,first=False):
  • def _parse_modres(modres_text):
  • def prepare_nmr_model(nmr_model_in,models_dir):
  • def reliable_sidechains(inpath=None, outpath=None ):
  • def rename_chains(inpdb=None, outpdb=None, fromChain=None, toChain=None ):
  • def resseq(pdbin):
  • def _resseq(hierarchy):
  • def renumber_residues(pdbin, pdbout, start=1):
  • def _renumber(hierarchy, start):
  • def renumber_residues_gaps(pdbin, pdbout, gaps, start=1):
  • def _rog_side_chain_treatment(hierarchy, scores, del_orange):
  • def rog_side_chain_treatment(pdbin, pdbout, rog_data, del_orange=False):
  • def select_residues(pdbin, pdbout, chain_id=None, delete=None, tokeep=None, delete_idx=None, tokeep_idx=None, raw=False):
  • def sequence(pdbin):
  • def _sequence(hierarchy):
  • def _sequence1(hierarchy):
  • def sequence_data(pdbin):
  • def _sequence_data(hierarchy):
  • def split_pdb(pdbin, directory=None):
  • def split_into_chains(pdbin, chain=None, directory=None):
  • def standardise(pdbin, pdbout, chain=None, del_hetatm=False):
  • def std_residues_cctbx(pdbin, pdbout, del_hetatm=False): --> def std_residues(pdbin, pdbout, del_hetatm=False):
  • def strip(pdbin, pdbout, hetatm=False, hydrogen=False, atom_types=[]):
  • def _strip(hierachy, hetatm=False, hydrogen=False, atom_types=[]):
  • def to_single_chain(inpath, outpath):
  • def translate(inpdb=None, outpdb=None, ftranslate=None):

Deleted functions

  • def extract_header_pdb_code(pdb_input):
  • def extract_header_title(pdb_input):
  • def _parse_rwcontents(logfile):
  • def _run_rwcontents(pdbin, logfile):
  • def reliable_sidechains_cctbx(pdbin=None, pdbout=None ):
  • def Xselect_residues(inpath=None, outpath=None, residues=None):
  • def Xsplit(pdbin):
  • def Xstd_residues(pdbin, pdbout ):
  • def xyz_coordinates(pdbin):
  • def _xyz_coordinates(hierarchy):
  • def xyz_cb_coordinates(pdbin):
  • def _xyz_cb_coordinates(hierarchy):
  • def _xyz_atom_coords(atom_group):

ordering of search models

We want the 'sweet spot' ones (20-40% truncation) to run first. With 190 clusters sparsely sampled it seems that after cluster 1 sweet spot, AMPLE does others in cluster 1. We would probably prefer that it sampled across the sweet spots of all clusters rather than first covering all of cluster 1?

Fix printing of cluster_score_type when using tm scoring

Log files do not contain Cluster score type information:

Ensemble Results
----------------

Cluster method: spicker
Cluster score type: None       <-- HERE
Number of clusters: 3
Truncation method: percent
Percent truncation: 5
Side-chain treatments: ['polyAla', 'reliable', 'allatom']

Missing chain in decoys terminates ensembling

If the chain is missing in ab initio structure prediction files, AMPLE terminates at two distinct positions. This currently occurs when trialling AMPLE with FragFold decoys.

  1. An incorrect spicker.dat file is generated.

Problematic code is pattern = re.compile('^ATOM\s*(\d*)\s*(\w*)\s*(\w*)\s*(\w)\s*(\d*)\s*(\d*)\s') in ample/util/spicker.py.

  1. Gesamt cannot extract atoms

Gesamt searches for chain atom 1 in chain A.

[ ... ]

 ... reading file '<ROOT>/AMPLE_0/ensemble_workdir/cluster_1/tlevel_100/model_283_100.pdb', selection '/1/A':
          0 atoms selected

[ ... ]

 ALIGNMENT ERROR 4
 ===========================================================
 Gesamt:  Normal termination

An example ATOM lines looks like this

ATOM      1  N   ALA     1      -3.190  -3.778  10.166  1.00 13.71

distant homolog superposition mode now polyAla only

I'm pretty sure using multiple distant homologs used to generate ensembles with the original side chains as well? I'm just running a job now and all the search models are polyAla. I guess this is a by-product of switching to Jens's sparse sampling mode? I think we want the default with -homologs True on to be to generate both poly-Ala and all side chain ensembles.

TMAnalysis is submitted to the cluster queuing system in benchmarking

When running benchmarking using the TMscore binary, the calculation of the TMscores is submitted as a job to the queuing system - ample/util/tm_util.py:comparision ~ line 200.

As on a cluster, the benchmarking job is already run under the queuing systems, and many clusters don't allow resubmission from the nodes, this causes the benchmarking to crash.

Fix the from_single_model test case

Currently the from_single_model test case fails on Linux-4.4.0-92-generic-x86_64-with-debian-stretch-sid with CCP4 version: 7.0.44 with the error:

Error creating ensembles: Not all column labels are in your CSV file

Missing Python exception in Traceback

The Python exception (+ message) is missing in the traceback.

2018-08-23 11:01:41,187 - root - DEBUG - AMPLE EXITING AT2...
  File "/home/felix/develop/ccp4-7.0/lib/python2.7/runpy.py", line 162, in _run_module_as_main 
    "__main__", fname, loader, pkg_name)
  File "/home/felix/develop/ccp4-7.0/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/__main__.py", line 18, in <module>
    main.Ample().main() 
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/main.py", line 102, in main
    self.ensembling(amopt.d)
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/main.py", line 214, in ensembling
    exit_util.exit_error(msg)
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/util/exit_util.py", line 83, in exit_error 
    traceback_str = "".join(traceback.format_stack())

Modular "modeller" sub-package

The idea of having one subpackage for all modelling related work. This should probably include modules for various fragment picking and ab initio folding software with necessary functions to link them.

Features to include

  • Script to execute ab initio modelling separately
  • Support of Rosetta and Saint2
  • Support of NNmake and Flib
  • Basic model assessment (ranking of models based on energy scores, etc.)

negative sequence number start crashes spicker, at least in Concoord mode

When pointing AMPLE to a directory of models to cluster (in this case from concoord, but I assume generally) Spicker crashes if the first residue number is -1 (or I assume anything negative). Renumbering to start from 1 runs fine.

I infer that this does not apply with the -homologs true flag on since the structure in question was acceptable in previous jobs based on multiple distant homologs.

Also worth noting for the record, that MSE residues and alternative conformations confuse Concoord. AMPLE should ideally have the option, at least via GUI2, to sort these things out for the user and run Concoord as part of a Concoord-mode job.

Refactor TMscoring for centroid models

Problem

The subcluster_centroid_model TMscoring is currently done on a per-model basis. This means we need to create a new object, manipulate the PDB and execute the comparison for each model individually.

This creates very long runtimes for benchmarking on the cluster and causes often timeouts when the queue is busy.

Proposed Fix

Collect all the data for the subcluster_centroid models prior to analysing them. Store TMscore information in a dict or similar structure and take values from that for the benchmark_results list.

Integration of ConKit

Substitute all the contact-prediction-related code in AMPLE with a module that interacts with ConKit.

Features to include:

  • new module interacting with ConKit
    • ample.util.contact_util
  • command-line flags for contact file format
    • -contact_file + -contact_format
  • decoy subselection mode
    • -subselect_mode
    • help menu suppressed for now
  • removed redundant contact-related parsers and scripts
  • integration test(s) for new functionality

Check that the fasta sequence and residues in the benchmark pdb match

If the sequence given to ROSETTA to model and that in the native pdb file (supplied with the -native_pdb flag) don't match, then the benchmarking will crash with:

INFO: Using single structure provided for all model comparisons
INFO: Direct comparison of models and structures
----------------------------EICV-TA
TSSVLRSPMPGVVVAVSVKPGDAVAEGQEICVIEA
Traceback (most recent call last):
  File "/home/jmht/ccp4-7.0/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/jmht/ccp4-7.0/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 658, in <module>
    main()
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 642, in main
    tmapp.compare_structures(args.models, args.structures, fastas=args.fastas, all_vs_all=args.allvall)
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 437, in compare_structures
    pdb_combo = self._mod_structures(model_aln, structure_aln, model, structure)
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 510, in _mod_structures
    raise RuntimeError(msg % (model_name, structure_name))
RuntimeError: Differing residues in model and structure. Affected PDBs S_00000012_10_0dqz6csdh8_mod - 2JKU_std_std

We should check that the sequence and residues in the PDB file match at the start

no_contact_prediction option always tries to generate contacts when modelling is attempted

In the file ample/util/options_processor.py, the line:

if optd['contact_file'] or optd['bbcontacts_file'] or not optd["no_contact_prediction"]:

checks if contacts should be predicted. no_contact_prediction does not appear to be set anywhere, but when I run a job, the .ini file that is generated contains, for example:

ample_testing/nmr_remodel/debug.log:cmdline_flags : ['no_contact_prediction', 'rosetta_dir', 'name', 'nmr_process', 'frags_9mers', 'nmr_model_in', 'work_dir', 'nmr_remodel', 'no_gui', 'mtz', 'frags_3mers', 'fasta', 'nproc']

so no_contact_prediction has made it into the command-line flags and is False. The above test in options_processor.py is therefore always True if there are no contact files, so AMPLE is always trying to run contact predictions whenever it does modelling.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.