The ample from rigdenlab

Add Phase error calcualtion for all post-MR structures during benchmarking

This is currently half-done and needs work.

Add integration tests that use the spicker_tm and spicker_omp binaries

Use cctbx for getting sequence in util/spicker.py

Change creating of spicker seq.dat file to use cctbx data structure rather then the ATOM regular expression

docs setup

Setup ReadTheDocs and Sphinx documentation generation

Features to include:

Refactor pdb_edit.py

The purpose of this issue is for me to keep track of the refactored functions. A list of currently implemented functions is below:

All ticked methods contain docstrings, were modified and only require final checks.

Deleted functions

~~def extract_header_pdb_code(pdb_input):~~
~~def extract_header_title(pdb_input):~~
~~def _parse_rwcontents(logfile):~~
~~def _run_rwcontents(pdbin, logfile):~~
~~def reliable_sidechains_cctbx(pdbin=None, pdbout=None ):~~
~~def Xselect_residues(inpath=None, outpath=None, residues=None):~~
~~def Xsplit(pdbin):~~
~~def Xstd_residues(pdbin, pdbout ):~~
~~def xyz_coordinates(pdbin):~~
~~def _xyz_coordinates(hierarchy):~~
~~def xyz_cb_coordinates(pdbin):~~
~~def _xyz_cb_coordinates(hierarchy):~~
~~def _xyz_atom_coords(atom_group):~~

We want the 'sweet spot' ones (20-40% truncation) to run first. With 190 clusters sparsely sampled it seems that after cluster 1 sweet spot, AMPLE does others in cluster 1. We would probably prefer that it sampled across the sweet spots of all clusters rather than first covering all of cluster 1?

Fix printing of cluster_score_type when using tm scoring

Log files do not contain Cluster score type information:

Ensemble Results
----------------

Cluster method: spicker
Cluster score type: None       <-- HERE
Number of clusters: 3
Truncation method: percent
Percent truncation: 5
Side-chain treatments: ['polyAla', 'reliable', 'allatom']

Check if we can support RFREE flags in mtz files that don't contain the word 'free' in the label

Make 1000 models default in GUI

Currently 500 models

Bug relating to missing CRYST1 header line

AMPLE crashes when the CRYST1 header line is missing from the native pdb file provided via -native_pdb.

Missing chain in decoys terminates ensembling

If the chain is missing in ab initio structure prediction files, AMPLE terminates at two distinct positions. This currently occurs when trialling AMPLE with FragFold decoys.

An incorrect spicker.dat file is generated.

Problematic code is pattern = re.compile('^ATOM\s*(\d*)\s*(\w*)\s*(\w*)\s*(\w)\s*(\d*)\s*(\d*)\s') in ample/util/spicker.py.

Gesamt cannot extract atoms

Gesamt searches for chain atom 1 in chain A.

[ ... ]

 ... reading file '<ROOT>/AMPLE_0/ensemble_workdir/cluster_1/tlevel_100/model_283_100.pdb', selection '/1/A':
          0 atoms selected

[ ... ]

 ALIGNMENT ERROR 4
 ===========================================================
 Gesamt:  Normal termination

An example ATOM lines looks like this

ATOM      1  N   ALA     1      -3.190  -3.778  10.166  1.00 13.71

Add chain length to results output to benchmark/results.csv

Readthedocs missing comand line options

Command line options are not showing up in the online documentation: https://ample.readthedocs.io/en/latest/api/cloptions.html

distant homolog superposition mode now polyAla only

I'm pretty sure using multiple distant homologs used to generate ensembles with the original side chains as well? I'm just running a job now and all the search models are polyAla. I guess this is a by-product of switching to Jens's sparse sampling mode? I think we want the default with -homologs True on to be to generate both poly-Ala and all side chain ensembles.

Stop printing of results of polling MRBUMP directory

This is causing us to generate unnecessary huge log files

TMAnalysis is submitted to the cluster queuing system in benchmarking

When running benchmarking using the TMscore binary, the calculation of the TMscores is submitted as a job to the queuing system - ample/util/tm_util.py:comparision ~ line 200.

As on a cluster, the benchmarking job is already run under the queuing systems, and many clusters don't allow resubmission from the nodes, this causes the benchmarking to crash.

Check for missing header information in native_pdb

The native pdb needs to contain header information, such as CRYST1 fields. We should check these are present when the job starts, not fail in benchmarking

Fix the from_single_model test case

Currently the from_single_model test case fails on Linux-4.4.0-92-generic-x86_64-with-debian-stretch-sid with CCP4 version: 7.0.44 with the error:

Error creating ensembles: Not all column labels are in your CSV file

Parallelise the truncation/subclustering abinitio ensembling stage

Currently the truncation/subclustering stages proceed one after the other - there's no reason these could be run in parallel for each truncation threshold

Missing Python exception in Traceback

The Python exception (+ message) is missing in the traceback.

2018-08-23 11:01:41,187 - root - DEBUG - AMPLE EXITING AT2...
  File "/home/felix/develop/ccp4-7.0/lib/python2.7/runpy.py", line 162, in _run_module_as_main 
    "__main__", fname, loader, pkg_name)
  File "/home/felix/develop/ccp4-7.0/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/__main__.py", line 18, in <module>
    main.Ample().main() 
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/main.py", line 102, in main
    self.ensembling(amopt.d)
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/main.py", line 214, in ensembling
    exit_util.exit_error(msg)
  File "/home/felix/develop/ccp4-7.0/lib/py2/ample/util/exit_util.py", line 83, in exit_error 
    traceback_str = "".join(traceback.format_stack())

ReadTheDocs _static path not found

The _static path in RTFD cannot be found.

Modular "modeller" sub-package

The idea of having one subpackage for all modelling related work. This should probably include modules for various fragment picking and ab initio folding software with necessary functions to link them.

Features to include

Script to execute ab initio modelling separately
Support of Rosetta and Saint2
Support of NNmake and Flib
Basic model assessment (ranking of models based on energy scores, etc.)

Check input sequence for unusual characters

FASTA sequences with * characters at the end are accepted. Post-modelling this results in "different" sequences in the decoys (w/o *) and input (w/ *).

Integrate PyJOB in AMPLE

This is a thread to monitor all relevant PRs and commits for the full integration of fsimkovic/pyjob in AMPLE.

Update CCP4i2 interface

Need to add an option for coiled-coil proteins

negative sequence number start crashes spicker, at least in Concoord mode

When pointing AMPLE to a directory of models to cluster (in this case from concoord, but I assume generally) Spicker crashes if the first residue number is -1 (or I assume anything negative). Renumbering to start from 1 runs fine.

I infer that this does not apply with the -homologs true flag on since the structure in question was acceptable in previous jobs based on multiple distant homologs.

Also worth noting for the record, that MSE residues and alternative conformations confuse Concoord. AMPLE should ideally have the option, at least via GUI2, to sort these things out for the user and run Concoord as part of a Concoord-mode job.

Refactor TMscoring for centroid models

Problem

The subcluster_centroid_model TMscoring is currently done on a per-model basis. This means we need to create a new object, manipulate the PDB and execute the comparison for each model individually.

This creates very long runtimes for benchmarking on the cluster and causes often timeouts when the queue is busy.

Proposed Fix

Collect all the data for the subcluster_centroid models prior to analysing them. Store TMscore information in a dict or similar structure and take values from that for the benchmark_results list.

Integration of ConKit

Substitute all the contact-prediction-related code in AMPLE with a module that interacts with ConKit.

Features to include:

new module interacting with ConKit
- ample.util.contact_util
command-line flags for contact file format
- -contact_file + -contact_format
decoy subselection mode
- -subselect_mode
- help menu suppressed for now
removed redundant contact-related parsers and scripts
integration test(s) for new functionality

Check that the fasta sequence and residues in the benchmark pdb match

If the sequence given to ROSETTA to model and that in the native pdb file (supplied with the -native_pdb flag) don't match, then the benchmarking will crash with:

INFO: Using single structure provided for all model comparisons
INFO: Direct comparison of models and structures
----------------------------EICV-TA
TSSVLRSPMPGVVVAVSVKPGDAVAEGQEICVIEA
Traceback (most recent call last):
  File "/home/jmht/ccp4-7.0/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/jmht/ccp4-7.0/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 658, in <module>
    main()
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 642, in main
    tmapp.compare_structures(args.models, args.structures, fastas=args.fastas, all_vs_all=args.allvall)
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 437, in compare_structures
    pdb_combo = self._mod_structures(model_aln, structure_aln, model, structure)
  File "/home/jmht/ample.git/ample/util/tm_util.py", line 510, in _mod_structures
    raise RuntimeError(msg % (model_name, structure_name))
RuntimeError: Differing residues in model and structure. Affected PDBs S_00000012_10_0dqz6csdh8_mod - 2JKU_std_std

We should check that the sequence and residues in the PDB file match at the start

add mrbump NMASU to ample dictionary

Currently there is no easy way to work out how many copies phaser was attempting to place

Missing phaser_exe command line argument

We currently cannot provide a different phaser executable to AMPLE. Is this intended behaviour @linucks ?

When submitting to a cluster queueing system the ensembler_timeout is too short

ensembler_timeout (set in ample/include.ini) is currently 3600s, which is too short if we are using TM scoring to SPICKER.

Clean up mixed CamelCase and snake_case in files - eg. mrbump_util.py

As per title

no_contact_prediction option always tries to generate contacts when modelling is attempted

In the file ample/util/options_processor.py, the line:

if optd['contact_file'] or optd['bbcontacts_file'] or not optd["no_contact_prediction"]:

checks if contacts should be predicted. no_contact_prediction does not appear to be set anywhere, but when I run a job, the .ini file that is generated contains, for example:

ample_testing/nmr_remodel/debug.log:cmdline_flags : ['no_contact_prediction', 'rosetta_dir', 'name', 'nmr_process', 'frags_9mers', 'nmr_model_in', 'work_dir', 'nmr_remodel', 'no_gui', 'mtz', 'frags_3mers', 'fasta', 'nproc']

so no_contact_prediction has made it into the command-line flags and is False. The above test in options_processor.py is therefore always True if there are no contact files, so AMPLE is always trying to run contact predictions whenever it does modelling.

rigdenlab / ample Goto Github PK

ample's People

Contributors

Stargazers

Watchers

Forkers

ample's Issues

Features to include:

Deleted functions

Features to include

Problem

Proposed Fix

Features to include:

Recommend Projects

Recommend Topics

Recommend Org