rigdenlab / ample Goto Github PK
View Code? Open in Web Editor NEWAb initio Modelling of Proteins for moLEcular replacement
Home Page: http://ample.rtfd.io
License: BSD 3-Clause "New" or "Revised" License
Ab initio Modelling of Proteins for moLEcular replacement
Home Page: http://ample.rtfd.io
License: BSD 3-Clause "New" or "Revised" License
This is currently half-done and needs work.
Change creating of spicker seq.dat file to use cctbx data structure rather then the ATOM regular expression
Setup ReadTheDocs and Sphinx documentation generation
The purpose of this issue is for me to keep track of the refactored functions. A list of currently implemented functions is below:
All ticked methods contain docstrings, were modified and only require final checks.
def backbone(inpath=None, outpath=None):
def calpha_only(inpdb, outpdb):
def check_pdb_directory(directory,single=True,allsame=True,sequence=None):
def check_pdbs(models,single=True,allsame=True,sequence=None):
def extract_chain(inpdb, outpdb, chainID=None, newChainID=None, cAlphaOnly=False, renumber=True ):
def extract_model(inpdb, outpdb, modelID ):
def keep_matching(refpdb=None, targetpdb=None, outpdb=None, resSeqMap=None ):
def _keep_matching(refpdb=None, targetpdb=None, outpdb=None, resSeqMap=None ):
def get_info(inpath):
def match_resseq(targetPdb=None, outPdb=None, resMap=None, sourcePdb=None ):
def merge(pdb1=None, pdb2=None, pdbout=None ):
def molecular_weight(pdbin):
def most_prob(hierarchy, always_keep_one_conformer=True):
def num_atoms_and_residues(pdbin,first=False):
def _parse_modres(modres_text):
def prepare_nmr_model(nmr_model_in,models_dir):
def reliable_sidechains(inpath=None, outpath=None ):
def rename_chains(inpdb=None, outpdb=None, fromChain=None, toChain=None ):
def resseq(pdbin):
def _resseq(hierarchy):
def renumber_residues(pdbin, pdbout, start=1):
def _renumber(hierarchy, start):
def renumber_residues_gaps(pdbin, pdbout, gaps, start=1):
def _rog_side_chain_treatment(hierarchy, scores, del_orange):
def rog_side_chain_treatment(pdbin, pdbout, rog_data, del_orange=False):
def select_residues(pdbin, pdbout, chain_id=None, delete=None, tokeep=None, delete_idx=None, tokeep_idx=None, raw=False):
def sequence(pdbin):
def _sequence(hierarchy):
def _sequence1(hierarchy):
def sequence_data(pdbin):
def _sequence_data(hierarchy):
def split_pdb(pdbin, directory=None):
def split_into_chains(pdbin, chain=None, directory=None):
def standardise(pdbin, pdbout, chain=None, del_hetatm=False):
def std_residues_cctbx(pdbin, pdbout, del_hetatm=False):
def std_residues(pdbin, pdbout, del_hetatm=False):
def strip(pdbin, pdbout, hetatm=False, hydrogen=False, atom_types=[]):
def _strip(hierachy, hetatm=False, hydrogen=False, atom_types=[]):
def to_single_chain(inpath, outpath):
def translate(inpdb=None, outpdb=None, ftranslate=None):
def extract_header_pdb_code(pdb_input):
def extract_header_title(pdb_input):
def _parse_rwcontents(logfile):
def _run_rwcontents(pdbin, logfile):
def reliable_sidechains_cctbx(pdbin=None, pdbout=None ):
def Xselect_residues(inpath=None, outpath=None, residues=None):
def Xsplit(pdbin):
def Xstd_residues(pdbin, pdbout ):
def xyz_coordinates(pdbin):
def _xyz_coordinates(hierarchy):
def xyz_cb_coordinates(pdbin):
def _xyz_cb_coordinates(hierarchy):
def _xyz_atom_coords(atom_group):
We want the 'sweet spot' ones (20-40% truncation) to run first. With 190 clusters sparsely sampled it seems that after cluster 1 sweet spot, AMPLE does others in cluster 1. We would probably prefer that it sampled across the sweet spots of all clusters rather than first covering all of cluster 1?
Log files do not contain Cluster score type
information:
Ensemble Results
----------------
Cluster method: spicker
Cluster score type: None <-- HERE
Number of clusters: 3
Truncation method: percent
Percent truncation: 5
Side-chain treatments: ['polyAla', 'reliable', 'allatom']
Currently 500 models
AMPLE crashes when the CRYST1 header line is missing from the native pdb file provided via -native_pdb
.
If the chain is missing in ab initio structure prediction files, AMPLE terminates at two distinct positions. This currently occurs when trialling AMPLE with FragFold decoys.
spicker.dat
file is generated.Problematic code is pattern = re.compile('^ATOM\s*(\d*)\s*(\w*)\s*(\w*)\s*(\w)\s*(\d*)\s*(\d*)\s')
in ample/util/spicker.py
.
Gesamt searches for chain atom 1 in chain A.
[ ... ]
... reading file '<ROOT>/AMPLE_0/ensemble_workdir/cluster_1/tlevel_100/model_283_100.pdb', selection '/1/A':
0 atoms selected
[ ... ]
ALIGNMENT ERROR 4
===========================================================
Gesamt: Normal termination
An example ATOM lines looks like this
ATOM 1 N ALA 1 -3.190 -3.778 10.166 1.00 13.71
Command line options are not showing up in the online documentation: https://ample.readthedocs.io/en/latest/api/cloptions.html
I'm pretty sure using multiple distant homologs used to generate ensembles with the original side chains as well? I'm just running a job now and all the search models are polyAla. I guess this is a by-product of switching to Jens's sparse sampling mode? I think we want the default with -homologs True
on to be to generate both poly-Ala and all side chain ensembles.
This is causing us to generate unnecessary huge log files
When running benchmarking using the TMscore binary, the calculation of the TMscores is submitted as a job to the queuing system - ample/util/tm_util.py:comparision ~ line 200.
As on a cluster, the benchmarking job is already run under the queuing systems, and many clusters don't allow resubmission from the nodes, this causes the benchmarking to crash.
The native pdb needs to contain header information, such as CRYST1 fields. We should check these are present when the job starts, not fail in benchmarking
Currently the from_single_model test case fails on Linux-4.4.0-92-generic-x86_64-with-debian-stretch-sid with CCP4 version: 7.0.44 with the error:
Error creating ensembles: Not all column labels are in your CSV file
Currently the truncation/subclustering stages proceed one after the other - there's no reason these could be run in parallel for each truncation threshold
The Python exception (+ message) is missing in the traceback.
2018-08-23 11:01:41,187 - root - DEBUG - AMPLE EXITING AT2...
File "/home/felix/develop/ccp4-7.0/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/felix/develop/ccp4-7.0/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/felix/develop/ccp4-7.0/lib/py2/ample/__main__.py", line 18, in <module>
main.Ample().main()
File "/home/felix/develop/ccp4-7.0/lib/py2/ample/main.py", line 102, in main
self.ensembling(amopt.d)
File "/home/felix/develop/ccp4-7.0/lib/py2/ample/main.py", line 214, in ensembling
exit_util.exit_error(msg)
File "/home/felix/develop/ccp4-7.0/lib/py2/ample/util/exit_util.py", line 83, in exit_error
traceback_str = "".join(traceback.format_stack())
The _static
path in RTFD cannot be found.
The idea of having one subpackage for all modelling related work. This should probably include modules for various fragment picking and ab initio folding software with necessary functions to link them.
FASTA sequences with *
characters at the end are accepted. Post-modelling this results in "different" sequences in the decoys (w/o *
) and input (w/ *
).
This is a thread to monitor all relevant PRs and commits for the full integration of fsimkovic/pyjob in AMPLE.
Need to add an option for coiled-coil proteins
When pointing AMPLE to a directory of models to cluster (in this case from concoord, but I assume generally) Spicker crashes if the first residue number is -1 (or I assume anything negative). Renumbering to start from 1 runs fine.
I infer that this does not apply with the -homologs true flag on since the structure in question was acceptable in previous jobs based on multiple distant homologs.
Also worth noting for the record, that MSE residues and alternative conformations confuse Concoord. AMPLE should ideally have the option, at least via GUI2, to sort these things out for the user and run Concoord as part of a Concoord-mode job.
The subcluster_centroid_model
TMscoring is currently done on a per-model basis. This means we need to create a new object, manipulate the PDB and execute the comparison for each model individually.
This creates very long runtimes for benchmarking on the cluster and causes often timeouts when the queue is busy.
Collect all the data for the subcluster_centroid model
s prior to analysing them. Store TMscore information in a dict
or similar structure and take values from that for the benchmark_results
list.
Substitute all the contact-prediction-related code in AMPLE with a module that interacts with ConKit.
ample.util.contact_util
-contact_file
+ -contact_format
-subselect_mode
If the sequence given to ROSETTA to model and that in the native pdb file (supplied with the -native_pdb
flag) don't match, then the benchmarking will crash with:
INFO: Using single structure provided for all model comparisons
INFO: Direct comparison of models and structures
----------------------------EICV-TA
TSSVLRSPMPGVVVAVSVKPGDAVAEGQEICVIEA
Traceback (most recent call last):
File "/home/jmht/ccp4-7.0/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/jmht/ccp4-7.0/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/jmht/ample.git/ample/util/tm_util.py", line 658, in <module>
main()
File "/home/jmht/ample.git/ample/util/tm_util.py", line 642, in main
tmapp.compare_structures(args.models, args.structures, fastas=args.fastas, all_vs_all=args.allvall)
File "/home/jmht/ample.git/ample/util/tm_util.py", line 437, in compare_structures
pdb_combo = self._mod_structures(model_aln, structure_aln, model, structure)
File "/home/jmht/ample.git/ample/util/tm_util.py", line 510, in _mod_structures
raise RuntimeError(msg % (model_name, structure_name))
RuntimeError: Differing residues in model and structure. Affected PDBs S_00000012_10_0dqz6csdh8_mod - 2JKU_std_std
We should check that the sequence and residues in the PDB file match at the start
Currently there is no easy way to work out how many copies phaser was attempting to place
We currently cannot provide a different phaser executable to AMPLE. Is this intended behaviour @linucks ?
ensembler_timeout (set in ample/include.ini) is currently 3600s, which is too short if we are using TM scoring to SPICKER.
As per title
In the file ample/util/options_processor.py
, the line:
if optd['contact_file'] or optd['bbcontacts_file'] or not optd["no_contact_prediction"]:
checks if contacts should be predicted. no_contact_prediction
does not appear to be set anywhere, but when I run a job, the .ini
file that is generated contains, for example:
ample_testing/nmr_remodel/debug.log
:cmdline_flags : ['no_contact_prediction', 'rosetta_dir', 'name', 'nmr_process', 'frags_9mers', 'nmr_model_in', 'work_dir', 'nmr_remodel', 'no_gui', 'mtz', 'frags_3mers', 'fasta', 'nproc']
so no_contact_prediction
has made it into the command-line flags and is False
. The above test in options_processor.py
is therefore always True
if there are no contact files, so AMPLE is always trying to run contact predictions whenever it does modelling.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.