wrf / pdbcolor Goto Github PK
View Code? Open in Web Editor NEWPython code to color a PDB structure based on parameters from a multiple sequence alignment
Python code to color a PDB structure based on parameters from a multiple sequence alignment
Recently, I wanted to color a pdb structure in PyMol by my self-defined and extracted parameter about each amino acid. I felt so lucky to find this primal code which could finish this job perfectly.
However, I encountered some confusions while using the code.
After reading README.md file, I found out I could use pdb_color_generic.py to fulfill my purpose.
At the beginning, I tried the exact same example, the 3rze.pdb and 3rze.map.rates_features.csv, to help myself get familiar with this code.
I had done every step exactly as the instruction described. However, while I run the command line "pdb_color_generic.py -c 4 -d , -p 3rze.pdb -i 3rze.map.rates_features.csv -l blue -g dnds --exclude-first-group > 3rze.color_by_dnds.pml", an error popped out, which indicated 'TypeError: Nontype object is not iterable'.
The wrong syntax was "For seqid in seqidlist".
The reason for this error seemd to be because seqidlist should be given by users.
In the get_chains_only function, seqidlist parameter should be input by args.sequence, but args.sequence was not given in the command and its default value was None. And this would give the error when running "for seqid in seqidlist".
In the defined parser, parser.add_argument("-s","--sequence", nargs="*", help="sequence ID for PDB, give multiple names if data is available in the input file") points out this argument is necessary, while the previous command did not contain any "-s" information.
I was not sure I run this code in the right way or maybe I missed something.
The seqidlist seems that it should be consist of a list about proteinid so that "for seqid in seqidlist" and "if seqid.find('proteinid') > -1" in the code could work.
After this, I simply defined a function to help me find those proteinids in 3rze.pdb file and put them into a seqidlist. Then, the code could run through.
###-----------------------------------------------------------------------------------------------------------------------------###
Further analyzing the code, I was confused about the defined function read_generic_data:
(part code)
if type(chaincolumn) is int:
chain = lsplits[int(chaincolumn)]
else:
chain = "A"
chaincolumn seems should be str type since the 'else' syntax indicates default chain is 'A'.
However, the type of chaincolumn in the args.chain_column is always str and this function will always set the varible "china" to 'A', no matter what chain_column you enter in the first place.
As for me, my input file has a chaincolumn which contains chain characters defined to indicate which chain each amino acid comes from, such as 'A', 'B', 'C' etc. Thus, this function set all my chains into 'A'.
This problem could be fixed by modifying the parse argument '--chain-column' with 'type = int' added in the parser.add_argument() sentence.
###-----------------------------------------------------------------------------------------------------------------------------###
In the make_output_script function:
#assign whole chain to lowest color, then build up
wayout.write("color {}{:02d}, chain {}\n".format( basecolor, int(value*binname_correction), chain ) )
this part seems could not achieve the purpose of setting the whole chain to the lowest color. The color parameter is
int(value*binname_correction) and the 'value' is generated by the following syntaxes:
for chain in keepchains.keys():
chainoffset = refoffsets.get(chain, 0)
scoregroups = defaultdict(list) # key is percent group, value is list of residues
# for each residue, assign to a bin
for residue in scoredict[chain].keys():
esiduescore = scoredict[chain].get(residue,0.00)
for i,value in enumerate(binvalues[:-1]):
upper = binvalues[i+1]
f residuescore < upper:
scoregroups[value].append(residue - chainoffset)
break
Assuming that keepchains.keys() contains four chain letters: 'A','B','C',and 'D'. For each iteration, let's say 'A' for example, each residue in socredict['A'].keys() will run the last 'for' iteration and genrate its own 'i' and 'value'.
In this case, after every residue in chain 'A' is traversed, the 'i' and 'value' will be set according to the last residue of
chain 'A'. And after all the chains is traversed, the 'i' and 'value' will be set according to the last residue of
chain 'D'. Still, I do not know how to fix this and considering targetcolors could be reversed, how should I find the lowest color for the whole chain is another problem the function should deal with.
###-----------------------------------------------------------------------------------------------------------------------------###
At last, I was confused about the function of 'refoffsets' variable. What does this varible do? To mark down the right sites of amino acid residues in PDB files? I could not say.
As for me, I simply delete 'chainoffset' in the following syntaxes:
[from make_output_script function]
for chain in keepchains.keys():
chainoffset = refoffsets.get(chain, 0)
scoregroups = defaultdict(list) # key is percent group, value is list of residues
# for each residue, assign to a bin
for residue in scoredict[chain].keys():
esiduescore = scoredict[chain].get(residue,0.00)
for i,value in enumerate(binvalues[:-1]):
upper = binvalues[i+1]
f residuescore < upper:
#scoregroups[value].append(residue - chainoffset)
scoregroups[value].append(residue)
break
and my PDB file in PyMol could select the right residues I wanted to color.
Firstly, thank you for this awesome tool! I am not really used to using python but was hoping you could help me. I am trying to annotate the exons and introns onto the protein structure for the gene IFT140
grep NM_014714.4 IFT140.gff3 > ift140.gff
to get the relevant gff fileParsing GFF file IFT140.gff3 for None
Finished parsing IFT140.gff3, read 37 lines
WARNING COULD NOT FIND ANY EXONS FOR None
Found 0 exons, 0 bases for 0 residues
0 codons are out of phase
I have tried: python gff_cds_to_pymol_script.py -g IFT140.gff3 -i AF-Q96RY7-F1-model_v1.pdb > color_ppyr_ift140.pml
(where AF-Q96RY7-F1-model_v1.pdb is the pdb file for IFT140 taken from the alphafold website) but this give me the same error as above
I have also tried:
python gff_cds_to_pymol_script.py -g ift140.gff -i NM_014714.4 > color_ppyr_ift140.pml but this give me the same error as above
Your help would be much appreciated.
All the best
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.