pdbcolor,wrf

Some confusions

Recently, I wanted to color a pdb structure in PyMol by my self-defined and extracted parameter about each amino acid. I felt so lucky to find this primal code which could finish this job perfectly.
However, I encountered some confusions while using the code.

After reading README.md file, I found out I could use pdb_color_generic.py to fulfill my purpose.
At the beginning, I tried the exact same example, the 3rze.pdb and 3rze.map.rates_features.csv, to help myself get familiar with this code.

I had done every step exactly as the instruction described. However, while I run the command line "pdb_color_generic.py -c 4 -d , -p 3rze.pdb -i 3rze.map.rates_features.csv -l blue -g dnds --exclude-first-group > 3rze.color_by_dnds.pml", an error popped out, which indicated 'TypeError: Nontype object is not iterable'.
The wrong syntax was "For seqid in seqidlist".
The reason for this error seemd to be because seqidlist should be given by users.

In the get_chains_only function, seqidlist parameter should be input by args.sequence, but args.sequence was not given in the command and its default value was None. And this would give the error when running "for seqid in seqidlist".

In the defined parser, parser.add_argument("-s","--sequence", nargs="*", help="sequence ID for PDB, give multiple names if data is available in the input file") points out this argument is necessary, while the previous command did not contain any "-s" information.

I was not sure I run this code in the right way or maybe I missed something.
The seqidlist seems that it should be consist of a list about proteinid so that "for seqid in seqidlist" and "if seqid.find('proteinid') > -1" in the code could work.

After this, I simply defined a function to help me find those proteinids in 3rze.pdb file and put them into a seqidlist. Then, the code could run through.
###-----------------------------------------------------------------------------------------------------------------------------###
Further analyzing the code, I was confused about the defined function read_generic_data:
(part code)
if type(chaincolumn) is int:
chain = lsplits[int(chaincolumn)]
else:
chain = "A"

chaincolumn seems should be str type since the 'else' syntax indicates default chain is 'A'.
However, the type of chaincolumn in the args.chain_column is always str and this function will always set the varible "china" to 'A', no matter what chain_column you enter in the first place.
As for me, my input file has a chaincolumn which contains chain characters defined to indicate which chain each amino acid comes from, such as 'A', 'B', 'C' etc. Thus, this function set all my chains into 'A'.
This problem could be fixed by modifying the parse argument '--chain-column' with 'type = int' added in the parser.add_argument() sentence.
###-----------------------------------------------------------------------------------------------------------------------------###
In the make_output_script function:

#assign whole chain to lowest color, then build up
wayout.write("color {}{:02d}, chain {}\n".format( basecolor, int(value*binname_correction), chain ) )

this part seems could not achieve the purpose of setting the whole chain to the lowest color. The color parameter is
int(value*binname_correction) and the 'value' is generated by the following syntaxes:

for chain in keepchains.keys():
chainoffset = refoffsets.get(chain, 0)
scoregroups = defaultdict(list) # key is percent group, value is list of residues
# for each residue, assign to a bin
for residue in scoredict[chain].keys():
esiduescore = scoredict[chain].get(residue,0.00)
for i,value in enumerate(binvalues[:-1]):
upper = binvalues[i+1]
f residuescore < upper:
scoregroups[value].append(residue - chainoffset)
break

Assuming that keepchains.keys() contains four chain letters: 'A','B','C',and 'D'. For each iteration, let's say 'A' for example, each residue in socredict['A'].keys() will run the last 'for' iteration and genrate its own 'i' and 'value'.
In this case, after every residue in chain 'A' is traversed, the 'i' and 'value' will be set according to the last residue of
chain 'A'. And after all the chains is traversed, the 'i' and 'value' will be set according to the last residue of
chain 'D'. Still, I do not know how to fix this and considering targetcolors could be reversed, how should I find the lowest color for the whole chain is another problem the function should deal with.

###-----------------------------------------------------------------------------------------------------------------------------###
At last, I was confused about the function of 'refoffsets' variable. What does this varible do? To mark down the right sites of amino acid residues in PDB files? I could not say.
As for me, I simply delete 'chainoffset' in the following syntaxes:

[from make_output_script function]

for chain in keepchains.keys():
chainoffset = refoffsets.get(chain, 0)
scoregroups = defaultdict(list) # key is percent group, value is list of residues
# for each residue, assign to a bin
for residue in scoredict[chain].keys():
esiduescore = scoredict[chain].get(residue,0.00)
for i,value in enumerate(binvalues[:-1]):
upper = binvalues[i+1]
f residuescore < upper:
#scoregroups[value].append(residue - chainoffset)
scoregroups[value].append(residue)
break

and my PDB file in PyMol could select the right residues I wanted to color.

Unable to extract exons from gff file

Firstly, thank you for this awesome tool! I am not really used to using python but was hoping you could help me. I am trying to annotate the exons and introns onto the protein structure for the gene IFT140

I have downloaded the gff3 file for the gene of interest IFT140: https://www.ncbi.nlm.nih.gov/nuccore/NM_014714.4?report=genbank&to=5232 by clicking "send to" -> "file" -> "complete Record" -> GFF3
I then did: grep NM_014714.4 IFT140.gff3 > ift140.gff to get the relevant gff file
I then ran: python gff_cds_to_pymol_script.py -g IFT140.gff3 > color_ppyr_ift140.pml
This gave me:

Parsing GFF file IFT140.gff3 for None
Finished parsing IFT140.gff3, read 37 lines
WARNING COULD NOT FIND ANY EXONS FOR None
Found 0 exons, 0 bases for 0 residues
0 codons are out of phase

I have tried: python gff_cds_to_pymol_script.py -g IFT140.gff3 -i AF-Q96RY7-F1-model_v1.pdb > color_ppyr_ift140.pml
(where AF-Q96RY7-F1-model_v1.pdb is the pdb file for IFT140 taken from the alphafold website) but this give me the same error as above
I have also tried:
python gff_cds_to_pymol_script.py -g ift140.gff -i NM_014714.4 > color_ppyr_ift140.pml but this give me the same error as above

Your help would be much appreciated.

All the best

wrf / pdbcolor Goto Github PK

pdbcolor's People

Contributors

Stargazers

Watchers

Forkers

pdbcolor's Issues

Some confusions

Unable to extract exons from gff file

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent