Coder Social home page Coder Social logo

wrf / pdbcolor Goto Github PK

View Code? Open in Web Editor NEW
30.0 4.0 5.0 15.77 MB

Python code to color a PDB structure based on parameters from a multiple sequence alignment

Python 88.61% R 11.39%
protein protein-structure evolution visualization pymol pdb-structure raxml phylobayes

pdbcolor's People

Contributors

wrf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pdbcolor's Issues

Some confusions

Recently, I wanted to color a pdb structure in PyMol by my self-defined and extracted parameter about each amino acid. I felt so lucky to find this primal code which could finish this job perfectly.
However, I encountered some confusions while using the code.

After reading README.md file, I found out I could use pdb_color_generic.py to fulfill my purpose.
At the beginning, I tried the exact same example, the 3rze.pdb and 3rze.map.rates_features.csv, to help myself get familiar with this code.

I had done every step exactly as the instruction described. However, while I run the command line "pdb_color_generic.py -c 4 -d , -p 3rze.pdb -i 3rze.map.rates_features.csv -l blue -g dnds --exclude-first-group > 3rze.color_by_dnds.pml", an error popped out, which indicated 'TypeError: Nontype object is not iterable'.
The wrong syntax was "For seqid in seqidlist".
The reason for this error seemd to be because seqidlist should be given by users.

In the get_chains_only function, seqidlist parameter should be input by args.sequence, but args.sequence was not given in the command and its default value was None. And this would give the error when running "for seqid in seqidlist".

In the defined parser, parser.add_argument("-s","--sequence", nargs="*", help="sequence ID for PDB, give multiple names if data is available in the input file") points out this argument is necessary, while the previous command did not contain any "-s" information.

I was not sure I run this code in the right way or maybe I missed something.
The seqidlist seems that it should be consist of a list about proteinid so that "for seqid in seqidlist" and "if seqid.find('proteinid') > -1" in the code could work.

After this, I simply defined a function to help me find those proteinids in 3rze.pdb file and put them into a seqidlist. Then, the code could run through.
###-----------------------------------------------------------------------------------------------------------------------------###
Further analyzing the code, I was confused about the defined function read_generic_data:
(part code)
if type(chaincolumn) is int:
chain = lsplits[int(chaincolumn)]
else:
chain = "A"

chaincolumn seems should be str type since the 'else' syntax indicates default chain is 'A'.
However, the type of chaincolumn in the args.chain_column is always str and this function will always set the varible "china" to 'A', no matter what chain_column you enter in the first place.
As for me, my input file has a chaincolumn which contains chain characters defined to indicate which chain each amino acid comes from, such as 'A', 'B', 'C' etc. Thus, this function set all my chains into 'A'.
This problem could be fixed by modifying the parse argument '--chain-column' with 'type = int' added in the parser.add_argument() sentence.
###-----------------------------------------------------------------------------------------------------------------------------###
In the make_output_script function:

#assign whole chain to lowest color, then build up
wayout.write("color {}{:02d}, chain {}\n".format( basecolor, int(value*binname_correction), chain ) )

this part seems could not achieve the purpose of setting the whole chain to the lowest color. The color parameter is
int(value*binname_correction) and the 'value' is generated by the following syntaxes:

for chain in keepchains.keys():
chainoffset = refoffsets.get(chain, 0)
scoregroups = defaultdict(list) # key is percent group, value is list of residues
# for each residue, assign to a bin
for residue in scoredict[chain].keys():
esiduescore = scoredict[chain].get(residue,0.00)
for i,value in enumerate(binvalues[:-1]):
upper = binvalues[i+1]
f residuescore < upper:
scoregroups[value].append(residue - chainoffset)
break

Assuming that keepchains.keys() contains four chain letters: 'A','B','C',and 'D'. For each iteration, let's say 'A' for example, each residue in socredict['A'].keys() will run the last 'for' iteration and genrate its own 'i' and 'value'.
In this case, after every residue in chain 'A' is traversed, the 'i' and 'value' will be set according to the last residue of
chain 'A'. And after all the chains is traversed, the 'i' and 'value' will be set according to the last residue of
chain 'D'. Still, I do not know how to fix this and considering targetcolors could be reversed, how should I find the lowest color for the whole chain is another problem the function should deal with.

###-----------------------------------------------------------------------------------------------------------------------------###
At last, I was confused about the function of 'refoffsets' variable. What does this varible do? To mark down the right sites of amino acid residues in PDB files? I could not say.
As for me, I simply delete 'chainoffset' in the following syntaxes:

[from make_output_script function]

for chain in keepchains.keys():
chainoffset = refoffsets.get(chain, 0)
scoregroups = defaultdict(list) # key is percent group, value is list of residues
# for each residue, assign to a bin
for residue in scoredict[chain].keys():
esiduescore = scoredict[chain].get(residue,0.00)
for i,value in enumerate(binvalues[:-1]):
upper = binvalues[i+1]
f residuescore < upper:
#scoregroups[value].append(residue - chainoffset)
scoregroups[value].append(residue)
break

and my PDB file in PyMol could select the right residues I wanted to color.

Unable to extract exons from gff file

Firstly, thank you for this awesome tool! I am not really used to using python but was hoping you could help me. I am trying to annotate the exons and introns onto the protein structure for the gene IFT140

  • I have downloaded the gff3 file for the gene of interest IFT140: https://www.ncbi.nlm.nih.gov/nuccore/NM_014714.4?report=genbank&to=5232 by clicking "send to" -> "file" -> "complete Record" -> GFF3
  • I then did: grep NM_014714.4 IFT140.gff3 > ift140.gff to get the relevant gff file
  • I then ran: python gff_cds_to_pymol_script.py -g IFT140.gff3 > color_ppyr_ift140.pml
  • This gave me:

Parsing GFF file IFT140.gff3 for None
Finished parsing IFT140.gff3, read 37 lines
WARNING COULD NOT FIND ANY EXONS FOR None
Found 0 exons, 0 bases for 0 residues
0 codons are out of phase

  • I have tried: python gff_cds_to_pymol_script.py -g IFT140.gff3 -i AF-Q96RY7-F1-model_v1.pdb > color_ppyr_ift140.pml
    (where AF-Q96RY7-F1-model_v1.pdb is the pdb file for IFT140 taken from the alphafold website) but this give me the same error as above

  • I have also tried:

  • python gff_cds_to_pymol_script.py -g ift140.gff -i NM_014714.4 > color_ppyr_ift140.pml but this give me the same error as above

Your help would be much appreciated.

All the best

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.