mpdunne / omgene Goto Github PK
View Code? Open in Web Editor NEWMutual optimisation of gene models through gene orthology
Mutual optimisation of gene models through gene orthology
Hi there,
I'm trying to run omgene, but it seems that the embedded bedtools commands are out of date (My install has bedtools v2.27.1 - the most recent version). For example, when I run omgene on my .tsv file, it works for a time then produces this error:
python2 /lab/solexa_weng/testtube/omgene/omgene.py -i test.tsv
loading and checking input data locations...
Grabbing cds and aa sequences for inputted transcripts...
Traceback (most recent call last):
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 3421, in <module>
go(path_inf, path_ref, path_resultsDir, path_wDir, minintron, minexon, int_numCores, int_slopAmount)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 3318, in go
dict_generegions = prepareGeneregions(dict_seqInfo, dict_genomeInfo, path_wDir, int_numCores,int_slopAmount)#qe
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 1111, in prepareGeneregions
path_generegion = writeGeneRegionFile(line, generegion, path_mDir_l + "/" + str(generegion) + ".gtf")
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 1082, in writeGeneRegionFile
".", line[3], ".", "transcript_id \"" + generegion + "\"; gene_id \"" + generegion + "\"", generegion]
IndexError: list index out of range
The "line" variable referenced seems to be the output of a bedtools stream from a previous command, and indeed if I print(line), it has only 3 columns. This patch (photocyte@9f48d5f) that uses the -c and -o parameters of bedtools seems to fix it, but I'm not sure I implemented it properly, as the script crashes later on with this problem:
python2 /lab/solexa_weng/testtube/omgene/omgene.py -i test.tsv
Checking installed programs...
loading and checking input data locations...
Grabbing cds and aa sequences for inputted transcripts...
Traceback (most recent call last):
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 3421, in <module>
go(path_inf, path_ref, path_resultsDir, path_wDir, minintron, minexon, int_numCores, int_slopAmount)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 3318, in go
dict_generegions = prepareGeneregions(dict_seqInfo, dict_genomeInfo, path_wDir, int_numCores,int_slopAmount)#qe
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 1151, in prepareGeneregions
dg["cdsBase_data"] = str(readSeqs(dg["cdsBase"])[0].seq)
IndexError: list index out of range
Any ideas?
All the best,
-Tim
Hi there Michael,
I'm trying to repair a gene model that has a bit of an error, at the C-terminus. If I exclude this gene from the omgene optimization, the whole script works, but if I include it (ILUMI_23468), BioPython complains, and then omgene crashes later.
See .zip file linked below for the .tsv, .gtf, and .gff3 files. One thing that seems strange between the ILUMI_23468 .gff3 and .gtf file is that there is a stop codon in the .GTF, but the translated peptide of the .GFF3 doesn't have a stop codon. The .GTF was produced with the script you recommend on the omgene main page.
files.zip
(The genome reference FASTAs can be found here http://www.fireflybase.org/firefly_data.html)
omgene output here:
Checking installed programs...
loading and checking input data locations...
Grabbing cds and aa sequences for inputted transcripts...
/usr/local/lib/python2.7/dist-packages/Bio/Seq.py:2309: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning)
Relativising sequences...
Performing first round exonerate...
Exonerating and piling sequences against sequence regions...
Performing second round exonerate...
Performing third round exonerate...
done getting options
Fixing 0
Fixing 1
Fixing 2
Fixing 3
Fixing 4
Fixing 5
Fixing 6
Fixing 7
Fixing 8
Fixing 9
Fixing 10
Fixing 11
Fixing 12
Traceback (most recent call last):
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 3438, in <module>
go(path_inf, path_ref, path_resultsDir, path_wDir, minintron, minexon, int_numCores, int_slopAmount)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 3369, in go
res = fixIt(adj, parts, dict_generegions, path_wDir, minintron, minexon, path_winnersAln)#qe
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 1816, in fixIt
res = incrementalFixRecursive(res, path_fix, minintron, minexon, path_winnersAln, d_gr)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 1838, in incrementalFixRecursive
res, prevAln = incrementalFix(inparts, path_iDir, mi, mx, path_winnersAln, d_gr, tTerminal=tTerminal)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 1829, in incrementalFix
return processLabels(labels, options, p_lDir, path_winnersAln, False, refine)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 3088, in processLabels
winners = chooseWinnersRef(options, path_fDir, path_refAln = path_refAln, doubleCheck = True, orAln = True, parallelCheck = True)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 2607, in chooseWinnersRef
return chooseThem(path_allAln, tryBlank, seqlookup, seqlookup_rev, path_wDir)
File "/lab/solexa_weng/testtube/omgene/omgene.py", line 2740, in chooseThem
w.id = seqlookup[k][winner[k].id]
KeyError: 'generegion_5.option_0'
Thoughts?
All the best,
-Tim
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.