mfiers / hagfish Goto Github PK
View Code? Open in Web Editor NEWHagfish - assess an assembly through creative use of coverage plots
License: GNU General Public License v3.0
Hagfish - assess an assembly through creative use of coverage plots
License: GNU General Public License v3.0
Generate documentation for the plots here: https://github.com/mfiers/hagfish/wiki/Plots
(as requested by (@alpapan)
I'm trying to run hagfish_extract
and am getting the following error:
[smoss@biolserva pacbio_assembly]$ hagfish_extract pbreads_to_pbasm_blasr.sorted.bam
/home/smoss/.local/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
/home/smoss/.local/lib/python2.7/site-packages/numpy/core/_methods.py:71: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/home/smoss/tools/hagfish/hagfish_extract", line 643, in <module>
stats = doStats(bamBase, seqInfo, readPairs)
File "/home/smoss/tools/hagfish/hagfish_extract", line 293, in doStats
label='Peak top (%d)' % int(topInsert))
ValueError: cannot convert float NaN to integer
Allow the visualization of (simple) markers in a plot.
(as requested by @alpapan)
Hello,
thanks for the tip on this package on Biostar!
I've spent the last couple of days playing around with test-data and I can't get the library to plot, I ran hagfish-extract/hagfish_gapfinder/hagfish_coverage_combine and then hagfish_cplot2:
python hagfish_cplot2 -n 5e3 --ymax 200 chrom3_v3
where chrom3 is my chromosome I aligned reads to (for comparison I used bowtie with the parameters from the wiki).
I get the following error (same when I don't give any parameters):
Traceback (most recent call last):
File "hagfish_cplot2", line 159, in <module>
plot.save(tag='cplot2')
File "/HPC/home/s4279812/work/TestingHagfish/hagfish/hagfishUtils.py", line 426, in save
bbox_inches='tight')
File "/bio1/opt/python/lib/python2.7/site-packages/matplotlib-1.0.0-py2.7-linux-x86_64.egg/matplotlib/pyplot.py", line 363, in savefig
return fig.savefig(*args, **kwargs)
File "/bio1/opt/python/lib/python2.7/site-packages/matplotlib-1.0.0-py2.7-linux-x86_64.egg/matplotlib/figure.py", line 1084, in savefig
self.canvas.print_figure(*args, **kwargs)
File "/bio1/opt/python/lib/python2.7/site-packages/matplotlib-1.0.0-py2.7-linux-x86_64.egg/matplotlib/backend_bases.py", line 1852, in print_figure
**kwargs)
File "/bio1/opt/python/lib/python2.7/site-packages/matplotlib-1.0.0-py2.7-linux-x86_64.egg/matplotlib/backends/backend_agg.py", line 438, in print_png
FigureCanvasAgg.draw(self)
File "/bio1/opt/python/lib/python2.7/site-packages/matplotlib-1.0.0-py2.7-linux-x86_64.egg/matplotlib/backends/backend_agg.py", line 393, in draw
self.renderer = self.get_renderer()
File "/bio1/opt/python/lib/python2.7/site-packages/matplotlib-1.0.0-py2.7-linux-x86_64.egg/matplotlib/backends/backend_agg.py", line 404, in get_renderer
self.renderer = RendererAgg(w, h, self.figure.dpi)
File "/bio1/opt/python/lib/python2.7/site-packages/matplotlib-1.0.0-py2.7-linux-x86_64.egg/matplotlib/backends/backend_agg.py", line 59, in __init__
self._renderer = _RendererAgg(int(width), int(height), dpi, debug=False)
ValueError: width and height must each be below 32768
I've tried playing around with the -W and -H parameters but no dice, do you have any idea?
Dear Mark
Is it possible to have a small demo/test in the repo? I'd like to show your program to my engineer and as he is not a bioinformatician he would need some basic demo. I'm thinking of asking him to create a JS application maybe using jbrowse or a new framework using a standard library but first we need to scope it.
I'm happy to provide that if you want (but might be easier if you did) and one can even use it for unit testing?
it wouldn't need to be more than say 100 reads and say 500 bp...
thanks
a
Thank you for developing this tool !! I wonder if it works (or you have tested it) for mammalian size genomes? We ran it for a couple of days (with 200GB bam file) and it gave an error:
Traceback (most recent call last):
File "./hagfish_extract", line 636, in
readPairs = parseBam(seqInfo, inputFile)
File "./hagfish_extract", line 440, in parseBam
stop2 = rp['stop2'])
File "/u/projects4/pashadag/pham/others/anna/hagfish/hagfish_file_util.py", line 23, in np_savez
cPickle.dump(kwargs[k], F,0)
SystemError: error return without exception set
Thank you!
Hello Mark
After Ross showed some excellent graphs, I've been trying to trial your software. I'm having problems parsing the BAM files from the Kanga aligner; it has the following header:
@sq AS:genome SN:scaffold_0 LN:5118119
but sadly hagfish_extract uses an array to determine sequence length rather than doing a regex match for LN:\d+. could we please fix it (sorry I'm not a python person qualified submit a patch)
many thanks
alexie
Export hagfish plots in WIG format - for visualization in other tools, such as IGV.
wiggle: http://genome.ucsc.edu/goldenPath/help/wiggle.html
(as requested by @alpapan)
Hi again,
my workgroup prefers to work with SOAPaligner and there's a tiny bug in Hagfish related to that:
When I convert SOAP-output to SAM via soap2sam.pl and get the header using
samtools view -bt bla.fai output.sam > output.bam
hagfish_extract doesn't work - because
samtools view -f 67
in hagfish_extract filters for the "paired"-tag that, for some reason, isn't inserted by samtools, so hagfish_extract has 0 lines to work on. Removing the "-f 67" part works perfectly (I get nice plots at least) because SOAP has three types of output files (paired reads, unpaired reads and unmapped) so it's already ensured that all reads in the bam-file are paired.
Maybe some extra-flag like "output is converted SOAP" is needed, so hagfish-extract runs samtools view without "-f 67"?
Hello
Reading the BAM file is great but some times we don't want to have the overhead of creating a bam if the file is not stored. Can we allow for SAM files (by adding the -S option in the two samtools view commands?)?
Easiest is for user to specify it is a SAM with --sam
thank you
a
Hello
Another small feature request:
the option to use a specified library (bam file) so that e.g.
-library 1_2_vs_contigs_sopra.fasta.sam.fix
this is used
coverage/1_2_vs_contigs_sopra.fasta.sam.fix/scaffold_20.coverage.r_ok_ends.gz
instead of
combined/scaffold_20_20.coverage.r_ok_ends.gz
(should be straightforward to implement, no?).
once we have that, i can make hagfish or wiggle graphs and see how each library behaves (e.g compare 20kb with 3kb)...
For extra points (but extra time) then multiple libraries can be given, reading each file it turn (or also the wildcard 'all'), negating thus the need to run hagfish_combine....?
thanks
alexie
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.