In the final printing step of sga-astat.py, there's a line that includes a division:
print '%s\t%d\t%d\t%d\t%f\t%f' % (cd.name, cd.len, cd.nlen, cd.n, cd.n / (cd.nlen * arrivalRate), cd.astat)
This line resulted in an error when I ran it on my bowtie2-mapped reads (with no minimum contig length set) against assembled contigs:
$ ~/bin/sga-astat.py enterococci_all_vs_supergenome.bam > contigs.astat
Reading alignments from enterococci_all_vs_supergenome.bam
Initial arrival rate: 2.67734681196
Initial genome size estimate: 8412774
Iteration 0 arrival rate: 1.63762399084
Iteration 0 genome size estimate: 13754021
Iteration 1 arrival rate: 1.27047154511
Iteration 1 genome size estimate: 17728783
Iteration 2 arrival rate: 1.17208944432
Iteration 2 genome size estimate: 19216890
Using genome size: 19216890
Using arrival rate: 1.17208950043
Traceback (most recent call last):
File "/home/gringer/bin/sga-astat.py", line 175, in
print '%s\t%d\t%d\t%d\t%f\t%f' % (cd.name, cd.len, cd.nlen, cd.n, cd.n / (cd.nlen * arrivalRate), cd.astat)
ZeroDivisionError: float division by zero
Obviously this error is happening because cd.nlen is zero, which can only happen if the contig size is one less than the average read length:
[line 121] cd.nlen = cd.len - avgReadLen + 1
Other bits in the code seem to assume that this value, cd.nlen, is positive, e.g. line 131: [bootstrapLen += cd.nlen].
One fix for this particular division by zero error is to ensure that cd.nlen is always positive by making the following change:
[line 121] cd.nlen = cd.len - avgReadLen + 1 if (cd.len > avgReadLen) else 1
I'm not sure how this would affect other code regions. Based on the description, "number of positions
that can generate a read of length avgReadLen", this should probably be 'else 0' rather than 'else 1', but then the division by zero error would occur for all situations where the contig length is less than the read length, rather than the current state of only when it's exactly one bp less. Given that it gets through the code fine until this point, an alternative change consistent with this description would be the following:
[line 121] cd.nlen = cd.len - avgReadLen + 1 if (cd.len > avgReadLen) else 0
[line 175-176]
nstat = cd.n / (cd.nlen * arrivalRate) if (cd.nlen > 0) else 0
print '%s\t%d\t%d\t%d\t%f\t%f' % (cd.name, cd.len, cd.nlen, nstat, cd.astat)
Bearing in mind that this would push the zero problem further downstream. I haven't looked at what would happen to sga scaffold if it found a zero value (or negative value) in this column.