arq5x / bedtools2 Goto Github PK
View Code? Open in Web Editor NEWbedtools - the swiss army knife for genome arithmetic
License: MIT License
bedtools - the swiss army knife for genome arithmetic
License: MIT License
Hi Guys,
We have users running bedtools on our compute clusters have have noticed that the IO requests issued are 4KB. This is a breakdown of the IO sizes as seen by the Lustre llite VFS layer on Lustre clients for a run of bed tools over a couple of minutes:
extents calls % cum% | calls % cum%
0K - 4K : 14099274 99 99 | 0 0 0
4K - 8K : 0 0 99 | 0 0 0
8K - 16K : 0 0 99 | 0 0 0
16K - 32K : 0 0 99 | 0 0 0
32K - 64K : 0 0 99 | 0 0 0
64K - 128K : 0 0 99 | 0 0 0
128K - 256K : 0 0 99 | 0 0 0
256K - 512K : 0 0 99 | 0 0 0
512K - 1024K : 0 0 99 | 0 0 0
1M - 2M : 0 0 99 | 0 0 0
2M - 4M : 88 0 100 | 0 0 0
Could you employ input and output buffering or make the IO size a tunable? Running 1000s of bedtools run at the same time is causing a lot of small IO traffic to our Lustre servers.
Anyone seen something similar?
After the 2.17 release I get an error when intersecting a BED file with a VCF that contains genotype information (i.e. sample VCF file)
In my query VCF, containing genotypes for 100 samples, I get the following error in 2.19 release:
intersectBed -a query.vcf.gz -b target.bed.gz -wa -wb
Error: line number 1 of file query.vcf.gz has 109 fields, but 10 were expected.
This worked fine in previous releases.
Using >=2.19.0 with a BED query and a BAM database, binary output is added at the end of the (supposedly BED) output when using the -c option (and possibly other options). The BED output appears to be correct, but binary output is incorrectly tacked onto the end. I suspect this is a BAM EOF footer that is incorrectly being added based on the fact that the database is a BAM file.
bedtools intersect -a hg19.100K.windows.gatk.bed -b ../bam/s05-R-2379-AJ-0009.bwamem.sort.dedup.bam -c -sorted > s05-R.100k.bedg
tail -20 s05-R.100k.bedg
Y 58800000 58900000 1115
Y 58900000 59000000 11942
Y 59000000 59100000 3851
Y 59100000 59200000 0
Y 59200000 59300000 0
Y 59300000 59373566 0
�BC
mUKh^U���S)
The BED3 intervals reported when merging BAM input files appear to be correct. However, the remaining 9 columns of the full BED12 records emitted don't make sense.
Specifically, the read names and strands assigned to the merged record are odd because it appears that they are just chose from the first record in the merged block. Perhaps until -c
and -o
functionality are possible for BAM files we should just emit BED 3.
$ samtools view ../BEDTools/testingData/NA18152.bam | cut -f 1-6 | head
NA18152-SRR007381.35051 16 chr1 554305 16 9M2D2M2D2M2D314M
NA18152-SRR007381.637219 16 chr1 554305 16 9M2D2M1D3M2D295M
NA18152-SRR007381.730912 16 chr1 554305 16 9M2D2M2D2M2D333M
NA18152-SRR007381.1166916 16 chr1 554305 15 9M2D2M2D2M2D15M1I16M1I415M
NA18152-SRR007381.1281127 16 chr1 554305 4 9M2D2M2D2M2D52M1D236M
NA18152-SRR007381.287200 16 chr1 554310 15 5M1D1M1D6M1I41M1I58M1D347M
NA18152-SRR007381.1466069 16 chr1 554324 24 8M1I6M1I217M1I109M
NA18152-SRR007381.1339811 1040 chr1 554335 39 106M1I178M
NA18152-SRR007381.591703 16 chr1 554338 29 132M1I33M
NA18152-SRR007381.437387 16 chr1 554346 31 42M1I56M1D347M
$ bin/bedtools merge -i ../BEDTools/testingData/NA18152.bam | head
chr1 554304 560167 NA18152-SRR007381.35051 - -1 560167 0,0,0 1 0, 0,
chr1 714220 714373 NA18152-SRR007381.251923 + -1 714373 0,0,0 1 0, 0,
chr1 780202 780556 NA18152-SRR007381.1452392 36 - 780202 780556 0,0,0 1 354, 0,
chr1 810530 810706 NA18152-SRR007381.1012740 55 - 810530 810706 0,0,0 1 176, 0,
chr1 825299 825688 NA18152-SRR007381.317662 37 - 825299 825688 0,0,0 1 389, 0,
chr1 847668 848139 NA18152-SRR007381.158556 + -1 848139 0,0,0 1 0, 0,
chr1 860064 860310 NA18152-SRR007381.329161 10 - 860064 860310 0,0,0 1 246, 0,
chr1 863981 864228 NA18152-SRR007381.1450839 48 - 863981 864228 0,0,0 1 247, 0,
chr1 875876 876203 NA18152-SRR007381.732122 9 + 875876 876203 0,0,0 1 327, 0,
chr1 888310 888749 NA18152-SRR007381.814042 - -1 888749 0,0,0 1 0, 0,
The validation stage is supposed to make sure all options given are allowed with the detected file types. However, it doesn't appear to be recognizing file types at that point. For example:
bedtools2-release/bin/bedtools intersect -a a.bam -b a.bed -header
Should give a warning that the header option doesn't work with a BAM query unless BED output is specified with the -bed option. However, this warning does not occur.
Currently -f applies (confusingly) to the -bed file.
https://groups.google.com/forum/#!topic/bedtools-discuss/hzRnjQfhyZM
Currently, it seems that, for sequence names in FASTA files, only the first word after ">" is taken into account. The code even contains a comment "just write the first component of the name, for compliance with other tools".
Would you agree to change this behavior? Or add an option for it? Would it break many things in the rest of the code? (And out of curiosity, which tools are referred to in the comment?)
Although it is easy to let the user fix this, it's a hassle to do that for genome reference sequences containing thousands (or more) of contigs.
Hi,
I am running a Duplex-Seq pipeline that requires using bedtools.
All the sequence comes from a relatively small amplicon (90 bp), which is a p53 sequence. I have a sorted, indexed .bam file as the -abam input an a one-line .bed file for the p53 region as the -b input, but the resulting .bam file (which should have the overlapping reads) is empty. I have checked the alignment, and the coordinates are correct. In fact, I also tried using a .bed file specifying the entire chr17, and I still don't get any reads in the resulting .bam file
Can you please help me troubleshoot what might be going on?
Thanks,
Charles
The old version of bedtools was able to recognize and ignore the DOS end-of-line characters, represented by "\r\n", or "^M" on the command line. PFM needs to be able to do this as well. Currently it throws errors on files containing these, saying it can not recognize the format.
discovered it is 2.17, but replicated it is latest version from git (v2.19.1-10-g377f1b1):
If an indel in the first file overlaps multiple variants from the second file, the indel is repeated multiple times in subtractBed output. Here is an example:
rvijaya@ubuntu:~/misc/temp$ cat testbed1.vcf
chr1 5935162 rs1287637 AAAAAAAAAAAAAAA T 14677.8 PASS AC=2;AF=1.00;AN=2;DP=625;Dels=0.00 GT:AD:DP:GQ:PL 1/1:0,625:625:99:14706,1035,0
rvijaya@ubuntu:~/misc/temp$ cat testbed2.vcf
chr1 5935162 rs1287637 A T 14677.8 PASS AC=2;AF=1.00;AN=2;DP=625;Dels=0.00 GT:AD:DP:GQ:PL 1/1:0,625:625:99:14706,1035,0
chr1 5935168 rs3747992;COSM426517 G A 20400.8 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=24.992;DP=2096;Dels=0.00 GT:AD:DP:GQ:PL 0/1:1048,1046:2096:99:20429,0,19945
rvijaya@ubuntu:~/misc/temp$ ~/software/bedtools2/bin/subtractBed -a testbed1.vcf -b testbed2.vcf
chr1 5935162 rs1287637 AAAAAAAAAAAAAAA T 14677.8 PASS AC=2;AF=1.00;AN=2;DP=625;Dels=0.00 GT:AD:DP:GQ:PL 1/1:0,625:625:99:14706,1035,0
chr1 5935162 rs1287637 AAAAAAAAAAAAAAA T 14677.8 PASS AC=2;AF=1.00;AN=2;DP=625;Dels=0.00 GT:AD:DP:GQ:PL 1/1:0,625:625:99:14706,1035,0
I have 2 files
a.bed
chrUn_gl000220 130081 130187 2364_IonXpress_010 36 +
b.bed
chrUn_gl000220 5406 5414 XYZ 8 +
chrUn_gl000220 12203 12211 XYZ 8 -
chrUn_gl000220 25451 25459 XYZ 8 -
chrUn_gl000220 28956 28964 XYZ 8 -
chrUn_gl000220 37582 37590 XYZ 8 -
chrUn_gl000220 90417 90425 XYZ 8 +
chrUn_gl000220 120950 120958 XYZ 8 -
When running closest with the following options:
bedtools closest -S -D a -a a.bed -b b.bed
I got the expected:
chrUn_gl000220 130081 130187 2364_IonXpress_010 36 + chrUn_gl000220 120950 120958 XYZ 8 - -9124
However with this additional option:
bedtools closest -S -D a -iu -a a.bed -b b.bed
I got nothing. The problem seems to come from the fact that the interval from the a.bed file is close to the extremity of the chrUn and there is not a single interval in b.file which fits the -iu constraint. But instead of keeping the original entry of the a.bed file adding a "none" or another negative mark, it removes it.
Is it a bug or the expected behavior of this function?
PS: I use Bedtools 2.17.0
The memory usage has jumped significantly from 2.17 to 2.18 for unsorted data. This is primarily owing to the fact that both numeric and string versions of many fields are stored for each record. We should look into opportunities for reducing this footprint if at all possible to facilitate better scaling for unsorted datasets.
The counter-argument, of course, is that for large datasets, one should really be using pre-sorted data.
See this thread:
https://groups.google.com/forum/#!topic/bedtools-discuss/D04h7-o91_o
Hello,
I'd like to know what the most up-to-date python bindings are and whether they work with bedtools2.
Thanks.
-- Arjan
Union is misleading because it is actually Union-Intersection
http://bedtools.readthedocs.org/en/latest/content/tools/jaccard.html#comment-1219116947
Hi Aron,
As we have all our genomic fasta gziped for using with bwa, I was wondering if bedtools getfasta can work with gzipped data.
Pablo.
I was looking at the print method's code for GFF records, and believe it may be incorrect. We should double check, and add unit tests for it if we don't already have it.
Newest github repo.
test_hg19.bed
chr1 36337 36537
chr1 780737 781137
chr1 948337 949337
slopBed -i test_hg19.bed -g hg19.genome -b 3000000000 > test_slop.bed
and the content of test_slop.bed:
chr1 0 -2147447111
chr1 0 -2146702511
chr1 0 -2146534311
Here I just want to use it in a diffrenet cutoff loop, it works with small -b, but failed at big value.
Neil and Aaron;
We're running into build errors with 2.18.0 on RedHat/CentOS systems. It build cleanly on Ubuntu but got two separate reports of failures on RedHat systems (bcbio/bcbio-nextgen#219 (comment) and bcbio/bcbio-nextgen#220 (comment)).
I can replicate this on a CentOS 5 system:
$ uname -a
Linux hsph01.rc.fas.harvard.edu 2.6.18-274.12.1.el5 #1 SMP Tue Nov 29 13:37:46 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
hsph01:~/bio/cloudbiolinux $ cat /etc/redhat-release
CentOS release 5.7 (Final)
The final error message is:
obj/InputStreamMgr.o: In function `InputStreamMgr::detectBamOrBgzip(int&, int)':
InputStreamMgr.cpp:(.text+0x534): undefined reference to BamTools::BamReader::OpenStream(std::istream*)'
and here's a full build log:
https://gist.github.com/chapmanb/7981795
Happy to provide any more information that could help. Thanks much,
Brad
Currently, even with sorted data, the algorithm is two-pass. It can reallty be done in one pass. As such, memory and runtime will be reduced.
When running bedtools intersect -a fileA.bed -b fileB.bed -wa the output forces proper bed format for column 5 (strand) in the original file instead of reporting the contents of the original file.
For example if the fileA input region were:
chr1 16110 16390 CTCF 227 K562,HeLa-S3,MCF-7,HCPEpiC
The output if it overlapped a feature is this:
chr1 16110 16390 CTCF 227 .
whereas in 2.17 it was this:
chr1 16110 16390 CTCF 227 K562,HeLa-S3,MCF-7,HCPEpiC
Basically, -split detects overlaps with blocked records just fine, but then -wo is used, the number of base pairs reported reflects what would be reported were -split not used.
For example, if you had a two block SAM record with blocks from 0:10 and 20:30 and you also had a BED from 5:15, -wo with -split would return 10 instead of 5.
Also, when -f is also requested, the result is incorrect owing to the fact that the number of overlapping bases is miscomputed prior to testing on -f.
We need to fix thisd and put in appropriate unit tests.
Details here: https://code.google.com/p/bedtools/issues/detail?id=165
Hi,
I am doing an intersectBed with two files, bedA.bed and bedB.bed. Both are presorted, and one is being piped through standard input. I notice the following two commands give different results:
Namely, option #2 seems to give the 'wrong' answer. Running without the -sorted option gives the correct answer no matter which is in -a and -b.
I realize this may be because of how the memory sweep algorithm works, but could you put this in the intersectBed documentation directly as a warning?
Hello,
I am developing a C++ package doing eQTL detection and I licensed it under GPLv3+ (the "+" meaning "or any later version"). By looking at the README.md or the source code (say "bedFile.h"), bedtools is licensed under GPLv2. Thus, I can't use code from bedtools2 into my package, see here.
Maybe you could switch to GPLv3+. But I can understand that you want to keep GPLv2, as switching can disrupt things for other developers. However, would it be possible to unambiguously specify in bedtools2 that you allow usage of the code under GPLv2 as well as later versions? That is, to switch to GPLv2+?
In fact, this is suggested in the LICENSE file itself, in section 9. Practically, this would require adding "or any later version" in the README.md file (and also in the header of the source files, to avoid confusion).
Thanks,
Tim
There was a user request for the sample tool to keep mate pairs intact when the input is BAM. I believe this is doable.
Hi There,
I am just looking at groupBy usage and was wondering how can we use it with an ops that is not on the list which is "difference" or subtraction
Is that possible ?
Thanks
"mergeBed -s -nms -i z0 will put two consecutive tabs in the output that causes problems in the later processes"
z0 contents:
chr1 1102483 1102578 MIR200B 0 + 1102578 1102578 0 1 95, 0,
chr1 1103242 1103332 MIR200A 0 + 1103332 1103332 0 1 90, 0,
chr1 1104384 1104467 MIR429 0 + 1104467 1104467 0 1 83, 0,
chr1 3044538 3044599 MIR4251 0 + 3044599 3044599 0 1 61, 0,
chr1 3477259 3477354 MIR551A 0 - 3477354 3477354 0 1 95, 0,
chr1 5624130 5624203 MIR4417 0 + 5624203 5624203 0 1 73, 0,
See discussion with Daniel Klevebring here:
http://bedtools.readthedocs.org/en/latest/content/tools/map.html#comment-1193434939
bedtools sample -i NA18152.bam -n 100 \
| head -n 2
chrX 70381952 70382329 NA18152-SRR007381.808759 46 - 70381952 70382329 0,0,0 1 377, 0,
chr7 100594607 100595034 NA18152-SRR007381.795852 15 + 100594607 100595034 0,0,0 1 427, 0,
This is confusing because it should just output BAM unless the -bed
option is used. Also, the -ubam
option is dsiplayed in the help indicating that the default output for BAM is compressed BAM.
Running intersectBed on a vcf that has a header but no variant calls results hangs forever.
$cat test.vcf
this command never exits:
$intersectBed -a test.vcf -b any.bed
Bedtools does not correctly find intersections for imprecise Structural Variants. The Alt allele is encoded as a or for example with recent VCF format standards. Instead of using the ALT text to determine the length, bedtools should use the SVLEN INFO field that is available to determine the length of the SV.
Right now it is not possible to build bedtools cleanly without zlib installed as root. The Makefile should pass custom include / library header $CXXFLAGS.
This is useful for tools like conda.
export CXXFLAGS = -Wall -O2 -D_FILE_OFFSET_BITS=64 -fPIC $(INCLUDE)
to compile like this:
export INCLUDE="-I$PREFIX/include -L$PREFIX/lib"
make
Bedtools intersect segfaults with the gdb.bam test file in pybedtools. However, this file works fine with bedtools 2.17 and samtools. Interestingly, if one converts it to SAM then back to BAM with samtools, the file works fine. It appears that something is odd with the way 2.18 is interpretting the header in this file or perhaps that the stream that is being created becomes corrupt.
Uma noticed that I had references to the A and B file in the merge help in the sections explaining the -c and -o options. This was a cut and paste from the map help. Just need to remove those bits.
$ make
Building BEDTools:
=========================================================
DETECTED_VERSION = v2.18.0
CURRENT_VERSION =
Updating version file.
* Creating BamTools API
- Building in src/utils/bedFile
- Building in src/utils/BinTree
- Building in src/utils/version
- Building in src/utils/bedGraphFile
* compiling BinTree.cpp
* compiling bedFile.cpp
* compiling version.cpp
* compiling bedGraphFile.cpp
- Building in src/utils/chromsweep
* compiling chromsweep.cpp
In file included from BinTree.cpp:2:
In file included from ../../utils//FileRecordTools/FileRecordMgr.h:16:
../../utils//general/DualQueue.h:48:35: error: declaration of 'T' shadows
template parameter
template <class T, template<class T> class CompareFunc> class DualQueue {
^
../../utils//general/DualQueue.h:48:17: note: template parameter is declared
here
template <class T, template<class T> class CompareFunc> class DualQueue {
^
- Building in src/utils/Contexts
* compiling Context.cpp
1 error generated.
make[1]: *** [../../../obj//BinTree.o] Error 1
make: *** [src/utils/BinTree] Error 2
make: *** Waiting for unfinished jobs....
We need to insure that empty files give no output, rather than errors, as mentioned in another bug.
If there is a header, and -h is used, the header should still be output.
Lastly, the -v option needs to work correctly if the database file is empty.
The previous arq5x/bedtools#30 issue has resurfaced in the new repository:
$ bedtools sort -i /dev/null
Error: The requested file (/dev/null) could not be opened. Error message: (Success). Exiting!
It appears that the isGzipFile()
part of ca50f59 has been reverted by 69d82a7.
Would you like to wrap any pointer data members with the template class "std::unique_ptr"?
Update candidates:
From Aaron:
As you can see in this example: http://bedtools.readthedocs.org/en/latest/content/tools/map.html#mean-compute-the-mean-of-a-column-from-overlapping-intervals, map allows users to detect intersections between A and B and specify a column from the B file that should be used to summarize the hist in B for each record in A.
So, if the user said -c 8 -o mean, it would indicate that the mean of the 8th column (whatever it may be) in B should be calculated across all the records in B that overlao the current record in A.
Thus, it requires that we be able to access columns in Records by position, rather than by name. In 2.17, I had a "fields" vector of strings that stored each column.
Hi Guys
this might be a stupid question... but in USCS (a 0-based coordinate system browser), the coordinates :
chr14:79498951-79499010
correspond to the sequence:
CTAAGCCACACCATAACTGACTTCTAGGCATTCATCTTTCTTCCACTTAAATTCATTCTC
however, using fastaFromBed to retrieve this sequence from the hg19 assembly fasta file with this command:
bedtools getfasta -name -tab -bed coordinates.bed -fi hg19.fasta -fo output_coords.txt
Trims the 1st base from the sequence
TAAGCCACACCATAACTGACTTCTAGGCATTCATCTTTCTTCCACTTAAATTCATTCTC
so in order to get the correct sequence I need to subtract 1 from the start coordinate.
Since both bedtools and USCS use 0-based coordinates , why is this ?
Thank you for your help.
Duarte
PS: Using bedtools v2.17.0
This will allow the map
and merge
tools to apply operations to numbered columns. Relatively straightforward for the core BAM fields. Optional BAM tags could be tricky, especially with the BamTools tag interface. One solution is to start with support just for the core fields.
On OS X 10.7, building the documentation for 2.18.1, I get the following failure for the man file (the html target succeeds):
make -j1 -C docs man
sphinx-build -b man -d _build/doctrees . _build/man
Making output directory...
Running Sphinx v1.1.3
loading pickled environment... done
loading intersphinx inventory from http://docs.python.org/objects.inv...
building [man]: all manpages
updating environment: 0 added, 48 changed, 0 removed
reading sources... [100%] index
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/bedtools-suite.rst:16: WARNING: toctree contains reference to nonexisting document u'content/tools/igv'
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/general-usage.rst:25: WARNING: Enumerated list ends without a blank line; unexpected unindent.
[ a bunch more warnings ]
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/general-usage.rst:166: ERROR: Unexpected indentation.
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/general-usage.rst:184: ERROR: Unexpected indentation.
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/general-usage.rst:161: ERROR: Unknown target name: "aza-z0-9".
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/related-tools.rst:3: WARNING: Duplicate explicit target name: "here".
=======================================================================================
``-both`` Report both the count of hits and the fraction covered from the annotation files
=======================================================================================
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/tools/bamtobed.rst:149: ERROR: Unexpected indentation.
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/tools/bed12tobed6.rst:14: ERROR: Unexpected indentation.
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/tools/bed12tobed6.rst:30: ERROR: Unexpected indentation.
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/tools/bedtobam.rst:12: ERROR: Unexpected indentation.
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/tools/bedtobam.rst:30: ERROR: Unexpected indentation.
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/tools/bedtobam.rst:57: ERROR: Unexpected indentation.
[ tons more similar errors ]
/sw/build.build/bedtools-2.18.1-1/bedtools2/docs/content/tools/unionbedg.rst:148: ERROR: Unexpected indentation.
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
writing... bedtools.1 { content/overview content/installation content/quick-start content/general-usage content/bedtools-suite content/tools/annotate content/tools/bamtobed content/tools/bamtofastq content/tools/bed12tobed6 content/tools/bedpetobam content/tools/bedtobam content/tools/closest content/tools/cluster content/tools/complement content/tools/coverage content/tools/expand content/tools/flank content/tools/genomecov content/tools/getfasta content/tools/groupby content/tools/intersect content/tools/jaccard content/tools/links content/tools/makewindows content/tools/map content/tools/maskfasta content/tools/merge content/tools/multicov content/tools/multiinter content/tools/nuc content/tools/overlap content/tools/pairtobed content/tools/pairtopair content/tools/random content/tools/reldist content/tools/shuffle content/tools/slop content/tools/sort content/tools/subtract content/tools/tag content/tools/unionbedg content/tools/window content/example-usage content/advanced-usage content/tips-and-tricks content/faq content/related-tools }
Exception occurred:
File "/sw/lib/python2.7/site-packages/docutils/writers/manpage.py", line 865, in dedent
self._indent.pop()
IndexError: pop from empty list
This is the sphinx error log file:
# Sphinx version: 1.1.3
# Python version: 2.7.6
# Docutils version: 0.10 release
# Jinja2 version: 2.7.1
Traceback (most recent call last):
File "/sw/lib/python2.7/site-packages/sphinx/cmdline.py", line 189, in main
app.build(force_all, filenames)
File "/sw/lib/python2.7/site-packages/sphinx/application.py", line 204, in build
self.builder.build_update()
File "/sw/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 191, in build_update
self.build(['__all__'], to_build)
File "/sw/lib/python2.7/site-packages/sphinx/builders/__init__.py", line 252, in build
self.write(docnames, list(updated_docnames), method)
File "/sw/lib/python2.7/site-packages/sphinx/builders/manpage.py", line 88, in write
docwriter.write(largetree, destination)
File "/sw/lib/python2.7/site-packages/docutils/writers/__init__.py", line 80, in write
self.translate()
File "/sw/lib/python2.7/site-packages/sphinx/writers/manpage.py", line 35, in translate
self.document.walkabout(visitor)
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 174, in walkabout
if child.walkabout(visitor):
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 174, in walkabout
if child.walkabout(visitor):
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 174, in walkabout
if child.walkabout(visitor):
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 174, in walkabout
if child.walkabout(visitor):
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 174, in walkabout
if child.walkabout(visitor):
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 174, in walkabout
if child.walkabout(visitor):
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 174, in walkabout
if child.walkabout(visitor):
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 187, in walkabout
visitor.dispatch_departure(self)
File "/sw/lib/python2.7/site-packages/docutils/nodes.py", line 1640, in dispatch_departure
return method(node)
File "/sw/lib/python2.7/site-packages/docutils/writers/manpage.py", line 411, in depart_admonition
self.depart_block_quote(node)
File "/sw/lib/python2.7/site-packages/docutils/writers/manpage.py", line 449, in depart_block_quote
self.dedent()
File "/sw/lib/python2.7/site-packages/docutils/writers/manpage.py", line 865, in dedent
self._indent.pop()
IndexError: pop from empty list
after creating a BAM file by aligning RNA sequence to a small file of coding sequences bedtools genomecov -d did not report sequences with zero coverage over the entire length.
Is this a feature or a bug ?
$samtools idxstats ARC10_.unique.bam > ARC10_.unique.idxstats # get output of idxstats
$cat ARC10_.unique.idxstats | grep -v "^*" | wc -l # count the number of entries in idxstats
233
$cat ARC10_.unique.idxstats | cut -f3 | grep -v '^0' | wc -l # count the number of entries in idx stats without zero coverage
158
$./bedtools --versionbedtools v2.19.1-41-g656fd84 # show we are using current version of bedtoos
$ ./bedtools genomecov -dz -ibam ARC10_.unique.bam > ARC10_.unique.depth # output all coverage of bedtools
$ cat ARC10_.unique.depth | cut -f1 | uniq | wc -l # count the number of different sequences in bedtools
158
I hope to get small example fasta and SAM file together to better demonstrate I hope the above is understandable
While compiling the package on Gentoo Linux I get a warning about your coding style: ;-)
may exhibit random runtime failures.
For certain BED files, multicov will output Could not open input BAM files.
when given 2 files. However, when run with either of these files independently, it will run without problem. Below is a script to recreate the problem tested on ed71c8e .
The BAMs show no errors with samtools flagstat $BAM
.
wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhHistone/wgEncodeSydhHistoneMcf7H3k27acUcdAlnRep1.bam
wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhHistone/wgEncodeSydhHistoneMcf7H3k27me3bUcdAlnRep1.bam
echo $'chr1\t100\t200' | ./bin/bedtools multicov -bams \
wgEncodeSydhHistoneMcf7H3k27acUcdAlnRep1.bam \
wgEncodeSydhHistoneMcf7H3k27me3bUcdAlnRep1.bam \
-bed -
Many of the VectorOps methods (e.g., mean) can be computed in one pass of the data instead of the two passes current used (i.e., one pass to loop through the hits in a HitSet and extract the relevant columns, and another to compute the result). This should be optimized and we should be loading pointers to strings, not strings.
Secondly, optionally constructing from a HitSet will allow us to quickly compute results in one pass for many cases.
W/ v2.19, ran:
bedtools sample -n 10
This gives a seg fault.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.