Coder Social home page Coder Social logo

luntergroup / octopus Goto Github PK

View Code? Open in Web Editor NEW
296.0 19.0 37.0 140.33 MB

Bayesian haplotype-based mutation calling

License: MIT License

C++ 90.88% C 6.70% CMake 0.88% Python 1.31% Dockerfile 0.01% Shell 0.02% JavaScript 0.14% CSS 0.05%
genomics variant-calling bioinformatics variants haplotypes somatic-variants de-novo-mutation phasing single-cell

octopus's Introduction

Octopus Logo

Website Build Status MIT license Gitter GitHub release (latest SemVer) Anaconda-Server Badge Docker Image Version (latest semver)

Octopus is a mapping-based variant caller that implements several calling models within a unified haplotype-aware framework. Octopus takes inspiration from particle filtering by constructing a tree of haplotypes and dynamically pruning and extending the tree based on haplotype posterior probabilities in a sequential manner. This allows octopus to implicitly consider all possible haplotypes at a given loci in reasonable time.

There are currently six calling models available:

  • individual: call germline variants in a single healthy individual.
  • population: jointly call germline variants in small cohorts.
  • trio: call germline and de novo mutations in a parent-offspring trio.
  • cancer: call germline and somatic mutations tumour samples.
  • polyclone: call variants in samples with an unknown mixture of haploid clones, such a bacteria or viral samples.
  • cell: call variants in a set of single cell samples from the same individual.

Octopus calls SNVs, small-medium sized indels, and small complex rearrangements in VCF 4.3.

Quick start

Install Octopus:

$ git clone https://github.com/luntergroup/octopus.git
$ octopus/scripts/install.py --dependencies --forests
$ echo 'export PATH='$(pwd)'/octopus/bin:$PATH' >> ~/.bash_profile
$ source ~/.bash_profile

Call some variants:

$ FOREST="$(pwd)/octopus/resources/forests/germline.v0.7.4.forest"
$ octopus -R hs37d5.fa -I NA12878.bam -T 1 to MT -o NA12878.octopus.vcf.gz --forest $FOREST --threads 8

Documentation

Documentation is hosted on GitHub pages.

Support

Please report any bugs or feature requests to the octopus issue tracker. General chat is hosted on Gitter.

Contributing

Contributions are very welcome! Please consult the contribution guidelines.

Authors

Daniel Cooke and Gerton Lunter

Citing

See publications associated with Octopus.

License

Octopus is distributed under the MIT LICENSE.

octopus's People

Contributors

alimanfoo avatar blmoore avatar bredelings avatar chapmanb avatar ctsa avatar gerton avatar gmagoon avatar jaredo avatar jbedo avatar justloggingin avatar nh13 avatar roryk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

octopus's Issues

Progress can be slow in repetitive regions

Progress can be very slow in repetitive regions with mipmapped reads as many candidates are proposed. Often these regions are eventually skipped as there are too many possible haplotypes. The problem is especially bad for somatic and de novo calling where hours can be spent in a small region (< 1Kb). There is currently some logic in place to help avoid spending too much time in these regions, namely candidate generation masking, and disabling haplotype lagging in regions with too many candidates, but clearly these measures are not sufficient. We need a new approach to help detect these regions with high specificity.

Calling with cancer module and -t increases runtime multifold

Hi

I am trying to call a single IonTorrent-derived BAM file (BWA aligned) with this command line

octopus -R /opt/reference/hg19.fasta -I IonCode_0101.g.bam -C cancer --threads 8

And this is the estimate

[2019-02-22 11:45:38] <INFO> ---------------------------------------------------------------------------------
[2019-02-22 11:45:38] <INFO>          current           |                   |     time      |     estimated
[2019-02-22 11:45:38] <INFO>          position          |     completed     |     taken     |     ttc
[2019-02-22 11:45:38] <INFO> ---------------------------------------------------------------------------------
[2019-02-22 11:46:04] <INFO>            chr1:40799130             1.3%              26s           33m 20s
[2019-02-22 11:46:04] <INFO>           chr1:109819616             3.5%              26s           12m 10s
[2019-02-22 11:46:04] <INFO>           chr1:249250621             7.9%              26s             5m 8s
[2019-02-22 11:46:08] <INFO>            chr6:40663730             9.2%              29s            4m 58s
[2019-02-22 11:46:08] <INFO>            chr6:79994408            10.5%              30s            4m 22s
[2019-02-22 11:46:08] <INFO>           chr6:171115067            13.4%              30s            3m 17s
...

for 3,137,161,264bp. If I add the -t option, and decrease bp length to 22,404, the estimated time (and performance) change quite radically

[2019-02-22 10:55:51] <INFO>      position     |     completed     |     taken     |     ttc
[2019-02-22 10:55:51] <INFO> ------------------------------------------------------------------------
[2019-02-22 11:01:23] <INFO>  chr13:32893491             1.5%           5m 32s            6h 29m
[2019-02-22 11:05:17] <INFO>  chr13:32918884             3.3%           9m 26s            4h 45m
[2019-02-22 11:06:56] <INFO>  chr13:32900497             4.5%           11m 5s             4h 6m
[2019-02-22 11:07:45] <INFO>  chr13:32936832             5.4%          11m 54s            3h 36m
[2019-02-22 11:07:45] <INFO>  chr13:32936858             5.5%          11m 54s            3h 32m
[2019-02-22 11:12:10] <INFO>  chr13:32905239             6.5%          16m 19s             4h 5m
...

Any idea of what might be going on?

Thanks

N+1 calling (or gVCF calling)

Is the tool optimized to perform variant calling (Exome or Whole Genome) on a large cohort of samples (>500) using N+1 method? Is it possible to have outputs in gVCF format?

LD_LIBRARY_PATH needs to be set

After installation completed successfully, I tried running the binary

./bin/octopus

And got this error:

./bin/octopus: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory

even though during installation it had said:

-- Found HTSlib 
--    HTSlib include dirs: /usr/local/include/htslib;/usr/include
--    HTSlib libraries: /usr/local/lib/libhts.so;/usr/lib/x86_64-linux-gnu/libz.so

So I solved this by:

export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
./bin/octopus

and then it ran fine. Am happy to keep using it this way, but thought I'd point this out as your installer seems to look for libhts in other locations but then doesn't have a way of storing the path where it was found perhaps?

octopus appears to hang

Hello! I have been trying to run octopus (Ubuntu 16.04) on a small test example. I sampled ~100kb either side of an interesting gene on each autosome from NA12878 to construct a small test bam. The tool appears to start calling variants but then stop, CPU usage remains at 100% (and there is plenty of memory available). Also ran into the same issue with 30X WGS bams.

The mini-bam (hg19) is available here: https://dl.dropboxusercontent.com/u/15887058/NA12878.bam

$ software/octopus/bin/octopus  -I NA12878.bam -R genome.fa -o test.vcf.gz
[2016-10-14 09:38:37] <INFO> ------------------------------------------------------------------------
[2016-10-14 09:38:37] <INFO> octopus v0.1 alpha
[2016-10-14 09:38:37] <INFO> Copyright (c) 2016 University of Oxford
[2016-10-14 09:38:37] <INFO> ------------------------------------------------------------------------
Not a BGZF file: test.vcf.gz
[2016-10-14 09:38:37] <INFO> Done initialising calling components in 184ms
[2016-10-14 09:38:37] <INFO> Invoked calling model: individual
[2016-10-14 09:38:37] <INFO> Detected 1 sample: "NA12878"
[2016-10-14 09:38:37] <INFO> Writing calls to "test.vcf.gz"
[2016-10-14 09:38:37] <INFO> ------------------------------------------------------------------------
[2016-10-14 09:38:37] <INFO>      current      |                   |     time      |     estimated
[2016-10-14 09:38:37] <INFO>      position     |     completed     |     taken     |     ttc
[2016-10-14 09:38:37] <INFO> ------------------------------------------------------------------------
[2016-10-14 09:38:42] <INFO>  chr1:197001111             6.4%              5s           1.33m

it has been stuck in this state for ~6 hours.

Missing homozygous variant call

Describe the bug
Octopus appears to be missing a simple, fairly high coverage homozygous SNV call

Command
Command line to run octopus:

$ octopus  -R human_g1k_v37_decoy_phiXAdaptr.fasta -I region.bam -o region_germline_simple.vcf

Desktop (please complete the following information):

  • OS: CentOS
  • Version 7.4.1708
  • GRCh37 (with decoys and phiX adapter sequence)

Additional context
In testing against GIAB samples I came across a few false negative variants that appear readily in IGV, for instance there's a fairly clear homozygous C>G SNV at position 1:3318769 (obvious in attached screenshot). However, the variant does not appear in the VCF output. Other variants in the region are detected correctly. I'm probably missing something obvious here...
Happy to provide a small .bam file of the region as well - not sure how to include it here (it's about 100kb)

missed_snv

Any chance of updating the bioconda recipe to 0.6.3-beta?

Describe the bug
The conda recipe is four versions behind the current release, but is the easiest way to get octopus installed and running along side other bioinformatics tools. Any chance someone could update the recipe, and perhaps add that to the checklist for future releases?

I did start updating the recipe myself, but was a) unsure about specifying some of the new dependencies and b) wasn't entirely sure how to deal with the patch file since that has hardcoded paths that involve the version number. Sorry.

Adding AD Field in VCF _and_ info about VAF_CR field

@dancooke
possible label for that issue:Question

Hi Daniel,

In the 0.4.1-alpha Octopus' VCF, there are several fields listed in the vcf header for FILTER, INFO and FORMAT columns.
I am interested more specifically at somatic calls, and I was looking for the AD (Allele Depth) but could not find it.
I am asking because I would like to calculate the VAF (Variant Allele Frequency, aka Allele Ratio) both in my normal and tumor samples.

Questions

  • Would it be possible to add that AD field?
  • Is the Field VAF_CR an equivalent to the regular AF field but instead of providing one value, it gives a range of expected AF for the ALT allele?
  • Could you give us more information about what the VAF_CR field represents exactly.

Thanks,
Best,
Chris

Compilation dies at 100%

Took me a while to get it working, but eventually got this to compile on CentOS 7.5:

export HTSLIB_ROOT=/config/binaries/htslib/1.9; ./scripts/install.py --prefix=/config/binaries/octopus/0.5.2 -c /config/binaries/gcc/8.2/bin/gcc -cxx /config/binaries/gcc/8.2/bin/g++ --clean --boost /config/binaries/boost/1.68.0/

[100%] Linking CXX executable octopus
/tmp/ccJ9HFwD.ltrans2.ltrans.o: In function `boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std:
:__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<ch
ar> > const&, int) [clone .constprop.1896]':
<artificial>:(.text+0x15d3e): undefined reference to `boost::program_options::validation_error::get_template[abi:cxx11](boost::program_options::validation_err
or::kind_t)'
<artificial>:(.text+0x15d52): undefined reference to `boost::program_options::error_with_option_name::error_with_option_name(std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::b
asic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)'
/tmp/ccJ9HFwD.ltrans2.ltrans.o:(.rodata+0x540): undefined reference to `boost::program_options::error_with_option_name::substitute_placeholders(std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/tmp/ccJ9HFwD.ltrans2.ltrans.o:(.rodata+0xbc0): undefined reference to `boost::program_options::error_with_option_name::substitute_placeholders(std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/tmp/ccJ9HFwD.ltrans2.ltrans.o:(.rodata+0x1618): undefined reference to `boost::program_options::error_with_option_name::substitute_placeholders(std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/tmp/ccJ9HFwD.ltrans2.ltrans.o:(.rodata+0x2918): undefined reference to `boost::program_options::error_with_option_name::substitute_placeholders(std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/tmp/ccJ9HFwD.ltrans3.ltrans.o: In function `void boost::program_options::validate<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<cha
r> >, char>(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_strin
g<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, int) [clone .constprop.1659]':
<artificial>:(.text+0xd8e6): undefined reference to `boost::program_options::validate(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_trai
ts<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::bas
ic_string<char, std::char_traits<char>, std::allocator<char> >*, int)'
/tmp/ccJ9HFwD.ltrans3.ltrans.o: In function `void boost::program_options::validate<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<cha
r> >, char>(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_strin
g<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, int) [clone .constprop.1659] [clone .cold.12]':
<artificial>:(.text.unlikely+0x306): undefined reference to `boost::program_options::invalid_option_value::invalid_option_value(std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans3.ltrans.o: In function `void boost::program_options::validate<int, char>(boost::any&, std::vector<std::__cxx11::basic_string<char, std::c
har_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int*, lon
g) [clone .constprop.1656] [clone .cold.21]':
<artificial>:(.text.unlikely+0x77f): undefined reference to `boost::program_options::invalid_option_value::invalid_option_value(std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans3.ltrans.o: In function `void boost::program_options::validate<octopus::MemoryFootprint, char>(boost::any&, std::vector<std::__cxx11::basi
c_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
> > const&, octopus::MemoryFootprint*, long) [clone .constprop.1631] [clone .cold.22]':
<artificial>:(.text.unlikely+0x878): undefined reference to `boost::program_options::invalid_option_value::invalid_option_value(std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans3.ltrans.o: In function `void boost::program_options::validate<float, char>(boost::any&, std::vector<std::__cxx11::basic_string<char, std:
:char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, float*,
 long) [clone .constprop.1625] [clone .cold.23]':
<artificial>:(.text.unlikely+0x999): undefined reference to `boost::program_options::invalid_option_value::invalid_option_value(std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans3.ltrans.o: In function `void boost::program_options::validate<boost::filesystem::path, char>(boost::any&, std::vector<std::__cxx11::basic
_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >
 > const&, boost::filesystem::path*, long) [clone .constprop.1621] [clone .cold.24]':
<artificial>:(.text.unlikely+0xa92): undefined reference to `boost::program_options::invalid_option_value::invalid_option_value(std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans3.ltrans.o:<artificial>:(.text.unlikely+0xc38): more undefined references to `boost::program_options::invalid_option_value::invalid_option
_value(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)' follow
/tmp/ccJ9HFwD.ltrans3.ltrans.o:(.rodata+0x70c10): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std:
:vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans3.ltrans.o:(.rodata+0x70ca8): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std:
:vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans3.ltrans.o:(.rodata+0x70d40): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std:
:vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans3.ltrans.o:(.rodata+0x70dd8): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std:
:vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans3.ltrans.o:(.rodata+0x70e70): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std:
:vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans3.ltrans.o:(.rodata+0x70f08): more undefined references to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&
, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> > > > const&, bool) const' follow
/tmp/ccJ9HFwD.ltrans29.ltrans.o: In function `void boost::log::v2_mt_posix::type_dispatcher::callback_base::trampoline<boost::log::v2_mt_posix::binder1st<boos
t::log::v2_mt_posix::output_fun, boost::log::v2_mt_posix::expressions::aux::stream_ref<boost::log::v2_mt_posix::basic_formatting_ostream<char, std::char_trait
s<char>, std::allocator<char> > >&>, std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > >(void*, std::__cxx11::basic_str
ing<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&)':
<artificial>:(.text+0x3046): undefined reference to `boost::log::v2_mt_posix::aux::code_convert_impl(wchar_t const*, unsigned long, std::__cxx11::basic_string
<char, std::char_traits<char>, std::allocator<char> >&, unsigned long, std::locale const&)'
<artificial>:(.text+0x3085): undefined reference to `boost::log::v2_mt_posix::aux::code_convert_impl(wchar_t const*, unsigned long, std::__cxx11::basic_string
<char, std::char_traits<char>, std::allocator<char> >&, unsigned long, std::locale const&)'
<artificial>:(.text+0x3154): undefined reference to `boost::log::v2_mt_posix::aux::code_convert_impl(wchar_t const*, unsigned long, std::__cxx11::basic_string
<char, std::char_traits<char>, std::allocator<char> >&, unsigned long, std::locale const&)'
/tmp/ccJ9HFwD.ltrans29.ltrans.o: In function `boost::recursive_mutex::recursive_mutex()':
<artificial>:(.text+0x1462a): undefined reference to `boost::system::detail::generic_category_instance'
<artificial>:(.text+0x1467a): undefined reference to `boost::system::detail::generic_category_instance'
<artificial>:(.text+0x146cb): undefined reference to `boost::system::detail::generic_category_instance'
/tmp/ccJ9HFwD.ltrans29.ltrans.o: In function `boost::log::v2_mt_posix::sinks::synchronous_sink<boost::log::v2_mt_posix::sinks::text_file_backend>::try_consume
(boost::log::v2_mt_posix::record_view const&)':
<artificial>:(.text+0x21009): undefined reference to `boost::log::v2_mt_posix::sinks::text_file_backend::consume(boost::log::v2_mt_posix::record_view const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans29.ltrans.o: In function `boost::log::v2_mt_posix::sinks::synchronous_sink<boost::log::v2_mt_posix::sinks::text_file_backend>::consume(boo
st::log::v2_mt_posix::record_view const&)':
<artificial>:(.text+0x21221): undefined reference to `boost::log::v2_mt_posix::sinks::text_file_backend::consume(boost::log::v2_mt_posix::record_view const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<boost::filesystem::path, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x172ea): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<bool, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x1756a): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<int, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x177ea): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<octopus::MemoryFootprint, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x17a6a): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char>::name() const':
<artificial>:(.text+0x17cea): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans30.ltrans.o:<artificial>:(.text+0x17f6a): more undefined references to `boost::program_options::arg[abi:cxx11]' follow
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<bool, char>::xparse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const':
<artificial>:(.text+0x57c9): undefined reference to `boost::program_options::validate(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool*, int)'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char>::xparse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const':
<artificial>:(.text+0x58e9): undefined reference to `boost::program_options::validate(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int)'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<double, char>::xparse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const [clone .cold.66]':
<artificial>:(.text.unlikely+0xf8d): undefined reference to `boost::program_options::invalid_option_value::invalid_option_value(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans30.ltrans.o: In function `boost::program_options::typed_value<std::vector<int, std::allocator<int> >, char>::xparse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const [clone .cold.79]':
<artificial>:(.text.unlikely+0x18ff): undefined reference to `boost::program_options::invalid_option_value::invalid_option_value(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans30.ltrans.o:(.rodata+0xea8): undefined reference to `boost::program_options::error_with_option_name::substitute_placeholders(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/tmp/ccJ9HFwD.ltrans30.ltrans.o:(.rodata+0xf40): undefined reference to `boost::program_options::error_with_option_name::substitute_placeholders(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::system::error_category::std_category::equivalent(std::error_code const&, int) const':
<artificial>:(.text+0xc47e): undefined reference to `boost::system::detail::generic_category_instance'
<artificial>:(.text+0xc4c5): undefined reference to `boost::system::detail::generic_category_instance'
<artificial>:(.text+0xc505): undefined reference to `boost::system::detail::generic_category_instance'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::system::error_category::std_category::equivalent(int, std::error_condition const&) const':
<artificial>:(.text+0xc54d): undefined reference to `boost::system::detail::generic_category_instance'
<artificial>:(.text+0xc595): undefined reference to `boost::system::detail::generic_category_instance'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::program_options::basic_command_line_parser<char>::basic_command_line_parser(int, char const* const*)':
<artificial>:(.text+0x12c8d): undefined reference to `boost::program_options::to_internal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
<artificial>:(.text+0x12cd0): undefined reference to `boost::program_options::detail::cmdline::cmdline(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `octopus::options::conflicting_options(boost::program_options::variables_map const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)':
<artificial>:(.text+0x13614): undefined reference to `boost::program_options::abstract_variables_map::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
<artificial>:(.text+0x13645): undefined reference to `boost::program_options::abstract_variables_map::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::program_options::typed_value<std::vector<int, std::allocator<int> >, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x14a1a): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::program_options::typed_value<octopus::options::ExtensionLevel, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x14c9a): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::program_options::typed_value<std::vector<octopus::options::ContigPloidy, std::allocator<octopus::options::ContigPloidy> >, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x14f1a): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::program_options::typed_value<octopus::options::RefCallType, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x1519a): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::program_options::typed_value<float, char>::name[abi:cxx11]() const':
<artificial>:(.text+0x1541a): undefined reference to `boost::program_options::arg[abi:cxx11]'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:<artificial>:(.text+0x1569a): more undefined references to `boost::program_options::arg[abi:cxx11]' follow
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::log::v2_mt_posix::sinks::synchronous_sink<boost::log::v2_mt_posix::sinks::basic_text_ostream_backend<char> >::try_consume(boost::log::v2_mt_posix::record_view const&)':
<artificial>:(.text+0x19139): undefined reference to `boost::log::v2_mt_posix::sinks::basic_text_ostream_backend<char>::consume(boost::log::v2_mt_posix::record_view const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `boost::log::v2_mt_posix::sinks::synchronous_sink<boost::log::v2_mt_posix::sinks::basic_text_ostream_backend<char> >::consume(boost::log::v2_mt_posix::record_view const&)':
<artificial>:(.text+0x19351): undefined reference to `boost::log::v2_mt_posix::sinks::basic_text_ostream_backend<char>::consume(boost::log::v2_mt_posix::record_view const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `octopus::options::run(boost::program_options::basic_command_line_parser<char>&) [clone .cold.74]':
<artificial>:(.text.unlikely+0x19c5): undefined reference to `boost::program_options::error_with_option_name::get_canonical_option_name[abi:cxx11]() const'
<artificial>:(.text.unlikely+0x1a52): undefined reference to `boost::program_options::error_with_option_name::get_canonical_option_name[abi:cxx11]() const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `octopus::options::parse_config_file(boost::filesystem::path const&, boost::program_options::variables_map&, boost::program_options::options_description const&) [clone .cold.76]':
<artificial>:(.text.unlikely+0x2462): undefined reference to `boost::program_options::error_with_option_name::get_canonical_option_name[abi:cxx11]() const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:(.rodata+0x2240): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:(.rodata+0x22d8): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:(.rodata+0x2370): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:(.rodata+0x2408): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:(.rodata+0x24a0): undefined reference to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool) const'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:(.rodata+0x2538): more undefined references to `boost::program_options::value_semantic_codecvt_helper<char>::parse(boost::any&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool) const' follow
/tmp/ccJ9HFwD.ltrans31.ltrans.o: In function `main':
<artificial>:(.text.startup+0x1db): undefined reference to `boost::program_options::options_description::options_description(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int)'
<artificial>:(.text.startup+0x383): undefined reference to `boost::program_options::options_description::options_description(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int)'
<artificial>:(.text.startup+0x5c1): undefined reference to `boost::program_options::options_description::options_description(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int)'
<artificial>:(.text.startup+0xb17): undefined reference to `boost::program_options::options_description::options_description(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int)'
<artificial>:(.text.startup+0xd16): undefined reference to `boost::program_options::options_description::options_description(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int)'
/tmp/ccJ9HFwD.ltrans31.ltrans.o:<artificial>:(.text.startup+0x119c): more undefined references to `boost::program_options::options_description::options_description(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int)' follow
collect2: error: ld returned 1 exit status
make[2]: *** [src/octopus] Error 1
make[1]: *** [src/CMakeFiles/octopus.dir/all] Error 2
make: *** [all] Error 2

Realigned BAM not sorted

@dancooke

Hi Dan,

After successfully installing version 0.5.0-Beta (see my after-closed issue reply in #30), I have run the tool in the "tumor-normal somatic" mode with the --bamout option added to the command line.
Two bam files got generated but only one has been indexed, the normal bam.

I tried manually to indexed the Tumor BAM file, but failed because the BAM was not sorted properly. I sorted it manually and then successfully indexed it.

Question: Would it be possible to have Octopus automatically sorting and indexing the Tumor Realigned BAM file as well or any other realigned bam files?

Thanks,
Chris

installation test requires boost 1.61 not boost 1.58

Hi,

I would like to try Octopus. Installation seems to be so far ok, but the testing did not work:
test/install.py

--> required boost version 1.61 (1.58 is installed and the latest version to get via apt)
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- system
-- filesystem
-- program_options
-- date_time
-- log_setup
-- log
-- iostreams
-- timer
-- thread
-- regex
-- chrono
-- atomic
-- Found HTSlib
-- HTSlib include dirs: /usr/local/include/htslib;/usr/include
-- HTSlib libraries: /usr/local/lib/libhts.so;/usr/lib/x86_64-linux-gnu/libz.so
CMake Error at /usr/share/cmake-3.5/Modules/FindBoost.cmake:1677 (message):
Unable to find the requested Boost libraries.

Boost version: 1.58.0

Boost include path: /usr/include

Detected version of Boost is too old. Requested version was 1.61 (or
newer).
Call Stack (most recent call first):
test/unit/CMakeLists.txt:61 (find_package)

-- Configuring incomplete, errors occurred!
See also "/usr/local/octopus/build/CMakeFiles/CMakeOutput.log".
See also "/usr/local/octopus/build/CMakeFiles/CMakeError.log".

Greets,
Anke

Variant filtering is not multithreaded

Currently, variant call filtering can only use a single thread which can result in long runtimes. It should be fairly easy to support multithreaded threshold based filtering, however, we need to be aware that once more sophisticated filtering approaches have been developed (e.g. machine learning methods) filtering will not be map-reducible as the calls can no longer be filtered independently.

Tumour-only calling is slow

Tumour calling is unreasonably slow in default settings (without using --fast or --very-fast). This is mostly due time taken during Variational Bayes model inference. This part of the code needs optimising.

"illegal hardware instruction" when running Octopus installed via Conda

Describe the bug
Following official instructions I succesfully installed Octopus with conda.

$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash ./Miniconda3-latest-Linux-x86_64.sh -b -p venv
$./venv/bin/conda install -c conda-forge -c bioconda octopus  

I get error when trying to run Octopus.

Command
Command line to run octopus:

$ venv/bin/octopus -h
[1]    10679 illegal hardware instruction (core dumped)  venv/bin/octopus -h

Desktop (please complete the following information):

  • OS: [Ubuntu 14.04]
  • Version [latest, as of 6 September 2018]
  • Reference [N/A]

Additional context
Not sure if it helps:

  • Exactly the same behavior is observed when installing Octopus with another installation of Conda.
  • In Nautilus the octopus file is recognized as "shared library (application/x-sharedlib)"

Cannot parse regions that have colons in contig name

Some of the new extended hg38 HLA contigs cannot be parsed by octopus as they include colons (e.g. HLA-C*06:02:01:01). This causes errors when referring to these contigs with the --regions or --skip-region options which expect regions in the format contig[:begin][[-]end].

REF contains IUPAC ambiguity symbols

If the reference genome contains IUPAC ambiguity symbols then these will appear in the REF column of the VCF output, but this is not permitted by the VCF specification. From VCF 4.3:

If the reference sequence contains IUPAC ambiguity codes not allowed by this specification (such as R = A/G),
the ambiguous reference base must be reduced to a concrete base by using the one that is first alphabetically
(thus R as a reference base is converted to A in VCF.)

libhts.so.2 not found

I compiled the latest octopus and got this error:
octopus: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory
I followed the instructions in the README.md but something seems to be broken.
To be more precise, I'm trying to install octopus in a docker image using Ubuntu 16.04 as a base image.

Here are the exact lines of code that I used inside of an Ubuntu 16.04 container:
apt-get update && apt-get -y install clang-3.8 libstdc++6 libboost-all-dev cmake wget git-all zlib1g-dev python3 autoconf
git clone https://github.com/samtools/htslib.git && cd htslib && autoheader && autoconf && ./configure && make -j4 && make install && cd /
git clone https://github.com/luntergroup/octopus.git
cd /octopus/build && cmake -D CMAKE_C_COMPILER=clang-3.8 -D CMAKE_CXX_COMPILER=clang++-3.8 -DINSTALL_ROOT=ON .. && make install
I have also tried using python to install octopus, but I got the same error message.

What could be the problem?
/Oskar

Random Forest files not accessible in Google Cloud

Describe the bug
The Wiki describes the availability of trained random forest models for variant filtering and a link is provided to the Google Cloud where these reside. However, these objects are not public and cannot be obtained.

Command
For example:

gsutil cp gs://luntergroup/octopus/forests/germline.forest octopus
ServiceException: 401 Anonymous caller does not have storage.objects.get access to luntergroup/octopus/forests/germline.forest.

Desktop (please complete the following information):

  • OS: Ubuntu 14.04.5 LTS
  • version: v0.5.2-beta
  • Reference [e.g. hg19]: hs37d5

Additional context
I installed octopus in Lunic using conda, and can't use the Python install script (doesn't work)

Installation issue

Hello, I am having trouble installing octopus and would really appreciate any help.

I use the following versions:

  • htslib 1.9-118-g2da4c7d
  • boost 1.65
  • gcc version 6.5.0
  • python 3.4.6

Run the following command with "/path/to/" used for simplicity:

python3 install.py -cxx /path/to/gcc-6.5.0/bin/g++ -c /path/to/gcc-6.5.0/bin/gcc --boost /path/to/boost_1_65_1 --htslib /path/to/htslib

And receive the following error:

anonymous}::overlap_range(std::vector<octopus::ReadPileup>&, const octopus::AlignedRead&)':
octopus/src/basics/read_pileup.cpp:116:12: error: use of 'auto octopus::{anonymous}::overlap_range(std::vector<octopus::ReadPileup>&, const octopus::AlignedRead&)' before deduction of 'auto'
     return overlap_range(std::begin(pileups), std::end(pileups), contig_region(read), BidirectionallySortedT
ag {});
            ^~~~~~~~~~~~~
octopus/src/basics/read_pileup.cpp: In function 'octopus::ReadPi
leups octopus::make_pileups(const ReadContainer&, const octopus::GenomicRegion&)':
octopus/src/basics/read_pileup.cpp:129:61: error: forming reference to void
         for (ReadPileup& pileup : overlap_range(result, read)) {```
                                                             ^

octopus-0.5.2-beta does not compile, Centos 6.9

Greetings. We can't get this build process to work. Version octopus-0.5.2-beta. At milestone "[ 26%]", the output includes many of these errors:

/usr/local/boost_1_68_0/include/boost/container/detail/flat_tree.hpp:924:25: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
value_type &val = *static_cast<value_type *>(static_cast<void *>(v.data));

The environment includes source builds gcc-6.4.0, binutils 2.29.1, htslib 1.8, python 3.6.1.
The build call resembled: ./scripts/install.py --cxx_compiler /usr/local/gcc-6.4.0/bin/gcc --prefix /usr/local/octopus-0.5.2-beta/bin

Do you have any leads?

Early crash in multithreaded mode

Describe the bug
The program fails early when run in multithreaded mode (both for debug and release builds). Note that the input BAM file only has reads mapped to chromosome 1. Stack trace of the crashed thread below for the debug build.

When changing the --thread argument to --thread 1, the problem does not occur.

* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x4)
  * frame #0: 0x0000000102c9ee1c libhts.2.dylib`bgzf_read_block + 1108
    frame #1: 0x0000000102c9f973 libhts.2.dylib`bgzf_read + 69
    frame #2: 0x0000000102cb5f9b libhts.2.dylib`bam_read1 + 51
    frame #3: 0x0000000102cb6e20 libhts.2.dylib`bam_readrec + 34
    frame #4: 0x0000000102cafdf1 libhts.2.dylib`hts_itr_next + 243
    frame #5: 0x0000000100306c21 octopus_DEBUG`octopus::io::HtslibSamFacade::HtslibIterator::operator++() + 161
    frame #6: 0x000000010030a5a0 octopus_DEBUG`octopus::io::HtslibSamFacade::extract_read_positions(octopus::GenomicRegion const&, unsigned long) const + 272
    frame #7: 0x000000010030b122 octopus_DEBUG`octopus::io::HtslibSamFacade::extract_read_positions(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, octopus::GenomicRegion const&, unsigned long) const + 370
    frame #8: 0x000000010030bcc6 octopus_DEBUG`octopus::io::HtslibSamFacade::extract_read_positions(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, octopus::GenomicRegion const&, unsigned long) const + 390
    frame #9: 0x000000010036f232 octopus_DEBUG`octopus::io::ReadReader::extract_read_positions(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, octopus::GenomicRegion const&, unsigned long) const + 146
    frame #10: 0x0000000100337852 octopus_DEBUG`octopus::io::ReadManager::find_covered_subregion(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, octopus::GenomicRegion const&, unsigned long) const + 898
    frame #11: 0x0000000101043f14 octopus_DEBUG`octopus::(anonymous namespace)::find_max_window(octopus::ContigCallingComponents const&, octopus::GenomicRegion const&) + 148
    frame #12: 0x0000000101043940 octopus_DEBUG`octopus::(anonymous namespace)::propose_call_subregion(octopus::ContigCallingComponents const&, octopus::GenomicRegion const&, boost::optional<unsigned int>) + 80
    frame #13: 0x0000000101042ca9 octopus_DEBUG`octopus::(anonymous namespace)::make_region_tasks(octopus::GenomicRegion const&, octopus::ContigCallingComponents const&, octopus::ExecutionPolicy, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > >&, octopus::(anonymous namespace)::TaskMakerSyncPacket&, bool, bool) + 201
    frame #14: 0x00000001010429fe octopus_DEBUG`octopus::(anonymous namespace)::make_contig_tasks(octopus::ContigCallingComponents const&, octopus::ExecutionPolicy, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > >&, octopus::(anonymous namespace)::TaskMakerSyncPacket&, bool) + 1022
    frame #15: 0x0000000101040914 octopus_DEBUG`octopus::(anonymous namespace)::make_tasks_helper(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > >, octopus::(anonymous namespace)::ContigOrder, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > > > > >&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, octopus::GenomeCallingComponents&, unsigned int, octopus::ExecutionPolicy, octopus::(anonymous namespace)::TaskMakerSyncPacket&) + 756
    frame #16: 0x000000010104c1bc octopus_DEBUG`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > >, octopus::(anonymous namespace)::ContigOrder, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > > > > >&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, octopus::GenomeCallingComponents&, unsigned int, octopus::ExecutionPolicy, octopus::(anonymous namespace)::TaskMakerSyncPacket&), std::__1::reference_wrapper<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > >, octopus::(anonymous namespace)::ContigOrder, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::queue<octopus::(anonymous namespace)::Task, std::__1::deque<octopus::(anonymous namespace)::Task, std::__1::allocator<octopus::(anonymous namespace)::Task> > > > > > >, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, std::__1::reference_wrapper<octopus::GenomeCallingComponents>, unsigned int, octopus::ExecutionPolicy, std::__1::reference_wrapper<octopus::(anonymous namespace)::TaskMakerSyncPacket> > >(void*) + 1580
    frame #17: 0x00007fff5a79a305 libsystem_pthread.dylib`_pthread_body + 126
    frame #18: 0x00007fff5a79d26f libsystem_pthread.dylib`_pthread_start + 70
    frame #19: 0x00007fff5a799415 libsystem_pthread.dylib`thread_start + 13

Command
Command line to run octopus:

$ octopus --reads input.bam --reference hg38.fa --assemble-all --full-bamout --split-bamout realigned -o calls.vcf --thread

Desktop (please complete the following information):

  • OS:
    macOS 10.14.3

  • Version
    commit 785bfa2
    Author: Daniel Cooke [email protected]
    Date: Sat Jan 26 22:35:05 2019 +0000

  • Reference
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

core dumped (bioconda)

Hi,

I have installed octopus using bioconda, but when I try to run it, I just get a core dump. Does anybody have an idea what could be wrong?

Thanks!

[michaelk@fe1 ~]$ conda create -n octopus octopus -y
Fetching package metadata ...................
Solving package specifications: .

Package plan for installation in environment /home/michaelk/anaconda3/envs/octopus:

The following NEW packages will be INSTALLED:

    bzip2:           1.0.6-1            conda-forge
    ca-certificates: 2018.4.16-0        conda-forge
    curl:            7.59.0-1           conda-forge
    htslib:          1.7-0              bioconda
    icu:             58.2-0             conda-forge
    krb5:            1.14.6-0           conda-forge
    libgcc:          7.2.0-h69d50b8_2
    libgcc-ng:       7.2.0-hdf63c60_3
    libssh2:         1.8.0-2            conda-forge
    libstdcxx-ng:    7.2.0-hdf63c60_3
    octopus:         0.3.3a-htslib1.7_1 bioconda
    openssl:         1.0.2o-0           conda-forge
    xz:              5.2.3-0            conda-forge
    zlib:            1.2.11-0           conda-forge

[michaelk@fe1 ~]$ source activate octopus
(octopus) [michaelk@fe1 ~]$ octopus
Illegal instruction (core dumped)

Random Forest file not accesible

Describe the bug
Despite the related issue created by delavefm, random forest files are not accessible from the cloud (from the link in the manual or from my command below)

Command
gsutil cp gs://luntergroup/octopus/forests/somatic.v0.6.2-beta.forest .
ServiceException: 401 Anonymous caller does not have storage.objects.get access to luntergroup/octopus/forests/somatic.v0.6.2-beta.forest.

Desktop :
OS: Ubuntu 18.10
version: v0.6.2-beta
Reference hg19

HTSLIB not found

I'm getting HTSlib not found errors.

We have a number of versions of HTSLIB installed in a non regular path. We have Environment Modules designed to load HTSLIB into the environment - I loaded that module:

LIBRARY_PATH=/config/binaries/htslib/1.9/include:/config/binaries/boost/1.68.0/lib
LD_LIBRARY_PATH=/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib:/opt/rh/devtoolset-7/root/usr/lib64/dyninst:/opt/rh/devtoolset-7/root/usr/lib/dyninst:/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib:/config/binaries/htslib/1.9/lib
PATH=/opt/rh/devtoolset-7/root/usr/bin:/config/binaries/htslib/1.9/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/puppetlabs/bin:/root/bin:/root/bin

despite this, I still get the error:

_LMFILES_=/config/modulefiles/htslib/1.9
LOADEDMODULES=htslib/1.9

HTSlib_INCLUDE_DIR=HTSlib_INCLUDE_DIR-NOTFOUND
HTSlib_LIBRARY=HTSlib_LIBRARY-NOTFOUND
CMake Error at build/cmake/modules/FindHTSlib.cmake:67 (message):
  Required library HTSlib NOT FOUND.

  Install the library (dev version) and try again.  If the library is already
  installed, use ccmake to set the missing variables manually.
Call Stack (most recent call first):
  build/cmake/modules/FindHTSlib.cmake:137 (libfind_process)
  src/CMakeLists.txt:659 (find_package)


-- Configuring incomplete, errors occurred!
See also "/config/source/octopus/build/CMakeFiles/CMakeOutput.log".
See also "/config/source/octopus/build/CMakeFiles/CMakeError.log".

So I created the asked for env vars:

root@vmpr-res-utils:/config/source/octopus#
==> env | grep HTS
HTSlib_INCLUDE_DIR=/config/binaries/htslib/1.9/include
HTSlib_LIBRARY=/config/binaries/htslib/1.9/lib

But I got the same error again.

(also worth noting in that first code snippet that I've also got the boost module loaded, but I still need to pass --boost to the install script)

Am I putting the wrong values into HTSlib_* or do I need to use a different environment var?

Python install script is non standard, documentation needs clarification

Describe the bug

I'm about to install. I thought I'd use the python3 method.

Reading the instructions, looking for --prefix, find:

By default this installs to /bin relative to where you installed octopus. To install to a root directory (e.g. /usr/local/bin) use:

$ ./scripts/install.py --root

Not 100% sure what this means? Should I run

$ ./scripts/install.py --root /path/to/desired/prefix

Or do I not actually get a choice and

$ ./scripts/install.py --root

is my only option, and it requires that I install it into /usr/local/bin ?

If it's the former, would be great if it could be changed to the standard --prefix nomenclature.

Regardless of which, the documentation could be clearer.

Server (please complete the following information):

  • OS: CentOS
  • Version 7.5.5

Reducing runtime by a scatter/gather approach ?

Hi,

Just a very quick and generic question.

Although it is not mentioned explicitly in the docs, I suppose that splitting the analysis into several genomic intervals and then merging VCFs is a good approach to reduce runtime or is there any reason you would not recommend that with your tool ?

Thanks,
Anthony

Octopus: Conda pre-built package aborts with assertion error

Describe the bug
Octopus - conda package throws an assertion error message

Command
Command line to run octopus:

$ /path/to/bin/octopus --reference /path/to/human_g1k_v37_decoy.fasta --reads /path/to/real.bam --output io-files/output.vcf --threads=4

Desktop (please complete the following information):

  • OS: Red Hat Enterprise Linux Server
  • Version: 7.2
  • Reference: hg19

Additional context
FYI, took a look at the Conda build (https://github.com/bioconda/bioconda-recipes/blob/e7805136f0152d62f9c9e932826a43194995fd25/recipes/octopus/build.sh) and seems like you have set that flag already.

Errors with tumor-only variant calling (-C option)

Hi,
I'm testing Octopus with tumor-only somatic variant calling, but I got the following error. Could you help
me to fix it? Thanks.

$ octopus -R GRCh38.d1.vd1.fa -I input.bam -C cancer -o ouput.vcf
[2018-02-23 20:54:45] ------------------------------------------------------------------------
[2018-02-23 20:54:45] octopus v0.3.3-alpha
[2018-02-23 20:54:45] Copyright (c) 2017 University of Oxford
[2018-02-23 20:54:45] ------------------------------------------------------------------------
[2018-02-23 20:54:47] Tumour only calling requested. Please note this feature is still under development and results and runtimes may be poor
[2018-02-23 20:54:49] Done initialising calling components in 4s
[2018-02-23 20:54:49] Detected 1 sample: "TCGA-FA-8693-01A-11D-2397-10"
[2018-02-23 20:54:49] Invoked calling model: cancer
[2018-02-23 20:54:49] Processing 3,107,576,521bp with a single thread
[2018-02-23 20:54:49] Writing filtered calls to "/mnt/data/DLBC37/Octopus/TCGA-FA-8693-01A-11D-2397-10_Illumina_gdc_realn.vcf"
[2018-02-23 20:54:49] -------------------------------------------------------------------------------------
[2018-02-23 20:54:49] current | | time | estimated
[2018-02-23 20:54:49] position | completed | taken | ttc
[2018-02-23 20:54:49] -------------------------------------------------------------------------------------
octopus: /opt/conda/conda-bld/octopus_1518006328171/work/octopus-0.3.3-alpha/src/core/tools/vargen/utils/assembler.cpp:1040: void octopus::coretools::Assembler::remove_all_nonreference_cycles(bool): Assertion `last_vertex_itr != std::cend(reference_vertices_)' failed.
Aborted (core dumped)

Assertion failed: (v.alt.size() > repeat.period), function complete_partial_alt_repeat

Describe the bug
The program aborts with the following backtrace when run on a particular input BAM (I haven't had a chance to track down the region that's causing the problem; it's around the start of chromosome 1).

Assertion failed: (v.alt.size() > repeat.period), function complete_partial_alt_repeat, file /Users/olivier/octopus/src/core/tools/vargen/local_reassembler.cpp, line 740.
Process 50566 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff5a6e623e libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff5a6e623e <+10>: jae    0x7fff5a6e6248            ; <+20>
    0x7fff5a6e6240 <+12>: movq   %rax, %rdi
    0x7fff5a6e6243 <+15>: jmp    0x7fff5a6e03b7            ; cerror_nocancel
    0x7fff5a6e6248 <+20>: retq   
Target 0: (octopus) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff5a6e623e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff5a79cc1c libsystem_pthread.dylib`pthread_kill + 285
    frame #2: 0x00007fff5a64f1c9 libsystem_c.dylib`abort + 127
    frame #3: 0x00007fff5a617868 libsystem_c.dylib`__assert_rtn + 320
    frame #4: 0x0000000100e93eb6 octopus`octopus::coretools::complete_partial_alt_repeat(octopus::coretools::Assembler::Variant&, octopus::coretools::Repeat const&) + 246
    frame #5: 0x0000000100e9592d octopus`octopus::coretools::try_to_split_repeats(octopus::coretools::Assembler::Variant&, std::__1::vector<octopus::coretools::Repeat, std::__1::allocator<octopus::coretools::Repeat> > const&) + 4397
    frame #6: 0x0000000100e987bd octopus`octopus::coretools::decompose_complex(octopus::coretools::Assembler::Variant, std::__1::vector<octopus::coretools::Repeat, std::__1::allocator<octopus::coretools::Repeat> > const&) + 45
    frame #7: 0x0000000100e99008 octopus`octopus::coretools::decompose(octopus::coretools::Assembler::Variant, std::__1::vector<octopus::coretools::Repeat, std::__1::allocator<octopus::coretools::Repeat> > const&) + 88
    frame #8: 0x0000000100ed4f24 octopus`auto octopus::coretools::decompose(std::__1::__deque_iterator<octopus::coretools::Assembler::Variant, octopus::coretools::Assembler::Variant*, octopus::coretools::Assembler::Variant&, octopus::coretools::Assembler::Variant**, long, 73l>, std::__1::__deque_iterator<octopus::coretools::Assembler::Variant, octopus::coretools::Assembler::Variant*, octopus::coretools::Assembler::Variant&, octopus::coretools::Assembler::Variant**, long, 73l>, std::__1::vector<octopus::coretools::Repeat, std::__1::allocator<octopus::coretools::Repeat> > const&)::$_9::operator()<octopus::coretools::Assembler::Variant>(octopus::coretools::Assembler::Variant&&) const + 68
    frame #9: 0x0000000100e99512 octopus`octopus::coretools::decompose(std::__1::__deque_iterator<octopus::coretools::Assembler::Variant, octopus::coretools::Assembler::Variant*, octopus::coretools::Assembler::Variant&, octopus::coretools::Assembler::Variant**, long, 73l>, std::__1::__deque_iterator<octopus::coretools::Assembler::Variant, octopus::coretools::Assembler::Variant*, octopus::coretools::Assembler::Variant&, octopus::coretools::Assembler::Variant**, long, 73l>, std::__1::vector<octopus::coretools::Repeat, std::__1::allocator<octopus::coretools::Repeat> > const&) + 1234
    frame #10: 0x0000000100e9e7f0 octopus`octopus::coretools::decompose(std::__1::deque<octopus::coretools::Assembler::Variant, std::__1::allocator<octopus::coretools::Assembler::Variant> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 384
    frame #11: 0x0000000100e896f5 octopus`octopus::coretools::LocalReassembler::try_assemble_region(octopus::coretools::Assembler&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, octopus::GenomicRegion const&, std::__1::deque<octopus::Variant, std::__1::allocator<octopus::Variant> >&) const + 1477
    frame #12: 0x0000000100e88595 octopus`octopus::coretools::LocalReassembler::assemble_bin(unsigned int, octopus::coretools::LocalReassembler::Bin const&, std::__1::deque<octopus::Variant, std::__1::allocator<octopus::Variant> >&) const + 341
    frame #13: 0x0000000100e862e6 octopus`octopus::coretools::LocalReassembler::try_assemble_with_fallbacks(octopus::coretools::LocalReassembler::Bin const&, std::__1::deque<octopus::Variant, std::__1::allocator<octopus::Variant> >&) const + 470
    frame #14: 0x0000000100e82f6c octopus`octopus::coretools::LocalReassembler::do_generate(std::__1::vector<octopus::GenomicRegion, std::__1::allocator<octopus::GenomicRegion> > const&) const + 5292
    frame #15: 0x0000000100eee99a octopus`octopus::coretools::VariantGenerator::generate(octopus::GenomicRegion const&) const + 938
    frame #16: 0x000000010059204d octopus`octopus::Caller::generate_candidate_variants(octopus::GenomicRegion const&) const + 189
    frame #17: 0x00000001005913e5 octopus`octopus::Caller::call(octopus::GenomicRegion const&, octopus::ProgressMeter&) const + 2901
    frame #18: 0x000000010108cd56 octopus`octopus::(anonymous namespace)::run_octopus_on_contig(octopus::ContigCallingComponents&&) + 1478
    frame #19: 0x0000000101031498 octopus`octopus::(anonymous namespace)::run_octopus_single_threaded(octopus::GenomeCallingComponents&) + 424
    frame #20: 0x000000010102e602 octopus`octopus::run_calling(octopus::GenomeCallingComponents&) + 146
    frame #21: 0x0000000101033c13 octopus`octopus::run_variant_calling(octopus::GenomeCallingComponents&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 195
    frame #22: 0x0000000101039153 octopus`octopus::run_octopus(octopus::GenomeCallingComponents&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 179
    frame #23: 0x0000000100080a68 octopus`main + 824
    frame #24: 0x00007fff5a5a6ed9 libdyld.dylib`start + 1

Command
Command line to run octopus:

$ octopus run --reads input.bam --reference hg38.fa --assemble-all --full-bamout realigned -o calls.vcf 

Desktop (please complete the following information):

  • OS:
    macOS 10.14.3

  • Version
    commit 785bfa2
    Author: Daniel Cooke [email protected]
    Date: Sat Jan 26 22:35:05 2019 +0000

  • Reference
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

Illegal instruction (core dumped)

Dear Sr:
When I try to start the program I get Illegal instruction (core dumped) message.

Command line to run octopus:

cri@cri-To-be-filled-by-O-E-M: venv/bin/conda install -c conda-forge -c bioconda octopus
Solving environment: done

# All requested packages already installed.

cri@cri-To-be-filled-by-O-E-M: venv/bin/octopus -h
Illegal instruction (core dumped)
  • OS: [ Ubuntu 16.04 LTS]

Thanks for the help.

Octopus hangs after calling variants

I ran octopus for calling somatic variants with a normal/tumour pair and the variant calling appears to have finished, outputting the following line to stdout:

[2016-11-08 17:11:24] <INFO>                           -             100%           7.85h               -

Unfortunately the process has not exited and the final output file (as specified with -o flag) is still empty. It has been 12 hours since that last output line was printed so I doubt it will ever terminate. I attached to the process with GDB and obtained the following trace:

#0  0x00007fddcbd64f2c in hts_itr_query () from /nix/store/88syq7cidvhmgd54drajabja2qhjvcdv-htslib-1.3.2/lib/libhts.so.1
#1  0x00007fddcbd7340a in _reader_seek () from /nix/store/88syq7cidvhmgd54drajabja2qhjvcdv-htslib-1.3.2/lib/libhts.so.1
#2  0x00007fddcbd77853 in _reader_next_line () from /nix/store/88syq7cidvhmgd54drajabja2qhjvcdv-htslib-1.3.2/lib/libhts.so.1
#3  0x0000000000526ef8 in octopus::HtslibBcfFacade::count_records(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
#4  0x00000000005487f5 in octopus::VcfReader::count_records(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
#5  0x000000000055f39b in octopus::get_contig_count_map(std::vector<octopus::VcfReader, std::allocator<octopus::VcfReader> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
#6  0x0000000000560bc9 in octopus::merge(std::vector<octopus::VcfReader, std::allocator<octopus::VcfReader> > const&, octopus::VcfWriter&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
#7  0x000000000074226f in octopus::(anonymous namespace)::run_octopus_multi_threaded(octopus::GenomeCallingComponents&) ()
#8  0x000000000073badb in octopus::run_octopus(octopus::GenomeCallingComponents&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#9  0x00000000004736b4 in main ()`

Peaking in the temp directory seems to indicate the BCF files haven't been touched since that last message was printed to stdout. I will leave the process live for the moment in case you want more debugging output from GDB. Meanwhile, can I merge the temporary BCF files to obtain my variants?

program error on calling with hs38DH.fa

Just wanted to make note of an issue I encountered when calling with the entire hs38DH.fa (probably not a good idea, anyway). This is with trio calling and the --threads option. I suspect this may be related to the asterisk or colons in the HLA contig name.

[2018-12-11 23:56:22] <INFO>  chr19_GL949752v1_alt:987100            98.8%            1w 6d            3h 33m
[2018-12-12 00:07:43] <INFO>  chrUn_KN707874v1_decoy:2944            98.9%            1w 6d            3h 13m
[2018-12-12 00:13:29] <WARN> Skipping region chrUn_GL000214v1:117994-118601 as there are too many haplotypes
[2018-12-12 03:05:07] <INFO>      chrUn_KI270751v1:142912            99.0%            1w 6d            2h 55m
[2018-12-12 08:35:18] <INFO>               chr20:30136134            99.1%            1w 6d            2h 38m
[2018-12-13 04:14:52] <INFO>                            -             100%               2w                 -
[2018-12-13 04:23:25] <INFO> Removed 6734 temporary files
[2018-12-13 04:23:26] <EROR> A program error has occurred:
[2018-12-13 04:23:26] <EROR> 
[2018-12-13 04:23:26] <EROR>     Encountered an exception during calling 'failed to load contig
[2018-12-13 04:23:26] <EROR>     HLA-A*01:01:01:01'. This means there is a bug and your results are
[2018-12-13 04:23:26] <EROR>     untrustworthy.
[2018-12-13 04:23:26] <EROR> 
[2018-12-13 04:23:26] <EROR> To help resolve this error run in debug mode and send the log file to
[2018-12-13 04:23:26] <EROR> https://github.com/luntergroup/octopus/issues.
[2018-12-13 04:23:26] <INFO> ------------------------------------------------------------------------

Going over OS open file limit is not handled nicely

When using multithreaded mode, octopus keeps temporary files open for every contig. For some reference genomes (e.g. hg38) this can easily cause the number of open files to go over default maximum OS defined open file limit (ulimit -n). When this occurs, octopus tends to fail in a not very elegant way, and doesn't report why the error occurred.

Missing requirements

I needed the following in addition to what's listed in requirements to compile:

  • bzip2
  • lzma
  • curl

Joint calling is not fully supported

Currently, the population calling model uses an independence based model, so samples are not jointly genotyped. Whilst this is still better than individually calling all samples and merging calls, as octopus can borrow haplotypes generated from other samples, and will at least output consistent calls. It is not optimal as this approach doesn't leverage genotype prior information across samples. Specifically, PopulationModel found in src/core/models/genotype/population_model.hpp needs implementing.

In addition, when more than a few samples are used for joint calling, the number of candidate variants can become intractably large. A better strategy would be to individually call each sample using a low variant posterior threshold, and then re-calling all the samples together using only the called variants. This can easily be done manually, but it would be nice to automate the process.

Variant filtering does not distinguish somatic calls

CSR variant filtering is currently aimed at germline variant calls which may result in over-filtering of somatic mutation calls. In particular, the default threshold filters variant alleles with extreme allele frequencies (AFB) which clearly is not ideal for somatic mutations. The mapping quality divergence filter (MQD) will also over filter somatic variants with extreme allele frequencies.

Until a more appropriate solution is implemented. It is recommended to ignore the AFB and MQD threshold filters on any somatic variant calls.

Genotype GT and Phasing

@dancooke
label:question

Hi Daniel,

I have installed 0.4.1-alpha Octopus using miniconda and run it in the tumor-normal somatic mode
I have looked at the VCF and at the GT field in particular. I am a little bit befuddled when looking at the GT field for the tumor. I often see four (4) values in the GT, for instance:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR

22 18847497 . CA C 3 DP;FRF;GQ;MQ AC=1;AN=6;DP=2;MP=24.22;MQ=28;MQ0=0;NS=2;SOMATIC GT:GQ:DP:MQ:PS:PQ:VAF_CR:FT 0|0:1:1:31:18847497:31:.,.:GQ,DP **0|0|1|0**:1:1:25:18847497:31:0.01,0.42:GQ,MQ,DP,FRF
22 18847501 . A C 14.52 DP;GQ;MQ AC=1;AN=6;DP=3;MP=24.22;MQ=27;MQ0=0;NS=2;SOMATIC GT:GQ:DP:MQ:PS:PQ:VAF_CR:FT 0|0:5:1:31:18847497:31:.,.:GQ,DP **0|0|0|1**:5:2:25:18847497:31:0.09,0.64:GQ,MQ,DP
22 18847511 . A C 16.98 DP;FRF;GQ;MQ AC=1;AN=6;DP=4;MP=24.22;MQ=28;MQ0=0;NS=2;SOMATIC GT:GQ:DP:MQ:PS:PQ:VAF_CR:FT 0|0:4:2:31:18847497:31:.,.:GQ,DP **0|0|0|1**:4:2:25:18847497:31:0.09,0.64:GQ,MQ,DP,FRF

I read the paragraph << Why do octopus VCF files contain * and .? >> in the user manual, but I am still confused about GT representation in Octopus' vcf file. [ using --legacy did not fix that confusion ].
I understand the phasing, but not the values.
Here we can see that positions 501 and 511 have been phased together and are in the the phase whereas 497 is not in the same phase block;
What I do not understand in that example is why we have three (3) zeros.
I added a screenshot of IGV for the region given as an example above; Based on that example, I would have expected GT=0|1 in the Tumor and not 0|0|0|1 for positions 501 and 511.

note: These calls examples are just examples purposes. In real conditions, these calls would be filtered out from our final list of variants.

Here my questions

  1. Could you explain in further details, the Octopus specifications for the GT field, and in the 0|0|0|1 and 0|0|1|0 in the example above in particular?
  2. If all the calls contains a PS value, does that mean that if a call has a PS value that is unique across the VCF file, that call was actually not phased with any other call and therefore should not have a pipe character but a slash character in the GT?
  3. If it is not phased, can Octopus not adding the PS field in the unphased variants (smaller VCF), and keep it of course in the phased calls?
  4. Does --legacy option only impact the GT field by converting some "0" to dot (".") in that field, or does it also impact other fields in the VCF?

Thanks for helping the community understanding Octopus better.
Best,
Chris

octopus_call_4gt anno

--bamout feature doesn't split output

Thanks a lot for developing this great software.
Really liked the interface and speed.

Describe the bug

I tried running octopus on both single-end and paired-end BAM files with --bamout options to get splitted phased BAM files. If I use a prefix not ending in .bam, the output saves to that prefix in SAM format.
If I use the name of the directory without name of the file, not output bam is saved at all.

Command
Command line to run octopus:

$ octopus -R WholeGenomeFasta/genome.fa -I SCC-090/SCC-090_sortedRG.bam -o SCC-090/SCC-090\_Octopus.vcf --bamout SCC-090
## Doesn't save any BAM or SAM giving the "The bam file you specified is not writable"
$ octopus -R WholeGenomeFasta/genome.fa -I SCC-090/SCC-090_sortedRG.bam -o SCC-090/SCC-090\_Octopus.vcf --bamout SCC-090/SCC-090_SplittedBamOctopus
## Saves one SAM to SCC-090/SCC-090_SplittedBamOctopus

Desktop (please complete the following information):

  • OS: CentOS 7
  • Version octopus/0.5.2b
  • Reference GRCh38

Population calling sample limit

Hello,

I just came across your biorxiv preprint, it was very easy to install via conda, and I love the extensive documentation!

I am wondering if there is a limit to the population size when calling variants in population mode. I currently have 330 samples (100Mb genome, coverage per sample ranges from 20X-1000X, though I usually downsample high coverage samples to 100X), but this number will increase over time.

Thanks

htslib not found by cmake

Hi Daniel and the Octopus' Contributors

Describe the bug
htslib directory is not found by cmake.

Command
Command line to run install octopus:

$ cmake -DBOOST_ROOT=/tmp/boost_1.68 -DHTSLIB_ROOT:PATH=/tmp/htslib-1.4 .. 

Desktop (please complete the following information):

  • OS: Linux CentOS 7.2
  • Version Octopus [obtained by runnign git clone as specified in the Octopus' README]
  • Reference None

Additional context
before running the command mentioned above and get the error below, we installed ALL the requirements in /tmp/ folder, i.e.:
Git 2.18.0
Boost 1.68
htslib 1.4
CMake 3.12.1
gcc 8.2

Actions we performed

part1
Then, we added directories of gcc 8.2. cmake 3.12.1, boost 1.68 and htslib 1.4 to the PATH, the LD_LIBRARY_PATH and LIBRARY_PATH variables.
part2
We also created the env variable as follow:
export HTSLIB_ROOT= /tmp/htslib-1.4
and also
export HTSLIB_DIR= /tmp/htslib-1.4
then we run cmake command as described above. We still got the Error Message below.
part3
We also added the file "HTSlibConfig.cmake" to the /tmp/htslib-1.4 directory.
cmake found that file, but not the requested information that should have been in that file; not knowing what information to add to the file and what CMAKE needs, it did not help fixing the issue

** ERROR Message from cmake**


CMake Error at src/CMakeLists.txt:564 (find_package):
By not providing "FindHTSlib.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "HTSlib", but
CMake did not find one.

Could not find a package configuration file provided by "HTSlib" (requested
version 1.4) with any of the following names:
HTSlibConfig.cmake
htslib-config.cmake
Add the installation prefix of "HTSlib" to CMAKE_PREFIX_PATH or set
"HTSlib_DIR" to a directory containing one of the above files. If "HTSlib"
provides a separate development package or SDK, be sure it has been
installed.
-- Configuring incomplete, errors occurred!


Questions

Could you help us fixing this issue?
Could you provide further information needed to get cmake finding the HTSLIB folder?

Thanks.
Chris

Segfault occurring during somatic variant calling

Hello there,

I've been experiencing segfaults when trying to run octopus for certain chromosomes on my data.

For instance the command line:
/octopus/0.5.2/venv/bin/octopus -C individual -R /hs37d5/raw/hs37d5.fa -I /pipeline_output/Alignment/HCC38.bam /pipeline_output/Alignment/HCC38BL.bam --normal-sample HCC38BL --threads 4 --forest-file germline.forest --somatic-forest-file somatic.forest --regions-file 5.bed --working-directory /Octopus --output /Octopus/HCC38.octopus.chromosome_5.vcf --legacy

will cause a segfault as pasted below
[2019-01-28 10:48:09] 5:74985048 32.6% 1h 27m 3h 13m
[2019-01-28 10:48:16] 5:76607329 33.6% 1h 27m 3h 5m
[2019-01-28 10:48:20] 5:78351905 34.6% 1h 27m 2h 56m
[2019-01-28 10:48:39] 5:79287160 35.6% 1h 28m 2h 49m
[2019-01-28 10:49:01] 5:80150148 36.6% 1h 28m 2h 43m
[2019-01-28 10:49:48] 5:82649104 37.6% 1h 29m 2h 37m
[2019-01-28 10:50:13] 5:87499037 38.6% 1h 29m 2h 31m
[2019-01-28 10:50:27] 5:90070178 39.6% 1h 30m 2h 25m
[2019-01-28 10:50:53] 5:94040933 40.6% 1h 30m 2h 20m
[2019-01-28 10:51:12] 5:95145447 41.6% 1h 30m 2h 14m
[2019-01-28 10:51:47] 5:96322489 42.6% 1h 31m 2h 9m
[2019-01-28 10:53:18] 5:101707910 43.6% 1h 32m 2h 6m
Segmentation fault (core dumped)

Some chromosome bed files cause segfaults, and some dont. 5.bed causes a seg fault, 6.bed doesnt (see attached - renamed to .txt to allow attachment)

Any help would be appreciated

5.bed.txt
6.bed.txt

Memory problem

Hi! I am having an issue with running Octopus somatic calling. It seems to be problem with memory since I can not set optimal parameters for memory and program crashes with code 139. I run it on instance with 30 Gb of RAM and 16 CPUs. BAM files have 30 Gb in total. Here is my command line:
octopus -R human_g1k_v37_decoy.fasta -I normal.bam tumor.bam -t example.bed -N normal --threads 16 -X 8000 -B 8

It seems it always come to 5% completed and it crashes.
Here is print from log:

[2018-02-02 12:07:24] <INFO> ------------------------------------------------------------------------
[2018-02-02 12:07:24] <INFO> octopus v0.3.3-alpha
[2018-02-02 12:07:24] <INFO> Copyright (c) 2017 University of Oxford
[2018-02-02 12:07:24] <INFO> ------------------------------------------------------------------------
[2018-02-02 12:13:38] <INFO> Done initialising calling components in 6.23m
[2018-02-02 12:13:38] <INFO> Detected 2 samples: "Tumor" "Normal"
[2018-02-02 12:13:38] <INFO> Invoked calling model: cancer
[2018-02-02 12:13:38] <INFO> Processing 59,892,486bp with 16 threads
[2018-02-02 12:13:38] <INFO> Writing filtered calls to stdout
[2018-02-02 12:13:38] <INFO> ------------------------------------------------------------------------
[2018-02-02 12:13:38] <INFO>      current      |                   |     time      |     estimated   
[2018-02-02 12:13:38] <INFO>      position     |     completed     |     taken     |     ttc         
[2018-02-02 12:13:38] <INFO> ------------------------------------------------------------------------
[2018-02-02 12:14:22] <INFO>      1:16475621             1.0%             43s           1.18h
[2018-02-02 12:15:31] <INFO>      1:32145441             2.0%           1.86m           1.86h
[2018-02-02 12:16:58] <INFO>      1:45222003             3.0%           3.31m           2.35h
[2018-02-02 12:18:24] <INFO>      1:68954864             4.0%           4.75m           2.30h
[2018-02-02 12:18:28] <INFO>      1:75065676             4.1%           4.81m           2.23h
[2018-02-02 12:18:39] <INFO>      1:78384058             4.2%           5.00m           2.26h
[2018-02-02 12:21:06] <INFO>     1:111860800             5.2%           7.45m           2.73h

I tried it also on my computer with smaller BAM files only on chromosome 21 and it also crashes with message: killed.

Can you help me set best parameters so program could work or is there a bug?

Thanks!

Compile error with GCC 7 on CentOS 7

I'm trying to build Octopus on CentOS 7 but hitting an error with memory_footprint.cpp. Tried again with 0.3.2 and it's the same error. Pre-reqs: cmake/3.9.6, gcc/7.2.0 built with compatible ABI, boost/1.65.1 built with gcc/7.2.0, htslib/1.5 built with default gcc.

cmake -D CMAKE_C_COMPILER=/apps/gcc/7.2.0/bin/gcc -D CMAKE_CXX_COMPILER=/apps/gcc/7.2.0/bin/g++ ..

-- The C compiler identification is GNU 7.2.0
-- The CXX compiler identification is GNU 7.2.0
-- Check for working C compiler: /apps/gcc/7.2.0/bin/gcc
-- Check for working C compiler: /apps/gcc/7.2.0/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /apps/gcc/7.2.0/bin/g++
-- Check for working CXX compiler: /apps/gcc/7.2.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type is 
-- Linking against boost dynamic libraries
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   system
--   filesystem
--   program_options
--   date_time
--   log_setup
--   log
--   iostreams
--   timer
--   thread
--   regex
--   chrono
--   atomic
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7") 
-- Found BZip2: /usr/lib64/libbz2.so (found version "1.0.6") 
-- Looking for BZ2_bzCompressInit
-- Looking for BZ2_bzCompressInit - found
-- Looking for lzma_auto_decoder in /usr/lib64/liblzma.so
-- Looking for lzma_auto_decoder in /usr/lib64/liblzma.so - found
-- Looking for lzma_easy_encoder in /usr/lib64/liblzma.so
-- Looking for lzma_easy_encoder in /usr/lib64/liblzma.so - found
-- Looking for lzma_lzma_preset in /usr/lib64/liblzma.so
-- Looking for lzma_lzma_preset in /usr/lib64/liblzma.so - found
-- Found LibLZMA: /usr/include (found version "5.1.2") 
-- Found CURL: /usr/lib64/libcurl.so (found version "7.29.0") 
-- Found OpenSSL: /usr/lib64/libcrypto.so (found version "1.0.1e") 
-- Found HTSlib 
--    HTSlib include dirs: /apps/htslib/1.5/include;/usr/include;/usr/include;/usr/include;/usr/include;/usr/include
--    HTSlib libraries: /apps/htslib/1.5/lib/libhts.a;/usr/lib64/libz.so;/usr/lib64/libbz2.so;/usr/lib64/liblzma.so;/usr/lib64/libcurl.so;/usr/lib64/libssl.so;/usr/lib64/libcrypto.so
-- IPO is supported!
-- Configuring done
-- Generating done
-- Build files have been written to: /apps/octopus/0.3.2-alpha/build


make 
...

[ 37%] Building CXX object src/CMakeFiles/octopus.dir/utils/memory_footprint.cpp.o
/apps/octopus/0.3.2-alpha/src/utils/memory_footprint.cpp: In function 'boost::optional<octopus::MemoryFootprint> octopus::parse_footprint(std::string)':
/apps/octopus/0.3.2-alpha/src/utils/memory_footprint.cpp:125:65: error: no matching function for call to 'std::basic_string<char>::erase(const __gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >&, std::basic_string<char>::const_iterator)'
         footprint_str.erase(first_digit_itr, cend(footprint_str));
                                                                 ^
In file included from /opt/gridware/apps/gcc/7.2.0/include/c++/7.2.0/string:52:0,
                 from /apps/octopus/0.3.2-alpha/src/utils/memory_footprint.hpp:8,
                 from /apps/octopus/0.3.2-alpha/src/utils/memory_footprint.cpp:4:
/opt/gridware/apps/gcc/7.2.0/include/c++/7.2.0/bits/basic_string.h:4476:7: note: candidate: std::basic_string<_CharT, _Traits, _Alloc>& std::basic_string<_CharT, _Traits, _Alloc>::erase(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]
       erase(size_type __pos = 0, size_type __n = npos)
       ^~~~~
/opt/gridware/apps/gcc/7.2.0/include/c++/7.2.0/bits/basic_string.h:4476:7: note:   no known conversion for argument 1 from 'const __gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >' to 'std::basic_string<char>::size_type {aka long unsigned int}'
/opt/gridware/apps/gcc/7.2.0/include/c++/7.2.0/bits/basic_string.h:4492:7: note: candidate: std::basic_string<_CharT, _Traits, _Alloc>::iterator std::basic_string<_CharT, _Traits, _Alloc>::erase(std::basic_string<_CharT, _Traits, _Alloc>::iterator) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>; std::basic_string<_CharT, _Traits, _Alloc>::iterator = __gnu_cxx::__normal_iterator<char*, std::basic_string<char> >; typename _Alloc::rebind<_CharT>::other::pointer = char*]
       erase(iterator __position)
       ^~~~~
/opt/gridware/apps/gcc/7.2.0/include/c++/7.2.0/bits/basic_string.h:4492:7: note:   candidate expects 1 argument, 2 provided
/opt/gridware/apps/gcc/7.2.0/include/c++/7.2.0/bits/basic_string.h:4512:7: note: candidate: std::basic_string<_CharT, _Traits, _Alloc>::iterator std::basic_string<_CharT, _Traits, _Alloc>::erase(std::basic_string<_CharT, _Traits, _Alloc>::iterator, std::basic_string<_CharT, _Traits, _Alloc>::iterator) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>; std::basic_string<_CharT, _Traits, _Alloc>::iterator = __gnu_cxx::__normal_iterator<char*, std::basic_string<char> >; typename _Alloc::rebind<_CharT>::other::pointer = char*]
       erase(iterator __first, iterator __last);
       ^~~~~
/opt/gridware/apps/gcc/7.2.0/include/c++/7.2.0/bits/basic_string.h:4512:7: note:   no known conversion for argument 1 from 'const __gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >' to 'std::basic_string<char>::iterator {aka __gnu_cxx::__normal_iterator<char*, std::basic_string<char> >}'
make[2]: *** [src/CMakeFiles/octopus.dir/utils/memory_footprint.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/octopus.dir/all] Error 2
make: *** [all] Error 2

Any thoughts on why it would not compile? Thanks!

cmake fails

Hi. Thanks for releasing this. Looks terrific!

While installing on Ubuntu linux (16.04), I got all the dependencies installed first (including the latest cmake 3.5.2, htslib etc), but it fails after initially chugging along fine.

Any suggestions on what I might be doing wrong?

/exports/software/octopus$ ./install.py 
-- Build type is Release
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Boost version: 1.58.0
-- Found the following Boost libraries:
--   system
--   filesystem
--   program_options
--   date_time
--   log_setup
--   log
--   iostreams
--   timer
--   thread
--   regex
--   chrono
--   atomic
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.8") 
-- Found HTSlib 
--    HTSlib include dirs: /usr/local/include/htslib;/usr/include
--    HTSlib libraries: /usr/local/lib/libhts.so;/usr/lib/x86_64-linux-gnu/libz.so
-- Configuring done
-- Generating done
-- Build files have been written to: /exports/software/octopus/build
Scanning dependencies of target libdivsufsort
[  0%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/divsufsort.c.o
[  1%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/sssort.c.o
[  2%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/trsort.c.o
[  3%] Building C object lib/tandem/libdivsufsort/CMakeFiles/libdivsufsort.dir/utils.c.o
[  4%] Linking C static library liblibdivsufsort.a
[  4%] Built target libdivsufsort
Scanning dependencies of target tandem
[  5%] Building CXX object lib/tandem/CMakeFiles/tandem.dir/tandem.cpp.o
[  6%] Linking CXX static library libtandem.a
[  6%] Built target tandem
Scanning dependencies of target octopus
[  7%] Building CXX object src/CMakeFiles/octopus.dir/main.cpp.o
[  7%] Building CXX object src/CMakeFiles/octopus.dir/config/config.cpp.o
[  8%] Building CXX object src/CMakeFiles/octopus.dir/config/common.cpp.o
[  9%] Building CXX object src/CMakeFiles/octopus.dir/config/option_parser.cpp.o
[ 10%] Building CXX object src/CMakeFiles/octopus.dir/config/option_collation.cpp.o
[ 11%] Building CXX object src/CMakeFiles/octopus.dir/config/octopus_vcf.cpp.o
[ 12%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/error.cpp.o
[ 12%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/missing_file_error.cpp.o
[ 13%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/malformed_file_error.cpp.o
[ 14%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/missing_index_error.cpp.o
[ 15%] Building CXX object src/CMakeFiles/octopus.dir/exceptions/unwritable_file_error.cpp.o
[ 16%] Building CXX object src/CMakeFiles/octopus.dir/basics/cigar_string.cpp.o
[ 17%] Building CXX object src/CMakeFiles/octopus.dir/basics/aligned_read.cpp.o
[ 17%] Building CXX object src/CMakeFiles/octopus.dir/basics/ploidy_map.cpp.o
[ 18%] Building CXX object src/CMakeFiles/octopus.dir/logging/logging.cpp.o
[ 19%] Building CXX object src/CMakeFiles/octopus.dir/logging/progress_meter.cpp.o
[ 20%] Building CXX object src/CMakeFiles/octopus.dir/logging/error_handler.cpp.o
[ 21%] Building CXX object src/CMakeFiles/octopus.dir/logging/main_logging.cpp.o
[ 22%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/caching_fasta.cpp.o
[ 22%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/fasta.cpp.o
[ 23%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/reference_genome.cpp.o
[ 24%] Building CXX object src/CMakeFiles/octopus.dir/io/reference/threadsafe_fasta.cpp.o
[ 25%] Building CXX object src/CMakeFiles/octopus.dir/io/region/region_parser.cpp.o
[ 26%] Building CXX object src/CMakeFiles/octopus.dir/io/read/htslib_sam_facade.cpp.o
/exports/software/octopus/src/io/read/htslib_sam_facade.cpp: In member function ‘octopus::AlignedRead octopus::io::HtslibSamFacade::HtslibIterator::operator*() const’:
/exports/software/octopus/src/io/read/htslib_sam_facade.cpp:956:28: warning: narrowing conversion of ‘octopus::io::mapping_quality((* & info))’ from ‘unsigned int’ to ‘octopus::AlignedRead::MappingQuality {aka unsigned char}’ inside { } [-Wnarrowing]
             mapping_quality(info),
                            ^
/exports/software/octopus/src/io/read/htslib_sam_facade.cpp:973:28: warning: narrowing conversion of ‘octopus::io::mapping_quality((* & info))’ from ‘unsigned int’ to ‘octopus::AlignedRead::MappingQuality {aka unsigned char}’ inside { } [-Wnarrowing]
             mapping_quality(info),
                            ^
[ 27%] Building CXX object src/CMakeFiles/octopus.dir/io/read/read_manager.cpp.o
[ 27%] Building CXX object src/CMakeFiles/octopus.dir/io/read/read_reader.cpp.o
[ 28%] Building CXX object src/CMakeFiles/octopus.dir/io/variant/htslib_bcf_facade.cpp.o
/exports/software/octopus/src/io/variant/htslib_bcf_facade.cpp: In instantiation of ‘octopus::set_samples(const bcf_hdr_t*, bcf1_t*, const octopus::VcfRecord&, const std::vector<std::__cxx11::basic_string<char> >&)::<lambda(const auto:12&)> [with auto:12 = std::__cxx11::basic_string<char>]’:
/usr/include/c++/5/bits/stl_algo.h:3767:5:   required from ‘_Funct std::for_each(_IIter, _IIter, _Funct) [with _IIter = __gnu_cxx::__normal_iterator<const std::__cxx11::basic_string<char>*, std::vector<std::__cxx11::basic_string<char> > >; _Funct = octopus::set_samples(const bcf_hdr_t*, bcf1_t*, const octopus::VcfRecord&, const std::vector<std::__cxx11::basic_string<char> >&)::<lambda(const auto:12&)>]’
/exports/software/octopus/src/io/variant/htslib_bcf_facade.cpp:891:6:   required from here
/exports/software/octopus/src/io/variant/htslib_bcf_facade.cpp:845:45: internal compiler error: Segmentation fault
         const auto num_values = num_samples * static_cast<int>(source.format_cardinality(key));
                                             ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
src/CMakeFiles/octopus.dir/build.make:686: recipe for target 'src/CMakeFiles/octopus.dir/io/variant/htslib_bcf_facade.cpp.o' failed
make[2]: *** [src/CMakeFiles/octopus.dir/io/variant/htslib_bcf_facade.cpp.o] Error 1
CMakeFiles/Makefile2:216: recipe for target 'src/CMakeFiles/octopus.dir/all' failed
make[1]: *** [src/CMakeFiles/octopus.dir/all] Error 2
Makefile:116: recipe for target 'all' failed
make: *** [all] Error 2

cmake3 note found on CentOS

Describe the bug
Install fails if it can't find cmake of version.

Only searches for cmake.

In CentOS - and I presume RedHat - cmake is used for version 2, cmake3 is for version 3.

Need to edit line 98 of scripts/install.py from

ret = call(["cmake"] + cmake_options + [".."])

to

ret = call(["cmake3"] + cmake_options + [".."])

and that check passes.

Octopus writes out unicode characters in the VCF header

Describe the bug
Octopus writes out unicode characters in the VCF header. The leaning quotes in the source code around the word "called" actually make it into the VCF headers downstream. This causes the resulting output VCF to contain unicode text, and breaks reading with Pysam.

Here:

result.add_format("FT", "1", "String", "Sample genotype filter indicating if this genotype was “called”");

and here:

result.add_format("FT", "1", "String", "Sample genotype filter indicating if this genotype was “called”");

Heres the pysam error

>>> vcf = pysam.VariantFile('test_data/raw_tool_vcfs/octopus.raw.vcf')
>>> print(vcf.header)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pysam/libcbcf.pyx", line 2042, in pysam.libcbcf.VariantHeader.__str__ (pysam/libcbcf.c:30689)
  File "pysam/libcutils.pyx", line 143, in pysam.libcutils.charptr_to_str_w_len (pysam/libcutils.c:3393)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1516: ordinal not in range(128)

And here's the offending header line:

##FORMAT=<ID=FT,Number=1,Type=String,Description="Sample genotype filter indicating if this genotype was “called”">

Command
Command line to run octopus:

$ octopus

Desktop (please complete the following information):

  • OS: [e.g. OSX High Sierra]
  • Version [e.g. v0.3.3-alpha]
  • Reference [e.g. hg19]

Additional context
Add any other context about the problem here.

Not a BGZF file

Hello again!

I am still receiving the above warning message. This is on an Ubuntu 16.04 VM. The output looks fine as far as I call tell.

$ octopus-0.1.4-alpha/bin/octopus -I ~/Dropbox/Public/NA12878.bam  -R /mnt/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -o test.vcf.gz
[2016-11-01 17:19:17] <INFO> ------------------------------------------------------------------------
[2016-11-01 17:19:17] <INFO> octopus v0.1.4-alpha
[2016-11-01 17:19:17] <INFO> Copyright (c) 2016 University of Oxford
[2016-11-01 17:19:17] <INFO> ------------------------------------------------------------------------
Not a BGZF file: /home/joconnell/workspace/test.vcf.gz
[2016-11-01 17:19:17] <INFO> Done initialising calling components in 388ms
[2016-11-01 17:19:17] <INFO> Invoked calling model: individual
[2016-11-01 17:19:17] <INFO> Detected 1 sample: "NA12878"
[2016-11-01 17:19:17] <INFO> Writing calls to "/home/joconnell/workspace/test.vcf.gz"
[2016-11-01 17:19:17] <INFO> ------------------------------------------------------------------------
[2016-11-01 17:19:17] <INFO>      current      |                   |     time      |     estimated   
[2016-11-01 17:19:17] <INFO>      position     |     completed     |     taken     |     ttc         
[2016-11-01 17:19:17] <INFO> ------------------------------------------------------------------------
[2016-11-01 17:19:50] <INFO>  chr1:197003592             6.4%             32s           8.03m
[2016-11-01 17:20:08] <INFO>  chr2:136820959            10.8%             50s           7.06m
[2016-11-01 17:20:20] <INFO>  chr3:129195987            15.0%           1.05m           6.01m
[2016-11-01 17:20:48] <INFO>    chr4:3026996            15.1%           1.51m           8.63m
[2016-11-01 17:21:07] <INFO>  chr5:111993204            18.7%           1.81m           8.03m
[2016-11-01 17:21:30] <INFO>   chr6:26036096            19.5%           2.21m           9.21m
[2016-11-01 17:21:43] <INFO>  chr7:117071382            23.3%           2.41m           8.08m
[2016-11-01 17:21:58] <INFO>   chr8:67039906            25.5%           2.68m           7.95m
[2016-11-01 17:22:06] <INFO>  chr9:135715793            29.9%           2.80m           6.68m
[2016-11-01 17:22:16] <INFO> chr10:118949728            33.7%           2.96m           5.93m
[2016-11-01 17:22:36] <INFO>  chr11:27625597            34.6%           3.30m           6.33m
[2016-11-01 17:22:53] <INFO>  chr12:52856682            36.3%           3.60m           6.40m
[2016-11-01 17:23:11] <INFO>  chr13:32840198            37.4%           3.88m           6.60m
[2016-11-01 17:23:23] <INFO>  chr14:88349470            40.3%           4.08m           6.18m
[2016-11-01 17:23:39] <INFO>  chr15:48649338            41.8%           4.35m           6.16m
[2016-11-01 17:23:57] <INFO>   chr16:3723247            42.0%           4.66m           6.56m
[2016-11-01 17:23:58] <WARN> Skipping region chr16:3826935-3827292 as there are too many haplotypes
[2016-11-01 17:24:24] <INFO>  chr17:41144546            43.3%           5.10m           6.80m
[2016-11-01 17:24:47] <INFO>  chr18:21060436            44.0%           5.48m           7.11m
[2016-11-01 17:25:00] <INFO>  chr19:41852114            45.4%           5.70m           7.01m
[2016-11-01 17:25:14] <INFO>  chr20:10568782            45.7%           5.93m           7.20m
[2016-11-01 17:25:27] <INFO>  chr21:27201382            46.6%           6.15m           7.23m
[2016-11-01 17:25:57] <INFO>  chr22:42471627            48.0%           6.65m           7.40m
[2016-11-01 17:26:10] <INFO>  chrX:154014372            53.0%           6.88m           6.25m
[2016-11-01 17:26:12] <INFO>               -             100%           6.91m               -
[2016-11-01 17:26:12] <INFO> Finished calling 3,095,693,983bp, total runtime 6.91m
[2016-11-01 17:26:12] <INFO> ------------------------------------------------------------------------

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.