Coder Social home page Coder Social logo

kehrlab / popdel Goto Github PK

View Code? Open in Web Editor NEW
34.0 4.0 2.0 20.92 MB

Population-wide Deletion Calling

License: GNU General Public License v3.0

Makefile 0.01% C++ 95.33% C 4.66%
bioinformatics variant-calling population-genomics sv-calling structural-variation

popdel's Introduction

PopDel - Population-wide Deletion Calling

install with bioconda GitHub license GitHub Releases GitHub Issues

Input: BAM/CRAM files (tested for up to 50,000) from short-read paired-end whole-genome sequencing data

Output: Called deletions in VCF file

Note: The default reference genome is GRCh38 (Genome Reference Consortium Human Build 38). Other human reference builds can be specified in the options. See Specifying the reference genome. For other diploid organism or custom reference builds, it is necessary to specify user-defined sampling intervals. See Sampling intervals for parameter estimation.

Quickstart

For more detailed information see the Wiki.

Installation

git clone https://github.com/kehrlab/PopDel.git
cd PopDel
sudo make install

or with conda:

conda install -c bioconda popdel

Note: PopDel takes significantly more time for calling variants when installed via conda.

Step 1: Create profile

Create insert size profiles for each individual sample

# Create a profile for each BAM-file
popdel profile myBam1.bam
popdel profile myBam2.bam
popdel profile myBamN.bam

For more options see Wiki: PopDel Profile

Step 2: Call deletions

Joint calling on list of all profiles

# Create a list of all profiles
realpath myBam*.profile > myProfiles.txt
# Run calling on all profiles
popdel call myProfiles.txt

For more options see Wiki: PopDel Call

See wiki for more information on how to view the profile with PopDel View and interpret the output in VCF-format.

Citation

Sebastian Niehus, Hákon Jónsson, Janina Schönberger, Eythór Björnsson, Doruk Beyter, Hannes P. Eggertsson, Patrick Sulem, Kári Stefánsson, Bjarni V. Halldórsson, Birte Kehr. PopDel identifies medium-size deletions simultaneously in tens of thousands of genomes. Nat Commun 12, 730 (2021). https://doi.org/10.1038/s41467-020-20850-5

Version and License

    Last update: 2021-03-25
    PopDel version: 1.5.0
    SeqAn version: 2.1 (with HTSlib support added by Hannes P.Eggertsson)
    Author: Sebastian Niehus (Sebastian.Niehus[at]ukr.de)

PopDel is distributed under the GPL-3.0. Consult the accompanying LICENSE file for more details.

popdel's People

Contributors

bkehr avatar schonbej avatar serosko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

yuzhenpeng ro36

popdel's Issues

Control threading

Right now a lot of threads are spawned by popdel, probably for parallel I/O. It would be nice if this degree of freedom could be controlled by a command line parameter.

Popdel not recognising cram file

Hi. I installed popdel via conda and tried to run it on a single cram file (aligned to hg38) using the popdel profile command. However, I am getting the following error message:

popdel profile - Profile creation from BAM-file

[PopDel 2021-01-19 13:58:51] Command:

/cbio/datasets/human/other-nmd/b38/FR07888912/cram/FR07888912.md.recal.cram

./seqan/bam_io/bam_file.h:156 FAILED! (BamFileIn: File format not specified.)

stack trace:
0 [0x5607964d97cf] popdel(+0x137cf)
1 [0x56079653ee73] popdel(+0x78e73)
2 [0x56079654ec34] popdel(+0x88c34)
3 [0x5607964d22b5] popdel(+0xc2b5)
4 [0x7fb56b6ceb97] __libc_start_main + 0xe7
5 [0x5607964d2731] popdel(+0xc731)

Aborted (core dumped)

Do you have any suggestions on how I can get the program to run?

Thanks,
Melissa

Segfault when running popdel call

I received a segfault when running popdel call on a set of profile files generated from 880 BAM files aligned against the GRCh37 genome. I compiled the debug version of popdel, and ran it under gdb, and received the following (after loads of text describing the loading of each file):

[PopDel 2020-07-27 13:30:06] Loaded insert size histograms for 547 read groups.
[PopDel 2020-07-27 13:30:06] Minimum initial deletion lengths have been set to 4 * standard deviations of insert size histograms [115..467].
[PopDel 2020-07-27 13:30:06] Minimum final deletion length has been set to 418.
[PopDel 2020-07-27 13:30:06] Calculated minimum log-likelihood ratio as 12.5277 from the prior probability 0.0001 using the 99%-quantile of a chi-squared distribution with df=1.
[PopDel 2020-07-27 13:30:06] Finished parameter calculation.

[PopDel 2020-07-27 13:30:06] Initialized insert size profiles from 880 input files.
[PopDel 2020-07-27 13:30:51] The first window of all profiles starts at '1:10050'.
./seqan/sequence/string_base.h:460 Assertion failed : static_cast<TStringPos>(pos) < static_cast<TStringPos>(length(me)) was: 0 >= 0 (Trying to access an element behind the last one!)

stack trace:
  0          [0x41f308]  /gpfs/mrc0/projects/Research_Project-MRC147594/software/popdel/popdel_gcc8.2.0/PopDel/popdel()
  1          [0x478afe]  /gpfs/mrc0/projects/Research_Project-MRC147594/software/popdel/popdel_gcc8.2.0/PopDel/popdel()
  2          [0x45c158]  /gpfs/mrc0/projects/Research_Project-MRC147594/software/popdel/popdel_gcc8.2.0/PopDel/popdel()
  3          [0x4417bb]  /gpfs/mrc0/projects/Research_Project-MRC147594/software/popdel/popdel_gcc8.2.0/PopDel/popdel()
  4          [0x41d32e]  /gpfs/mrc0/projects/Research_Project-MRC147594/software/popdel/popdel_gcc8.2.0/PopDel/popdel()
  5          [0x41dfc6]  /gpfs/mrc0/projects/Research_Project-MRC147594/software/popdel/popdel_gcc8.2.0/PopDel/popdel()
  6    [0x2aaaab20eb15]  __libc_start_main + 0xf5
  7          [0x40c029]  /gpfs/mrc0/projects/Research_Project-MRC147594/software/popdel/popdel_gcc8.2.0/PopDel/popdel()


Program received signal SIGABRT, Aborted.
0x00002aaaab2225f7 in raise () from /lib64/libc.so.6
(gdb) backtrace
#0  0x00002aaaab2225f7 in raise () from /lib64/libc.so.6
#1  0x00002aaaab223ce8 in abort () from /lib64/libc.so.6
#2  0x000000000041f30d in seqan::ClassTest::fail () at ./seqan/basic/debug_test_system.h:1580
#3  0x0000000000478afe in seqan::value<unsigned int, seqan::Alloc<void>, int> (me=..., pos=@0x7fffffffa134: 0) at ./seqan/sequence/string_base.h:460
#4  0x000000000045c158 in seqan::String<unsigned int, seqan::Alloc<void> >::operator[]<int> (this=0x5b5bb8, pos=0) at ./seqan/sequence/string_alloc.h:204
#5  0x00000000004417bb in checkAndSwitch (profile=..., rg=..., beginPos=10050) at popdel_call/load_profile_popdel_call.h:420
#6  0x000000000041d32e in popdel_call (argc=2, argv=0x7fffffffb520) at workflow_popdel.h:303
#7  0x000000000041dfc6 in main (argc=2, argv=0x7fffffffb520) at popdel.cpp:37
(gdb) 

The command-line arguments were "call profile_files.txt".

If you require any further information, please let me know.

SVLEN field does not conform to VCF standard

According to VCF 4.2 and 4.3:

image

[..]

image

However, PopDel v1.1.3 writes out

##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">

IOW: with Number=1 rather than Number=.

request: statically linked binary ?

Hi I would be interested in testing popdel but my compiler is too old, my server is away from the internet (I cannot use conda) and asking my sysadmin to install softwares takes a long time for security reason.

If possible, would it be possible to provide a statically linked binary in the release , instead of the dynamic one ?

Thank you in advance

too big offset error in popdel profile

Hi,
I'm trying to test popdel v1.2.1 on a small batch of 5 WGS sample mapped to GRCh38 (30-60X mean coverage).
The command I used is simply:

popdel profile -o profiles/sample1.profile sample1.bam

For all samples the program performs parameters estimation and then crash with the following error:

terminate called after throwing an instance of 'seqann:ParseError'
what(): Too big offset in Window!

Can you advise on what can cause this problem?

SVLEN INFO field should be Integer

Currently, the type is String but it should be Integer.

##INFO=<ID=SVLEN,Number=1,Type=String,Description="Difference in length between REF and ALT alleles">

Add INFO/END

This would make downstream's life much easier as it would allow bcftools query -f "%CHROM\t%POS0\t%END\n" file.vcf.gz for conversion to BED.

Adding INFO/SVMETHOD

Hi, what about adding an INFO/SVMETHOD tag similar to the way Delly adds it?

header

##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">

record

SVMETHOD=POPDELv1.1.0

Issue with multiple read groups per sample

I'm having a BAM file that has the following read group structure, as per GATK best practices, one for each lane. This is only relevant for correcting lane biases in yield, though, and probably irrelevant for any HiSeq X era analysis...

@RG     ID:sample-N1-DNA1-WGS1.0       SM:sample-N1-DNA1-WGS1 PL:ILLUMINA
@RG     ID:sample-N1-DNA1-WGS1.1       SM:sample-N1-DNA1-WGS1 PL:ILLUMINA
@RG     ID:sample-N1-DNA1-WGS1.2       SM:sample-N1-DNA1-WGS1 PL:ILLUMINA

PopDel dislikes my BAM file:

[PopDel] Not enough reads in sampling regions for read group(s) 'sample-N1-DNA1-WGS1.2'. Please use different sampling intervals (option '-i') or use the option '-n'

A suitable workaround would be to either assume one sample per BAM file or not gathering reads by read group but rather by sample.

error message in popdel

I am trying run popdel profile with the follow comand :
popdel profile sample_sorted.bam
but i get this error:

  • Profile creation from BAM-file
    =================================

[PopDel 2019-04-16 11:08:57] Command:

sample_sorted.bam

[PopDel 2019-04-16 11:08:57] Found 1 read groups in input bam file 'sample_sorted.bam'.
[PopDel 2019-04-16 11:08:57] Reference consists of 195 sequences with a total length of 3099922541 base pairs.
terminate called after throwing an instance of 'std::ios_base::failure[abi:cxx11]'
what(): [ERROR] [PopDel] Not enough reads in sampling regions for read group(s) 'id'. Please use different sampling intervals (option '-i') or use the option '-n' to reduce the number of required read pairs for each histogram of each read group.: iostream error
Aborted

Cam someone help me with this?

Thanks

CRAM support

Are there plans for cram support?

As a workaround, could a cram be piped into PopDel using samtools?

samtools view -T ref38.fa -b myBam1.cram |popdel profile -

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.