Coder Social home page Coder Social logo

aquaskyline / 16gt Goto Github PK

View Code? Open in Web Editor NEW
27.0 8.0 8.0 1.29 MB

Simultaneous detection of SNPs and Indels using a 16-genotype probabilistic model

C 6.74% C++ 34.63% Makefile 0.11% Objective-C 0.06% Perl 58.41% Dockerfile 0.06%
call-variants genome vcf indels snps bioinformatics computational-biology

16gt's Introduction

Setup

docker

git clone https://github.com/aquaskyline/16GT.git
cd 16GT
docker build --no-cache .
docker images

use the respective "IMAGE ID" displayed above as below

docker run -it --privileged <docker-id> /bin/bash

once inside the docker image, index the reference

cd /16GT/SOAP3-dp
./soap3-dp-builder <path-to-ref-gen-fasta>
./BGS-Build <path-to-ref-gen-fasta>.index

variant call using aligned/indexed bam file

cd /16GT
./bam2snapshot -i <path-to-ref-gen-fasta>.index -b <aligned-bam-file> -o <output-prefix>
./snapshotSnpcaller  -i <path-to-ref-gen-fasta>.index  -o <output-prefix>
perl txt2vcf.pl <output-prefix>.txt <pro-id> <path-to-ref-gen-fasta> > <output>.vcf
perl filterVCF.pl <output>.vcf > <output>.filtered.vcf

16GT

16GT is a variant caller utilizing a 16-genotype probabilistic model to unify SNP and indel calling in a single algorithm. 16GT is easy to use. The default parameters will fit most of the use cases with human genome. For the detailed parameters for each module, please run the module to get an info.

Quick start

Inputs: genome.fa alignments.bam, Output: .vcf

0. Install

git clone https://github.com/aquaskyline/16GT
cd 16GT
make
# Tested in Ubuntu 14.04 and CentOS 6.7 with GCC 4.7.2

1. Build reference index

git clone https://github.com/aquaskyline/SOAP3-dp.git
cd SOAP3-dp
make SOAP3-Builder
make BGS-Build
soap3-dp-builder genome.fa
BGS-Build genome.fa.index

2. Convert BAM to SNAPSHOT

bam2snapshot -i genome.fa.index -b alignments.bam -o output/prefix

3. Call

snapshotSnpcaller -i genome.fa.index -o output/prefix
perl txt2vcf.pl output/prefix.txt sampleName genome.fa > <output>.vcf
perl filterVCF.pl <output>.vcf dbSNP.vcf.gz > <output>.filtered.vcf

Exome variant calling

Inputs: genome.fa alignement.bam region.bed, Outputs: region.bin .vcf

RegionIndexBuilder genome.fa.index region.bed region.bin -bed/-gff
bam2snapshot -i genome.fa.index -b alignments.bam -o output/prefix -e region.bin
snapshotSnpcaller -i genome.fa.index -o output/prefix -e region.bin

License

GPLv3

16gt's People

Contributors

animesh avatar aquaskyline avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

16gt's Issues

How to call somatic mutations using 16GT?

Hi. I've downloaded and installed 16GT and plan to call somatic mutations. However, I can not find the proper command to call. Would somebody could tell me about that? Thank you.

MMUnitAllocate() : cannot allocate memory!

Hello,

I've got a problem using the snapshotSnpcaller:

Loading reference index...
Done in 6.3894 seconds
Reference sequence length: 1255676862

Reading snapshot...
Read snapshot in 10.8167 seconds

Handling SNP Counter Result
MMUnitAllocate() : cannot allocate memory!

It can't allocate memory but I'm not sure why.

Thank you for your help,

David

need help for 16gt install-------Makefile:11: recipe for target 'snapshotSnpcaller' failed

~/src_naibin/16GT$ make
g++ -O3 -funroll-loops -fomit-frame-pointer -maccumulate-outgoing-args -funroll-loops -static-libgcc -mpopcnt -fopenmp -fpermissive -w snapshotSnpcaller.o SNP.o SAMhandler.o FisherExactTest.o VariantCaller.o SnapshotHandler.o fisher.o likelihood_cache.o ycsq.o SNP_Caller.o SNPFunctions.o interpreter.o lib/lib.a -lpthread -lm -lz -o snapshotSnpcaller
/usr/bin/ld: lib/lib.a(BWT.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(CPUfunctions.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(DNACount.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(HSP.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(HSPstatistic.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(MemManager.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(MemoryCounter.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(MiscUtilities.o): relocation R_X86_64_32S against .bss' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(PEAlgnmt.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(PE.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(r250.o): relocation R_X86_64_32 against .bss' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(TextConverter.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(Timing.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(SimpleMemoryPool.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(IniParam.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(iniparser.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(inistrlib.o): relocation R_X86_64_32 against .bss' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(bam.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(bam_import.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(sam_header.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(AlgnResult.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(BGS-HostAlgnmtAlgo.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(BGS-HostAlgnmtAlgo2.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(BGS-HostAlgnmtAlgoSingle.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(BGS-HostAlgnmtOps.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(BGS-IO.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(Indel_RA.o): relocation R_X86_64_32 against symbol _Z23identifyRAWindowWrapperPv' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(ScoreRecalibration.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(SNPDuplicateRemoval.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(dictionary.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(PrimerHashAPI.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(bgzf.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(bam_aux.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(sam.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(bam_pileup.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(faidx.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: lib/lib.a(knetfile.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: lib/lib.a(razf.o): relocation R_X86_64_32S against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
Makefile:11: recipe for target 'snapshotSnpcaller' failed
make: *** [snapshotSnpcaller] Error 1
Thanks for your kind help to diagnose the error shown here.

Joint calling and hg38

Hi @aquaskyline

I have two questions about 16GT. First, does it support trio joint calling? Can I use all the samples as the input of bam2snapshot? Second, is it compatible with the hs38dh reference genome? Is there something like the BadMateFilter in GATK that will cause problems when calling variants on the ALT contigs?

Thanks

Segmentaiton fault

Hello, I have tried to follow the tutorial in the README file, but program bam2snapshot have dropped a a Segmentation fault. Here is gdb output :

...
Processing ref_is350_to_b3v06_masked_div30.bam...

Program received signal SIGSEGV, Segmentation fault.
0x0000000000418670 in getAmbPos (chr_id=0, offset=1, ambiguityMap=ambiguityMap@entry=0x6582c0, 
    translate=translate@entry=0x7ffff7f4c010, dnaLength=982460096) at indexFunction.cpp:22
22          while (translate[approxValue].startPos > ambPos) {

The bam file is rather big, but privately I can share it if needed.

--- edit ---
Some technical details. It's a sorted bamfile, reads were mapped using bwa-mem and PCR duplicates were marked using samblaster. I am running it on CentOS 7, compiled using gcc version 6.1.1.

make error 16GT on mac

Hello,
I'm on mac os 10.15.4 (Catalina) trying to install 16GT.
My clang version: Apple clang version 11.0.3 (clang-1103.0.32.59)
I have latest gcc (g++-9)
My try:
$ git clone https://github.com/aquaskyline/16GT
$ cd 16GT
$ make
g++ -O3 -funroll-loops -fomit-frame-pointer -maccumulate-outgoing-args -funroll-loops -static-libgcc -mpopcnt -fopenmp -fpermissive -w -c snapshotSnpcaller.cpp -o snapshotSnpcaller.o
clang: error: unknown argument: '-maccumulate-outgoing-args'
clang: error: unsupported option '-fopenmp'
make: *** [snapshotSnpcaller.o] Error 1

Thank you in advance :-)

How to set CPU number? Number of threads? Segmentation fault (core dumped)

~/16GT> /home/tao/16GT/bam2snapshot -i /home/tao/seq/jelly_out_chr_pilon.fasta.rename.fa.index -b /home/newdisk/tao/z_final_new/432.sort.bam -b /home/newdisk/tao/z_final_new/ZGWS.sort.bam -b /home/newdisk/tao/z_final_new/543.sort.bam -b /home/newdisk/tao/z_final_new/ZO1.sort.bam -b /home/newdisk/tao/z_final_new/AL1.sort.bam -b /home/newdisk/tao/z_final_new/PZ.sort.bam -b /home/newdisk/tao/z_final_new/BG.sort.bam -b /home/newdisk/tao/z_final_new/PZHBL.sort.bam -b /home/newdisk/tao/z_final_new/BYDHL.sort.bam -b /home/newdisk/tao/z_final_new/QBL.sort.bam -b /home/newdisk/tao/z_final_new/DH2.sort.bam -b /home/newdisk/tao/z_final_new/SRR2131192.sort.bam -b /home/newdisk/tao/z_final_new/FH.sort.bam -b /home/newdisk/tao/z_final_new/T1-6.sort.bam -b /home/newdisk/tao/z_final_new/HLJHL.sort.bam -b /home/newdisk/tao/z_final_new/WSHL.sort.bam -b /home/newdisk/tao/z_final_new/JX.sort.bam -b /home/newdisk/tao/z_final_new/XHHL.sort.bam -b /home/newdisk/tao/z_final_new/XL.sort.bam -o /home/newdisk/tao/z_final_new/16GTotherlotus
Loading reference...
Done.
464 chromosomes, 821160161bp in length.
Parameters:
#CPUThreads=16
TrimSize=5
SoftClipThreshold=5
MQThreshold=0
indelWeightThreshold=1
Allocating memory...
Done.
Time elapsed by now is 0 seconds.
Updating snapshot...
Processing /home/newdisk/tao/z_final_new/432.sort.bam...
Segmentation fault (core dumped)

Dear friend, you tool does not have any parameters for changing the number of threads.

Please let us know how to set it!

Compilations issues with SOAP and BGS

Hello,

I can't compile SOAP

g++: warning: ‘-mcpu=’ is deprecated; use ‘-mtune=’ or ‘-march=’ instead
g++: error: unrecognized command line option ‘-maltivec’; did you mean ‘-mglibc’?
make: *** [<builtin>: 2bwt-lib/BWTConstruct.o] Error 1

neither BGS

g++ -O3 -funroll-loops -w -fopenmp -D__STDC_LIMIT_MACROS -mcpu=power8 -mtune=power8 -maltivec -fsigned-char   -c -o 2bwt-lib/BWT.o 2bwt-lib/BWT.c
g++: warning: ‘-mcpu=’ is deprecated; use ‘-mtune=’ or ‘-march=’ instead
g++: error: unrecognized command line option ‘-maltivec’; did you mean ‘-mglibc’?

With gcc 6.3.1

Could you elaborate more on the steps?

Could you explain better the steps? I find information missing,

-what type of bam does it need to work, sorted, mark duplicates, BQSR?
-can the bam file be produced with any mapper or only SOAP3-dp?
-what is the.bin file?
-what does each step do?
-If I have a bam file and the reference.fasta indexed where should I start?

Thank you very much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.