Coder Social home page Coder Social logo

10xgenomics / lariat Goto Github PK

View Code? Open in Web Editor NEW
28.0 21.0 7.0 1.93 MB

Linked-Read Alignment Tool

Home Page: https://support.10xgenomics.com/genome-exome/software/pipelines/latest/algorithms/overview

License: MIT License

Makefile 0.24% Python 1.39% Go 97.41% C 0.96%

lariat's Introduction

Lariat: Linked-Read Alignment Tool

Lariat is an aligner for barcoded linked reads, produced by the 10X Genomics GemCode™ platform. All the linked reads for a single barcode are aligned simultaneously, with the prior knowledge that the reads arise from a small number of long (10kb - 200kb) molecules. This approach allows reads to be mapped in repetitive regions of the genome.

Lariat is based on the original RFA method developed by Alex Bishara, Yuling Liu et al in Serafim Batzoglou’s lab at Stanford: Genome Res. 2015. 25:1570-1580. In addition to developing the original model for RFA, Alex Bishara and Yuling Liu both contributed substantially to the Lariat implementation maintained in this repository.

Lariat generates candidate alignments by calling the BWA C API, then performs the RFA inference to select the final mapping position and MAPQ.

Usage Notes:

NOTE: If you just want to get Lariat-aligned BAM files from Chromium Linked-Read data, you can run the ALIGN pipeline in Long Ranger 2.2. It runs the FASTQ processing and alignment steps only.

  • Lariat currently is tested with Go version 1.9.2.
  • Lariat currently requires a non standard format for input reads. We recommend using the Lariat build bundled with the 10X Genomics Long Ranger software (http://software.10xgenomics.com/)

Please contact us if you're interested in using Lariat independently of the Long Ranger pipeline.

Build notes:

In the lariat directory, run git submodule --init --recursive to ensure you've checked out the BWA submodule.

Make sure you have a working Go installation (version >= 1.9.2). go version should return something like "go version go1.9.2 linux/amd64"

From the root of the repo:

cd go
make           # Build lariat
bin/lariat -h  # Show cmd-line flags

For experimental purposes you can replace the lariat binary in a Long Ranger build with bin/lariat.

Input File Format

The SORT_FASTQS stage in Long Ranger creates specially formatted, barcode sorted input for lariat. We recommend using those input files to experiment with changes to lariat. Lariat requires input data in a non-standard FASTQ-like format. Each read-pair is formatted as a record of 9 consecutive lines containing:

  • read header
  • read1 sequence
  • read1 quals
  • read2 sequence
  • read2 quals
  • 10X barcode string
  • 10X barcode quals
  • sample index sequence
  • sample index quals

Read pairs must be sorted by the 10X barcode string. The 10X barcode string is of the form 'ACGTACGTACGTAC-1'.

License

Lariat is distributed under the MIT license. Lariat links to BWA at the object level. Lariat include the BWA source code via git submodule. Lariat links to the Apache2 branch of the BWA repo, which is licensed under the Apache2 license.

lariat's People

Contributors

ablewhiskey avatar adam-azarchs avatar pmarks avatar sjackman avatar wheaton5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lariat's Issues

SA tags with a position of -1

I've seen this twice now where Lariat aligned BAMs contain alignments where the SA tag reports a position of -1. (For an example, see hall-lab/extract_sv_reads#8). My expectation was that the position in the SA tag is 1-based, although the specification is not explicit on this point. For BWA-MEM aligned BAMs, I've never encountered a negative position in this tag.

Since it's unclear to me, I thought I'd open an issue to ask what a position of -1 means in the SA tags for Lariat aligned BAMs?

Thanks in advance.

Please make the Lariat alignment score parameters configurable

For future reference, the default Lariat alignment scoring parameters are

AS:f = -2 * mismatches - 3 * indels - 5 * clipped - 0.5 * clipped_bases - 4 * improper_pair

The default BWA scoring parameters are

AS:i = A * matches - B * mismatches - O * opens - E * extends - L * clipped - U * improper_pair
A = 1
B = 4
O = 6
E = 1
L = 5
U = 17

src/inference/bamwriter.go:8:2: no buildable Go source files in ~/src/lariat/go/src/gobwa

Steps to reproduce

git clone --recursive https://github.com/10XGenomics/lariat.git
make -C lariat/go

Log

make -C src/gobwa/bwa libbwa.a
make[1]: Entering directory '/home/sjackman/src/lariat/go/src/gobwa/bwa'
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS  utils.c -o utils.o
…
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS  malloc_wrap.c -o malloc_wrap.o
ar -csru libbwa.a utils.o kthread.o kstring.o ksw.o bwt.o bntseq.o bwa.o bwamem.o bwamem_pair.o bwamem_extra.o malloc_wrap.o
make[1]: Leaving directory '/home/sjackman/src/lariat/go/src/gobwa/bwa'
go install -ldflags "-X inference.__VERSION__ '29a7f74'" lariat
src/inference/bamwriter.go:8:2: no buildable Go source files in /home/sjackman/src/lariat/go/src/gobwa
make: *** [Makefile:9: lariat] Error 1

No Split/Chimeric reads in Longranger/Lariat output

I have noticed that in both the example WGS dataset NA12878 and our own WGS data, Longranger does not detect any split/chimeric reads. In my experience, this has caused problems in SV detection on getting the exact break points right. It seems that split read information is supposed to be incorporated into SV calls, since there are fields in both Loupe and the large_sv_calls.bedpe file to report them.
Since Longranger/Lariat uses BWA for alignment, and BWA-MEM and BWA-SW are both capable aligning split reads, my question is, are the current parameters that are being used taking advantage of this function?

Lariat takes a long time

Hello,

I am wondering if it is normal that longranger wgs (in the lariat step) is taking a long time to complete: it is now going on since two weeks.
This is the command:
longranger wgs --id=lngrgrwgs_to_NRG --fastqs=../input_fastqs --reference=refdata-genome_for_longranger --vcmode=freebayes --somatic --localcores=8 --localmem=70 --sex=f
The input files are 150 GB (two lanes of 33-37 GB each with 2.5 GB I1.fq.gz file), the reference is 4.5 Gb and has been formatted (number of sequences, max sequence length) to fit longranger mkref requirements. The output is 1.4 TB now, the stdout shows
2018-03-06 00:03:45 [runtime] (run:local) ID.lngrgrwgs_to_NRG.PHASER_SVCALLER_CS.PHASER_SVCALLER._LINKED_READS_ALIGNER.BARCODE_AWARE_ALIGNER.fork0.chnk127.main
only the _log file and the journal folder (which is empty) are being updated. Htop shows that there are 8 jobs running. The _log file shows
2018-03-19 05:17:48 [jobmngr] Attempted to reserve 4 threads, but only 0 were available.

Is this a normal computation time for this process?
Thanks,
Dario

Please clarify the specification for input to Lariat

In the .fasth files being input to lariat, I noticed that there are 3 variations on the 10X barcode field

The readme says:

The 10X barcode string is of the form ACGTACGTACGTAC-1

I found that barcodes not in the whitelist appear as AGCTAGCTAGCTAGCT

Barcodes that are in the whitelist and found in the fastq seem to be marked as

AGCTAGCTAGCTAGCT-1,AGCTAGCTAGCTAGCT

Barcodes that mismatch by one base at the beginning of the barcode seem to be considered still as a match and seem to be specified by:

AGCTAGCTAGCTAGCT-1,GGCTAGCTAGCTAGCT

where the barcode at the beginning is the actual barcode it maps to and the one after the -1, is the barcode read from the fastq that mismatches by one base only at the beginning.

Can someone please confirm this specification?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.