Developed by BioTuring, hera is a bioinformatic tool that helps analyze RNA-seq data. With a single command line, hera provides:

Base-to-base alignment BAM file
Transcript abundance estimation
Fusion gene detection with fused sequence assemblies

Each process in hera was carefully organized and optimized in order to maximize the performance in term of time and accuracy.

Example data

We designed a test using 20 datasets from Synapse Dream Challenge SMC-RNA, each of which contains 60 million read pairs. The test was done on a 32-core machine running Ubuntu 14.04. The result is shown in the table below:

	Transcriptome	Transcriptome + Genome
Alignment
Mapped read	93.3860%	93.3871%
Memory	8GB	30GB
Abundance estimating in compare with truth
Spearman	0.9033	0.9057
Pearson	0.9951	0.9951
Gene fusion in compare with truth
True positive	0.6960
False negative	0.304
False positive	0.0595

Core algorithm

Alignment

In hera, alignment starts with a hash-based approach that is applied on all the reads to anchor gene fragments. Then, Needleman–Wunsch algorithm is used to fill in the gaps between these anchored seeds. With this approach, the mapping time is reduced without hurting the precision. An additional conversion of transcriptome takes place to generate genome coordinates from the original transcriptome coordinates. This step provides a much better accuracy for splicing detection than mapping data onto a reference genome.

In another hand, hera is still able to perform the common genome mapping due to the incompletion of available transcriptomics. Any reads that cannot be mapped properly on the transcriptome will be remapped to the genome later. The procedure for this case is the same as the transcriptome mapping except the hash-based method is replaced with the Burrow-Wheeler Transform algorithm.

Abundance estimation

Expectation–maximization algorithm is optimized with the SQUAREM procedure (Varadhan, R. & Roland, C. Scand. J. Stat. 35, 335–353 (2008)).

Fusion detection

In order to detect fusions, hera keeps track of abnormally mapped reads. Based on their potential fusion site, these reads are divided into several groups, each of which is assembled into a super contig. These contigs will be mapped back onto the reference genome and thereby reveal their fusion gene pairs.

Build requirements:

GNU GCC C Compiler
CMake (http://www.cmake.org/) version 3.1.0 or newer

Install:

1. git clone https://github.com/kspham/hera.git
2. cd hera/
3. chmod +x build.sh
4. ./build.sh

Usage:

INDEX:

./hera/build/hera_build
        --fasta genome_sequence.fa (text file only)
        --gtf annotation_file.gtf
        --outdir path/to/output_directory
[OPTIONAL]
        --full_index

By default, hera need ~8GB for transcriptome indexing only. Full genome indexing needs ~30GB. You also can download indexed human genome file here: GRCh37.75, GRCh38.82

RUN:

./hera/build/hera quant -i path/to/index_directory [OPTIONAL] read1.fastq read2.fastq

[OPTIONAL]:
  -o [output directory] (default: ./)
  -t [number of running threads] (default: 1)
  -z [level of bam file compression (1 - 9)] (default: -1)
  -b [Number of boostrap] (default: 1)
  -w [Output bam file 0: true, 1: false] (defaut: 0)
  -f [Genome fasta file]

Index directory: Directory contain index file from previous index step
Genome fasta file: If not defined, genome mapping will be ignore. Mapping on transcriptome need ~8BG, but mapping with genome need ~30GB.
Output file include:

abundance.tsv : Transcripts abundance estimation (tsv file)
abundance.h5 : Transcripts abundance estimation and boostrapping result (hdf5 file)
fusion.bedpe : Fusion detection result
transcript.bam : Alignment result

Third-party

hera includes some third-patry software:

hdf5 [https://support.hdfgroup.org/HDF5/]
htslib [http://www.htslib.org/]
jemalloc [http://jemalloc.net/]
libdivsufsort [https://github.com/y-256/libdivsufsort]
zlib [https://zlib.net/]

Contacts

Please report any issues directly to the github issue tracker. Also, you can send your feedback to [email protected]

License

MIT license

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

kspham / hera-1 Goto Github PK

hera-1's Introduction

Example data

Core algorithm

Alignment

Abundance estimation

Fusion detection

Build requirements:

Install:

Usage:

INDEX:

RUN:

Third-party

Contacts

License

hera-1's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent