alenzhao,Alen Zhao,github

Sequencing of RNA provides the possibility to study an individual's transcriptome landscape and determine allelic expression ratios. Single-molecule protocols generate multi-kilobase reads longer than most transcripts allowing sequencing of complete isoforms allowing to partition the reads by haplotype of origin. While the read length of the single-molecule protocols is long, the relatively high error rate limits the ability to estimate allele-specific expression on the gene and isoform levels. In this paper, we present Hercules, \textbf{H}aplotype-aware \textbf{er}ror \textbf{c}orrection of \textbf{L}ong single molecule reads in RNA-Seq, a comprehensive method to correct errors in long single-molecule RNA-Seq data of a diploid cell. The proposed method first partition the reads into the parental alleles and then correct the errors in each haplotype cluster. Experimental validation suggests that Hercules is able to tolerate high error rate and archive more accurate ASE estimates than previous methods. Phasing the reads according to the allele of origin allows our method to efficiently distinguish between the read errors and the true biological mutations and achieves similar accuracy of error correction as hybrid methods that uses short, high-fidelity sequences to correct long single molecule reads. Error corrected reads also allows our method to detect the SNVs in the genes with sufficient coverage. We also apply Hercules to novel clinical single-molecule RNA-Seq data to estimate ASE of genes of interest. Our method was able to correct the reads and determine point mutation of clinical significance validated by GeneDx HCM Panel.

hisat2

Graph-based alignment (Hierarchical Graph FM index)

hive

Mirror of Apache Hive

hotmaps

Detects hotspot regions for somatic mutations in 3D protein structures

hts-specs

Specifications of SAM/BAM and related high-throughput sequencing file formats

htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

htseq-1

High throughput / next generation sequencing pipeline

htseq-toolbox

Tools for short read high-throughut sequencing (a.k.a. 'next-gen sequencing')

htsjdk

A Java API for high-throughput sequencing data (HTS) formats.

htslib

C library for high-throughput sequencing data formats

hub

hub helps you win at git.

hub-1

fault tolerant, highly available service for data storage and distribution

iannotatesv

iAnnotateSV is a Python library and command-line software toolkit to annotate and visualize structural variants detected from Next Generation DNA sequencing data.

alenzhao Goto Github PK

Alen Zhao's Projects

Recommend Projects

Recommend Topics

Recommend Org