Coder Social home page Coder Social logo

jmc734 / hail Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hail-is/hail

0.0 2.0 0.0 13.34 MB

Scalable genomic data analysis.

Home Page: https://hail.is

License: MIT License

Shell 0.05% Python 24.11% Makefile 0.47% Batchfile 0.32% Jupyter Notebook 4.21% C++ 1.49% Java 0.42% Scala 67.10% R 0.07% CSS 1.18% JavaScript 0.37% XSLT 0.14% HTML 0.07%

hail's Introduction

Hail

Gitter CI Status

Hail is an open-source, scalable framework for exploring and analyzing genomic data. Starting from genetic data in VCF, BGEN or PLINK format, Hail can, for example:

  • load variant and sample annotations from text tables, JSON, VCF, VEP, and locus interval files
  • generate variant annotations like call rate, Hardy-Weinberg equilibrium p-value, and population-specific allele count
  • generate sample annotations like mean depth, imputed sex, and TiTv ratio
  • generate new annotations from existing ones as well as genotypes, and use these to filter samples, variants, and genotypes
  • find Mendelian violations in trios, prune variants in linkage disequilibrium, analyze genetic similarity between samples via the GRM and IBD matrix, and compute sample scores and variant loadings using PCA
  • perform variant, gene-burden and eQTL association analyses using linear, logistic, and linear mixed regression, and estimate heritability

This functionality and more is exposed through Python and backed by distributed algorithms built on top of Apache Spark to efficiently analyze gigabyte-scale data on a laptop or terabyte-scale data on a cluster, without the need to manually chop up data or manage job failures. Users can script pipelines or explore data interactively through Jupyter notebooks that flow between Hail with methods for genomics, PySpark with scalable SQL and machine learning algorithms, and pandas with scikit-learn and Matplotlib for results that fit on one machine. Hail also provides a flexible domain language to express complex quality control and analysis pipelines with concise, readable code.

The Hail project began in Fall 2015 to empower the worldwide genetics community to harness the flood of genomes to discover the biology of human disease. Hail has been used for dozens of major studies and is the core analysis platform of large-scale genomics efforts such as gnomAD.

Want to get involved in open-source development of methods or infrastructure? Check out the Github repo, chat with us in the Gitter dev room, and view our talks at Spark Summit East and Spark Summit West (below). Or come join us full-time!

Hail talk at Spark Summit West 2017

Getting Started

To get started using Hail on your data or public data:

We encourage use of the Discussion Forum for user and dev support, feature requests, and sharing your Hail-powered science. Follow Hail on Twitter @hailgenetics. Please report any suspected bugs to github issues.

Hail Team

The Hail team is embedded in the Neale lab at the Stanley Center for Psychiatric Research of the Broad Institute of MIT and Harvard and the Analytic and Translational Genetics Unit of Massachusetts General Hospital.

Contact the Hail team at [email protected].

Citing Hail

If you use Hail for published work, please cite the software:

and either the forthcoming manuscript describing Hail (if possible):

  • Cotton Seed, Alex Bloemendal, Jonathan M Bloom, Jacqueline I Goldstein, Daniel King, Timothy Poterba, Benjamin M. Neale. Hail: An Open-Source Framework for Scalable Genetic Data Analysis. In preparation.

or the following paper which includes a brief introduction to Hail in the online methods:

  • Andrea Ganna, Giulio Genovese, et al. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nature Neuroscience

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.