Coder Social home page Coder Social logo

scrattch.io's Introduction

scrattch.io: scrattch File Input/Output Handling

scrattching a tome

master: Build Status
dev: Build Status

Installation

scrattch.io requires the rhdf5 package from BioConductor, which can be installed with:

source("https://bioconductor.org/biocLite.R")
biocLite("rhdf5")

Once rhdf5 is in place, scrattch.io can be installed from github:

devtools::install_github("AllenInstitute/scrattch.io")

If you'd like to use the developer branch where we're testing out new code, it can be installed using:

devtools::install_github("AllenInstitute/scrattch.io", ref = "dev")

.tome files

A major component of scrattch.io is a set of helpful functions for writing and reading .tome files, which are an HDF5-based format for transcriptomics in an open, modular, extensible format.

Why another HDF5 format for transcriptomics?

Existing formats for transcriptomics are either designed for fast computation, like .loom, or a small storage footprint, like the .h5 files generated by 10X Genomics' cellRanger. The goal of .tome is to combine compact storage with reasonably fast random access of both genes and samples.

This is accomplished by storing the main data matrix in a sparse format, based on dgCMatrix from the R Matrix package, stored in both orientations. This structure is also chunked and compressed to speed access and reduce file size. The compression level can be changed depending on how quickly you need to read your data (see ?write_tome_data for details).

The practical upshot of this strategy is that .tome files are ~1/10th the size of .loom files for storage of data from 10X genomics experiments, while providing a way to read gene or sample data for display quickly.

Many additional metadata can be stored in .tome files as well, from sample annotations to precomputed statistics.

The .tome cheatsheets on Google Docs is a helpful reference for where scrattch.io stores these within the HDF5 file structure, and which functions can be used to read and write these objects.

.tome is intended to be extensible. Want to store something that isn't already provided? Check out the Generic functions section of the .tome cheatsheet, to add your own data however it makes sense to you.

.loom files

scrattch.io also includes simple functions for reading matrices, annotations, and projections from .loom files with read_loom_dgCMatrix(), read_loom_anno(), and read_loom_projections(), respectively.

You can find out more about the .loom format, developed by the Linnarsson lab, here: loompy.org

A more complete implementation of the .loom format in R is available from the Satija lab's loomR package on Github here: mojaveazure/loomR

10X Genomics files

scrattch.io includes the ability to read the data matrix from the .h5 files that are output by CellRanger in HDF5 Gene-Barcode Matrix Format with read_10x_dgCMatrix().

.h5ad files

scrattch.io also supports reading the main data matrix from .h5ad files that are generated by tools like Scanpy with read_h5ad_dgCMatrix().

The scrattch suite

scrattch.io is one component of the scrattch suite of packages for Single Cell RNA-seq Analysis for Transcriptomic Type CHaracterization from the Allen Institute.

License

The license for this package is available on Github at: https://github.com/AllenInstitute/scrattch.io/blob/master/LICENSE

Level of Support

We are planning on occasional updating this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests.

Contribution Agreement

If you contribute code to this repository through pull requests or other mechanisms, you are subject to the Allen Institute Contribution Agreement, which is available in full at: https://github.com/AllenInstitute/scrattch.io/blob/master/CONTRIBUTION

scrattch.io's People

Contributors

hypercompetent avatar jeremymiller avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.