Coder Social home page Coder Social logo

cscg / quilt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from quiltdata/quilt

0.0 1.0 0.0 9.2 MB

Quilt versions and deploys data

Home Page: https://quiltdata.com

License: Apache License 2.0

PowerShell 0.81% Batchfile 0.38% HTML 2.55% JavaScript 33.16% Python 62.82% Shell 0.23% Mako 0.06%

quilt's Introduction

Gitter docs on_gitbook

OS CI testing on master
Linux
CircleCI branch
Windows

Docs

Visit docs.quiltdata.com. Or browse the docs on GitHub.

Manage data like code

Quilt provides versioned, reusable building blocks for analysis in the form of data packages. A data package may contain data of any type or size. In spirit, Quilt does for data what package managers and Docker registries do for code: provide a centralized, collaborative store of record.

Getting started tutorial

Benefits

  • Reproducibility - Imagine source code without versions. Ouch. Why live with un-versioned data? Versioned data makes analysis reproducible by creating unambiguous references to potentially complex data dependencies.
  • Collaboration and transparency - Data likes to be shared. Quilt offers a centralized data warehouse for finding and sharing data.
  • Auditing - the registry tracks all reads and writes so that admins know when data are accessed or changed
  • Less data prep - the registry abstracts away network, storage, and file format so that users can focus on what they wish to do with the data.
  • Deduplication - Data fragments are hashed with SHA256. Duplicate data fragments are written to disk once globally per user. As a result, large, repeated data fragments consume less disk and network bandwidth.
  • Faster analysis - Serialized data loads 5 to 20 times faster than files. Moreover, specialized storage formats like Apache Parquet minimize I/O bottlenecks so that tools like Presto DB and Hive run faster.

Commands

Here are the basic Quilt commands:

Service

Quilt is offered as a managed service at quiltdata.com.

Architecture

Quilt consists of three source-level components:

  1. A data catalog

    • Displays package meta-data in HTML
    • Implemented with JavaScript with redux, sagas
  2. A data registry

    • Controls permissions
    • Stores pacakge fragments in blob storage
    • Stores package meta-data
    • De-duplicates repeated data fragments
    • Implemented in Python with Flask and PostgreSQL
  3. A data compiler

    • Serializes tabular data to Apache Parquet
    • Transforms and parses files
    • builds packages locally
    • pushes packages to the registry
    • pulls packages from the registry
    • Implemented in Python with pandas and PyArrow

quilt's People

Contributors

dimaryaz avatar akarve avatar kevinemoore avatar meffij avatar eode avatar nl0 avatar asah avatar kurlov avatar mhassan102 avatar diwu1989 avatar rinman24 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.