Coder Social home page Coder Social logo

brucemen711 / verdict Goto Github PK

View Code? Open in Web Editor NEW

This project forked from verdict-project/verdict

0.0 1.0 0.0 379 KB

Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing

Home Page: http://verdictdb.org

License: Apache License 2.0

Shell 0.48% Python 91.69% Dockerfile 0.74% Makefile 0.04% Jupyter Notebook 7.05%

verdict's Introduction

We are making lots of changes right now.

Gitter

  1. Project website: https://verdictdb.org
  2. Documentation: https://verdict.readthedocs.org

Instant Analytics with Exponential Speedups

Verdict brings interactive-speed, resource-efficient data analytics, with the following key features:

  1. 200x faster by sacrificing only 1% accuracy Verdict can give you 99% accurate answers for your big data queries in a fraction of the time needed for calculating exact answers. If your data is too big to analyze in a couple of seconds, you will like Verdict.
  2. No change to your database Verdict is a middleware standing between your application and your backend engine. You can just issue the same queries as before and get precise estimates computed instantly.
  3. Runs on (almost) any database Verdict is designed to run on any database that supports standard SQL. Right now, we support Presto and will soon add other open source engines.
  4. Ease of use Verdict is a light-weight client-side library: no servers, no port configurations, no extra user authentication, etc., beyond what you already have.

Installation

Launch verdict in a single line (with Presto for its backend engine).

curl -s https://raw.githubusercontent.com/verdictproject/verdict/master/docker-compose-64gb.yaml \
    | docker-compose -f - up

Simple Example

Once the docker containers run, start the Python shell as follows:

# bash
docker exec -it docker-verdict python

# Python shell
import verdict
v = verdict.presto(presto_host='presto')     # connects to Presto via Verdict
v.sql("bypass show catalogs")

Originally, a query is quite slow

v.sql('bypass select count(*) from tpch.sf1.orders')
# Returning an answer in 8.863600015640259 sec(s). 
#      _col0
# 0  1500000

The bypass keyword makes the query processed directly by the backend engine.

Let verdict do some one-time operations for the table

v.create_sample('tpch.sf1.orders')

Now the query runs faster

The same count query:

v.sql('select count(*) from tpch.sf1.orders')
# Returning an answer in 0.17403197288513184 sec(s). 
#         c1
# 0  1503884

Another query:

v.sql('select orderpriority, count(*) from tpch.sf1.orders group by orderpriority')
# Returning an answer in 0.14169764518737793 sec(s). 
#      orderpriority      c1
# 0         1-URGENT  300784
# 1           2-HIGH  301540
# 2         3-MEDIUM  298872
# 3  4-NOT SPECIFIED  302060
# 4            5-LOW  300628

You can issue more complex queries including joins or subqueries. See our documentation for more examples.

Note: The above latency comparisons are for quick demo of Verdict and are not meant to be scientific.

How Verdict is Platform-Independent

  1. Verdict rewrites your queries to use special types of samples (instead of original tables).
  2. The rewritten queries are processed by the backend engine in the regular way.
  3. Given the answers from the engine, Verdict composes statistically unbiased estimates (for your final answers), which are returned.

Even the samples are stored in your engines/stores (database, S3, and so on).

More information

verdict's People

Contributors

pyongjoo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.