Coder Social home page Coder Social logo

spear's Introduction

Overview

Build Status codecov.io

Codecov.io

This project is a sandbox and playground of mine for experimenting ideas and potential improvements to Spark SQL. It consists of:

  • A parser that parses a small SQL dialect into unresolved logical plans
  • A semantic analyzer that resolves unresolved logical plans into resolved ones
  • A query optimizer that optimizes resolved query plans into equivalent but more performant ones
  • A query planner that turns (optimized) logical plans into executable physical plans

Currently Spear only works with local Scala collections.

Build

Building Spear is as easy as:

$ ./build/sbt package

Run the REPL

Spear has an Ammonite-based REPL for interactive experiments. To start it:

$ ./build/sbt spear-repl/run

Let's create a simple DataFrame of numbers:

@ context range 10 show ()
╒══╕
│id│
├──┤
│ 0│
│ 1│
│ 2│
│ 3│
│ 4│
│ 5│
│ 6│
│ 7│
│ 8│
│ 9│
╘══╛

A sample query using the DataFrame API:

@ context.
    range(10).
    select('id as 'key, (rand(42) * 100) cast IntType as 'value).
    where('value % 2 === 0).
    orderBy('value.desc).
    show()
╒═══╤═════╕
│key│value│
├───┼─────┤
│  5│   90│
│  9│   78│
│  0│   72│
│  1│   68│
│  4│   66│
│  8│   46│
│  6│   36│
│  2│   30│
╘═══╧═════╛

Equivalent sample query using SQL:

@ context range 10 asTable 't // Registers a temporary table first

@ context.sql(
    """SELECT * FROM (
      |  SELECT id AS key, CAST(RAND(42) * 100 AS INT) AS value FROM t
      |) s
      |WHERE value % 2 = 0
      |ORDER BY value DESC
      |""".stripMargin
  ).show()
╒═══╤═════╕
│key│value│
├───┼─────┤
│  5│   90│
│  9│   78│
│  0│   72│
│  1│   68│
│  4│   66│
│  8│   46│
│  6│   36│
│  2│   30│
╘═══╧═════╛

We can also check the query plan using explain():

@ context.
    range(10).
    select('id as 'key, (rand(42) * 100) cast IntType as 'value).
    where('value % 2 === 0).
    orderBy('value.desc).
    explain(true)
# Logical plan
Sort: order=[$0] ⇒ [?output?]
│ ╰╴$0: `value` DESC NULLS FIRST
╰╴Filter: condition=$0 ⇒ [?output?]
  │ ╰╴$0: ((`value` % 2:INT) = 0:INT)
  ╰╴Project: projectList=[$0, $1] ⇒ [?output?]
    │ ├╴$0: (`id` AS `key`#11)
    │ ╰╴$1: (CAST((RAND(42:INT) * 100:INT) AS INT) AS `value`#12)
    ╰╴LocalRelation: data=<local-data> ⇒ [`id`#10:BIGINT!]

# Analyzed plan
Sort: order=[$0] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
│ ╰╴$0: `value`#12:INT! DESC NULLS FIRST
╰╴Filter: condition=$0 ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
  │ ╰╴$0: ((`value`#12:INT! % 2:INT) = 0:INT)
  ╰╴Project: projectList=[$0, $1] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
    │ ├╴$0: (`id`#10:BIGINT! AS `key`#11)
    │ ╰╴$1: (CAST((RAND(CAST(42:INT AS BIGINT)) * CAST(100:INT AS DOUBLE)) AS INT) AS `value`#12)
    ╰╴LocalRelation: data=<local-data> ⇒ [`id`#10:BIGINT!]

# Optimized plan
Sort: order=[$0] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
│ ╰╴$0: `value`#12:INT! DESC NULLS FIRST
╰╴Filter: condition=$0 ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
  │ ╰╴$0: ((`value`#12:INT! % 2:INT) = 0:INT)
  ╰╴Project: projectList=[$0, $1] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
    │ ├╴$0: (`id`#10:BIGINT! AS `key`#11)
    │ ╰╴$1: (CAST((RAND(42:BIGINT) * 100.0:DOUBLE) AS INT) AS `value`#12)
    ╰╴LocalRelation: data=<local-data> ⇒ [`id`#10:BIGINT!]

# Physical plan
Sort: order=[$0] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
│ ╰╴$0: `value`#12:INT! DESC NULLS FIRST
╰╴Filter: condition=$0 ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
  │ ╰╴$0: ((`value`#12:INT! % 2:INT) = 0:INT)
  ╰╴Project: projectList=[$0, $1] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]
    │ ├╴$0: (`id`#10:BIGINT! AS `key`#11)
    │ ╰╴$1: (CAST((RAND(42:BIGINT) * 100.0:DOUBLE) AS INT) AS `value`#12)
    ╰╴LocalRelation: data=<local-data> ⇒ [`id`#10:BIGINT!]

spear's People

Contributors

liancheng avatar cloud-fan avatar zzl0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.