Coder Social home page Coder Social logo

dbstage's Introduction

DBStage โ€“ Flexible, Staged Query Compilation Playground

stability-experimental

Introduction

This repository contains a proof of concept for a configurable language-integrated runtime query compiler based on staging. The implementation relies on the Squid type-safe metaprogramming framework for Scala, which makes its code manipulation and generation capabilities fairly robust.

The main features are:

  • An expressive SQL embedded DSL (still currently missing many features), with type-safe integration to normal Scala programs (LinQ-style);

  • A backend implemented using powerful abstractions and Scala modular programming, which allows great configurability at no runtime cost: experiment and combine different ways to store relation tables (column store, row store, hash map), different ways to index them, different ways to query them (push, pull), etc.

Step by step

1. define the database relations

case object Person extends Relation {
  val Id = Column[Int]("Id", primary = true)
  val Name = Column[String]("Name")
  val Age = Column[Int]("Age")
  val Sex = Column[Sex]("Sex")
}

2.a. register queries to be executed later, using a SQL-like DSL

  import Person._
  val q0 = from(Person) where ir"$Age > 18" where ir"$Sex == Male" select (Name,Age)

(Of course, one can write where ir"$Age > 18 && $Sex == Male" equivalently.)

Note that column types are checked at compile-time, but column reference consistency and ambiguities are checked at query construction time (runtime). For example if I had written select (Name,Age,Salary) it would have complained at runtime that there are no such Salary column available. (It would be easy to have a compile-time linter written in Squid to catch these errors earlier.)

2.b. load the data from the file system

  Person.loadDataFromFile("data/persons.csv", compileCode = true)

This compiles a program on-the-fly to efficiently load the data given the relation schema.

2.c. on-the-fly compile and execute queries

  q0.plan.foreach { case (name, age) => assert(age > 18); println(s"$name $age") }

Notice that the types for name and age are correctly inferred as String and Int, respectively.

Importantly, steps 2.a, 2.b and 2.c can be done in any order and can be interleaved.

Another example: all pairs of people of the same age but opposite sex:

  val m = from(Person)
  val f = from(Person)
  val q = ((m where ir"$Sex == Male") join (f where ir"$Sex == Female"))(ir"${m.Age} == ${f.Age}")
    .select (m.Age, m.Name, f.Name, m.Id, f.Id)
  q.printLines

Which prints the following:

Age(0) Name(0) Name(1) Id(0) Id(1)
41 bob parker julia kenn 1 6
... ... ... ... ...

The currently supported functionalities are:

  • Selection, projection, filtering, (hash) joins
  • Option to load data in a hashmap where the keys are the primary keys of the relation; this structure is then used to perform faster joins
  • Option to store data in column store, on a per-relation basis (if the above is not applied on the given relation)
  • User-defined functions and data types
  • Pushing and pulling are both supported
  • The type-safe DSL means one can integrate queries inside general purpose program, using DBStage as a simple Scala library
  • Engine is agnostic in the underlying data structures and row representation; tables currently use tuples and Scala ArrayBuffer/HashMap's, but we could easily experiment with off-heap memory to avoid boxing, for example.

What I'd like to have in the future:

  • Aggregations, grouping, sorting
  • Customize the storage of data optimizing for registered queries (possibly adapt it dynamically as more queries are registered)
  • Option to instrument the data loading code to add more error recovery and/or add data analytics guiding subsequent query compilation
  • Extend the SQL subset with updates, perhaps transactions
  • Handling of data on disk, and associated cache management?

dbstage's People

Contributors

lptk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

chan150

dbstage's Issues

Unable to connect to Orient DB

We tried to connect to Orient DB 3.0.21 and its respective driver via DBStage application, but it always throw an error that it is unable to connect. We tried the same JAR file using Java class and were able to connect.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.