Coder Social home page Coder Social logo

scuel's Introduction

 _____                 _      
/ ____|               | |     
|(___   ___ _   _  ___| |     
\___ \ / __| | | |/ _ \ |     
 ____) | (__| |_| |  __/ |____
|_____/ \___|\__,_|\___|______|

Generating (TPCH) SQL using CUE

This project is an experimentation in using CUE as a way to generate customized queries for TPCH. For anyone familiar with TPCH and the rest of the TPC work, will know that the specification defines a number of queries that can be parameterized under certain constraints.

The combination of a very powerful templating engine borrowed from Golang, together with how good CUE is in expressing constraints, makes it a very pleasant experience to approach the generation of these queries as a configuration problem managed by Cuelang.

To use scuel, you have to create configuration files that define the parameters of your queries. All the defaults defined by TPCH have been used, so it's super easy to create all the queries with default parameters. To do that, you only have to execute: cue cmd tpch_gen on your terminal.

When done, a number of 22 .sql files will be created where each one contains one of the TPCH queries.

The beauty of CUE is that there's no way you can end up with queries that have parameters that are not accepted by the specification. If the parameters you pass violate the constraints of the specification, you will get an error explaining why.

Going through the cue files is also a good way to get familiar with the constraints and the schema of the TPCH specification, that's another benefit of working with the clear syntax an semantics of CUE.

If you plan to create your custom configurations, you can edit the scuel.cue file and update the database definition to add database related information, e.g. schema and database name.

Then, for each one of the queries you can create a parameters object where you can set the values you want. The tpch_model.cue file is your friend here, it contains all the schema definitions for each query together with references to the specification document to find more information if you want.

Future work

  • Being able to generalize to other models and queries in an easy way would be a great next step for ScueL.
  • Being able to test the queries against a DB would also be extremely useful but this will be hard to implement with just CUE and its toolset. It is possible to run commands with CUE but interpreting outputs from a sql CLI tool is not trivial.
  • Can we use CUE to generate the complete workload for TPCH? Including the data for the database? This is more complex than templating queries, but it would be useful. The current tooling for TPCH is hard to use.

scuel's People

Contributors

cpard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

scuel's Issues

a few indentation outliers

I noticed that q1 and q22 have different indentation patterns that the others

thoughts on tabs versus spaces? (mine is typically tabs so the user can decide width / indentation scale)

You can use unexpand -t4 on linux for the other files which use 4 spaces

Need to `mkdir output` on fresh clone

$ cue cmd tpch_gen
task failed: open output/tpch_q22.sql: no such file or directory

$ mkdir output
$ cue cmd tpch_gen

You could add a mkdir as a task before the file writes to fix this

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.