Coder Social home page Coder Social logo

queries's Introduction

Contents

This repo contains pre-generated query stream files for a few different scale factors. The convention for TPC-DS is "sfXXXX" where XXXX is the size of the raw data set in gigabytes (GB).

Sacle Factor Raw Dataset Size (approximate) Directory Name
sf100 100 GB sf100_queries
sf1000 1 TB (1,000 GB) sf1000_queries
sf3000 3 TB (3,000 GB) sf3000_queries
sf10000 10 TB (10,000 GB) sf10000_queries

For each scale, there are four query stream files, named query_N.sql.

Details about the official TPD-DS benchmark can be found at https://www.tpc.org/tpcds/default5.asp. The queries are in a pre-determined (pseudo random) order per the TPC-DS spec.

In general, any results obtained using the query stream files contained in this repos should follow the guidelines from TPC.org for the TPC DS benchmark which are spelled out explicitly in the license agreement for the benchmark.

NNVIDIA Spark-RAPIDS - NVIDIA Decsion Support (NDS) Benchmark

The query stream files in this repo were generated using the nds_gen_query_stream.py utility which can be found in the NVIDIA/spark_rapids_benchmarks github repo.

nds_gen_query_stream.py is essentially a wrapper for the offiical TPC-DS query generator (dsqgen).

The syntax for the command line is documented here.

usage: nds_gen_query_stream.py [-h] (--template TEMPLATE | --streams STREAMS)
                               template_dir scale output_dir

positional arguments:
  template_dir         directory to find query templates and dialect file.
  scale                assume a database of this scale factor.
  output_dir           generate query in directory.

optional arguments:
  -h, --help           show this help message and exit
  --template TEMPLATE  build queries from this template. Only used to generate one query from one tempalte. This argument is mutually exclusive with --streams. It
                       is often used for test purpose.
  --streams STREAMS    generate how many query streams. This argument is mutually exclusive with --template.
  --rngseed RNGSEED    seed the random generation seed.

queries's People

Contributors

badscooter23 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.