Coder Social home page Coder Social logo

codesigner / sosd Goto Github PK

View Code? Open in Web Editor NEW

This project forked from learnedsystems/sosd

0.0 1.0 0.0 3.11 MB

A Benchmark for Learned Indexes

License: GNU General Public License v3.0

CMake 0.47% C++ 47.29% Makefile 1.48% M4 0.28% HTML 44.61% CSS 0.24% Perl 0.03% JavaScript 0.43% C 0.34% Gnuplot 0.65% Shell 1.02% Awk 0.01% Python 3.16%

sosd's Introduction

   _____ ____  _____ ____ 
  / ___// __ \/ ___// __ \
  \__ \/ / / /\__ \/ / / /
 ___/ / /_/ /___/ / /_/ / 
/____/\____//____/_____/  
                          

Search on Sorted Data Benchmark

Build Status

SOSD is a benchmark to compare (learned) index structures on equality lookup performance over densely packed, sorted data. It comes with state-of-the-art baseline implementations to compare against and many datasets to compare on. Each dataset consists of 200 million to 800 million 32-bit or 64-bit unsigned integers.

Dependencies

On vanilla Ubuntu 20.04 LTS:

sudo apt -y update
sudo apt -y install zstd python3-pip m4 cmake clang libboost-all-dev
pip3 install --user numpy scipy
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Usage instructions

We provide a number of scripts to automate things. Each is located in the scripts directory, but should be executed from the repository root.

Running the benchmark

  • ./scripts/download.sh downloads and stores required data from the Internet
  • ./scripts/build_rmis.sh compiles and builds the RMIs for each dataset. If you run into the error message error: no override and no default toolchain set, try running rustup install stable.
    • ./scripts/download_rmis.sh will download pre-built RMIs instead, which may be faster. You'll need to run build_rmis.sh if you want to measure build times on your platform.
  • ./scripts/prepare.sh constructs query workloads and compiles the benchmark
  • ./scripts/execute.sh executes the benchmark on each workload, storing the results in results. You can use the -c flag to output a .csv file of results rather than a .txt.

Build times can be long, as we make aggressive use of templates to ensure we do not accidentally measure vtable lookup time. For development, this can be annoying: you can set USE_FAST_MODE in config.h to disable some features and get a faster build time.

Cite

If you use this benchmark in your own work, please cite us:

@article{sosd-vldb,
  author    = {Ryan Marcus and
               Andreas Kipf and
               Alexander van Renen and
               Mihail Stoian and
               Sanchit Misra and
               Alfons Kemper and
               Thomas Neumann and
               Tim Kraska},
  title     = {Benchmarking Learned Indexes},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {14},
  number    = {1},
  pages     = {1--13},
  year      = {2020}
}

@article{sosd-neurips,
  title={SOSD: A Benchmark for Learned Indexes},
  author={Kipf, Andreas and Marcus, Ryan and van Renen, Alexander and Stoian, Mihail and Kemper, Alfons and Kraska, Tim and Neumann, Thomas},
  journal={NeurIPS Workshop on Machine Learning for Systems},
  year={2019}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.