View Code? Open in Web Editor NEW

This project forked from cds-ruc/gre_sali

GRE is a benchmark suite to compare learned indexes and traditional indexes.

Shell 1.36% C++ 77.20% Python 1.50% C 3.86% TeX 2.49% Makefile 0.49% CMake 1.07% M4 0.81% Roff 11.22%

gre_sali's Introduction

GRE

GRE is a benchmark suite for learned indexes and traditional indexes to measure throughput and latency with custom workload (read / write ratio) and any dataset. GRE quantifies datasets using local and global hardness, and includes a synthetic data generator to generate data with various hardness.

See details in our VLDB 2022 paper below. If you use our work, please cite:

Chaichon Wongkham, Baotong Lu, Chris Liu, Zhicong Zhong, Eric Lo, and Tianzheng Wang. Are Updatable Learned Indexes Ready?. PVLDB, 15(11): 3004 - 3017, 2022.

Requirements

gcc 8.3.0+
cmake 3.14.0+

Dependencies

intel-mkl 2018.4.274
intel-tbb 2020.3
jemalloc

Build

git submodule update --init # only for the first time
mkdir -p build
cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && make

Basic usage

To calculate throughput:

./build/microbench \
--keys_file=./data/dataset \
--keys_file_type={binary,text} \
--read=0.5 --insert=0.5 \
--operations_num=800000000 \
--table_size=-1 \
--init_table_ratio=0.5 \
--thread_num=24 \
--index=index_name \

table_size=-1 is to infer from the first line of the file. init_table_ratio is to specify the proportion of the dataset to bulkload.

For additional features, add additional flags:

Latency

--latency_sample --latency_sample_ratio=0.01

Range query (eg. range = 100)

--scan_ratio=1 --scan_num=100

To use Zipfian distribution for lookup

--sample_distribution=zipf

To perform data-shift experiment. Note that the key file needs to be generated like so (changing from one dataset to another). This flag just simply prevent the keys be shuffled and preserving the order in the key file

--data_shift

Calculate data hardness (PLA-metric) with specified model error bound of the input dataset

--dataset_statistic --error_bound=32

If the index implement memory consumption interface

--memory

All the result will be output to the csv file specified in --output_path flag.

Recommend Projects

spongeann / gre_sali Goto Github PK

gre_sali's Introduction

GRE

Requirements

Dependencies

Build

Basic usage

gre_sali's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent