Coder Social home page Coder Social logo

starling's Introduction

Starling

In this repository, we share the implementations and experiments of our work Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment (arXiv).

It contains the following features:

For build,

  1. Build disk graph.
  2. Build in-memory navigation graph, based on
    1. Nodes that are uniformly-sampled.
    2. Nodes that are generated by search frequency.
  3. Perform Graph Partition on given base data

For search,

With Cache Nodes With Nav Graph With Graph Partition With use_ratio Use SQ
Beam Search
Page Search

Datasets

The datasets we used in the experiments can be downloaded and the data formats are explained in NeurIPS'21 Big-ANN Benchmark.

Dataset Data type Dimensions Distance # Query Query type
BIGANN uint8 128 L2 10000 ANNS/RS
DEEP float 96 L2 10000 ANNS/RS
SSNPP uint8 256 L2 100000 RS
Text2image float 200 IP 100000 ANNS

Quick Start

To install dependencies, run

apt install build-essential libboost-all-dev make cmake g++ libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev libmkl-full-dev

To run benchmarks, go to scripts directory, copy config_sample.sh to config_local.sh, modifies the datasets paths in config_dataset.sh and run

./run_benchmark.sh [debug/release] [build/build_mem/freq/gp/search] [knn/range]
Arguement Description
debug/release Debug/Release mode to run, passed to CMake
build Build index
build_mem Build memory index
freq Generate visit-frequency file
gp Graph partition given index file
search Search index
knn Find k-nearest neighbors
range Range search

Configure datasets and parameters in config_local.sh

To run experiments with multiple segments and large-scale, please visist https://github.com/PwzXxm/segment-framework.

starling's People

Contributors

harsha-simhadri avatar pwzxxm avatar songlinlife avatar microsoftopensource avatar daxpryce avatar theantony avatar dengcai78 avatar jigaoluo avatar philipbadams avatar shikharj avatar microsoft-github-operations[bot] avatar rakri avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.