Coder Social home page Coder Social logo

ssg's Introduction

SSG : Satellite System Graph For Approximate Nearest Neighbor Search

Cooperation

If you are interested in the research line of nearest neighbor search or application and looking for a cooperation please feel free to contect me via [email protected]

Introduction

======

SSG is a graph-based approximate nearest neighbor search (ANNS) algorithm. It provides a flexible and efficient solution for the metric-free large-scale ANNS on dense real vectors. It implements the algorithm of our paper: Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search

Benchmark datasets

Data set Download dimension nb base vectors nb query vectors original website
SIFT1M original website 128 1,000,000 10,000 original website
GIST1M original website 128 1,000,000 1,000 original website
Crawl crawl.tar.gz (1.7GB) 300 1,989,995 10,000 original website
GloVe-100 glove-100.tar.gz (424MB) 100 1,183,514 10,000 original website
Deep100M deep100m.tar.gz* (34GB) 96 100,000,000 10,000 original website
  • For Deep100M we will provide the download link upon request

ANNS performance

The performance was tested without parallelism. Among all the graph-based algorithms, NSG and SSG has the smallest index size.

Performance

Compared Algorithms:

Tree-based

Quantization-based

Graph-based

Please see our NSG paper for the performance of other graph-based algorithms - FANNG.

How to use

Compile

  • Prerequisite : openmp, cmake, boost
  • Compile:
    1. Go to the root directory of faiss, it's under the directory of extern_libraries aside of ours.
    2. Execute the following commands:
$ cd /path/to/project
$ mkdir -p build && cd build
$ cmake .. && make -j

Building SSG Index

The main interfaces and classes have its respective test codes under directory tests/.

Please follow the instructions below to build the SSG index.

a) Build a kNN graph

Firstly, we need to prepare a kNN graph.

We suggest you use our efanna_graph to build this kNN graph. But you can also use any alternatives you like, such as KGraph or faiss.

b) Convert the kNN graph to an SSG

For example:

$ cd /path/to/project/build/tests/
$ ./test_ssg_index data_path knn_graph_path L R Angle ssg_path
  • data_path is the path of the origin data.
  • knn_graph_path is the path of the pre-built kNN graph.
  • L controls the quality of the NSG, the larger the better, L > R.
  • R controls the index size of the graph, the best R is related to the intrinsic dimension of the dataset.
  • Angle controls the angle between two edges.
  • ssg_path is the path where the result SSG stored.

Approximate Nearest Neighbor Search using SSG

For example:

$ cd /path/to/project/build/tests/
$ ./test_ssg_optimized_search data_path query_path ssg_path search_L search_K result_path [random_seed]
  • data_path is the path of the origin data.
  • query_path is the path of the query data.
  • ssg_path is the path of the pre-built SSG.
  • search_L controls the quality of the search results, the larger the better but slower (must larger than search_K).
  • search_K controls the number of neighbors we want to find.
  • result_path is the path of the result neighbors.
  • random_seed (optional) is the random seed.

NOTE: For now, we only provide interface for search for only one query at a time, and test the performance with single thread.

NOTE: Data alignment is essential for the correctness of our procedure, because we use SIMD instructions for acceleration of numerical computing such as AVX and SSE2. You should use it to ensure your data elements (feature) is aligned with 8 or 16 int or float. For example, if your features are of dimension 70, then it should be extend to dimension 72. And the last 2 dimension should be filled with 0 to ensure the correctness of the distance computing. And this is what data_align() does.

NOTE: Only data-type int32 and float32 are supported for now.

Python API

Install

$ cd /path/to/project/
$ python setup.py install

Usage

NOTE: currently Python API only supports search method.

import numpy as np
import pyssg

data = np.fromfile("/path/to/sift_base.fvecs", dtype=np.float32)
dim = data[0].view(np.int32)
data = data.reshape(-1, dim + 1)
data = np.ascontiguousarray(data[:, 1:])
ndata, dim = data.shape

pyssg.set_seed(1234)
index = pyssg.IndexSSG(dim, ndata)
index.load("/path/to/ssg", data)

k, l = 100, 300
query = np.random.randn(dim).astype(np.float32)
knn = index.search(query, k, l)
print(knn)

Please refer to tests/test_python_query.py for a real-world example.

Parameters used in Our Paper

SSG Building

We use the following parameters to get the SSG index in Fig. 6 of our paper.

We use efanna_graph to build the kNN graph

Step 1. Build kNN Graph

Dataset K L iter S R
SIFT1M 200 200 12 10 100
GIST1M 400 400 12 15 100
Crawl 400 420 12 15 100
GloVe-100 400 420 12 20 200
  • Commands:
$ efanna_graph/tests/test_nndescent sift.fvecs sift_200nn.knng 200 200 12 10 100
$ efanna_graph/tests/test_nndescent gist.fvecs gist_400nn.knng 400 400 12 15 100
$ efanna_graph/tests/test_nndescent crawl.fvecs crawl_400nn.knng 400 420 12 15 100
$ efanna_graph/tests/test_nndescent glove-100.fvecs glove-100_400nn.knng 400 420 12 15 200

Step 2. Convert kNN Graph to SSG

  • Parameters:
Dataset L R Angle
SIFT1M 100 50 60
GIST1M 500 70 60
Crawl 500 40 60
GloVe-100 500 50 60
  • Commands:
$ ./test_ssg_index sift.fvecs sift_200nn.knng 100 50 60 sift.ssg
$ ./test_ssg_index gist.fvecs gist_400nn.knng 500 70 60 gist.ssg
$ ./test_ssg_index crawl.fvecs crawl_400nn.knng 500 40 60 crawl.ssg
$ ./test_ssg_index glove-100.fvecs glove-100_400nn.knng 500 50 60 glove-100.ssg

SSG Search

  • search_L: range from search_K to 2000
  • random_seed: 161803398

Pre-built kNN Graph and NSG Index

Here we provide our pre-built kNN graphs and SSG index files used in our papar's experiments.

Dataset kNN Graph SSG Index
SIFT1M sift_200nn.knng sift.ssg
GIST1M gist_400nn.knng gist.ssg
Crawl crawl_400nn.knng crawl.ssg
GloVe-100 glove_400nn.knng glove.ssg

ssg's People

Contributors

comzyh avatar delightrun avatar dengcai78 avatar fc731097343 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssg's Issues

Building SSG index for single core CPU

Hello, I'm interested in your work and I'm trying to reproduce your result.

I want to reproduce the result for DEEP100M dataset on single core.
However, test_nn_descent is not working because of segmentation fault.
Can you let me know how did you build kNN graphs for this dataset using Faiss as mentioned in your paper?
Also, can you share both kNN graph and ssg index?

Thanks

请问在sift1M上跑出来的结果,SSG效果没有NSG好,为什么跟论文不一样呢,自己很疑惑,

我是在sift1M上测试的,看论文,两个方法应该效果差不多的,但是我自己没测出来:(
1)NSSG的构建时间是32.9728+35.99(s);最关注的搜索时间是search_L=100, k=10,时间是17.3324s,而NSG可以3.29968s跑出来,显然是NSG的QPS高了;
2)在我设置多个参数,绘制 QPS vs Recall@10 曲线时,SSG也确实没有outperformNSG;
请问您能给我一些可能的原因嘛~

SSG on Deep100m dataset

Hi, I want to run SSG on Deep100m dataset.
Would you kindly send me the link to the dataset, please?
Also, can you share Faiss, efanna refine, SSG/NSG config for Deep100m?
Thanks,

Can we convert the ssg graph index into general graph format?

Hi,

I am interested in your work and want to use ssg graph to my own implemented search algorithm.

I want to know this graph is stored? What is it data layout or looks like?

Can I convert it into a general graph format and dense matrix/ adjacent list/ CSR something like that?

请教为什么ssg这么计算夹角?

请教为什么ssg这么计算夹角?我理解计算cone圆锥的夹角不是这么计算吧。

float djk = distance_->compare(data_ + dimension_ * (size_t)result[t].id,
data_ + dimension_ * (size_t)p.id,
(unsigned)dimension_);
float cos_ij = (p.distance + result[t].distance - djk) / 2 /
sqrt(p.distance * result[t].distance);
希望能得到指教。

当我尝试运行图索引构建程序时,出现以下错误

root@f04dcf1c3705:/opt/SSG/build/tests# ./test_ssg_index /opt/SSG/data/datasets/sift/sift_base.fvecs /opt/SSG/data/datasets/knn/sift_200nn.knng 200 200 12 10 100
Using Seed 100
Data Path: /opt/SSG/data/datasets/sift/sift_base.fvecs
L = 200, R = 200, Angle = 12
KNNG = /opt/SSG/data/datasets/knn/sift_200nn.knng
Output SSG Path: 10
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted

请问这个问题您是如何解决的
感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.