Coder Social home page Coder Social logo

gamma's Introduction

License: Apache-2.0 Build Status Go Report Card Gitter

Overview

Vearch is a cloud-native distributed vector database for efficient similarity search of embedding vectors in your AI applications.

Key features

  • Hybrid search: Both vector search and scalar filtering.

  • Performance: Fast vector retrieval - search from millions of objects in milliseconds.

  • Scalability & Reliability: Replication and elastic scaling out.

Document

Restful APIs

OpenAPIs

SDK

Usage cases

Use Vearch as a memory backend

Real world Demos

  • VisualSearch: Vearch can be leveraged to build a complete visual search system to index billions of images. The image retrieval plugin for object detection and feature extraction is also required.

Quick start

Deploy vearch cluster on k8s

Add charts through the repo

$ helm repo add vearch https://vearch.github.io/vearch-helm
$ helm repo update && helm install my-release vearch/vearch

Add charts from local

$ git clone https://github.com/vearch/vearch-helm.git && cd vearch-helm
$ helm install my-release ./charts -f ./charts/values.yaml

Start by docker-compose

$ cd cloud
$ cp ../config/config.toml .
$ docker-compose up

Deploy by docker: Quickly start with vearch docker image, please see DeployByDocker

Compile by source code: Quickly compile the source codes, please see SourceCompileDeployment

Components

Vearch Architecture

arc

Master: Responsible for schema mananagement, cluster-level metadata, and resource coordination.

Router: Provides RESTful API: upsert, delete, search and query; request routing, and result merging.

PartitionServer (PS): Hosts document partitions with raft-based replication. Gamma is the core vector search engine implemented based on faiss. It provides the ability of storing, indexing and retrieving the vectors and scalars.

Reference

Reference to cite when you use Vearch in a research paper:

@misc{li2019design,
      title={The Design and Implementation of a Real Time Visual Search System on JD E-commerce Platform},
      author={Jie Li and Haifeng Liu and Chuanghua Gui and Jianyu Chen and Zhenyun Ni and Ning Wang},
      year={2019},
      eprint={1908.07389},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

Community

You can report bugs or ask questions in the issues page of the repository.

For public discussion of Vearch or for questions, you can also send email to [email protected].

Our slack : https://vearchwrokspace.slack.com

Known Users

Welcome to register the company name in this issue: #230 (in order of registration)

Users

License

Licensed under the Apache License, Version 2.0. For detail see LICENSE and NOTICE.

gamma's People

Contributors

cuisonghui avatar gyd-a avatar haosky avatar kuailelijuan avatar ljeagle avatar maslino avatar matadorhong avatar qiutianme avatar rrjia avatar shiquan1988 avatar wxingda avatar xbugliu avatar zcdb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gamma's Issues

ARM64平台编译

我直接在ARM64平台上编译失败,想请问一下,要想成功编译的话,有哪些要修改的呢?谢谢啦

test_files.cc相关问题疑惑

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

issue也没人管啊

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

gamma编译报错fatal error: batch_result_generated.h: No such file or directory

运行build.sh编译vearch,编译到gamma时,报错gamma编译报错fatal error: batch_result_generated.h: No such file or directory。
更多错误信息如下:
[ 2%] Building C object CMakeFiles/gamma.dir/third_party/btree/threadskv10h.c.o
[ 4%] Building C object CMakeFiles/gamma.dir/third_party/btree/threadskv8.c.o
[ 6%] Building C object CMakeFiles/gamma.dir/third_party/cjson/cJSON.c.o
[ 8%] Building CXX object CMakeFiles/gamma.dir/third_party/easyloggingpp/easylogging++.cc.o
[ 10%] Building CXX object CMakeFiles/gamma.dir/util/bitmap.cc.o
[ 12%] Building CXX object CMakeFiles/gamma.dir/util/utils.cc.o
[ 14%] Building CXX object CMakeFiles/gamma.dir/search/gamma_engine.cc.o
In file included from /home/softsz/vearch/vearch/engine/search/gamma_engine.h:13:0,
from /home/softsz/vearch/vearch/engine/search/gamma_engine.cc:8:
/home/softsz/vearch/vearch/engine/c_api/api_data/gamma_batch_result.h:12:36: fatal error: batch_result_generated.h: No such file or directory
#include "batch_result_generated.h"
^
compilation terminated.
make[2]: *** [CMakeFiles/gamma.dir/search/gamma_engine.cc.o] Error 1
make[1]: *** [CMakeFiles/gamma.dir/all] Error 2
make: *** [all] Error 2

请尝试在问ithub里面搜索这个文件: batch_result_generated.h,搜不到。这问我是不是缺少某些依赖库呀,题这个问该怎样解决呢

每个raw_vector对象会有一个VIDMgr对象,请问这个对象的作用是什么?

有两个疑问:
按照代码的逻辑是:如果存在多个向量字段,字段名相同,那么这些向量对应的raw_vector对象内就需要一个VIDMgr对象;
std::vector vid2docid_; // vector id to doc id,数组下标指vectorId,对应的值指docId
std::vector<int *> docid2vid_; // doc id to vector id list,一个doc_id,可以对应多个向量????
为甚么会存在一个docid对应多个vectorId这种情况呢?

第二在Add方法中
int Add(int vid, int docid)
{
// add to vid2docid_ and docid2vid_
if (multi_vids_)
{
vid2docid_[vid] = docid;
if (docid2vid_[docid] == nullptr)
{
docid2vid_[docid] = utils::NewArray(MAX_VECTOR_NUM_PER_DOC + 1, "init_vid_list");
// total_mem_bytes += (MAX_VECTOR_NUM_PER_DOC + 1) * sizeof(int);
docid2vid_[docid][0] = 1; // 第一个位置设为1表示 这个doc下面有1一个向量
docid2vid_[docid][1] = vid; // 第二个位置才开始设置vectorId
}
else
{
int *vid_list = docid2vid_[docid];
// 一个表格最多有10个字段为向量而且还同名
if (vid_list[0] + 1 > MAX_VECTOR_NUM_PER_DOC)
{
return -1;
}
vid_list[vid_list[0]] = vid; // 这个有点不对吧?vid_list[vid_list[0]+1]=vid,在数组的下个位置记下vectorId
vid_list[0]++; // 这个doc有的向量数据+1
}
}
return 0;
}

当docid2vid_[docid] != nullptr时,vectorI的写入的位置是否有误,谢谢

flatbuffer版本升级疑问

可以直接在v3.2.7将flastbuffer升级到2.0吗,通过修改idl/build.sh将flatbuffer版本升级到2.0时,生成gamma_api编译提示异常
engine/idl/fbs-gen/go/gamma_api/FieldInfo.go:8:3: invalid import path: ""

接口缺少示例与文档

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

等了一个月也没人回

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

ids 查询会先过滤么

您好,我想用vearch来做粗排,对召回率和响应要求都比较高
请求的样例类似

curl -H "content-type: application/json" -XPOST -d'
{
  "query": {
    "sum": [
      {
        "field": "field_name",
        "feature": [
          0.1,
          0.2,
          0.3,
          0.4,
          0.5
        ],
        "min_score": 0.9,
        "boost": 0.5
      }
    ],
    "ids": [
      "id1",
      "id2","xxxx"
    ]
  }
}

ids大概是1w个,麻烦问一下,

  1. 我这样查询可以么
  2. 索引的类型是否应该选择flat呢
  3. 查询的过程是先按照id获取再计算相似度,还是先召回再过滤id呢

我看之前有一个QA写的过程好像是先做向量召回,在召回的中间进行term过滤,如果是这样的话用flat是不是就变成了遍历
QA地址
image

idl/build.sh 通过proxy下载flatbuffers报错

ERROR: cannot verify github.com's certificate, issued by ‘xxx Web Secure Internet Gateway CA’:
Unable to locally verify the issuer's authority.
To connect to github.com insecurely, use `--no-check-certificate'.
tar: v1.11.0.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

代码注释文档有点少,有问题请教

add doc时 gamma_engine分别调用profile和vec_manager的Add()方法和AddTostore()方法进行将数据进入索引,而vec_manager似乎只涉及raw_vector的add方法,并没有看到gammaIndex的相关操作,请问引擎的工作流程是怎么样同步到相应类型的gammaIndex中的?

gamma cmake err sample_bfbs.cpp.o

make 阶段报错,反查cmke报错

最后阶段的信息
[ 93%] Built target flatbuffers CMakeFiles/flatsamplebfbs.dir/build.make:163: recipe for target 'CMakeFiles/flatsamplebfbs.dir/samples/sample_bfbs.cpp.o' failed make[2]: *** [CMakeFiles/flatsamplebfbs.dir/samples/sample_bfbs.cpp.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CMakeFiles/Makefile2:214: recipe for target 'CMakeFiles/flatsamplebfbs.dir/all' failed make[1]: *** [CMakeFiles/flatsamplebfbs.dir/all] Error 2 [ 95%] Linking CXX executable flatsampletext [ 96%] Linking CXX executable flattests [ 96%] Built target flatsampletext [ 96%] Built target flattests Makefile:181: recipe for target 'all' failed make: *** [all] Error 2 -- Configuring done -- Generating done -- Build files have been written to: /data/server/go/src/github.com/vearch/vearch/engine/build

中间详细报错信息
[ 84%] Building CXX object CMakeFiles/flattests.dir/src/idl_gen_fbs.cpp.o /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_bfbs.cpp:23:25: error: ‘Sample’ is not a namespace-name using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_bfbs.cpp:23:31: error: expected namespace-name before ‘;’ token using namespace MyGame::Sample; ^ CMakeFiles/flatsamplebinary.dir/build.make:85: recipe for target 'CMakeFiles/flatsamplebinary.dir/samples/sample_binary.cpp.o' failed make[2]: *** [CMakeFiles/flatsamplebinary.dir/samples/sample_binary.cpp.o] Error 1 CMakeFiles/Makefile2:187: recipe for target 'CMakeFiles/flatsamplebinary.dir/all' failed make[1]: *** [CMakeFiles/flatsamplebinary.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... Scanning dependencies of target flatsampletext [ 85%] Building CXX object CMakeFiles/flatsampletext.dir/src/reflection.cpp.o

/data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:17: error: ‘MyGame’ has not been declared using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:25: error: ‘Sample’ is not a namespace-name using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:31: error: expected namespace-name before ‘;’ token using namespace MyGame::Sample;

编译Gamma的第三方软件scann-1.2.1时报错

按照vearch和gamma编译说明和编译脚本,当运行到用bazel编译scann-1.2.1时出现如下报错,都指向com_google_protobuf这个内容:

ERROR: /root/.cache/bazel/_bazel_root/7125299c24cb2d7bde318ecca9ed5091/external/com_google_protobuf/BUILD:979:21: in proto_lang_toolchain rule @com_google_protobuf//:cc_toolchain: '@com_google_protobuf//:cc_toolchain' does not have mandatory provider 'ProtoInfo'.
ERROR: Analysis of target '//:build_pip_pkg' failed; build aborted: Analysis of target '@com_google_protobuf//:cc_toolchain' failed
INFO: Elapsed time: 0.270s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets conf
igured)

我试了这个网文的方法(https://zhuanlan.zhihu.com/p/488199658)
#bazel版本过高的话,会有如下错误: in proto_lang_toolchain rule @com_google_protobuf//:cc_toolchain: '@com_google_protobuf//:cc_toolchain' does not have mandatory provider 'ProtoInfo'. 可以加以下参数解决: bazel build ... --incompatible_blacklisted_protos_requires_proto_info=false
还是也不能修复编译问题,是不是WORKSPACE里的 https://github.com/bazelbuild/rules_cc 的版本或者com_google_protobuf使用github的tag版本有问题,也有网文说要修改类似内容的。我试验了改了这俩个几个版本,在bazel编译时还是出现failed ERROR。

Centos 7 ( MacOS Big Sur + 2.7 GHz 双核Intel Core i5 + Parallels Desktop) + bazel 4 + gcc 9.3.1+ clang 8

  1. gcc 9 是利用了devtoolset-9安装的,环境通过source /opt/rh/devtoolset-9/enable激活
  2. clang llvm8编译安装,PATH:/usr/local/clang/bin
  3. Bazel 4 安装使用了:
    wget https://copr.fedorainfracloud.org/coprs/vbatts/bazel/repo/epel-7/vbatts-bazel-epel-7.rep --no-check-certificate
    mv vbatts-bazel-epel-7.rep /etc/yum.repos.d/
    cd /etc/yum.repos.d
    mv vbatts-bazel-epel-7.rep vbatts-bazel-epel-7.repo
    yum install -y bazel4

mmap 錯誤是什麼問題

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4bd006]

goroutine 1 [running]:
golang.org/x/exp/mmap.(*ReaderAt).ReadAt(0x0, 0xc000a8a000, 0x800, 0x800, 0x0, 0xc000a88000, 0x4, 0x8)
/root/GOPATH/src/golang.org/x/exp/mmap/mmap_unix.go:66 +0x26
main.BatchAddDocToEngine(0x186a0)
/root/GOPATH/gamma/go/examples/test.go:222 +0x675
main.Add()
/root/GOPATH/gamma/go/examples/test.go:279 +0x5dc
main.main()
/root/GOPATH/gamma/go/examples/test.go:347 +0xf2

三方库scann的升级问题

请问gamma会不会根据更新的scann版本进行官方升级?如果需要自行升级scann版本,在代码替换时有哪些注意事项?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.