vearch / gamma Goto Github PK
View Code? Open in Web Editor NEWReal time vector search engine
License: Other
Real time vector search engine
License: Other
有两个疑问:
按照代码的逻辑是:如果存在多个向量字段,字段名相同,那么这些向量对应的raw_vector对象内就需要一个VIDMgr对象;
std::vector vid2docid_; // vector id to doc id,数组下标指vectorId,对应的值指docId
std::vector<int *> docid2vid_; // doc id to vector id list,一个doc_id,可以对应多个向量????
为甚么会存在一个docid对应多个vectorId这种情况呢?
第二在Add方法中
int Add(int vid, int docid)
{
// add to vid2docid_ and docid2vid_
if (multi_vids_)
{
vid2docid_[vid] = docid;
if (docid2vid_[docid] == nullptr)
{
docid2vid_[docid] = utils::NewArray(MAX_VECTOR_NUM_PER_DOC + 1, "init_vid_list");
// total_mem_bytes += (MAX_VECTOR_NUM_PER_DOC + 1) * sizeof(int);
docid2vid_[docid][0] = 1; // 第一个位置设为1表示 这个doc下面有1一个向量
docid2vid_[docid][1] = vid; // 第二个位置才开始设置vectorId
}
else
{
int *vid_list = docid2vid_[docid];
// 一个表格最多有10个字段为向量而且还同名
if (vid_list[0] + 1 > MAX_VECTOR_NUM_PER_DOC)
{
return -1;
}
vid_list[vid_list[0]] = vid; // 这个有点不对吧?vid_list[vid_list[0]+1]=vid,在数组的下个位置记下vectorId
vid_list[0]++; // 这个doc有的向量数据+1
}
}
return 0;
}
当docid2vid_[docid] != nullptr时,vectorI的写入的位置是否有误,谢谢
1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果
add doc时 gamma_engine分别调用profile和vec_manager的Add()方法和AddTostore()方法进行将数据进入索引,而vec_manager似乎只涉及raw_vector的add方法,并没有看到gammaIndex的相关操作,请问引擎的工作流程是怎么样同步到相应类型的gammaIndex中的?
可以直接在v3.2.7将flastbuffer升级到2.0吗,通过修改idl/build.sh将flatbuffer版本升级到2.0时,生成gamma_api编译提示异常
engine/idl/fbs-gen/go/gamma_api/FieldInfo.go:8:3: invalid import path: ""
1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果
希望能提供gamma的整体设计文档~
https://github.com/vearch/gamma/blob/master/tests/test_files.cc#L310
测试文件一直走这个分支,是否是请求参数错误还是其他原因
按照vearch和gamma编译说明和编译脚本,当运行到用bazel编译scann-1.2.1时出现如下报错,都指向com_google_protobuf这个内容:
ERROR: /root/.cache/bazel/_bazel_root/7125299c24cb2d7bde318ecca9ed5091/external/com_google_protobuf/BUILD:979:21: in proto_lang_toolchain rule @com_google_protobuf//:cc_toolchain: '@com_google_protobuf//:cc_toolchain' does not have mandatory provider 'ProtoInfo'.
ERROR: Analysis of target '//:build_pip_pkg' failed; build aborted: Analysis of target '@com_google_protobuf//:cc_toolchain' failed
INFO: Elapsed time: 0.270s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets conf
igured)
我试了这个网文的方法(https://zhuanlan.zhihu.com/p/488199658)
#bazel版本过高的话,会有如下错误: in proto_lang_toolchain rule @com_google_protobuf//:cc_toolchain: '@com_google_protobuf//:cc_toolchain' does not have mandatory provider 'ProtoInfo'. 可以加以下参数解决: bazel build ... --incompatible_blacklisted_protos_requires_proto_info=false
还是也不能修复编译问题,是不是WORKSPACE里的 https://github.com/bazelbuild/rules_cc 的版本或者com_google_protobuf使用github的tag版本有问题,也有网文说要修改类似内容的。我试验了改了这俩个几个版本,在bazel编译时还是出现failed ERROR。
Centos 7 ( MacOS Big Sur + 2.7 GHz 双核Intel Core i5 + Parallels Desktop) + bazel 4 + gcc 9.3.1+ clang 8
请问gamma会不会根据更新的scann版本进行官方升级?如果需要自行升级scann版本,在代码替换时有哪些注意事项?
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4bd006]
goroutine 1 [running]:
golang.org/x/exp/mmap.(*ReaderAt).ReadAt(0x0, 0xc000a8a000, 0x800, 0x800, 0x0, 0xc000a88000, 0x4, 0x8)
/root/GOPATH/src/golang.org/x/exp/mmap/mmap_unix.go:66 +0x26
main.BatchAddDocToEngine(0x186a0)
/root/GOPATH/gamma/go/examples/test.go:222 +0x675
main.Add()
/root/GOPATH/gamma/go/examples/test.go:279 +0x5dc
main.main()
/root/GOPATH/gamma/go/examples/test.go:347 +0xf2
make 阶段报错,反查cmke报错
最后阶段的信息
[ 93%] Built target flatbuffers CMakeFiles/flatsamplebfbs.dir/build.make:163: recipe for target 'CMakeFiles/flatsamplebfbs.dir/samples/sample_bfbs.cpp.o' failed make[2]: *** [CMakeFiles/flatsamplebfbs.dir/samples/sample_bfbs.cpp.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CMakeFiles/Makefile2:214: recipe for target 'CMakeFiles/flatsamplebfbs.dir/all' failed make[1]: *** [CMakeFiles/flatsamplebfbs.dir/all] Error 2 [ 95%] Linking CXX executable flatsampletext [ 96%] Linking CXX executable flattests [ 96%] Built target flatsampletext [ 96%] Built target flattests Makefile:181: recipe for target 'all' failed make: *** [all] Error 2 -- Configuring done -- Generating done -- Build files have been written to: /data/server/go/src/github.com/vearch/vearch/engine/build
中间详细报错信息
[ 84%] Building CXX object CMakeFiles/flattests.dir/src/idl_gen_fbs.cpp.o /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_bfbs.cpp:23:25: error: ‘Sample’ is not a namespace-name using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_bfbs.cpp:23:31: error: expected namespace-name before ‘;’ token using namespace MyGame::Sample; ^ CMakeFiles/flatsamplebinary.dir/build.make:85: recipe for target 'CMakeFiles/flatsamplebinary.dir/samples/sample_binary.cpp.o' failed make[2]: *** [CMakeFiles/flatsamplebinary.dir/samples/sample_binary.cpp.o] Error 1 CMakeFiles/Makefile2:187: recipe for target 'CMakeFiles/flatsamplebinary.dir/all' failed make[1]: *** [CMakeFiles/flatsamplebinary.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... Scanning dependencies of target flatsampletext [ 85%] Building CXX object CMakeFiles/flatsampletext.dir/src/reflection.cpp.o
/data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:17: error: ‘MyGame’ has not been declared using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:25: error: ‘Sample’ is not a namespace-name using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:31: error: expected namespace-name before ‘;’ token using namespace MyGame::Sample;
1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果
ERROR: cannot verify github.com's certificate, issued by ‘xxx Web Secure Internet Gateway CA’:
Unable to locally verify the issuer's authority.
To connect to github.com insecurely, use `--no-check-certificate'.
tar: v1.11.0.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
这两类中的代码量较大,内部有很多内部类,例如QueryTables,KnnSearchResults,IVFPQScannerT,GammaInvertedListScanner,GammaIVFPQScanner,GammaIVFFlatScanner,有点搞不清楚这些类之间是如何协作实现索引的训练,增删改查持久化的,请问可以大致介绍下设计思路和技术路线么?
您好,我想用vearch来做粗排,对召回率和响应要求都比较高
请求的样例类似
curl -H "content-type: application/json" -XPOST -d'
{
"query": {
"sum": [
{
"field": "field_name",
"feature": [
0.1,
0.2,
0.3,
0.4,
0.5
],
"min_score": 0.9,
"boost": 0.5
}
],
"ids": [
"id1",
"id2","xxxx"
]
}
}
ids大概是1w个,麻烦问一下,
我看之前有一个QA写的过程好像是先做向量召回,在召回的中间进行term过滤,如果是这样的话用flat是不是就变成了遍历
QA地址
1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果
我直接在ARM64平台上编译失败,想请问一下,要想成功编译的话,有哪些要修改的呢?谢谢啦
运行build.sh编译vearch,编译到gamma时,报错gamma编译报错fatal error: batch_result_generated.h: No such file or directory。
更多错误信息如下:
[ 2%] Building C object CMakeFiles/gamma.dir/third_party/btree/threadskv10h.c.o
[ 4%] Building C object CMakeFiles/gamma.dir/third_party/btree/threadskv8.c.o
[ 6%] Building C object CMakeFiles/gamma.dir/third_party/cjson/cJSON.c.o
[ 8%] Building CXX object CMakeFiles/gamma.dir/third_party/easyloggingpp/easylogging++.cc.o
[ 10%] Building CXX object CMakeFiles/gamma.dir/util/bitmap.cc.o
[ 12%] Building CXX object CMakeFiles/gamma.dir/util/utils.cc.o
[ 14%] Building CXX object CMakeFiles/gamma.dir/search/gamma_engine.cc.o
In file included from /home/softsz/vearch/vearch/engine/search/gamma_engine.h:13:0,
from /home/softsz/vearch/vearch/engine/search/gamma_engine.cc:8:
/home/softsz/vearch/vearch/engine/c_api/api_data/gamma_batch_result.h:12:36: fatal error: batch_result_generated.h: No such file or directory
#include "batch_result_generated.h"
^
compilation terminated.
make[2]: *** [CMakeFiles/gamma.dir/search/gamma_engine.cc.o] Error 1
make[1]: *** [CMakeFiles/gamma.dir/all] Error 2
make: *** [all] Error 2
请尝试在问ithub里面搜索这个文件: batch_result_generated.h,搜不到。这问我是不是缺少某些依赖库呀,题这个问该怎样解决呢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.