Coder Social home page Coder Social logo

gamma's Issues

每个raw_vector对象会有一个VIDMgr对象,请问这个对象的作用是什么?

有两个疑问:
按照代码的逻辑是:如果存在多个向量字段,字段名相同,那么这些向量对应的raw_vector对象内就需要一个VIDMgr对象;
std::vector vid2docid_; // vector id to doc id,数组下标指vectorId,对应的值指docId
std::vector<int *> docid2vid_; // doc id to vector id list,一个doc_id,可以对应多个向量????
为甚么会存在一个docid对应多个vectorId这种情况呢?

第二在Add方法中
int Add(int vid, int docid)
{
// add to vid2docid_ and docid2vid_
if (multi_vids_)
{
vid2docid_[vid] = docid;
if (docid2vid_[docid] == nullptr)
{
docid2vid_[docid] = utils::NewArray(MAX_VECTOR_NUM_PER_DOC + 1, "init_vid_list");
// total_mem_bytes += (MAX_VECTOR_NUM_PER_DOC + 1) * sizeof(int);
docid2vid_[docid][0] = 1; // 第一个位置设为1表示 这个doc下面有1一个向量
docid2vid_[docid][1] = vid; // 第二个位置才开始设置vectorId
}
else
{
int *vid_list = docid2vid_[docid];
// 一个表格最多有10个字段为向量而且还同名
if (vid_list[0] + 1 > MAX_VECTOR_NUM_PER_DOC)
{
return -1;
}
vid_list[vid_list[0]] = vid; // 这个有点不对吧?vid_list[vid_list[0]+1]=vid,在数组的下个位置记下vectorId
vid_list[0]++; // 这个doc有的向量数据+1
}
}
return 0;
}

当docid2vid_[docid] != nullptr时,vectorI的写入的位置是否有误,谢谢

等了一个月也没人回

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

代码注释文档有点少,有问题请教

add doc时 gamma_engine分别调用profile和vec_manager的Add()方法和AddTostore()方法进行将数据进入索引,而vec_manager似乎只涉及raw_vector的add方法,并没有看到gammaIndex的相关操作,请问引擎的工作流程是怎么样同步到相应类型的gammaIndex中的?

flatbuffer版本升级疑问

可以直接在v3.2.7将flastbuffer升级到2.0吗,通过修改idl/build.sh将flatbuffer版本升级到2.0时,生成gamma_api编译提示异常
engine/idl/fbs-gen/go/gamma_api/FieldInfo.go:8:3: invalid import path: ""

接口缺少示例与文档

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

编译Gamma的第三方软件scann-1.2.1时报错

按照vearch和gamma编译说明和编译脚本,当运行到用bazel编译scann-1.2.1时出现如下报错,都指向com_google_protobuf这个内容:

ERROR: /root/.cache/bazel/_bazel_root/7125299c24cb2d7bde318ecca9ed5091/external/com_google_protobuf/BUILD:979:21: in proto_lang_toolchain rule @com_google_protobuf//:cc_toolchain: '@com_google_protobuf//:cc_toolchain' does not have mandatory provider 'ProtoInfo'.
ERROR: Analysis of target '//:build_pip_pkg' failed; build aborted: Analysis of target '@com_google_protobuf//:cc_toolchain' failed
INFO: Elapsed time: 0.270s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets conf
igured)

我试了这个网文的方法(https://zhuanlan.zhihu.com/p/488199658)
#bazel版本过高的话,会有如下错误: in proto_lang_toolchain rule @com_google_protobuf//:cc_toolchain: '@com_google_protobuf//:cc_toolchain' does not have mandatory provider 'ProtoInfo'. 可以加以下参数解决: bazel build ... --incompatible_blacklisted_protos_requires_proto_info=false
还是也不能修复编译问题,是不是WORKSPACE里的 https://github.com/bazelbuild/rules_cc 的版本或者com_google_protobuf使用github的tag版本有问题,也有网文说要修改类似内容的。我试验了改了这俩个几个版本,在bazel编译时还是出现failed ERROR。

Centos 7 ( MacOS Big Sur + 2.7 GHz 双核Intel Core i5 + Parallels Desktop) + bazel 4 + gcc 9.3.1+ clang 8

  1. gcc 9 是利用了devtoolset-9安装的,环境通过source /opt/rh/devtoolset-9/enable激活
  2. clang llvm8编译安装,PATH:/usr/local/clang/bin
  3. Bazel 4 安装使用了:
    wget https://copr.fedorainfracloud.org/coprs/vbatts/bazel/repo/epel-7/vbatts-bazel-epel-7.rep --no-check-certificate
    mv vbatts-bazel-epel-7.rep /etc/yum.repos.d/
    cd /etc/yum.repos.d
    mv vbatts-bazel-epel-7.rep vbatts-bazel-epel-7.repo
    yum install -y bazel4

三方库scann的升级问题

请问gamma会不会根据更新的scann版本进行官方升级?如果需要自行升级scann版本,在代码替换时有哪些注意事项?

mmap 錯誤是什麼問題

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4bd006]

goroutine 1 [running]:
golang.org/x/exp/mmap.(*ReaderAt).ReadAt(0x0, 0xc000a8a000, 0x800, 0x800, 0x0, 0xc000a88000, 0x4, 0x8)
/root/GOPATH/src/golang.org/x/exp/mmap/mmap_unix.go:66 +0x26
main.BatchAddDocToEngine(0x186a0)
/root/GOPATH/gamma/go/examples/test.go:222 +0x675
main.Add()
/root/GOPATH/gamma/go/examples/test.go:279 +0x5dc
main.main()
/root/GOPATH/gamma/go/examples/test.go:347 +0xf2

gamma cmake err sample_bfbs.cpp.o

make 阶段报错,反查cmke报错

最后阶段的信息
[ 93%] Built target flatbuffers CMakeFiles/flatsamplebfbs.dir/build.make:163: recipe for target 'CMakeFiles/flatsamplebfbs.dir/samples/sample_bfbs.cpp.o' failed make[2]: *** [CMakeFiles/flatsamplebfbs.dir/samples/sample_bfbs.cpp.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CMakeFiles/Makefile2:214: recipe for target 'CMakeFiles/flatsamplebfbs.dir/all' failed make[1]: *** [CMakeFiles/flatsamplebfbs.dir/all] Error 2 [ 95%] Linking CXX executable flatsampletext [ 96%] Linking CXX executable flattests [ 96%] Built target flatsampletext [ 96%] Built target flattests Makefile:181: recipe for target 'all' failed make: *** [all] Error 2 -- Configuring done -- Generating done -- Build files have been written to: /data/server/go/src/github.com/vearch/vearch/engine/build

中间详细报错信息
[ 84%] Building CXX object CMakeFiles/flattests.dir/src/idl_gen_fbs.cpp.o /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_bfbs.cpp:23:25: error: ‘Sample’ is not a namespace-name using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_bfbs.cpp:23:31: error: expected namespace-name before ‘;’ token using namespace MyGame::Sample; ^ CMakeFiles/flatsamplebinary.dir/build.make:85: recipe for target 'CMakeFiles/flatsamplebinary.dir/samples/sample_binary.cpp.o' failed make[2]: *** [CMakeFiles/flatsamplebinary.dir/samples/sample_binary.cpp.o] Error 1 CMakeFiles/Makefile2:187: recipe for target 'CMakeFiles/flatsamplebinary.dir/all' failed make[1]: *** [CMakeFiles/flatsamplebinary.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... Scanning dependencies of target flatsampletext [ 85%] Building CXX object CMakeFiles/flatsampletext.dir/src/reflection.cpp.o

/data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:17: error: ‘MyGame’ has not been declared using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:25: error: ‘Sample’ is not a namespace-name using namespace MyGame::Sample; ^ /data/server/go/src/github.com/vearch/vearch/engine/third_party/flatbuffers-1.11.0/samples/sample_binary.cpp:19:31: error: expected namespace-name before ‘;’ token using namespace MyGame::Sample;

test_files.cc相关问题疑惑

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

idl/build.sh 通过proxy下载flatbuffers报错

ERROR: cannot verify github.com's certificate, issued by ‘xxx Web Secure Internet Gateway CA’:
Unable to locally verify the issuer's authority.
To connect to github.com insecurely, use `--no-check-certificate'.
tar: v1.11.0.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

ids 查询会先过滤么

您好,我想用vearch来做粗排,对召回率和响应要求都比较高
请求的样例类似

curl -H "content-type: application/json" -XPOST -d'
{
  "query": {
    "sum": [
      {
        "field": "field_name",
        "feature": [
          0.1,
          0.2,
          0.3,
          0.4,
          0.5
        ],
        "min_score": 0.9,
        "boost": 0.5
      }
    ],
    "ids": [
      "id1",
      "id2","xxxx"
    ]
  }
}

ids大概是1w个,麻烦问一下,

  1. 我这样查询可以么
  2. 索引的类型是否应该选择flat呢
  3. 查询的过程是先按照id获取再计算相似度,还是先召回再过滤id呢

我看之前有一个QA写的过程好像是先做向量召回,在召回的中间进行term过滤,如果是这样的话用flat是不是就变成了遍历
QA地址
image

issue也没人管啊

1、c++接口中只有createTable,没有删除表的操作,请问应该如何删除表格?
2、GetDocByID,GetDocByDocID这两个接口中ID是一个string,但是实际内部是一个int,docID和id有什么区别呢?而且以id=1进行调用时
string docid = "1";
GetDocByID(opt.engine, docid.c_str(), docid.size(), &doc_str, &doc_str_len);
却没有查询到任何doc
3、当索引类型设置为倒排乘积量化,把index_size设为1000时,当表中完全写入10000条数据后,进行查询时每次都是返回-2错误码,显示索引没有训练,当强制搜索时,也经常搜索不到任何结果

ARM64平台编译

我直接在ARM64平台上编译失败,想请问一下,要想成功编译的话,有哪些要修改的呢?谢谢啦

gamma编译报错fatal error: batch_result_generated.h: No such file or directory

运行build.sh编译vearch,编译到gamma时,报错gamma编译报错fatal error: batch_result_generated.h: No such file or directory。
更多错误信息如下:
[ 2%] Building C object CMakeFiles/gamma.dir/third_party/btree/threadskv10h.c.o
[ 4%] Building C object CMakeFiles/gamma.dir/third_party/btree/threadskv8.c.o
[ 6%] Building C object CMakeFiles/gamma.dir/third_party/cjson/cJSON.c.o
[ 8%] Building CXX object CMakeFiles/gamma.dir/third_party/easyloggingpp/easylogging++.cc.o
[ 10%] Building CXX object CMakeFiles/gamma.dir/util/bitmap.cc.o
[ 12%] Building CXX object CMakeFiles/gamma.dir/util/utils.cc.o
[ 14%] Building CXX object CMakeFiles/gamma.dir/search/gamma_engine.cc.o
In file included from /home/softsz/vearch/vearch/engine/search/gamma_engine.h:13:0,
from /home/softsz/vearch/vearch/engine/search/gamma_engine.cc:8:
/home/softsz/vearch/vearch/engine/c_api/api_data/gamma_batch_result.h:12:36: fatal error: batch_result_generated.h: No such file or directory
#include "batch_result_generated.h"
^
compilation terminated.
make[2]: *** [CMakeFiles/gamma.dir/search/gamma_engine.cc.o] Error 1
make[1]: *** [CMakeFiles/gamma.dir/all] Error 2
make: *** [all] Error 2

请尝试在问ithub里面搜索这个文件: batch_result_generated.h,搜不到。这问我是不是缺少某些依赖库呀,题这个问该怎样解决呢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.