vearch / vearch-python Goto Github PK

View Code? Open in Web Editor NEW

8.0 7.0 6.0 61 KB

Python sdk for vearch

License: Other

Shell 2.80% Python 79.76% SWIG 17.43%

real-time-index vector-search python-sdk local-development deep-learning

vearch-python's Introduction

简体中文 | English

Overview

Vearch is a cloud-native distributed vector database for efficient similarity search of embedding vectors in your AI applications.

Key features

Hybrid search: Both vector search and scalar filtering.
Performance: Fast vector retrieval - search from millions of objects in milliseconds.
Scalability & Reliability: Replication and elastic scaling out.

Document

Restful APIs

Tutorial | 参考文档

OpenAPIs

Tutorial

SDK

Usage cases

Use Vearch as a memory backend

Real world Demos

VisualSearch: Vearch can be leveraged to build a complete visual search system to index billions of images. The image retrieval plugin for object detection and feature extraction is also required.

Quick start

Deploy vearch cluster on k8s

Add charts through the repo

$ helm repo add vearch https://vearch.github.io/vearch-helm
$ helm repo update && helm install my-release vearch/vearch

Add charts from local

$ git clone https://github.com/vearch/vearch-helm.git && cd vearch-helm
$ helm install my-release ./charts -f ./charts/values.yaml

Start by docker-compose

standalone mode

$ cd cloud
$ cp ../config/config.toml .
$ docker-compose --profile standalone up -d

cluster mode

$ cd cloud
$ cp ../config/config_cluster.toml .
$ docker-compose --profile cluster up -d

Deploy by docker: Quickly start with vearch docker image, please see DeployByDocker

Compile by source code: Quickly compile the source codes, please see SourceCompileDeployment

Components

Vearch Architecture

Master: Responsible for schema mananagement, cluster-level metadata, and resource coordination.

Router: Provides RESTful API: upsert, delete, search and query; request routing, and result merging.

PartitionServer (PS): Hosts document partitions with raft-based replication. Gamma is the core vector search engine implemented based on faiss. It provides the ability of storing, indexing and retrieving the vectors and scalars.

Reference

Reference to cite when you use Vearch in a research paper:

@misc{li2019design,
      title={The Design and Implementation of a Real Time Visual Search System on JD E-commerce Platform},
      author={Jie Li and Haifeng Liu and Chuanghua Gui and Jianyu Chen and Zhenyun Ni and Ning Wang},
      year={2019},
      eprint={1908.07389},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

Community

You can report bugs or ask questions in the issues page of the repository.

For public discussion of Vearch or for questions, you can also send email to [email protected].

Our slack : https://vearchwrokspace.slack.com

Known Users

Welcome to register the company name in this issue: #230 (in order of registration)

License

Licensed under the Apache License, Version 2.0. For detail see LICENSE and NOTICE.

vearch-python's People

Contributors

Stargazers

Watchers

Forkers

blake86 borissimkin ownbylichaobao gyd-a frederic89

vearch-python's Issues

Dumping and loading : no documents

I am loading some random vectors to engine, then I do some dummy searches, which work fine.

however when I try to dump and load the vector, the engine says it has zero documents, and querying it returns zero results.

genereating the dump:

def generate_vector(n=520):
    features = np.random.rand(n,dimension).astype('float32')
    doc_items = []
    print(n)
    for i in range(n):
        profiles = {}
        profiles["id"] = i
        profiles["embedding"] = features[i,:].tolist()
        doc_items.append(profiles)
    return doc_items

engine = vearch.Engine("dummy_data", max_doc_size)
engine.init_log_dir("dummy_logs")
table = {
    "name": "test_table",
    "index_size":10000,
    "model": {
        "name": "IVFPQ",
        "nprobe": -1,
        "metric_type": "L2",
        "ncentroids": -1,
        "nsubvector": -1
    },
    "properties": {
        "id": {
            "type": "integer",
            "index": "true"
        },
        "embedding": {
            "index": "true",
            "type": "vector",
            "dimension": dimension,
            "store_type": "Mmap",
            "store_param": {"cache_size": 2000}
        },
    },
}
engine.create_table(table)
doc_items = generate_vector(n=9984)
engine.build_index()
engine.dump()

loading:

engine2 = vearch.Engine("dummy_data", max_doc_size)
engine2.init_log_dir("dummy_logs")
engine2.load()
total_num = engine2.get_doc_num()
print("total docs")
# gets 0
print(total_num)
# also searching any vectors returns empty results.

从pypi上下载的whl包无法安装

pip install vearch-0.3.1.6-cp36-cp36m-manylinux2010_x86_64.whl
系统是Ubuntu 16.04 python版本3.6.5
提示错误
vearch-0.3.1.6-cp36-cp36m-manylinux2010_x86_64.whl is not a supported wheel on this platform.

Mac pip install vearch 安装不了

Mac 上安装不了：

No matching distribution found for vearch

How to set nprobe in query?

I can only set nprobe when creating table. How can I set it during search?

flatbuffers版本有差异么

下载了python3.2.5 的demo去运行报错，发现最终是flatbuffers中 builder.EndVector方法传参报错，我这边是用的flatbuffers==2.0

import vearch 报错

安装vearch后，import报错
操作系统 centOS7.8
python版本 3.7.0和3.8.6
请问大概是什么原因呢

single query模式下，执行报错

调用脚本：

具体报错：

这个究竟是什么原因呢？

查询返回值中的score值不在0-1区间

通过pythonSDK测试，表空间索引类型为
"engine" : {
"index_size": 100000,
"retrieval_type": "FLAT"
}
插入数据后执行查询（指定查询向量，过滤条件，暴力搜索，返回条数）
query = {}
query["vector"] = [{"field": "feature", "feature": features}]
query["filter"] = [{"term": {"row_info": ['20212', '20213'], "operator": "or"}}]
query["topn"] = 4
query["is_brute_search"] = 1
query["retrieval_param"]= {"metric_type": "InnerProduct"}
查询结果的score不在0-1区间，还有负值。
想咨询一下，score不是代表的余弦相似度吗