Coder Social home page Coder Social logo

milvus-tools's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

milvus-tools's Issues

Migrating data from MIlvus1.0 to Milvus2.0

Hi ,
I am trying to migrate the data from milvus1.0 to milvus2.0, but it gives an error Error with: name 'delids' is not defined
and i don't have idea abut delids.

image
pls help.

Milvus DM - save the auto id generation

Hello guys, here is my question:

Why If I migrate collection from Milvus to HDF5 and then from HDF5 to another Milvus - the milvus id auto generation breaks down. So If I want to insert the new vector into migrated collection of another Milvus it produces the following error mesage:

Status(code=12, message='Entities IDs are user-defined. Please provide IDs for all entities of the collection.')                                                                                      
[]  

I want to save the auto id generation on migrated collection. Maybe I am doing something wrong?

milvusdm error

when I execute milvusdm --yaml M2M.yaml i encounter an error.
2021-04-15 19:50:23,301 | ERROR | milvus_to_milvus.py | transform_milvus_data | 44 | Error with: cannot reshape array of size 350208 into shape (171,64)2021-04-15 19:50:23,301 | ERROR |
My milvus version is 1.0.0.

Error with: local variable 'total_vectors' referenced before assignment

使用milvusdm迁移collection
源和目标节点版本均为0.10.3(源为单机版、目标为集群版)

报错信息:Error with: local variable 'total_vectors' referenced before assignment

M2M配置信息如下:

M2M:
  # The dest-milvus version.
  milvus_version: 0.10.3
  # Working directory of the source Milvus.
  source_milvus_path: '/data0/milvus'
  mysql_parameter:
    host: '172.18.248.189'
    user: 'root'
    port: 3306
    password: '123456'
    database: 'milvus'
  source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection.
    tidea_is_sample:
      - ''
  dest_host: '172.18.151.165'
  dest_port: 19531
  mode: 'skip' # 'skip/append/overwrite'

错误日志:

2021-11-05 21:18:58,140 | DEBUG | read_milvus_meta.py | connect_mysql | 20 | Successfully connect mysql
2021-11-05 21:18:58,142 | INFO | milvus_to_milvus.py | transform_milvus_data | 38 | Ready to transform all data of collection: tidea_is_sample/partitions: ['']
2021-11-05 21:18:58,143 | DEBUG | read_milvus_meta.py | get_collection_info | 72 | Get collection info(dimension, index_file_size, metric_type, version):((512, 1073741824, 1, '0.10.3'),)
2021-11-05 21:18:58,147 | DEBUG | read_milvus_data.py | read_milvus_file | 89 | Reading milvus/db data from collection: tidea_is_sample/partition:
2021-11-05 21:18:58,148 | DEBUG | read_milvus_meta.py | get_collection_dim_type | 96 | Get meta data about dimension and types: ((512, 1),)
2021-11-05 21:18:58,148 | DEBUG | read_milvus_meta.py | get_collection_segments_rows | 109 | Get meta data about segment and rows: ()
2021-11-05 21:18:58,149 | ERROR | milvus_to_milvus.py | transform_milvus_data | 44 | Error with: local variable 'total_vectors' referenced before assignment

Can't find the installed app

my codes:

export MILVUSDM_PATH='/home/${MY_USER_NAME}/milvusdm'
export LOGS_NUM=0
pip3 install pymilvusdm

and then:

pymilvusdm
pymilvusdm: command not found

Anything wrong?

milvusdm can't work

OS: CentOS7.4,
Milvus old version: 0.10.3
Milvus new version: 1.1.1
We migrate to version1.1.1 from version0.10.3 with milvusdm:
yaml file:
M2M: milvus_version: 1.1.1 source_milvus_path: '/data0/milvus' mysql_parameter: source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection intelligence_picture_v1: dest_host: '10.11.205.18' dest_port: 19530 mode: 'skip' # 'skip/append/overwrite'

ERROR as flowing:
2022-05-06 16:53:26,929 | ERROR | grpc_handler.py | handler | 72 |
Addr [10.11.205.18:19530] fake_register_link
RPC error: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = ""
debug_error_string = "{"created":"@1651827206.928677178","description":"Error received from peer ipv4:10.11.205.18:19530","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"","grpc_status":12}"

    {'API start': '2022-05-06 16:53:26.927801', 'RPC start': '2022-05-06 16:53:26.928089', 'RPC error': '2022-05-06 16:53:26.928964'}

Support for milvus>=2.2

It seems that the hdf5 to milvus does not support custom schema. I have 3 columns "embedding", "id", "other".
It seems that the DM tool only imports the hardcoded group "embeddings", and "ids".

Error when migrating data

使用milvusdm迁移collection
目标:1.x版本
数据源:0.10.x版本

使用三个版本的milvusdm均遇到了异常,详情如下:

0.1版
2021-08-23 16:30:23,042 | ERROR | milvus_to_milvus.py | transform_milvus_data | 44 | Error with: cannot reshape array of size 357564416 into shape (43648,256)

1.0版
2021-08-23 16:20:14,277 | ERROR | milvus_client.py | insert | 98 | The amount of data inserted each time cannot exceed 256 MB
0%| | 0/1 [00:09<?, ?it/s]

2.0版
2021-08-23 16:16:19,198 | ERROR | grpc_handler.py | handler | 71 |
Addr [xx.xx.xx.xx:19530](隐去ip地址) fake_register_link
RPC error: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = ""
debug_error_string = "{"created":"@1629706579.197927267","description":"Error received from peer ipv4:xx.xx.xx.xx:19530","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"","grpc_status":12}"

F2M does not follow yaml to execute

Does not follow the yaml file collection_parameter: dimension: 256, while the Faiss file has dimension 128 . After creating the collectiton, the dimension is 128 instead of 256 as set in the yaml file

Error with: name 'delids' is not defined

  • Encountered an issue when trying to migrate another dataset using DM tool 2.0 (New issues need to be resolved)
  • 2022-09-20 05:23:11,401 | ERROR | milvus_to_milvus.py | transform_milvus_data | 47 | Error with: name 'delids' is not defined

Data export from Milvus 2.x standalone

Hi,

I've tried to export Milvus data to HDF5 when running Milvus in a standalone setup using dockerized milvus, etcd and minio.

The yaml configuration looks like:

M2H:
  milvus_version: 2.0.0
  source_milvus_path: '<directory-where-milvus-volume-is-mapped>'
  mysql_parameter:
  source_collection:
    <my-collection-name>:
      - '_default'

  data_dir: '<data-directory-where-to-export>'

However this fails with:
ERROR | read_milvus_meta.py | connect_sqlite | 31 | SQLite ERROR: connect failed with unable to open database file

So I wonder if it is possible to use the tool from the standalone run Milvus?

Thanks!

The type of go_benchmark response is not json: panic: nprobe not valid

How to pass search param of inv_flat index type exactly with your benchmark code, assume that I have successfully create INV_FLAT index on the dataset. Passing

search_parameters = {
        "anns_field": anns_field,
        "metric_type": metric_type,
        "param": {
            "nprobe": 32,
        },
        "limit": topk,
        "expression": expression,
    }

gives error:

Traceback (most recent call last):
  File "go_benchmark.py", line 167, in <module>
    go_search(go_benchmark=go_benchmark, uri=uri, user=user, password=password, collection_name=collection_name,
  File "go_benchmark.py", line 113, in go_search
    raise ValueError(msg)
ValueError: The type of go_benchmark response is not json: panic: nprobe not valid

Appreciate you helps. Thanks.

Error using milvusdm

when I execute milvusdm --yaml M2H.yaml
I got:

2021-04-23 14:28:21,740 | INFO | milvus_to_hdf5.py | read_milvus_data | 50 | Ready to read all data of collection: ann_1m_sq8/partitions: [None] 0%| | 0/1 [00:00<?, ?it/s] 2021-04-23 14:28:21,908 | ERROR | milvus_to_hdf5.py | read_milvus_data | 56 | Error with: cannot reshape array of size 307200000 into shape (600000,16)

Is there any wrong with the data volume?

Milvus data when migration not correct at sqlite

1. Error:

  • Collection name, partition, index, none of them are migrate to the new host
  • Old Host:

image

  • New Host: After migrations:

image

2. How to reproduce:

  • Run pymilvusdm==1.0 and pymilvus==1.0.1, migrate data of milvus server 1.0.0
  • Run both on M2M.yaml, H2M.yaml and M2H.yaml
  • Data migrate success but missing improtant data
  • i have tested on both append and overwrite
  • data for test
from milvus import Milvus, IndexType, MetricType, Status
milvus = Milvus(host='milvusv2.local', port='19530')
param = {'collection_name':'test01', 'dimension':256, 'index_file_size':1024, 'metric_type':MetricType.L2}
milvus.create_collection(param)
milvus.create_partition('test01', 'tag01')
import random
vectors = [[random.random() for _ in range(256)] for _ in range(20)]
vector_ids = [id for id in range(20)]
milvus.insert(collection_name='test01', records=vectors, ids=vector_ids)
milvus.insert('test01', vectors, partition_tag="tag01")
ivf_param = {'nlist': 16384}
milvus.create_index('test01', IndexType.IVF_FLAT, ivf_param)
  • Docker-compose

services:
  milvus:
    image: 'milvusdb/milvus:1.0.0-cpu-d030521-1ea92e'
    hostname: milvus.local
    networks:
      binhbtn:
        ipv4_address: 172.23.0.3
    volumes:
      - /tmp/db:/var/lib/milvus/db
      - /tmp/logs:/var/lib/milvus/logs
      - /tmp/wal:/var/lib/milvus/wal
  milvusv2:
    image: 'milvusdb/milvus:1.0.0-cpu-d030521-1ea92e'
    hostname: milvusv2.local
    networks:
      binhbtn:
        ipv4_address: 172.23.0.4
    volumes:
      - /tmp/2/db:/var/lib/milvus/db
      - /tmp/2/logs:/var/lib/milvus/logs
      - /tmp/2/wal:/var/lib/milvus/wal
  python37:
    image: 'python:3.7.13'
    tty: true
    networks:
      binhbtn:
        ipv4_address: 172.23.0.5
    volumes:
      - /tmp/2/db:/var/lib/milvus/db
      - /tmp/2/logs:/var/lib/milvus/logs
      - /tmp/2/wal:/var/lib/milvus/wal
      - /tmp/db:/var/lib/milvus/dest/db
      - /tmp/logs:/var/lib/milvus/dest/logs
      - /tmp/wal:/var/lib/milvus/dest/wal
    depends_on:
      - milvus
      - milvusv2

networks:
  binhbtn:
    driver: bridge
    ipam:
     config:
       - subnet: 172.23.0.0/16

3. Config

  • M2M.yaml
  M2M:  milvus_version: 1.0.0
  source_milvus_path: '/var/lib/milvus'
  mysql_parameter:
  source_collection:
    test01:
  dest_host: 'milvus.local'
  dest_port: 19530
  mode: 'overwrite' 
  • H2M.yaml
H2M:
  milvus_version: 1.x
  data_path:
  data_dir: '/var/lib/milvus/backup'
  dest_host: '172.23.0.3'
  dest_port: 19530
  mode: 'overwrite'
  dest_collection_name: 'test01'
  dest_partition_name: 'tag01'
  collection_parameter:
    dimension:
    index_file_size:
    metric_type:
  • M2H.yaml
M2H:
  milvus_version: 1.0.0
  source_milvus_path: '/var/lib/milvus'
  mysql_parameter:
  source_collection:
    test01:
  data_dir: '/var/lib/milvus/backup'

mlivus to milvus error 1.0

ERROR | milvus_to_milvus.py | transform_milvus_data | 44 | Error with: local variable 'total_vectors' referenced before assignment

backup from milvus-cluster

I'm wondering how we can backup index and vector data from a Milvus cluster to an HDF5 file for HA purposes.

Data integrity

When I transform data of milvus, some situations happend which some vectors can not find in new milvus.
origin milvus:
version=0.10.4,
index_type=IndexType.IVF_FLAT
index_param={'nlist': 16384}

new milvus:
version=1.0.0,
index_type=IndexType.IVF_FLAT
index_param={'nlist': 16384}

H2M No vector dimension in the log file

2021-02-08 11:52:48,802 | DEBUG | data_to_milvus.py | insert_data | 69 | Successfuly insert collection: test_bina/partition: , total num: 5000
Only the total number of vectors, and hopefully all the information inserted in Milvus colletion can be printed

H2M error grpc: received message larger than max

milvusdm --yaml H2M.yaml
0%| | 0/1 [00:00<?, ?it/s]2023-01-09 21:10:13,679 | ERROR | grpc_handler.py | handler | 72 |

Addr [192.168..:19530] bulk_insert

RPC error: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "grpc: received message larger than max (77985775 vs. 67108864)"
debug_error_string = "{"created":"@1673316613.678811865","description":"Error received from peer

ipv4:192.168..:19530","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"grpc: received message larger than max (77985775 vs. 67108864)","grpc_status":8}"

{'API start': '2023-01-09 21:10:11.611670', 'RPC start': '2023-01-09 21:10:11.612275', 'RPC error': '2023-01-09 21:10:13.679653'}

2023-01-09 21:10:13,680 | ERROR | milvus_client.py | insert | 86 | <_MultiThreadedRendezvous of RPC that terminated with:

status = StatusCode.RESOURCE_EXHAUSTED

details = "grpc: received message larger than max (77985775 vs. 67108864)"

debug_error_string = "{"created":"@1673316613.678811865","description":"Error received from peer

ipv4:192.168..:19530","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"grpc: received message larger than max (77985775 vs. 67108864)","grpc_status":8}"

0%| | 0/1 [00:04<?, ?it/s]

how can i solve this problem? thanks.

the sour collection: xxx does not exists

(Status(code=0, message='Show collections successfully!'), ['milvus_datas'])

2021-03-25 16:50:16,962 | ERROR | milvus_to_milvus.py | transform_milvus_data | 47 | Error with: The sour collection: milvus_datas does not exists.

Error occurred when migrating from Milvus to HDF5

An error encountered when using DM to migrate data from Milvus to HDF5.

22-06-07 16:21:30,979 | INFO | milvus_to_hdf5.py | read_milvus_data | 49 | Ready to read all data of collection: video_fingerprint/partitions: [None]
0%| | 0/1 [00:03<?, ?it/s]
2022-06-07 16:21:34,230 | ERROR | milvus_to_hdf5.py | read_milvus_data | 56 | Error with: name 'delids' is not defined

Have tried different versions of milvusdm (1.0, 2.0) and the results are same. And it happened with another milvus server.

Version of milvusdm 2.0
Version of milvus 1.1.1

Configuration shown below ( M2H.yaml )

M2H:
milvus_version: 1.1.1
source_milvus_path: '/data1/milvus_1.x_uni_video'
mysql_parameter:
host: '127.0.0.1'
user: 'root'
port: 3376
password: 'xxxxxxxxxxxx'
database: 'milvus'
source_collection:
video_fingerprint:
data_dir: '/data1/milvus_migration/uni_video'
mode: 'overwrite'

Error when reading milvus data for empty collection

Issue description

When trying to read an empty collection, milvusdm fails saying that total_vectors variable is referenced before assignment. This error origins from get_files_data function in read_milvus_data.py:

def get_files_data(self, table_id, collection_path, milvus_meta):
dim, types = milvus_meta.get_collection_dim_type(table_id)
segment_list, row_list = milvus_meta.get_collection_segments_rows(table_id)
# total_vectors = []
# total_ids = []
# total_rows = 0
# for segment_id, rows in zip(segment_list, row_list):
# total_rows += rows
# vectors, ids = self.get_segment_data(collection_path, segment_id, dim, rows, types)
# total_vectors += vectors
# # total_ids += ids.tolist()
# total_ids += ids
total_rows = 0
for segment_id, rows in zip(segment_list, row_list):
vectors, ids = self.get_segment_data(collection_path, segment_id, dim, rows, types)
if total_rows==0:
total_vectors = vectors
total_ids = ids
else:
total_ids = np.append(total_ids, ids)
total_vectors = np.append(total_vectors, vectors, axis=0)
total_rows += rows
del vectors
del ids
return total_vectors, total_ids, total_rows

When either segment_list or row_list is empty, the for loop won't run and thus an attempt is made to return total_vectors and total_ids which haven't been initialized.

I've created a PR with a simple fix/workaround for this issue: #33

Reproduction steps

Using Milvus 1.1.1, pymilvus==1.1.1 and pymilvusdm==2.0.

  1. Create collection using pymilvus:
_DIM = 8
from milvus import Milvus, IndexType, MetricType, Status
milvus = Milvus('127.0.0.1', '19530')
collection_name = 'example_collection'
param = { 'collection_name': collection_name, 'dimension': _DIM }
milvus.create_collection(param)
milvus.flush([collection_name])
  1. Prepare configuration YAML M2H.yml:
M2H:
  milvus_version: 1.1.1
  source_milvus_path: '<SOURCE_MILVUS_PATH>'
  mysql_parameter:
    host: '127.0.0.1'
    user: 'root'
    port: 3306
    password: 'password'
    database: 'milvus'
  source_collection:
    example_collection:
  data_dir: 'backup'
  1. Call milvusdm --yaml M2H.yml:
<TIMESTAMP> | INFO | milvus_to_hdf5.py | read_milvus_data | 49 | Ready to read all data of collection: example_collection/partitions: [None]
  0%|                                                                                                                      | 0/1 [00:00<?, ?it/s]
<TIMESTAMP>| ERROR | milvus_to_hdf5.py | read_milvus_data | 56 | Error with: local variable 'total_vectors' referenced before assignment

Same error happens when non-default partition is used and contains some vectors, while the default partition stays empty.

Benchmark binary

Why is the benchmark in a binary, and not in actual code? This is not a transparent way to share and replicate benchmark results.

Support milvus annoy index

i want to test the search performance of annoy index.
After i ingested data into milvus and built index, i run the go_benchmark and receive the following exception:
image
It seems that 'benchmark' binary currently not support annoy index.
Does milvus has the plan to open-source the source code of benchmark?

migrate milvus to milvus data when schema is different

we are going to add new field like "hash". but i read some article milvus didn't support alter feature yet.
so we're testing milvusDM tool.

is it possible to migrate data from original collection to new collection on same host like below ?

original collection schema is id / image_url / embeddings.
new collection schema is id / image_url / embeddings / hash.

thank you

MIlvusdm issue with large scale data

I tested Milvusdm with 100 million data, 10 million data, respectively, and when the amount of data is large, it simply does not run in small machines, summary: Milvusdm occupies a very high memory, please fix

Is there a plan to support faiss ivf_pq index files?

For now "pymilvusdm only supports faiss flat and ivf_flat index files", so is there a plan to support faiss ivf_pq index files?
Or any clue or document to start this work by loading the index?
The header is 'IxPT'.

Error: cannot reshape array error

使用milvusdm迁移collection
源和目标节点版本均为0.10.5
由于0.10.5版本无分区,配置如下:

source_collection:
collection_name_xxx:
- ''

执行迁移时提示以下异常,请问是目标集群的限制吗?该如何解决呢?

ERROR | milvus_to_milvus.py | transform_milvus_data | 44 | Error with: cannot reshape array of size 100335616 into shape (48992,64)

HDF5 to Milvus fail

When Milvus to HDF5.Partition not set,Generated file named None.h5.I don't know if that works

When HDF5 to Milvus.I don't know what to specify as the ’dest_partition_name‘ attribute when I don't have a partition,Finally, I gave an empty string

Finally, the following error occurred

2022-06-21 16:44:23,939 | ERROR | grpc_handler.py | handler | 72 |
Addr [192.168.23.131:19530] fake_register_link
RPC error: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = ""
debug_error_string = "{"created":"@1655801063.939163866","description":"Error received from peer ipv4:192.168.23.131:19530","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"","grpc_status":12}"

{'API start': '2022-06-21 16:44:23.937994', 'RPC start': '2022-06-21 16:44:23.938438', 'RPC error': '2022-06-21 16:44:23.939350'}

2022-06-21 16:47:28,638 | ERROR | main.py | execute | 139 | server is not healthy, please try again later

I'm really going crazy. Can you tell me how to specify configuration items when I don't have a specific partition

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.