Comments (31)
Thanks for reporting, we will look into it soon.
from annlite.
I just realize that you are trying to update a deleted key, is there any reason to do this? Why not use insert
instead of update
? @tommykoctur
from annlite.
However, we also support updating a key even though you have deleted it. This should not cause error, I will look into this.
from annlite.
I just realize that you are trying to update a deleted key, is there any reason to do this? Why not use
insert
instead ofupdate
? @tommykoctur
Actually it is intended. We have root document ( which is long text) and we split it to chunks and those chunks are indexed with annlite. So to be sure to update all chunks not leaving any old information behind we delete all chunks first and then we do update/insert... but it would be nice to have a feature to delete(update) on parent_id - I would be in heaven :)
from annlite.
The issue comes from SQLite
, we use soft delete in SQLite (mark as deleted but not real delete), so updating/inserting again will cause the duplicated key error. I need to check this with teams to determine whether we should deprecate this feature.
from annlite.
Hi @jemmyshin , is there any expected time to resolve this ? Was there any decision made yet ? Sorry for urging this but we need to make plans in our team.
Thank you
from annlite.
We have released annlite v0.5.7
, you can update the package and try it again. @tommykoctur
from annlite.
Hi, @jemmyshin , you probably mean 0.5.8... but anyway i will test it asap. Thanks
from annlite.
@jemmyshin I tested the latest commit (annlite @ git+https://github.com/jina-ai/annlite.git@4c145ddd19abb4caec479941d1c0ffb03c4cfcf3 ) and my minimal example does not work.
IntegrityError: UNIQUE constraint failed: table_0._doc_id
from annlite.
Can you try this? https://github.com/jina-ai/annlite/blob/main/tests/executor/test_executor.py#L408 this is from our unittest
from annlite.
Hi,
I am somehow not able to run that test successfully:
___________________________________________________________________ ERROR collecting tests/executor/test_executor.py ____________________________________________________________________
ImportError while importing test module '/home/tokoctur/annlite/tests/executor/test_executor.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../miniconda3/envs/jina-test/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/executor/test_executor.py:11: in <module>
from annlite.executor import AnnLiteIndexer
annlite/__init__.py:3: in <module>
from .index import AnnLite
annlite/index.py:16: in <module>
from .container import CellContainer
annlite/container.py:13: in <module>
from .core.index.hnsw import HnswIndex
annlite/core/__init__.py:1: in <module>
from .codec import PQCodec, ProjectorCodec, VQCodec
annlite/core/codec/__init__.py:1: in <module>
from .pq import PQCodec
annlite/core/codec/pq.py:6: in <module>
from annlite import pq_bind
E ImportError: cannot import name 'pq_bind' from partially initialized module 'annlite' (most likely due to a circular import) (/home/tokoctur/annlite/annlite/__init__.py)
from annlite.
you should first uninstall annlite and then pip install -e .
to the folder where setup.py
is, and then you can run this test. @tommykoctur
from annlite.
Hi @jemmyshin ;
thank you for suggestion.
=================================================================================== warnings summary ====================================================================================
../miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/serve/executors/__init__.py:126
UserWarning: `docs` annotation must be a class if you want to use it as schema input, got typing.Optional[docarray.array.document.DocumentArray]. try to remove the Optional.fallback to default behavior (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/serve/executors/__init__.py:126)
tests/executor/test_executor.py::test_local_storage_delete_update
DeprecationWarning: There is no current event loop (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/orchestrate/flow/base.py:1905)
tests/executor/test_executor.py::test_local_storage_delete_update
DeprecationWarning: There is no current event loop (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/orchestrate/flow/base.py:1915)
tests/executor/test_executor.py::test_local_storage_delete_update
DeprecationWarning: There is no current event loop (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/orchestrate/flow/base.py:1921)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================ 1 passed, 4 warnings in 17.27s =============================================================================
from annlite.
OK, could you check if you init the AnnLiteIndexer into the same folder? If yes, you can first delete the index folder and rerun your code.
from annlite.
@jemmyshin I am not sure what do you mean. Can you please be more exact ?
from annlite.
When you run f = Flow().add(uses=AnnLiteIndexer)
, this actually create a folder for storing data, so each time you need to first remove this folder first before you start another experiment, otherwise there will be key collision since you insert the same data.
from annlite.
Or you can specify the different data_path
when you start the flow.
from annlite.
hi @jemmyshin , I changed my flow definition to
f = Flow().add(uses=AnnLiteIndexer, uses_with={"data_path": "xxx"})
But I got the same error:
execute(sql, values[-1])\n"
stacks: "sqlite3.IntegrityError: UNIQUE constraint failed: table_0._doc_id\n"
executor: "AnnLiteIndexer"
}
}
exec_endpoint: "/update"
target_executor: ""
Is my minimal example above working for you ?
Thanks
from annlite.
Yes, this works for me, how about removing this data_path
every time before you run the script?
from annlite.
Hi, I tried it on my dev ubuntu server and also on my mac, with the same result, even when deleting "xxx" folder.
This is my full output I hope it helps... but I think there must be something you are doing differently than me.
➜ jina-multi-sentence-sse git:(feature/crud-fixing) ✗ rm -rf xxx
➜ jina-multi-sentence-sse git:(feature/crud-fixing) ✗ python delete_bug_minimal_example.py
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:64962 │
│ 🔒 Private 192.168.117.174:64962 │
│ 🌍 Public XXX.XX.XXX.XX:64962 │
╰──────────────────────────────────────────╯
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:34.679 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:34.680 | INFO | annlite.index:_rebuild_index_from_local:777 - Rebuild the indexer from scratch
2023-02-27 12:17:34.689 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:57934 │
│ 🔒 Private 192.168.117.174:57934 │
│ 🌍 Public XXX.XX.XXX.XX:57934 │
╰──────────────────────────────────────────╯
2023-02-27 12:17:35.297 | INFO | annlite.index:backup:657 - dump to local ...
2023-02-27 12:17:35.298 | INFO | annlite.index:dump_model:680 - Save the parameters to xxx/parameters-2b445f0495bd404037d10b26cf101add
2023-02-27 12:17:35.329 | INFO | annlite.index:dump_index:692 - Save the indexer to xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Backup
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:35.956 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:35.958 | INFO | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:35.966 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:62494 │
│ 🔒 Private 192.168.117.174:62494 │
│ 🌍 Public XXX.XX.XXX.XX:62494 │
╰──────────────────────────────────────────╯
2023-02-27 12:17:36.579 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:36.580 | INFO | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:36.583 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
Restore
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:37.188 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:37.190 | INFO | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:37.198 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:59541 │
│ 🔒 Private 192.168.117.174:59541 │
│ 🌍 Public XXX.XX.XXX.XX:59541 │
╰──────────────────────────────────────────╯
d1 0 0
Deleted d1 at: 2023-02-27 11:17:37.771039
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:38.429 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:38.430 | INFO | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:38.437 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:63125 │
│ 🔒 Private 192.168.117.174:63125 │
│ 🌍 Public XXX.XX.XXX.XX:63125 │
╰──────────────────────────────────────────╯
2023-02-27 12:17:39.328 | INFO | annlite.index:backup:657 - dump to local ...
2023-02-27 12:17:39.329 | INFO | annlite.index:dump_model:680 - Save the parameters to xxx/parameters-2b445f0495bd404037d10b26cf101add
2023-02-27 12:17:39.333 | INFO | annlite.index:dump_index:692 - Save the indexer to xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
2023-02-27 12:17:39.335 | INFO | annlite.index:dump_index:695 - Index path xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT already exists, will be overwritten
Backup
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:39.919 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:39.920 | INFO | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:39.929 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:51194 │
│ 🔒 Private 192.168.117.174:51194 │
│ 🌍 Public XXX.XX.XXX.XX:51194 │
╰──────────────────────────────────────────╯
2023-02-27 12:17:40.607 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:40.609 | INFO | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:40.613 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
Restore
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:41.214 | INFO | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:41.215 | INFO | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:41.223 | INFO | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local 0.0.0.0:59358 │
│ 🔒 Private 192.168.117.174:59358 │
│ 🌍 Public XXX.XX.XXX.XX:59358 │
╰──────────────────────────────────────────╯
ERROR executor0/rep-0@23976 IntegrityError('UNIQUE constraint failed: table_0._doc_id') [02/27/23 12:17:41]
add "--quiet-error" to suppress the exception details
Traceback (most recent call last):
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/__init__.py", line 222, in process_data
result = await self._request_handler.handle(
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/request_handling.py", line 291, in handle
return_data = await self._executor.__acall__(
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py", line 352, in __acall__
return await self.__acall_endpoint__(req_endpoint, **kwargs)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py", line 408, in __acall_endpoint__
return await exec_func(
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py", line 369, in exec_func
return await get_or_reuse_loop().run_in_executor(None, functools.partial(func, self,
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/decorators.py", line 182, in arg_wrapper
return fn(executor_instance, *args, **kwargs)
File "/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py", line 204, in update
self._index[doc.id] = doc
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/mixins/setitem.py", line 85, in __setitem__
self._set_doc(index, value)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/base/getsetdel.py", line 177, in _set_doc
self._set_doc_by_id(_id, value)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/annlite/getsetdel.py", line 28, in _set_doc_by_id
self._annlite.update(docs)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/index.py", line 326, in update
return super(AnnLite, self).update(
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py", line 380, in update
self.insert(new_data, new_cells, new_docs)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py", line 279, in insert
offsets = self.cell_table(cell_id).insert(docs)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/storage/table.py", line 250, in insert
cursor.execute(sql, values[-1])
sqlite3.IntegrityError: UNIQUE constraint failed: table_0._doc_id
Traceback (most recent call last):
File "/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py", line 450, in <module>
update()
File "/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py", line 423, in update
f.post(on='/update', inputs=du)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/mixin.py", line 273, in post
return run_async(
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/helper.py", line 1342, in run_async
return asyncio.run(func(*args, **kwargs))
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/mixin.py", line 264, in _get_results
async for resp in c._get_results(*args, **kwargs):
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/base/grpc.py", line 140, in _get_results
callback_exec(
File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/helper.py", line 81, in callback_exec
raise BadServer(response.header)
jina.excepts.BadServer: request_id: "968cd22f5a6045caaed2ab690a14c092"
status {
code: ERROR
description: "IntegrityError(\'UNIQUE constraint failed: table_0._doc_id\')"
exception {
name: "IntegrityError"
args: "UNIQUE constraint failed: table_0._doc_id"
stacks: "Traceback (most recent call last):\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/__init__.py\", line 222, in process_data\n result = await self._request_handler.handle(\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/request_handling.py\", line 291, in handle\n return_data = await self._executor.__acall__(\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py\", line 352, in __acall__\n return await self.__acall_endpoint__(req_endpoint, **kwargs)\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py\", line 408, in __acall_endpoint__\n return await exec_func(\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py\", line 369, in exec_func\n return await get_or_reuse_loop().run_in_executor(None, functools.partial(func, self,\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/concurrent/futures/thread.py\", line 58, in run\n result = self.fn(*self.args, **self.kwargs)\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/decorators.py\", line 182, in arg_wrapper\n return fn(executor_instance, *args, **kwargs)\n"
stacks: " File \"/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py\", line 204, in update\n self._index[doc.id] = doc\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/mixins/setitem.py\", line 85, in __setitem__\n self._set_doc(index, value)\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/base/getsetdel.py\", line 177, in _set_doc\n self._set_doc_by_id(_id, value)\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/annlite/getsetdel.py\", line 28, in _set_doc_by_id\n self._annlite.update(docs)\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/index.py\", line 326, in update\n return super(AnnLite, self).update(\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py\", line 380, in update\n self.insert(new_data, new_cells, new_docs)\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py\", line 279, in insert\n offsets = self.cell_table(cell_id).insert(docs)\n"
stacks: " File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/storage/table.py\", line 250, in insert\n cursor.execute(sql, values[-1])\n"
stacks: "sqlite3.IntegrityError: UNIQUE constraint failed: table_0._doc_id\n"
executor: "AnnLiteIndexer"
}
}
exec_endpoint: "/update"
target_executor: ""
from annlite.
OK, I reproduced this error. I am curious about your use case for this, to see if we can have some optimization here. I believe this error comes from restoring and reloading.
from annlite.
Hi,
well I wouldn't say that our use case can't be optimized. :)
Well we are doing text search, and our basic text unit is a sentence. So our root documents is long text and it has chunks which are senteces with embeddings. Those chunks are stored in annlite.
If a root document will change e.g. it gets shorter (from 10 sentences to 5) then with update method it will update just 5 sentences in annlite and another 5 will be sitting there, but they should be deleted.
Best feature for us would be to have update/delete operations based on parent_id, or any operations with nested structure, but this is not how annlite works.
from annlite.
@jemmyshin are there any eta when it will be fixed (i understand that it is up to your availability, but we would like to make plans for our project)
from annlite.
Sorry for the late reply, we will start working on this issue next week. @tommykoctur
from annlite.
Thank you @jemmyshin , if I can help with anything, just let me know.
Thanks
from annlite.
Hi any update on this?
Thanks
from annlite.
Hi, sorry for the late reply, we are fully occupied by other tickets this week and sorry about the delay. Our engineers will work on it this Friday and hopefully we can fix it by next week. Thanks!
from annlite.
Hi @jemmyshin , any update on this? Thanks
from annlite.
@tommykoctur We have already identified where this error comes from, and doing some fixes. We just need one or two days. thank you for your patience.
from annlite.
@tommykoctur this issue will be address by this PR #222
And what's more, due to the upstream issue of grpio, your test script cannot work because you run sequence of grpc services (i.e., jina flow). Hence, I made some adoption as follows:
def all():
f = Flow().add(uses=AnnLiteIndexer, uses_with={'data_path': './data'})
import time
with f:
print(f'==> index')
f.post(on='/clear')
f.post(on='/index', inputs=da)
# wait for the index thread to finish
time.sleep(5)
f.post(on='/dump')
print(f'==> backup')
f.post(on='/backup')
print(f'==> restore')
f.post(on='/restore')
print(f'==> delete')
delete_list = ["d1"]
f.post(on='/delete', parameters={'ids': delete_list})
print(f'==> backup')
f.post(on='/backup')
print(f'==> restore')
f.post(on='/restore')
print(f'==> update')
du = DocumentArray([
Document(id="d1", text="updated data 1", embedding=np.array([1, 2, 3, 4, 7]),
tags={"tag_id": "updated_d1"})])
f.post(on='/update', inputs=du)
print("==> backup")
f.post(on='/backup')
print(f'==> restore')
f.post(on='/restore')
print(f'==> search')
f.post(on='/search', inputs=DocumentArray([Document(embedding=np.array([1, 2, 3, 4, 7]))]))
if __name__ == "__main__":
all()
from annlite.
Hi I can confirm that issue is solved. Thank you
from annlite.
Related Issues (20)
- Support for 16 bit quantization HOT 2
- Support Lucene backend via PyLucene HOT 1
- fix: links to documentation are broken HOT 2
- RuntimeError: wrong dimensionality of the vectors HOT 5
- RuntimeError: cannot return results
- add dump/backup endpoints
- Support for Mac with Apple Silicon HOT 1
- Can annlite be accelerated? HOT 4
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 2
- annlite wrong filter name bug HOT 1
- Delete in executor does not works HOT 11
- Link missing in README.md HOT 2
- (bug)ID mismatch between hnsw and sqlite HOT 1
- ImportError in tests directory HOT 2
- 支持gpu? HOT 1
- Annliteindexer results change every bootup within a jina flow HOT 9
- AttributeError: 'builtins.WriteOptions' object has no attribute 'set_sync' HOT 1
- docarray extend is very slow HOT 6
- snapshot's index_hash has wrong value when deleting only HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annlite.