Coder Social home page Coder Social logo

Comments (31)

jemmyshin avatar jemmyshin commented on June 6, 2024

Thanks for reporting, we will look into it soon.

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

I just realize that you are trying to update a deleted key, is there any reason to do this? Why not use insert instead of update? @tommykoctur

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

However, we also support updating a key even though you have deleted it. This should not cause error, I will look into this.

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

I just realize that you are trying to update a deleted key, is there any reason to do this? Why not use insert instead of update? @tommykoctur

Actually it is intended. We have root document ( which is long text) and we split it to chunks and those chunks are indexed with annlite. So to be sure to update all chunks not leaving any old information behind we delete all chunks first and then we do update/insert... but it would be nice to have a feature to delete(update) on parent_id - I would be in heaven :)

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

The issue comes from SQLite, we use soft delete in SQLite (mark as deleted but not real delete), so updating/inserting again will cause the duplicated key error. I need to check this with teams to determine whether we should deprecate this feature.

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi @jemmyshin , is there any expected time to resolve this ? Was there any decision made yet ? Sorry for urging this but we need to make plans in our team.

Thank you

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

We have released annlite v0.5.7, you can update the package and try it again. @tommykoctur

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi, @jemmyshin , you probably mean 0.5.8... but anyway i will test it asap. Thanks

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

@jemmyshin I tested the latest commit (annlite @ git+https://github.com/jina-ai/annlite.git@4c145ddd19abb4caec479941d1c0ffb03c4cfcf3 ) and my minimal example does not work.
IntegrityError: UNIQUE constraint failed: table_0._doc_id

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

Can you try this? https://github.com/jina-ai/annlite/blob/main/tests/executor/test_executor.py#L408 this is from our unittest

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi,
I am somehow not able to run that test successfully:

___________________________________________________________________ ERROR collecting tests/executor/test_executor.py ____________________________________________________________________
ImportError while importing test module '/home/tokoctur/annlite/tests/executor/test_executor.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../miniconda3/envs/jina-test/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/executor/test_executor.py:11: in <module>
    from annlite.executor import AnnLiteIndexer
annlite/__init__.py:3: in <module>
    from .index import AnnLite
annlite/index.py:16: in <module>
    from .container import CellContainer
annlite/container.py:13: in <module>
    from .core.index.hnsw import HnswIndex
annlite/core/__init__.py:1: in <module>
    from .codec import PQCodec, ProjectorCodec, VQCodec
annlite/core/codec/__init__.py:1: in <module>
    from .pq import PQCodec
annlite/core/codec/pq.py:6: in <module>
    from annlite import pq_bind
E   ImportError: cannot import name 'pq_bind' from partially initialized module 'annlite' (most likely due to a circular import) (/home/tokoctur/annlite/annlite/__init__.py)

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

you should first uninstall annlite and then pip install -e . to the folder where setup.py is, and then you can run this test. @tommykoctur

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi @jemmyshin ;
thank you for suggestion.

=================================================================================== warnings summary ====================================================================================
../miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/serve/executors/__init__.py:126
  UserWarning: `docs` annotation must be a class if you want to use it as schema input, got typing.Optional[docarray.array.document.DocumentArray]. try to remove the Optional.fallback to default behavior (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/serve/executors/__init__.py:126)

tests/executor/test_executor.py::test_local_storage_delete_update
  DeprecationWarning: There is no current event loop (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/orchestrate/flow/base.py:1905)

tests/executor/test_executor.py::test_local_storage_delete_update
  DeprecationWarning: There is no current event loop (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/orchestrate/flow/base.py:1915)

tests/executor/test_executor.py::test_local_storage_delete_update
  DeprecationWarning: There is no current event loop (raised from /home/tokoctur/miniconda3/envs/jina-test/lib/python3.10/site-packages/jina/orchestrate/flow/base.py:1921)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================ 1 passed, 4 warnings in 17.27s =============================================================================

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

OK, could you check if you init the AnnLiteIndexer into the same folder? If yes, you can first delete the index folder and rerun your code.

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

@jemmyshin I am not sure what do you mean. Can you please be more exact ?

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

When you run f = Flow().add(uses=AnnLiteIndexer), this actually create a folder for storing data, so each time you need to first remove this folder first before you start another experiment, otherwise there will be key collision since you insert the same data.

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

Or you can specify the different data_path when you start the flow.

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

hi @jemmyshin , I changed my flow definition to
f = Flow().add(uses=AnnLiteIndexer, uses_with={"data_path": "xxx"})

But I got the same error:

execute(sql, values[-1])\n"
    stacks: "sqlite3.IntegrityError: UNIQUE constraint failed: table_0._doc_id\n"
    executor: "AnnLiteIndexer"
  }
}
exec_endpoint: "/update"
target_executor: ""

Is my minimal example above working for you ?

Thanks

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

Yes, this works for me, how about removing this data_path every time before you run the script?

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi, I tried it on my dev ubuntu server and also on my mac, with the same result, even when deleting "xxx" folder.
This is my full output I hope it helps... but I think there must be something you are doing differently than me.

➜  jina-multi-sentence-sse git:(feature/crud-fixing) ✗ rm -rf xxx
➜  jina-multi-sentence-sse git:(feature/crud-fixing) ✗ python delete_bug_minimal_example.py
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:64962  │
│  🔒     Private   192.168.117.174:64962  │
│  🌍      Public     XXX.XX.XXX.XX:64962  │
╰──────────────────────────────────────────╯
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:34.679 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:34.680 | INFO     | annlite.index:_rebuild_index_from_local:777 - Rebuild the indexer from scratch
2023-02-27 12:17:34.689 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:57934  │
│  🔒     Private   192.168.117.174:57934  │
│  🌍      Public     XXX.XX.XXX.XX:57934  │
╰──────────────────────────────────────────╯
2023-02-27 12:17:35.297 | INFO     | annlite.index:backup:657 - dump to local ...
2023-02-27 12:17:35.298 | INFO     | annlite.index:dump_model:680 - Save the parameters to xxx/parameters-2b445f0495bd404037d10b26cf101add
2023-02-27 12:17:35.329 | INFO     | annlite.index:dump_index:692 - Save the indexer to xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Backup
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:35.956 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:35.958 | INFO     | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:35.966 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:62494  │
│  🔒     Private   192.168.117.174:62494  │
│  🌍      Public     XXX.XX.XXX.XX:62494  │
╰──────────────────────────────────────────╯
2023-02-27 12:17:36.579 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:36.580 | INFO     | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:36.583 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
Restore
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:37.188 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:37.190 | INFO     | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:37.198 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:59541  │
│  🔒     Private   192.168.117.174:59541  │
│  🌍      Public     XXX.XX.XXX.XX:59541  │
╰──────────────────────────────────────────╯
d1 0 0
Deleted d1 at: 2023-02-27 11:17:37.771039
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:38.429 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:38.430 | INFO     | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:38.437 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:63125  │
│  🔒     Private   192.168.117.174:63125  │
│  🌍      Public     XXX.XX.XXX.XX:63125  │
╰──────────────────────────────────────────╯
2023-02-27 12:17:39.328 | INFO     | annlite.index:backup:657 - dump to local ...
2023-02-27 12:17:39.329 | INFO     | annlite.index:dump_model:680 - Save the parameters to xxx/parameters-2b445f0495bd404037d10b26cf101add
2023-02-27 12:17:39.333 | INFO     | annlite.index:dump_index:692 - Save the indexer to xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
2023-02-27 12:17:39.335 | INFO     | annlite.index:dump_index:695 - Index path xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT already exists, will be overwritten
Backup
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:39.919 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:39.920 | INFO     | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:39.929 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:51194  │
│  🔒     Private   192.168.117.174:51194  │
│  🌍      Public     XXX.XX.XXX.XX:51194  │
╰──────────────────────────────────────────╯
2023-02-27 12:17:40.607 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:40.609 | INFO     | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:40.613 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
Restore
⠸ Waiting executor0 summary... ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 0:00:002023-02-27 12:17:41.214 | INFO     | annlite.index:restore:670 - restore Annlite from local
2023-02-27 12:17:41.215 | INFO     | annlite.index:_rebuild_index_from_local:770 - Load the indexer from snapshot xxx/snapshot-2b445f0495bd404037d10b26cf101add/2023-02-27#11:17:34-SNAPSHOT
Warning: Calling load_index for an already inited index. Old index is being deallocated.2023-02-27 12:17:41.223 | INFO     | annlite.index:_rebuild_index_from_local:794 - Load the model from xxx/parameters-2b445f0495bd404037d10b26cf101add
────────────────────────────────────────────────────────────────────────────────────────────────────── 🎉 Flow is ready to serve! ──────────────────────────────────────────────────────────────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:59358  │
│  🔒     Private   192.168.117.174:59358  │
│  🌍      Public     XXX.XX.XXX.XX:59358  │
╰──────────────────────────────────────────╯
ERROR  executor0/rep-0@23976 IntegrityError('UNIQUE constraint failed: table_0._doc_id')                                                                                                                             [02/27/23 12:17:41]
        add "--quiet-error" to suppress the exception details                                                                                                                                                                           
       Traceback (most recent call last):                                                                                                                                                                                               
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/__init__.py", line 222, in process_data                                                       
           result = await self._request_handler.handle(                                                                                                                                                                                 
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/request_handling.py", line 291, in handle                                                     
           return_data = await self._executor.__acall__(                                                                                                                                                                                
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py", line 352, in __acall__                                                                
           return await self.__acall_endpoint__(req_endpoint, **kwargs)                                                                                                                                                                 
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py", line 408, in __acall_endpoint__                                                       
           return await exec_func(                                                                                                                                                                                                      
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py", line 369, in exec_func                                                                
           return await get_or_reuse_loop().run_in_executor(None, functools.partial(func, self,                                                                                                                                         
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/concurrent/futures/thread.py", line 58, in run                                                                                         
           result = self.fn(*self.args, **self.kwargs)                                                                                                                                                                                  
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/decorators.py", line 182, in arg_wrapper                                                            
           return fn(executor_instance, *args, **kwargs)                                                                                                                                                                                
         File "/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py", line 204, in update                                                                                                             
           self._index[doc.id] = doc                                                                                                                                                                                                    
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/mixins/setitem.py", line 85, in __setitem__                                                               
           self._set_doc(index, value)                                                                                                                                                                                                  
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/base/getsetdel.py", line 177, in _set_doc                                                         
           self._set_doc_by_id(_id, value)                                                                                                                                                                                              
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/annlite/getsetdel.py", line 28, in _set_doc_by_id                                                 
           self._annlite.update(docs)                                                                                                                                                                                                   
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/index.py", line 326, in update                                                                                   
           return super(AnnLite, self).update(                                                                                                                                                                                          
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py", line 380, in update                                                                               
           self.insert(new_data, new_cells, new_docs)                                                                                                                                                                                   
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py", line 279, in insert                                                                               
           offsets = self.cell_table(cell_id).insert(docs)                                                                                                                                                                              
         File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/storage/table.py", line 250, in insert                                                                           
           cursor.execute(sql, values[-1])                                                                                                                                                                                              
       sqlite3.IntegrityError: UNIQUE constraint failed: table_0._doc_id                                                                                                                                                                
Traceback (most recent call last):
  File "/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py", line 450, in <module>
    update()
  File "/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py", line 423, in update
    f.post(on='/update', inputs=du)
  File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/mixin.py", line 273, in post
    return run_async(
  File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/helper.py", line 1342, in run_async
    return asyncio.run(func(*args, **kwargs))
  File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/mixin.py", line 264, in _get_results
    async for resp in c._get_results(*args, **kwargs):
  File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/base/grpc.py", line 140, in _get_results
    callback_exec(
  File "/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/clients/helper.py", line 81, in callback_exec
    raise BadServer(response.header)
jina.excepts.BadServer: request_id: "968cd22f5a6045caaed2ab690a14c092"
status {
  code: ERROR
  description: "IntegrityError(\'UNIQUE constraint failed: table_0._doc_id\')"
  exception {
    name: "IntegrityError"
    args: "UNIQUE constraint failed: table_0._doc_id"
    stacks: "Traceback (most recent call last):\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/__init__.py\", line 222, in process_data\n    result = await self._request_handler.handle(\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/runtimes/worker/request_handling.py\", line 291, in handle\n    return_data = await self._executor.__acall__(\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py\", line 352, in __acall__\n    return await self.__acall_endpoint__(req_endpoint, **kwargs)\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py\", line 408, in __acall_endpoint__\n    return await exec_func(\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/__init__.py\", line 369, in exec_func\n    return await get_or_reuse_loop().run_in_executor(None, functools.partial(func, self,\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/concurrent/futures/thread.py\", line 58, in run\n    result = self.fn(*self.args, **self.kwargs)\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/jina/serve/executors/decorators.py\", line 182, in arg_wrapper\n    return fn(executor_instance, *args, **kwargs)\n"
    stacks: "  File \"/Users/USERNAME/PycharmProjects/jina-multi-sentence-sse/delete_bug_minimal_example.py\", line 204, in update\n    self._index[doc.id] = doc\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/mixins/setitem.py\", line 85, in __setitem__\n    self._set_doc(index, value)\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/base/getsetdel.py\", line 177, in _set_doc\n    self._set_doc_by_id(_id, value)\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/docarray/array/storage/annlite/getsetdel.py\", line 28, in _set_doc_by_id\n    self._annlite.update(docs)\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/index.py\", line 326, in update\n    return super(AnnLite, self).update(\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py\", line 380, in update\n    self.insert(new_data, new_cells, new_docs)\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/container.py\", line 279, in insert\n    offsets = self.cell_table(cell_id).insert(docs)\n"
    stacks: "  File \"/Users/USERNAME/miniconda3-intel/envs/JINAPROJECT/lib/python3.10/site-packages/annlite/storage/table.py\", line 250, in insert\n    cursor.execute(sql, values[-1])\n"
    stacks: "sqlite3.IntegrityError: UNIQUE constraint failed: table_0._doc_id\n"
    executor: "AnnLiteIndexer"
  }
}
exec_endpoint: "/update"
target_executor: ""

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

OK, I reproduced this error. I am curious about your use case for this, to see if we can have some optimization here. I believe this error comes from restoring and reloading.

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi,

well I wouldn't say that our use case can't be optimized. :)
Well we are doing text search, and our basic text unit is a sentence. So our root documents is long text and it has chunks which are senteces with embeddings. Those chunks are stored in annlite.
If a root document will change e.g. it gets shorter (from 10 sentences to 5) then with update method it will update just 5 sentences in annlite and another 5 will be sitting there, but they should be deleted.

Best feature for us would be to have update/delete operations based on parent_id, or any operations with nested structure, but this is not how annlite works.

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

@jemmyshin are there any eta when it will be fixed (i understand that it is up to your availability, but we would like to make plans for our project)

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

Sorry for the late reply, we will start working on this issue next week. @tommykoctur

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Thank you @jemmyshin , if I can help with anything, just let me know.
Thanks

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi any update on this?

Thanks

from annlite.

jemmyshin avatar jemmyshin commented on June 6, 2024

Hi, sorry for the late reply, we are fully occupied by other tickets this week and sorry about the delay. Our engineers will work on it this Friday and hopefully we can fix it by next week. Thanks!

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi @jemmyshin , any update on this? Thanks

from annlite.

numb3r3 avatar numb3r3 commented on June 6, 2024

@tommykoctur We have already identified where this error comes from, and doing some fixes. We just need one or two days. thank you for your patience.

from annlite.

numb3r3 avatar numb3r3 commented on June 6, 2024

@tommykoctur this issue will be address by this PR #222

And what's more, due to the upstream issue of grpio, your test script cannot work because you run sequence of grpc services (i.e., jina flow). Hence, I made some adoption as follows:

def all():
    f = Flow().add(uses=AnnLiteIndexer, uses_with={'data_path': './data'})

    import time
    with f:
        print(f'==> index')
        f.post(on='/clear')
        f.post(on='/index', inputs=da)
        # wait for the index thread to finish
        time.sleep(5)
        f.post(on='/dump')

        print(f'==> backup')
        f.post(on='/backup')

        print(f'==> restore')
        f.post(on='/restore')

        print(f'==> delete')
        delete_list = ["d1"]
        f.post(on='/delete', parameters={'ids': delete_list})
        print(f'==> backup')
        f.post(on='/backup')

        print(f'==> restore')
        f.post(on='/restore')
        print(f'==> update')
        du = DocumentArray([
            Document(id="d1", text="updated data 1", embedding=np.array([1, 2, 3, 4, 7]),
                     tags={"tag_id": "updated_d1"})])
        f.post(on='/update', inputs=du)
        print("==> backup")
        f.post(on='/backup')

        print(f'==> restore')
        f.post(on='/restore')
        print(f'==> search')
        f.post(on='/search', inputs=DocumentArray([Document(embedding=np.array([1, 2, 3, 4, 7]))]))

if __name__ == "__main__":
    all()

from annlite.

tommykoctur avatar tommykoctur commented on June 6, 2024

Hi I can confirm that issue is solved. Thank you

from annlite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.