Comments (15)
@nikhilmakan02 the PR is ready to go, but until it is released you can circumvent the problem by downgrading your numpy version. Thanks for flagging this!
from docarray.
Can you give an example of a Doc type that would not work?
from docarray.
Here is an example taken from the docs which I have modified to now include an embeddings column. Hope that makes it clearer. This example works fine with out the embeddings field.
You can even create a doc list with the embeddings field using a python loop as described above. Then use the to_dataframe() method to export the doc list. But you then can't go back and create the doc list again from that dataframe using the from_dataframe() method.
Note I am using doc list and doc vec interchangeably here, the problem is the same for both.
import pandas as pd
import numpy as np
from docarray.typing import NdArray
from docarray import BaseDoc, DocVec
class Person(BaseDoc):
name: str
follower: int
embeddings: NdArray[32]
df = pd.DataFrame(
data=[["Maria", 12345, np.zeros(32)], ["Jake", 54321, np.zeros(32)]],
columns=["name", "follower", "embeddings"],
)
docs = DocVec[Person].from_dataframe(df)
from docarray.
What is the error you see? I do not reproduce the issue. My pandas
version is 2.0.3. What is yours?
from docarray.
python==3.11
docarray==0.39.0
pandas==2.1.1
Here is the traceback. I downgraded to pandas==2.0.3
same issue.
Traceback (most recent call last):
File "/workspaces/address-vector-search/archive/doc_array_github_issue.py", line 19, in <module>
docs = DocVec[Person].from_dataframe(df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/docarray/array/doc_vec/io.py", line 451, in from_dataframe
return cls(super().from_dataframe(df), tensor_type=tensor_type) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/docarray/array/doc_list/io.py", line 519, in from_dataframe
doc_dict = _access_path_dict_to_nested_dict(access_path2val)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/docarray/helper.py", line 79, in _access_path_dict_to_nested_dict
value=value if value not in ['', 'None'] else None,
^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
from docarray.
Can you share all the installed packages? I do not manage to reproduce.
from docarray.
I can hopefully do a bit better to recreate the issue for you. I am working in a vs code dev container which is just a docker container.
So you can recreate the issue by using docker, vs code and the vs code dev container extension. Or however other means you would like. The attachment below contains the docker file, requirements.txt file and the code already shown above, there is also a devcontainer.json file which is used by the vs code dev container extension.
Hopefully this should create an isolated instance of the problem that you can reproduce on your end.
from docarray.
Thanks, will try to do it
from docarray.
Reproduced in the docker container thanks.
This is the list of dependencies:
annotated-types==0.6.0
docarray==0.39.0
markdown-it-py==3.0.0
mdurl==0.1.2
mypy-extensions==1.0.0
numpy==1.26.1
orjson==3.9.9
pandas==2.1.1
pydantic==2.4.2
pydantic_core==2.10.1
Pygments==2.16.1
python-dateutil==2.8.2
pytz==2023.3.post1
rich==13.6.0
six==1.16.0
types-requests==2.31.0.9
typing-inspect==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.0.6
from docarray.
Okey, it seems the reason is the numpy
version is different to the one I have. I will try to fix this, thank you very much.
from docarray.
Amazing thanks. Out of interest what does the below mean in the docs
"List-like fields (including field of type DocList) are not supported."
I initially raised this as a feature request as I thought that meant you couldn't have lists in a field.
from docarray.
Oh, this may be part of the issue. However, there should be a better error message.
from docarray.
Hey @nikhilmakan02 , this is indeed a bug, the sentence "List-like fields (including field of type DocList) are not supported."
just means that the Document schema that defines your DocList/DocVec cannot contain another DocList/DocVec, which is not the case in your example.
I am working on a fix right now, if things go to plan it should be out today.
from docarray.
@JohannesMessner amazing work! Thank you very much for addressing this.
from docarray.
@JohannesMessner @nikhilmakan02 @JoanFM FYI
found an edge case for you
from NLM UMLS:
"C3842509","Finding","0 - none; 0 = None; 0= None; 0=None","None"
val: <class 'str'> = C3842509
val: <class 'str'> = Finding
val: <class 'str'> = 0 - none; 0 = None; 0= None; 0=None
val: <class 'str'> = None
def _is_none_like(val: Any) -> bool:
"""
:param val: any value
:return: true iff `val` equals to `None`, `'None'` or `''`
"""
# Convoluted implementation, but fixes https://github.com/docarray/docarray/issues/1821
# tensor-like types can have unexpected (= broadcast) `==`/`in` semantics,
# so treat separately
is_np_arr = isinstance(val, np.ndarray)
if is_np_arr:
return False
is_torch_tens = is_torch_available() and isinstance(val, torch.Tensor)
if is_torch_tens:
return False
is_tf_tens = is_tf_available() and isinstance(val, tf.Tensor)
if is_tf_tens:
return False
is_jax_arr = is_jax_available() and isinstance(val, jax.numpy.ndarray)
if is_jax_arr:
return False
# "normal" case
+ print(f"val: {type(val)} = {val}")
return val in ['', 'None', None]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[31], [line 4](vscode-notebook-cell:?execution_count=31&line=4)
[1](vscode-notebook-cell:?execution_count=31&line=1) conditions_doc_list = docarray.DocList[Condition].from_dataframe(
[2](vscode-notebook-cell:?execution_count=31&line=2) df=conditions_df,
[3](vscode-notebook-cell:?execution_count=31&line=3) )
----> [4](vscode-notebook-cell:?execution_count=31&line=4) findings_doc_list = docarray.DocList[Finding].from_dataframe(
[5](vscode-notebook-cell:?execution_count=31&line=5) df=findings_df,
[6](vscode-notebook-cell:?execution_count=31&line=6) )
[7](vscode-notebook-cell:?execution_count=31&line=7) associations_doc_list = docarray.DocList[Association].from_dataframe(
[8](vscode-notebook-cell:?execution_count=31&line=8) df=associations_df,
[9](vscode-notebook-cell:?execution_count=31&line=9) )
File [~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:518](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:518), in IOMixinDocList.from_dataframe(cls, df)
[516](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:516) access_path2val = row._asdict()
[517](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:517) access_path2val.pop('index', None)
--> [518](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:518) doc_dict = _access_path_dict_to_nested_dict(access_path2val)
[519](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:519) docs.append(doc_type.parse_obj(doc_dict))
[521](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:521) return docs
File [~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:124](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:124), in _access_path_dict_to_nested_dict(access_path2val)
[120](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:120) nested_dict: Dict[Any, Any] = {}
[121](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:121) for access_path, value in access_path2val.items():
[122](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:122) field2val = _access_path_to_dict(
[123](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:123) access_path=access_path,
--> [124](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:124) value=None if _is_none_like(value) else value,
[125](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:125) )
[126](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:126) _update_nested_dicts(to_update=nested_dict, update_with=field2val)
[127](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:127) return nested_dict
File [~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:99), in _is_none_like(val)
[96](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:96) return False
[98](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:98) # "normal" case
---> [99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:99) return val in ['', 'None', None]
File missing.pyx:419, in pandas._libs.missing.NAType.__bool__()
TypeError: boolean value of NA is ambiguous
anybody know how to get docarray to load documents containing "None" as a string
from docarray.
Related Issues (20)
- Release Notes HOT 2
- Tensor not found in Tensorflow module HOT 4
- index.find() tries to reshape and fails HOT 10
- Integrate new Vector search library from Spotify HOT 3
- Release Notes HOT 4
- DocList raises exception for type object. HOT 3
- Error on subindice Embedding type for Torch Tensor moving from GPU to CPU. HOT 3
- TorchTensor deepcopy raises error when dtype is not float32 HOT 8
- Support Epsilla as a document store HOT 3
- Error type hints in Python3.12 (#1147) HOT 2
- Release Note
- [Docs Missing] docs.docarray.org -> There isn't a GitHub Pages site here. HOT 1
- Enhancement to the weaviate datatypes support (text[], object, object[]) with WeaviateDocumentIndex HOT 6
- HnswDocumentIndex treats document IDs as string, they can be str, int, ID HOT 6
- Class-typed fields are required when inserting a new document HOT 3
- HNSWLib Indexer cannot knn query in subindex HOT 8
- Loading audio tensors fails: ValueError: all input arrays must have the same shape HOT 4
- “Multimodal deep learning with DocArray” There are many errors on the page HOT 2
- add int64 support the form of a millisecond timestamp HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docarray.