Coder Social home page Coder Social logo

Comments (15)

JohannesMessner avatar JohannesMessner commented on June 15, 2024 1

@nikhilmakan02 the PR is ready to go, but until it is released you can circumvent the problem by downgrading your numpy version. Thanks for flagging this!

from docarray.

JoanFM avatar JoanFM commented on June 15, 2024

Can you give an example of a Doc type that would not work?

from docarray.

nikhilmakan02 avatar nikhilmakan02 commented on June 15, 2024

Here is an example taken from the docs which I have modified to now include an embeddings column. Hope that makes it clearer. This example works fine with out the embeddings field.

You can even create a doc list with the embeddings field using a python loop as described above. Then use the to_dataframe() method to export the doc list. But you then can't go back and create the doc list again from that dataframe using the from_dataframe() method.

Note I am using doc list and doc vec interchangeably here, the problem is the same for both.

import pandas as pd
import numpy as np
from docarray.typing import NdArray

from docarray import BaseDoc, DocVec


class Person(BaseDoc):
    name: str
    follower: int
    embeddings: NdArray[32]


df = pd.DataFrame(
    data=[["Maria", 12345, np.zeros(32)], ["Jake", 54321, np.zeros(32)]],
    columns=["name", "follower", "embeddings"],
)

docs = DocVec[Person].from_dataframe(df)

from docarray.

JoanFM avatar JoanFM commented on June 15, 2024

What is the error you see? I do not reproduce the issue. My pandas version is 2.0.3. What is yours?

from docarray.

nikhilmakan02 avatar nikhilmakan02 commented on June 15, 2024

python==3.11
docarray==0.39.0
pandas==2.1.1

Here is the traceback. I downgraded to pandas==2.0.3 same issue.

Traceback (most recent call last):
  File "/workspaces/address-vector-search/archive/doc_array_github_issue.py", line 19, in <module>
    docs = DocVec[Person].from_dataframe(df)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docarray/array/doc_vec/io.py", line 451, in from_dataframe
    return cls(super().from_dataframe(df), tensor_type=tensor_type)  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docarray/array/doc_list/io.py", line 519, in from_dataframe
    doc_dict = _access_path_dict_to_nested_dict(access_path2val)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/docarray/helper.py", line 79, in _access_path_dict_to_nested_dict
    value=value if value not in ['', 'None'] else None,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

from docarray.

JoanFM avatar JoanFM commented on June 15, 2024

Can you share all the installed packages? I do not manage to reproduce.

from docarray.

nikhilmakan02 avatar nikhilmakan02 commented on June 15, 2024

I can hopefully do a bit better to recreate the issue for you. I am working in a vs code dev container which is just a docker container.

So you can recreate the issue by using docker, vs code and the vs code dev container extension. Or however other means you would like. The attachment below contains the docker file, requirements.txt file and the code already shown above, there is also a devcontainer.json file which is used by the vs code dev container extension.

github-issue-docarry-df.zip

Hopefully this should create an isolated instance of the problem that you can reproduce on your end.

from docarray.

JoanFM avatar JoanFM commented on June 15, 2024

Thanks, will try to do it

from docarray.

JoanFM avatar JoanFM commented on June 15, 2024

Reproduced in the docker container thanks.

This is the list of dependencies:

annotated-types==0.6.0
docarray==0.39.0
markdown-it-py==3.0.0
mdurl==0.1.2
mypy-extensions==1.0.0
numpy==1.26.1
orjson==3.9.9
pandas==2.1.1
pydantic==2.4.2
pydantic_core==2.10.1
Pygments==2.16.1
python-dateutil==2.8.2
pytz==2023.3.post1
rich==13.6.0
six==1.16.0
types-requests==2.31.0.9
typing-inspect==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.0.6

from docarray.

JoanFM avatar JoanFM commented on June 15, 2024

Okey, it seems the reason is the numpy version is different to the one I have. I will try to fix this, thank you very much.

from docarray.

nikhilmakan02 avatar nikhilmakan02 commented on June 15, 2024

Amazing thanks. Out of interest what does the below mean in the docs

"List-like fields (including field of type DocList) are not supported."

I initially raised this as a feature request as I thought that meant you couldn't have lists in a field.

from docarray.

JoanFM avatar JoanFM commented on June 15, 2024

Oh, this may be part of the issue. However, there should be a better error message.

from docarray.

JohannesMessner avatar JohannesMessner commented on June 15, 2024

Hey @nikhilmakan02 , this is indeed a bug, the sentence "List-like fields (including field of type DocList) are not supported." just means that the Document schema that defines your DocList/DocVec cannot contain another DocList/DocVec, which is not the case in your example.

I am working on a fix right now, if things go to plan it should be out today.

from docarray.

nikhilmakan02 avatar nikhilmakan02 commented on June 15, 2024

@JohannesMessner amazing work! Thank you very much for addressing this.

from docarray.

bionicles avatar bionicles commented on June 15, 2024

@JohannesMessner @nikhilmakan02 @JoanFM FYI
found an edge case for you

from NLM UMLS:
"C3842509","Finding","0 - none; 0 = None; 0= None; 0=None","None"
val: <class 'str'> = C3842509
val: <class 'str'> = Finding
val: <class 'str'> = 0 - none; 0 = None; 0= None; 0=None
val: <class 'str'> = None

def _is_none_like(val: Any) -> bool:
    """
    :param val: any value
    :return: true iff `val` equals to `None`, `'None'` or `''`
    """
    # Convoluted implementation, but fixes https://github.com/docarray/docarray/issues/1821

    # tensor-like types can have unexpected (= broadcast) `==`/`in` semantics,
    # so treat separately
    is_np_arr = isinstance(val, np.ndarray)
    if is_np_arr:
        return False

    is_torch_tens = is_torch_available() and isinstance(val, torch.Tensor)
    if is_torch_tens:
        return False

    is_tf_tens = is_tf_available() and isinstance(val, tf.Tensor)
    if is_tf_tens:
        return False

    is_jax_arr = is_jax_available() and isinstance(val, jax.numpy.ndarray)
    if is_jax_arr:
        return False

    # "normal" case
+   print(f"val: {type(val)} = {val}")
    return val in ['', 'None', None]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[31], [line 4](vscode-notebook-cell:?execution_count=31&line=4)
      [1](vscode-notebook-cell:?execution_count=31&line=1) conditions_doc_list = docarray.DocList[Condition].from_dataframe(
      [2](vscode-notebook-cell:?execution_count=31&line=2)     df=conditions_df,
      [3](vscode-notebook-cell:?execution_count=31&line=3) )
----> [4](vscode-notebook-cell:?execution_count=31&line=4) findings_doc_list = docarray.DocList[Finding].from_dataframe(
      [5](vscode-notebook-cell:?execution_count=31&line=5)     df=findings_df,
      [6](vscode-notebook-cell:?execution_count=31&line=6) )
      [7](vscode-notebook-cell:?execution_count=31&line=7) associations_doc_list = docarray.DocList[Association].from_dataframe(
      [8](vscode-notebook-cell:?execution_count=31&line=8)     df=associations_df,
      [9](vscode-notebook-cell:?execution_count=31&line=9) )

File [~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:518](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:518), in IOMixinDocList.from_dataframe(cls, df)
    [516](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:516)     access_path2val = row._asdict()
    [517](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:517)     access_path2val.pop('index', None)
--> [518](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:518)     doc_dict = _access_path_dict_to_nested_dict(access_path2val)
    [519](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:519)     docs.append(doc_type.parse_obj(doc_dict))
    [521](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/array/doc_list/io.py:521) return docs

File [~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:124](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:124), in _access_path_dict_to_nested_dict(access_path2val)
    [120](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:120) nested_dict: Dict[Any, Any] = {}
    [121](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:121) for access_path, value in access_path2val.items():
    [122](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:122)     field2val = _access_path_to_dict(
    [123](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:123)         access_path=access_path,
--> [124](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:124)         value=None if _is_none_like(value) else value,
    [125](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:125)     )
    [126](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:126)     _update_nested_dicts(to_update=nested_dict, update_with=field2val)
    [127](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:127) return nested_dict

File [~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:99), in _is_none_like(val)
     [96](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:96)     return False
     [98](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:98) # "normal" case
---> [99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/bion/hax/clinic_monorepo/pyddx/~/miniconda3/envs/py310/lib/python3.10/site-packages/docarray/helper.py:99) return val in ['', 'None', None]

File missing.pyx:419, in pandas._libs.missing.NAType.__bool__()

TypeError: boolean value of NA is ambiguous

anybody know how to get docarray to load documents containing "None" as a string

from docarray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.