Describe the bug I noticed that when using <code class="notranslat

The <a href="https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.athena.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

dtype_backend overrides categories about aws-sdk-pandas HOT 3 CLOSED

antbz commented on May 28, 2024

dtype_backend overrides categories

from aws-sdk-pandas.

Comments (3)

jaidisido commented on May 28, 2024

The wr.athena.read_sql_query API has a pyarrow_additional_kwargs argument which is forwarded to the to_pandas method. If nothing is supplied, some sane defaults are applied.

If you wish to override these defaults, to remove types_mapper for example, you can do something along the lines of:

data = wr.athena.read_sql_query(
    sql="SELECT id, options FROM my_table",
    database="my-database",
    categories=["options"],
    pyarrow_additional_kwargs={'types_mapper': None},
)

from aws-sdk-pandas.

antbz commented on May 28, 2024

@jaidisido While that is a nice suggestion, it also does not work because of how _fetch_parquet_result works internally. When you specify pyarrow_additional_kwargs, the categories are never added to the kwargs actually passed onto pyarrow:

aws-sdk-pandas/awswrangler/athena/_read.py

Lines 150 to 153 in 4816e5e

    
           if not pyarrow_additional_kwargs: 
        
               pyarrow_additional_kwargs = {} 
        
               if categories: 
        
                   pyarrow_additional_kwargs["categories"] = categories

For it to work correctly you need to pass categories as additional kwargs as well:

data = wr.athena.read_sql_query(
    sql="SELECT id, options FROM my_table",
    database="my-database",
    pyarrow_additional_kwargs={'types_mapper': None, 'categories': ['options']},
)

I'm not sure if the behaviour in _fetch_parquet_result is intentional or not, but as it stands, the categories parameter effectively does not do what it is supposed to. We should either document this better or find a way to make it compatible by default.

from aws-sdk-pandas.

jaidisido commented on May 28, 2024

I can't think of a reason why it's setup that way so I believe it's just badly indented. #2701 should fix that

from aws-sdk-pandas.

Recommend Projects

dtype_backend overrides categories about aws-sdk-pandas HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if not pyarrow_additional_kwargs:
	pyarrow_additional_kwargs = {}
	if categories:
	pyarrow_additional_kwargs["categories"] = categories