Comments (3)
The wr.athena.read_sql_query API has a pyarrow_additional_kwargs
argument which is forwarded to the to_pandas
method. If nothing is supplied, some sane defaults are applied.
If you wish to override these defaults, to remove types_mapper
for example, you can do something along the lines of:
data = wr.athena.read_sql_query(
sql="SELECT id, options FROM my_table",
database="my-database",
categories=["options"],
pyarrow_additional_kwargs={'types_mapper': None},
)
from aws-sdk-pandas.
@jaidisido While that is a nice suggestion, it also does not work because of how _fetch_parquet_result
works internally. When you specify pyarrow_additional_kwargs
, the categories
are never added to the kwargs actually passed onto pyarrow:
aws-sdk-pandas/awswrangler/athena/_read.py
Lines 150 to 153 in 4816e5e
For it to work correctly you need to pass categories as additional kwargs as well:
data = wr.athena.read_sql_query(
sql="SELECT id, options FROM my_table",
database="my-database",
pyarrow_additional_kwargs={'types_mapper': None, 'categories': ['options']},
)
I'm not sure if the behaviour in _fetch_parquet_result
is intentional or not, but as it stands, the categories
parameter effectively does not do what it is supposed to. We should either document this better or find a way to make it compatible by default.
from aws-sdk-pandas.
I can't think of a reason why it's setup that way so I believe it's just badly indented. #2701 should fix that
from aws-sdk-pandas.
Related Issues (20)
- get more than 50 items from `describe_log_streams` HOT 4
- chunked = True in wr.s3.read_parquet is not working as expected HOT 1
- Iceberg partitioning based on transformed DataFrame columns not supported? HOT 5
- combination of managed layer AWSSDKPandas-Python312 and AWSLambdaPowertoolsPythonV2 result in ModuleNotFoundError: No module named '_cffi_backend' HOT 6
- Parse error running Neptune SPARQL query with null value
- Why do I have to install openpyxl when I am using calamine as a pandas read_excel engine? HOT 2
- Remove README from pypi package
- Redshift.to_sql() Doc String Ident Error
- Release 3.7.0 does not work with pyarrow 7.0.0 HOT 1
- Why does 3.7.0 s3.to_parquet require Glue:CreateTable permissions? HOT 3
- Calling athena.to_iceberg can lead to unexpected permission related issues due to default query output location HOT 2
- S3 GetObject opetation. ClientError: The request signature we calculated does not match the signature you provided HOT 2
- Cloudwatch logs read_logs function breaks from version 3.4.0 onwards HOT 2
- facing OperationTimeoutException exception while writing to s3 using aws-wrangler HOT 1
- Why doesn't postgresql.to_sql support a TRUNCATE overwrite_method?
- Use of np.array_split causes warning that can be avoided
- Support PostGres in Data API with Vectors HOT 2
- _determine_differences method in athena.to_iceberg() is not matching types properly
- PostgreSQL - Adding constraints on columns for .to_sql() function HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-sdk-pandas.