Comments (3)
Hey,
One of the issues with formats like CSV is that, unlike Parquet or ORC, they don't store metadata on things such as column types. In order to infer types and transform them to the corresponding Redshift types, we need to load the whole table.
As such, if redshift.copy_from_files
supported CSV files, it would be equivalent to just loading the CSV data using s3.read_csv
and then invoking redshift.copy
with the DataFrame. This also presents a simple workaround for your issue:
df = wr.s3.read_csv("s3://...", ...)
wr.redshift.copy(df=df, path=temp_path, table=table_name)
Let me know if this helps,
Leon
from aws-sdk-pandas.
Yeah I was thinking about that, and it makes sense. On the other hand, it is often the case that the Postgres unload (or other operations that yield CSV data, for that matter) yields files that are very large (> 20 GB), and renders loading locally quite infeasible in some cases, and inefficient in other cases (after all, that is the point of large, parallel bulk operations, right?).
Would it make sense to allow CSVs so long as you pass in the schema manually? The benefit here is enabling the reuse of how the package does merge/upserts behind the scenes (which I am ending up implementing on my own otherwise).
from aws-sdk-pandas.
Related Issues (20)
- Incorrect error message or implementation for datetime formatting? HOT 1
- Unsupported Athena type: json
- Requests for Additional Argument `name_function` on `awswrangler.s3.to_parquet()` HOT 1
- Support IAM Trusted Identity Propagation and Lake Formation with Athena
- Support IAM Trusted Identity Propagation with Redshift
- Upsert column parameters HOT 1
- Get column parameters HOT 2
- s3.read_parquet_table and exception "Unknown parameter in input: "ExcludeColumnSchema", must be one of: CatalogId, DatabaseName, TableName, Expression, NextToken, Segment, MaxResults" HOT 3
- The `to_property_graph` docstring in the Neptune module is not well written and does not describe the behavior of the function when the node already exists HOT 1
- Postgres upsert table creation
- Insight into error `awswrangler.exceptions.QueryFailed: Iceberg cannot access the requested resource` HOT 2
- athena.to_iceberg function is not deleting temp_table_xxxxx properly in Athena HOT 1
- Add s3_output parameter to athena.delete_from_iceberg_table method HOT 1
- wr.s3.download fits the whole file into memory, with 2x memory allocation
- Upsert mode for SQL Server HOT 1
- Lack of `verify` input to customize SSL verify option limits smooth usage of the package modules HOT 1
- Calling wr.s3.read_parquet_metadata with a path that doesn't exist throws IndexError
- Athena query throws error with message "AttributeError: 'pyarrow._parquet.FileMetaData' object has no attribute 'total_byte_size'" HOT 3
- `athena.to_parquet` fails when `mode=overwrite_partitions` and `partition_cols` contains something like `hour(timestamp_col)`. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-sdk-pandas.