datamill-co / target-redshift Goto Github PK
View Code? Open in Web Editor NEWA Singer.io Target for Redshift
License: MIT License
A Singer.io Target for Redshift
License: MIT License
Hello all. I am working with target-redshift, however, was looking to resize columns more accurately.
I noticed target-redshift uses default_column_length, but this seems very limited to setting all varchar columns the same size.
Do I need to add something into the tap's catalog.json?
It would be a great addition to implement some functionality for accurate column sizing.
Hello everyone, I've been using target-postgres for almost an year and I'm now beggining to use target-redshift. Thank you for all the awesome work.
I'm having an issue when the target code tries to read the table comment metadata. I'm not sure why this is happening, and I was able to make it work changing two lines of code, maybe showing which lines I've changed and my json schema you could point me to something I'm getting wrong.
I've changed from file postgres.py around line 710
comment = cur.fetchone()[0]
if comment:
try:
comment_meta = json.loads(comment)
except:
self.LOGGER.exception('Could not load table comment metadata')
raise
else:
comment_meta = None
to
comment = cur.fetchone()
if comment and len(comment) > 0:
comment = comment[0]
try:
comment_meta = json.loads(comment)
except:
self.LOGGER.exception('Could not load table comment metadata')
raise
else:
comment_meta = None
and my json schema is
{ "streams": [{ "stream_alias": "batidas", "stream": "batidas", "tap_stream_id": "batidas", "schema": { "selected": true, "type": ["null", "object"], "properties": { "NSR": { "type": ["null", "string"] }, "NREP": { "type": ["null", "string"] }, "PIS": { "type": ["null", "string"] }, "Data": { "type": ["null", "string"] }, "Hora": { "type": ["null", "string"] }, "Latitude": { "type": ["null", "string"] }, "Longitude": { "type": ["null", "string"] } } }, "metadata": [{ "metadata": { "selected": true, "table-key-properties": ["NSR", "NREP", "PIS"] }, "breadcrumb": [] }] }] }
the error message is the following:
ERROR Exception writing records
Traceback (most recent call last):
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 302, in write_batch
{'version': target_table_version})
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 824, in write_batch_helper
metadata)
File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 84, in upsert_table_helper
log_schema_changes=log_schema_changes)
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 400, in upsert_table_helper
self.add_table(connection, table_path, table_name, _metadata)
File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 107, in add_table
json_schema.make_nullable({'type': json_schema.BOOLEAN}))
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 727, in add_column_mapping
metadata = self._get_table_metadata(cur, table_name)
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 713, in _get_table_metadata
comment = cur.fetchone()[0]
TypeError: 'NoneType' object is not subscriptable
CRITICAL ('Exception writing records', TypeError("'NoneType' object is not subscriptable",))
Traceback (most recent call last):
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 302, in write_batch
{'version': target_table_version})
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 824, in write_batch_helper
metadata)
File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 84, in upsert_table_helper
log_schema_changes=log_schema_changes)
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 400, in upsert_table_helper
self.add_table(connection, table_path, table_name, _metadata)
File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 107, in add_table
json_schema.make_nullable({'type': json_schema.BOOLEAN}))
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 727, in add_column_mapping
metadata = self._get_table_metadata(cur, table_name)
File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 713, in _get_table_metadata
comment = cur.fetchone()[0]
TypeError: 'NoneType' object is not subscriptable
Thank you in advance!
When using tap-postgres
on a table with the string array type varchar[]
, the Singer schema message produced uses a sdc_recursive_string_array
. target-redshift
fails in this scenario with the error:
CRITICAL ('`schema` is an invalid JSON Schema instance: {...} is recursive')
In the future it would be great to add support for postgres array types.
When running target-redshift
for big and long streams extract,
the target is able to load all the data with a stable resources consumption.
When running target-redshift==0.2.1
for one big and long stream extract on the Python Docker image python:3.7.5-buster
,
the target is ultimately killed by Unix because of an OOM error.
Then, the target send a SIGPIPE to the tap, causing a BrokenPipeError
when calling singer.write_message()
sys.stdout.flush()
BrokenPipeError: [Errno 32] Broken pipe
We either have something like a weird data-shape which is causing us to hang onto old pointers/refs OR there's an honest to god memory leak
docker build -t memory_test .
FROM python:3.7.5-buster
RUN apt-get update --yes \
&& apt-get upgrade --yes \
&& apt-get install --yes \
vim \
htop
RUN pip install target-redshift==0.2.1
RUN echo '#!/bin/bash\n'\
'# Print the schema\n'\
'echo "{\"type\": \"SCHEMA\", \"stream\": \"engagements\", \"schema\": {\"properties\": {\"engagement_id\": {\"type\": [\"null\", \"integer\"]}, \"engagement\": {\"properties\": {\"id\": {\"type\": [\"null\", \"integer\"]}, \"portal_id\": {\"type\": [\"null\", \"integer\"]}, \"active\": {\"type\": [\"null\", \"boolean\"]}, \"created_at\": {\"type\": [\"null\", \"integer\"]}, \"last_updated\": {\"type\": [\"null\", \"integer\"]}, \"created_by\": {\"type\": [\"null\", \"integer\"]}, \"modified_by\": {\"type\": [\"null\", \"integer\"]}, \"owner_id\": {\"type\": [\"null\", \"integer\"]}, \"type\": {\"type\": [\"null\", \"string\"]}, \"timestamp\": {\"type\": [\"null\", \"integer\"]}, \"activity_type\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"associations\": {\"properties\": {\"contact_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"company_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"deal_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"owner_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"workflow_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"ticket_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"content_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}, \"quote_ids\": {\"items\": {\"type\": [\"null\", \"integer\"]}, \"type\": [\"null\", \"array\"]}}, \"type\": [\"null\", \"object\"]}, \"attachments\": {\"items\": {\"properties\": {\"id\": {\"type\": [\"null\", \"integer\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"metadata\": {\"properties\": {\"body\": {\"type\": [\"null\", \"string\"]}, \"from\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}, \"first_name\": {\"type\": [\"null\", \"string\"]}, \"last_name\": {\"type\": [\"null\", \"string\"]}}, \"type\": [\"null\", \"object\"]}, \"to\": {\"items\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"cc\": {\"items\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"bcc\": {\"items\": {\"properties\": {\"email\": {\"type\": [\"null\", \"string\"]}}, \"type\": \"object\"}, \"type\": [\"null\", \"array\"]}, \"subject\": {\"type\": [\"null\", \"string\"]}, \"html\": {\"type\": [\"null\", \"string\"]}, \"text\": {\"type\": [\"null\", \"string\"]}, \"status\": {\"type\": [\"null\", \"string\"]}, \"for_object_type\": {\"type\": [\"null\", \"string\"]}, \"start_time\": {\"type\": [\"null\", \"integer\"]}, \"end_time\": {\"type\": [\"null\", \"integer\"]}, \"title\": {\"type\": [\"null\", \"string\"]}, \"to_number\": {\"type\": [\"null\", \"string\"]}, \"from_number\": {\"type\": [\"null\", \"string\"]}, \"external_id\": {\"type\": [\"null\", \"string\"]}, \"duration_milliseconds\": {\"type\": [\"null\", \"integer\"]}, \"external_account_id\": {\"type\": [\"null\", \"string\"]}, \"recording_url\": {\"format\": \"uri\", \"type\": [\"null\", \"string\"]}, \"disposition\": {\"type\": [\"null\", \"string\"]}}, \"type\": [\"null\", \"object\"]}}, \"type\": \"object\"}, \"key_properties\": [\"engagement_id\"]}"\n'\
'# Print an infinity of record\n'\
'let i=0 \n'\
'while true; do\n'\
'let i=i+1\n'\
'echo "{\"type\": \"RECORD\", \"stream\": \"engagements\", \"record\": {\"engagement\": {\"id\": ${i}, \"portal_id\": 123456, \"active\": true, \"created_at\": 1502097241136, \"last_updated\": 1566630888209, \"created_by\": 123456, \"modified_by\": 123456, \"owner_id\": 123456, \"type\": \"NOTE\", \"timestamp\": 1502097241136}, \"associations\": {\"contact_ids\": [], \"company_ids\": [], \"deal_ids\": [], \"owner_ids\": [123456], \"workflow_ids\": [], \"ticket_ids\": [], \"content_ids\": [], \"quote_ids\": []}, \"attachments\": [], \"metadata\": {\"body\": \"Hello im a body\"}, \"engagement_id\": ${i}, \"time_extracted\": \"2019-11-05T09:46:28.095053Z\"}}"\n'\
'done' >> /tmp/record_generator.sh
RUN chmod +x /tmp/record_generator.sh
RUN echo '{\n'\
' "default_column_length": 65535,\n'\
' "logging_level": "INFO",\n'\
' "invalid_records_detect": false,\n'\
' "invalid_records_threshold": 0,\n'\
' "persist_empty_tables": true,\n'\
' "max_batch_rows": 100000,\n'\
' "redshift_host": "your_cluster",\n'\
' "redshift_port": 5439,\n'\
' "redshift_database": "your_db",\n'\
' "redshift_username": "your_user",\n'\
' "redshift_password": "your_password",\n'\
' "redshift_schema": "backfill_hubspot",\n'\
' "target_s3": {\n'\
' "aws_access_key_id": "your_key",\n'\
' "aws_secret_access_key": "your_access_key",\n'\
' "bucket": "your_bucket",\n'\
' "key_prefix": "your_prefix"\n'\
' }\n'\
'}\n' >> /tmp/target_config.json
docker run -it memory_test /bin/bash
/tmp/target_config.json
file with your own credentialsnohup bash -c "/tmp/record_generator.sh | target-redshift --config /tmp/target_config.json" &
htop
I was trying to backfill a big stream (engagements from the tap-hubspot
), but the job always fails
Per this article, performance may be improved by adding COMPUPDATE OFF STATUPDATE OFF
to COPY queries.
Note that PipelineWise takes this approach in their target-redshift implementation
According to the readme by adding invalid_records_detect
set to false
in the config Redshift won't crash when there is an invalid records. I've added it to my config file but each time Redshift encounters a malformed row I get the following error and my pipe crashes:
ERROR Exception writing records
Traceback (most recent call last):
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 186, in write_batch
{'version': target_table_version})
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 831, in write_batch_helper
metadata)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 515, in write_table_batch
csv_rows)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 136, in persist_csv_rows
cur.execute(copy_sql)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 53, in execute
return super(_MillisLoggingCursor, self).execute(query, vars)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 440, in execute
return super(LoggingCursor, self).execute(query, vars)
psycopg2.InternalError: Load into table 'tmp_5c96ed60_77f8_4b70_8246_6d8dc5e83f0d' failed. Check 'stl_load_errors' system table for details.
CRITICAL ('Exception writing records', InternalError("Load into table 'tmp_5c96ed60_77f8_4b70_8246_6d8dc5e83f0d' failed. Check 'stl_load_errors' system table for details.\n"))
Traceback (most recent call last):
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 186, in write_batch
{'version': target_table_version})
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 831, in write_batch_helper
metadata)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 515, in write_table_batch
csv_rows)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 136, in persist_csv_rows
cur.execute(copy_sql)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 53, in execute
return super(_MillisLoggingCursor, self).execute(query, vars)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 440, in execute
return super(LoggingCursor, self).execute(query, vars)
psycopg2.InternalError: Load into table 'tmp_5c96ed60_77f8_4b70_8246_6d8dc5e83f0d' failed. Check 'stl_load_errors' system table for details.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/bin/target-redshift", line 11, in <module>
sys.exit(cli())
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 54, in cli
main(args.config)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 48, in main
target_tools.main(redshift_target)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 25, in main
stream_to_target(input_stream, target, config=config)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 69, in stream_to_target
raise e
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 60, in stream_to_target
_flush_streams(streams, target)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 88, in _flush_streams
_flush_stream(target, stream_buffer)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 81, in _flush_stream
target.write_batch(stream_buffer)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 58, in write_batch
return PostgresTarget.write_batch(self, nullable_stream_buffer)
File "/Users/dorcielovinsky/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 195, in write_batch
raise PostgresError(message, ex)
target_postgres.postgres.PostgresError: ('Exception writing records', InternalError("Load into table 'tmp_5c96ed60_77f8_4b70_8246_6d8dc5e83f0d' failed. Check 'stl_load_errors' system table for details.\n"))
From stl_load_errors
: Missing newline: Unexpected character 0xffffffbd found at location 34
62851352222d36006fb739dc-1747877889-250822-2142.pdf
Attached is a summary of an Issue we are facing with one of our meltano pipelines using a target-redshift datamill loader
Support IAM Role authentication for S3 COPY
Issue:
If a tap have some fields not explicitly declared, but declared in a
"patternProperties": {
".+": {}
}
block, they won't be loaded into Redshift.
While, when using another target (like target-csv), the fields are available.
How to reproduce:
data.jsonl.txt
( a file with a schema and an example record)
issues.csv.txt
A CSV generated by the following command
cat data.jsonl | target-csv
Version:
Python 3.7.3
target-csv==0.3.0
target-redshift==0.0.7
Documentation:
The link to the target-csv flatten function
While working on datamill-co/target-snowflake#12 we figured out that leveraging the SQLInterface
table upsert logic is far slower for the tmp
staging tables than using a single command in remote.
Snowflake has the CREATE TABLE ... LIKE ...
syntax that Postgres has as well.
Integrating with RedShift Spectrum will make this target very efficient.
Use Case
If you have large amounts of data that is not accessed frequently, it is more efficient to store it in S3 instead of keeping it loaded in RedShift. When data needs to be queried, it can be queried with RedShfit Spectrum that provides really fast querying by leveraging MPP. In order for this work, the data needs to be stored in S3 in a structured columnar format - I suggest Parquet. Moreover, these files can be gzipped leading to more efficient storage without slowing down query times.
Enhancements
RedShfit Spectrum is also able to query nested fields for files that are stored in S3 as parquet (https://docs.aws.amazon.com/redshift/latest/dg/tutorial-nested-data-create-table.html). Looking at the code, seems like that's not something currently provided, so this will be an additional enhancement.
Integration Steps:
Changes I foresee
I think this can make target-redshift very scalable. Infact, we may be able to separate the intermediate code, and create a target-s3-parquet that simply stores the stream as an s3 parquet file (similar to the s3-csv tap).
Talked with @AlexanderMann about this in slack.
Thanks for the work on this project! We're just trying out Singer for moving data from MySQL to Redshift. In MySQL we have a column type of bigint(18) unsigned
. Some values in this column don't fit it Redshift's bigint
column type and we get errors like Overflow (Long valid range -9223372036854775808 to 9223372036854775807)
Typically we declare a Redshift column as NUMERIC(20, 0)
to hold these values. Is there a way to tell target-redshift
to use that type for a particular Redshift column?
Can this column name be a bit more terse? It being the first column and being super long makes exploratory SELECT *
s out of the table annoying. Given the reliance on target-postgres
and the performance characteristics of Redshift, I assume removing it or moving it to the end aren't simple changes to make.
The connectivity was good to the redshift from dbeaver with a set of creds and was able to intract on the redshift. But the same creds and details when used with singer target-redshift, not able to connect.
The current target-redshift do not support passing "SSL true". Could you help with the same ?
I have a shopify store and I am trying to read the data from it using singer.io and dump it into the redshift cluster. I was able to run it successfully using postgresql but redshift is throwing an error that I am unable to understand and fix. Any help or suggestions as to how to get around this?
ImportError: cannot import name 'Mapping' from 'collections' (/usr/lib/python3.10/collections/__init__.py)
Below is my redshfit_config.json
{
"redshift_host":"redshift-cluster-1.cepdr3dnmmcm.us-east-1.redshift.amazonaws.com",
"redshift_port":5439,
"redshift_database":"dev",
"redshift_username":"awsuser",
"redshift_password":"Password",
"redshift_schema":"singer",
"default_column_length":1000,
"target_s3":{
"aws_access_key_id":"*Aws key id*",
"aws_secret_access_key":"*aws access key*",
"bucket":"singer-redshift",
"key_prefix":"__tmp"
}
and when trying to run i am getting this error
root@e0e876755adf:/# ~/.virtualenvs/tap-shopify/bin/tap-shopify -c config.json --catalog catalog.json | target-redshift --config redshift_config.json
INFO GET https://testnoman.myshopify.com/admin/api/2024-01/shop.json
Traceback (most recent call last):
File "/usr/local/bin/target-redshift", line 5, in <module>
from target_redshift import cli
File "/usr/local/lib/python3.10/dist-packages/target_redshift/__init__.py", line 8, in <module>
from target_redshift.s3 import S3
File "/usr/local/lib/python3.10/dist-packages/target_redshift/s3.py", line 3, in <module>
import boto3
File "/usr/local/lib/python3.10/dist-packages/boto3/__init__.py", line 16, in <module>
from boto3.session import Session
File "/usr/local/lib/python3.10/dist-packages/boto3/session.py", line 17, in <module>
import botocore.session
File "/usr/local/lib/python3.10/dist-packages/botocore/session.py", line 29, in <module>
import botocore.configloader
File "/usr/local/lib/python3.10/dist-packages/botocore/configloader.py", line 19, in <module>
from botocore.compat import six
File "/usr/local/lib/python3.10/dist-packages/botocore/compat.py", line 25, in <module>
from botocore.exceptions import MD5UnavailableError
File "/usr/local/lib/python3.10/dist-packages/botocore/exceptions.py", line 15, in <module>
from botocore.vendored import requests
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/__init__.py", line 58, in <module>
from . import utils
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/utils.py", line 26, in <module>
from .compat import parse_http_list as _parse_list_header
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/compat.py", line 7, in <module>
from .packages import chardet
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/packages/__init__.py", line 3, in <module>
from . import urllib3
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/packages/urllib3/__init__.py", line 10, in <module>
from .connectionpool import (
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py", line 38, in <module>
from .response import HTTPResponse
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/packages/urllib3/response.py", line 9, in <module>
from ._collections import HTTPHeaderDict
File "/usr/local/lib/python3.10/dist-packages/botocore/vendored/requests/packages/urllib3/_collections.py", line 1, in <module>
from collections import Mapping, MutableMapping
ImportError: cannot import name 'Mapping' from 'collections' (/usr/lib/python3.10/collections/__init__.py)
INFO --> 200 OK 1517b
CRITICAL BrokenPipeError
Traceback (most recent call last):
File "/root/.virtualenvs/tap-shopify/lib/python3.10/site-packages/tap_shopify/__init__.py", line 213, in main
sync()
File "/root/.virtualenvs/tap-shopify/lib/python3.10/site-packages/tap_shopify/__init__.py", line 149, in sync
singer.write_schema(stream["tap_stream_id"],
File "/root/.virtualenvs/tap-shopify/lib/python3.10/site-packages/singer/messages.py", line 264, in write_schema
write_message(
File "/root/.virtualenvs/tap-shopify/lib/python3.10/site-packages/singer/messages.py", line 227, in write_message
sys.stdout.flush()
BrokenPipeError: [Errno 32] Broken pipe
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/.virtualenvs/tap-shopify/bin/tap-shopify", line 33, in <module>
sys.exit(load_entry_point('tap-shopify==1.10.0', 'console_scripts', 'tap-shopify')())
File "/root/.virtualenvs/tap-shopify/lib/python3.10/site-packages/singer/utils.py", line 235, in wrapped
return fnc(*args, **kwargs)
File "/root/.virtualenvs/tap-shopify/lib/python3.10/site-packages/tap_shopify/__init__.py", line 228, in main
raise ShopifyError(exc) from exc
tap_shopify.exceptions.ShopifyError: BrokenPipeError
If I switch the target from redshift to postgres it works fine.
I am running python 3.10
Hello all! I have been receiving an error while using target-redshift where the connection is established at the start of a run, but once the data has been gathered and preparing to upload the run breaks due to the connection already being closed.
CRITICAL cursor already closed
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
During handling of the above exception, another exception occurred:
psycopg2.InterfaceError: cursor already closed
During handling of the above exception, another exception occurred:
psycopg2.InterfaceError: connection already closed
If anyone could shine some light on how to resolve this issue that would be great!
Currently using tap-facebook
from singer.io to read from facebook marketing api and writing to Redshift using the target-redshift
package. The connection establishes with Redshift and reads the data, but before i get to the write step I'm getting the error:
Traceback (most recent call last):
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 129, in write_batch
cur.execute('BEGIN;')
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 53, in execute
return super(_MillisLoggingCursor, self).execute(query, vars)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 440, in execute
return super(LoggingCursor, self).execute(query, vars)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 48, in main
target_tools.main(redshift_target)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 25, in main
stream_to_target(input_stream, target, config=config)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 69, in stream_to_target
raise e
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 63, in stream_to_target
_flush_streams(streams, target, force=True)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 88, in _flush_streams
_flush_stream(target, stream_buffer)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 81, in _flush_stream
target.write_batch(stream_buffer)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 58, in write_batch
return PostgresTarget.write_batch(self, nullable_stream_buffer)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 192, in write_batch
cur.execute('ROLLBACK;')
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 53, in execute
return super(_MillisLoggingCursor, self).execute(query, vars)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 440, in execute
return super(LoggingCursor, self).execute(query, vars)
psycopg2.InterfaceError: cursor already closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "venv--target-redshift/bin/target-redshift", line 10, in <module>
sys.exit(cli())
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 54, in cli
main(args.config)
File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 48, in main
target_tools.main(redshift_target)
psycopg2.InterfaceError: connection already closed
For a better user experience, and efficient handling of text data, target-redshift should implement automatic varchar widening, which starts off at a default (say 255) and automatically widen the varchar max length column as it observes field text lengths in incoming batches.
Steps:
Amazon Redshift has a data type "timestamp without time zone" which isn't supported by the target yet (only "timestamp with time zone" is supported via target-postgres)
Data Type | Aliases | Description |
---|---|---|
TIMESTAMP | TIMESTAMP WITHOUT TIME ZONE | Date and time (without time zone) |
TIMESTAMPTZ | TIMESTAMP WITH TIME ZONE | Date and time (with time zone) |
Source: https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html
So when target-redshift scan the destination schema, the following error occurs:
File "/app/target/.venv/lib/python3.7/site-packages/target_postgres/postgres.py", line 782, in sql_type_to_json_schema
raise PostgresError('Unsupported type `{}` in existing target table'.format(sql_type))
target_postgres.exceptions.PostgresError: Unsupported type `timestamp without time zone` in existing target table
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.