datamill-co / target-snowflake Goto Github PK
View Code? Open in Web Editor NEWA Singer.io Target for Snowflake
License: MIT License
A Singer.io Target for Snowflake
License: MIT License
Currently target-snowflake
flattens all nested objects if such are present in catalog of a tap. This is different behaviour from how Stitch platform handles sending data to Snowflake - I've seen the latter sending VARIANT-type data to Snowflake, without flattening them.
Might be worth synchronising this behaviour with Stitch, or maybe allowing for configuring it.
SQL hooks before_run_sql
and after_run_sql
fail to execute with the following error message:
Traceback (most recent call last):
File "/project/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
sys.exit(cli())
File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 57, in cli
main(args.config)
File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 51, in main
target_tools.main(target)
File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
stream_to_target(input_stream, target, config=config)
File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 44, in stream_to_target
_run_sql_hook('before_run_sql', config, target)
File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 188, in _run_sql_hook
with target.conn.cursor() as cur:
AttributeError: 'SnowflakeTarget' object has no attribute 'conn'
Seems to be caused by this target using a different attribute name for the SQLAlchemy connection (self.connection
instead of self.conn
).
Batching performance seems to have degraded and million+ rows of data are using small (hundreds to low thousands) sized batches.
The issue may have been introduced by the target-snowflake dependency's new state flushing/management datamill-co/target-postgres#130
Tests should be added to cover batch cutting logic.
Hello!
I think I've found a bug when trying to set a column as nullable. I changed the tap schema to accept NULL
and now the target is raising an AttributeError
. The problem may be in this function:
https://github.com/datamill-co/target-snowflake/blob/master/target_snowflake/snowflake.py#L579-L587
target-snowflake | ERROR Exception writing records
target-snowflake | Traceback (most recent call last):
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake | written_batches_details = self.write_batch_helper(cur,
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 822, in write_batch_helper
target-snowflake | remote_schema = self.upsert_table_helper(connection,
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 499, in upsert_table_helper
target-snowflake | self.make_column_nullable(connection,
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 586, in make_column_nullable
target-snowflake | table_name=sql.Identifier(table_name),
target-snowflake | AttributeError: module 'target_snowflake.sql' has no attribute 'Identifier'
target-snowflake | CRITICAL ('Exception writing records', AttributeError("module 'target_snowflake.sql' has no attribute 'Identifier'"))
target-snowflake | INFO MillisLoggingCursor: 98 millis spent executing: ROLLBACK
target-snowflake | Traceback (most recent call last):
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake | written_batches_details = self.write_batch_helper(cur,
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 822, in write_batch_helper
target-snowflake | remote_schema = self.upsert_table_helper(connection,
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 499, in upsert_table_helper
target-snowflake | self.make_column_nullable(connection,
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 586, in make_column_nullable
target-snowflake | table_name=sql.Identifier(table_name),
target-snowflake | AttributeError: module 'target_snowflake.sql' has no attribute 'Identifier'
target-snowflake |
target-snowflake | During handling of the above exception, another exception occurred:
target-snowflake |
target-snowflake | Traceback (most recent call last):
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
target-snowflake | sys.exit(cli())
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 57, in cli
target-snowflake | main(args.config)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 51, in main
target-snowflake | target_tools.main(target)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
target-snowflake | stream_to_target(input_stream, target, config=config)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-snowflake | raise e
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 70, in stream_to_target
target-snowflake | state_tracker.flush_streams(force=True)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 47, in flush_streams
target-snowflake | self._write_batch_and_update_watermarks(stream)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
target-snowflake | self.target.write_batch(stream_buffer)
target-snowflake | File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 164, in write_batch
target-snowflake | raise SnowflakeError(message, ex)
target-snowflake | target_snowflake.exceptions.SnowflakeError: ('Exception writing records', AttributeError("module 'target_snowflake.sql' has no attribute 'Identifier'"))
I see that target uses a conventional database connection to copy csv files into snowflake table.
Do you think this "target" covers snowpipe usage or do you think that fits to another target ?
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-intro.html
https://github.com/snowflakedb/snowflake-ingest-python
I get this error when running a tap against this target. I switched to a different variant with success but wanted to bring this up in case anyone has any ideas
target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: '135 is greater than the maximum of 127'>, {'type': 'RECORD', 'stream': 'xxxxxxxxxxxx', 'record': {'xxxxxxxxxx': xxxxxxxxxxx}, 'version': xxxxxxxxxxxxx, 'time_extracted': '2021-10-21T18:05:53.704164Z', '__raw_line_size': 594})])
Any AWS config causes S3 to be attempted, even when no bucket has been selected.
I was testing target-snowflake
with tap-gitlab
and came across some interesting behavior. Previously, when I used the Meltano variant of target-snowflake
I didn't have to specify a schema and the target would sort out where to put it. In that case I was using tap-zoom
and the schema was tap_zoom
.
Inserting data failed when using the datamill variant because the schema wasn't created beforehand, even though it was already looking for tap_gitlab
. When I manually created the schema I could get the pipeline to run with no problems even without specifying the schema in the my config.
What's the expected behavior? Should the target be creating the schema if it doesn't exist?
Apologies if this question has already been asked and answered, but would it be possible to remove the requirement for pre-installed Postgres? Either by refactoring around the dependency on singer-target-postgres
or by using a fork that uses psycopg2-binary
in place of psycopg2
?
It can sometimes be confusing or time consuming for new users to have to install the Postgres dependency ahead of landing data in Snowflake.
Happy to contribute cycles if we can help in any way. Thanks!
Regarding this error message during install:
Error: pg_config executable not found.
pg_config is required to build psycopg2 from source. Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
For further information please check the 'doc/src/install.rst' file (also at
<https://www.psycopg.org/docs/install.html>).
I am using target-snowflake along with meltano and a local tap.
I had a case where a string field value which ends in a backslash โ\โ (U+005C), the loader process fails. The error is reported against another field in the row having an invalid format. Simply removing the character from the source data allowed the loader to succeed.
example meltano output:
target-snowflake | ERROR Exception writing records
target-snowflake | Traceback (most recent call last):
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake | written_batches_details = self.write_batch_helper(cur,
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 834, in write_batch_helper
target-snowflake | batch_rows_persisted = self.write_table_batch(
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 538, in write_table_batch
target-snowflake | self.persist_csv_rows(cur,
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 479, in persist_csv_rows
target-snowflake | cur.execute('''
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/connection.py", line 20, in execute
target-snowflake | super(MillisLoggingCursor, self).execute(command, **kwargs)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 601, in execute
target-snowflake | Error.errorhandler_wrapper(self.connection, self,
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 97, in errorhandler_wrapper
target-snowflake | cursor.errorhandler(connection, cursor, errorclass, errorvalue)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 68, in default_errorhandler
target-snowflake | raise errorclass(
target-snowflake | snowflake.connector.errors.ProgrammingError: 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake | File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake | Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake | If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
target-snowflake | CRITICAL ('Exception writing records', 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake | File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake | Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake | If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.)
target-snowflake | INFO MillisLoggingCursor: 43 millis spent executing: ROLLBACK
target-snowflake | Traceback (most recent call last):
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake | written_batches_details = self.write_batch_helper(cur,
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 834, in write_batch_helper
target-snowflake | batch_rows_persisted = self.write_table_batch(
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 538, in write_table_batch
target-snowflake | self.persist_csv_rows(cur,
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 479, in persist_csv_rows
target-snowflake | cur.execute('''
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/connection.py", line 20, in execute
target-snowflake | super(MillisLoggingCursor, self).execute(command, **kwargs)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 601, in execute
target-snowflake | Error.errorhandler_wrapper(self.connection, self,
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 97, in errorhandler_wrapper
target-snowflake | cursor.errorhandler(connection, cursor, errorclass, errorvalue)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 68, in default_errorhandler
target-snowflake | raise errorclass(
target-snowflake | snowflake.connector.errors.ProgrammingError: 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake | File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake | Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake | If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
target-snowflake |
target-snowflake | During handling of the above exception, another exception occurred:
target-snowflake |
target-snowflake | Traceback (most recent call last):
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
target-snowflake | sys.exit(cli())
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 57, in cli
target-snowflake | main(args.config)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 51, in main
target-snowflake | target_tools.main(target)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
target-snowflake | stream_to_target(input_stream, target, config=config)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-snowflake | raise e
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 70, in stream_to_target
target-snowflake | state_tracker.flush_streams(force=True)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 47, in flush_streams
target-snowflake | self._write_batch_and_update_watermarks(stream)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
target-snowflake | self.target.write_batch(stream_buffer)
target-snowflake | File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 164, in write_batch
target-snowflake | raise SnowflakeError(message, ex)
target-snowflake | target_snowflake.exceptions.SnowflakeError: ('Exception writing records', 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake | File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake | Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake | If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.)
meltano | Loading failed (1): If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.)
meltano | ELT could not be completed: Loader failed
Snowflake offers many connection options. Too many too support for a 3rd party tool right out of the gate.
In order to provide the best all around support possible, scope our options to the connection options specified by DBT.
https://soundcloud.com/christinaaguilera/like-i-do-feat-goldlink
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.