Coder Social home page Coder Social logo

target-snowflake's People

Contributors

alexandermann avatar awm33 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

target-snowflake's Issues

Do not flatten nested objects when sending data to Snowflake

Currently target-snowflake flattens all nested objects if such are present in catalog of a tap. This is different behaviour from how Stitch platform handles sending data to Snowflake - I've seen the latter sending VARIANT-type data to Snowflake, without flattening them.

Might be worth synchronising this behaviour with Stitch, or maybe allowing for configuring it.

SQL hooks can't execute

SQL hooks before_run_sql and after_run_sql fail to execute with the following error message:

Traceback (most recent call last):
  File "/project/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
    sys.exit(cli())
  File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 57, in cli
    main(args.config)
  File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 51, in main
    target_tools.main(target)
  File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
    stream_to_target(input_stream, target, config=config)
  File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 44, in stream_to_target
    _run_sql_hook('before_run_sql', config, target)
  File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 188, in _run_sql_hook
    with target.conn.cursor() as cur:
AttributeError: 'SnowflakeTarget' object has no attribute 'conn'

Seems to be caused by this target using a different attribute name for the SQLAlchemy connection (self.connection instead of self.conn).

Batching Performance

Batching performance seems to have degraded and million+ rows of data are using small (hundreds to low thousands) sized batches.

The issue may have been introduced by the target-snowflake dependency's new state flushing/management datamill-co/target-postgres#130

Tests should be added to cover batch cutting logic.

AttributeError: module 'target_snowflake.sql' has no attribute 'Identifier'

Hello!

I think I've found a bug when trying to set a column as nullable. I changed the tap schema to accept NULL and now the target is raising an AttributeError. The problem may be in this function:

https://github.com/datamill-co/target-snowflake/blob/master/target_snowflake/snowflake.py#L579-L587

Traceback

target-snowflake | ERROR Exception writing records
target-snowflake | Traceback (most recent call last):
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake |     written_batches_details = self.write_batch_helper(cur,
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 822, in write_batch_helper
target-snowflake |     remote_schema = self.upsert_table_helper(connection,
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 499, in upsert_table_helper
target-snowflake |     self.make_column_nullable(connection,
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 586, in make_column_nullable
target-snowflake |     table_name=sql.Identifier(table_name),
target-snowflake | AttributeError: module 'target_snowflake.sql' has no attribute 'Identifier'
target-snowflake | CRITICAL ('Exception writing records', AttributeError("module 'target_snowflake.sql' has no attribute 'Identifier'"))
target-snowflake | INFO MillisLoggingCursor: 98 millis spent executing: ROLLBACK
target-snowflake | Traceback (most recent call last):
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake |     written_batches_details = self.write_batch_helper(cur,
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 822, in write_batch_helper
target-snowflake |     remote_schema = self.upsert_table_helper(connection,
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 499, in upsert_table_helper
target-snowflake |     self.make_column_nullable(connection,
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 586, in make_column_nullable
target-snowflake |     table_name=sql.Identifier(table_name),
target-snowflake | AttributeError: module 'target_snowflake.sql' has no attribute 'Identifier'
target-snowflake | 
target-snowflake | During handling of the above exception, another exception occurred:
target-snowflake | 
target-snowflake | Traceback (most recent call last):
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
target-snowflake |     sys.exit(cli())
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 57, in cli
target-snowflake |     main(args.config)
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 51, in main
target-snowflake |     target_tools.main(target)
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
target-snowflake |     stream_to_target(input_stream, target, config=config)
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-snowflake |     raise e
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 70, in stream_to_target
target-snowflake |     state_tracker.flush_streams(force=True)
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 47, in flush_streams
target-snowflake |     self._write_batch_and_update_watermarks(stream)
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
target-snowflake |     self.target.write_batch(stream_buffer)
target-snowflake |   File "/project/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 164, in write_batch
target-snowflake |     raise SnowflakeError(message, ex)
target-snowflake | target_snowflake.exceptions.SnowflakeError: ('Exception writing records', AttributeError("module 'target_snowflake.sql' has no attribute 'Identifier'"))

Validation Error - Not found in other variant of target-snowflake

I get this error when running a tap against this target. I switched to a different variant with success but wanted to bring this up in case anyone has any ideas

target_postgres.exceptions.SingerStreamError: ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: '135 is greater than the maximum of 127'>, {'type': 'RECORD', 'stream': 'xxxxxxxxxxxx', 'record': {'xxxxxxxxxx': xxxxxxxxxxx}, 'version': xxxxxxxxxxxxx, 'time_extracted': '2021-10-21T18:05:53.704164Z', '__raw_line_size': 594})])

Auto-creation of schema

I was testing target-snowflake with tap-gitlab and came across some interesting behavior. Previously, when I used the Meltano variant of target-snowflake I didn't have to specify a schema and the target would sort out where to put it. In that case I was using tap-zoom and the schema was tap_zoom.

Inserting data failed when using the datamill variant because the schema wasn't created beforehand, even though it was already looking for tap_gitlab. When I manually created the schema I could get the pipeline to run with no problems even without specifying the schema in the my config.

What's the expected behavior? Should the target be creating the schema if it doesn't exist?

Remove dependency on pre-installed Postgres libraries

Apologies if this question has already been asked and answered, but would it be possible to remove the requirement for pre-installed Postgres? Either by refactoring around the dependency on singer-target-postgres or by using a fork that uses psycopg2-binary in place of psycopg2?

It can sometimes be confusing or time consuming for new users to have to install the Postgres dependency ahead of landing data in Snowflake.

Happy to contribute cycles if we can help in any way. Thanks!

Regarding this error message during install:

    Error: pg_config executable not found.
    
    pg_config is required to build psycopg2 from source.  Please add the directory
    containing pg_config to the $PATH or specify the full executable path with the
    option:
    
        python setup.py build_ext --pg-config /path/to/pg_config build ...
    
    or with the pg_config option in 'setup.cfg'.
    
    If you prefer to avoid building psycopg2 from source, please install the PyPI
    'psycopg2-binary' package instead.
    
    For further information please check the 'doc/src/install.rst' file (also at
    <https://www.psycopg.org/docs/install.html>).

Field quoting issue

I am using target-snowflake along with meltano and a local tap.

I had a case where a string field value which ends in a backslash โ€œ\โ€ (U+005C), the loader process fails. The error is reported against another field in the row having an invalid format. Simply removing the character from the source data allowed the loader to succeed.

example meltano output:

target-snowflake    | ERROR Exception writing records
target-snowflake    | Traceback (most recent call last):
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake    |     written_batches_details = self.write_batch_helper(cur,
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 834, in write_batch_helper
target-snowflake    |     batch_rows_persisted = self.write_table_batch(
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 538, in write_table_batch
target-snowflake    |     self.persist_csv_rows(cur,
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 479, in persist_csv_rows
target-snowflake    |     cur.execute('''
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/connection.py", line 20, in execute
target-snowflake    |     super(MillisLoggingCursor, self).execute(command, **kwargs)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 601, in execute
target-snowflake    |     Error.errorhandler_wrapper(self.connection, self,
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 97, in errorhandler_wrapper
target-snowflake    |     cursor.errorhandler(connection, cursor, errorclass, errorvalue)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 68, in default_errorhandler
target-snowflake    |     raise errorclass(
target-snowflake    | snowflake.connector.errors.ProgrammingError: 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake    |   File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake    |   Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake    |   If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
target-snowflake    | CRITICAL ('Exception writing records', 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake    |   File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake    |   Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake    |   If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.)
target-snowflake    | INFO MillisLoggingCursor: 43 millis spent executing: ROLLBACK
target-snowflake    | Traceback (most recent call last):
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 150, in write_batch
target-snowflake    |     written_batches_details = self.write_batch_helper(cur,
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/sql_base.py", line 834, in write_batch_helper
target-snowflake    |     batch_rows_persisted = self.write_table_batch(
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 538, in write_table_batch
target-snowflake    |     self.persist_csv_rows(cur,
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 479, in persist_csv_rows
target-snowflake    |     cur.execute('''
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/connection.py", line 20, in execute
target-snowflake    |     super(MillisLoggingCursor, self).execute(command, **kwargs)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 601, in execute
target-snowflake    |     Error.errorhandler_wrapper(self.connection, self,
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 97, in errorhandler_wrapper
target-snowflake    |     cursor.errorhandler(connection, cursor, errorclass, errorvalue)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/snowflake/connector/errors.py", line 68, in default_errorhandler
target-snowflake    |     raise errorclass(
target-snowflake    | snowflake.connector.errors.ProgrammingError: 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake    |   File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake    |   Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake    |   If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
target-snowflake    | 
target-snowflake    | During handling of the above exception, another exception occurred:
target-snowflake    | 
target-snowflake    | Traceback (most recent call last):
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/bin/target-snowflake", line 8, in <module>
target-snowflake    |     sys.exit(cli())
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 57, in cli
target-snowflake    |     main(args.config)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/__init__.py", line 51, in main
target-snowflake    |     target_tools.main(target)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 28, in main
target-snowflake    |     stream_to_target(input_stream, target, config=config)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
target-snowflake    |     raise e
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/target_tools.py", line 70, in stream_to_target
target-snowflake    |     state_tracker.flush_streams(force=True)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 47, in flush_streams
target-snowflake    |     self._write_batch_and_update_watermarks(stream)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
target-snowflake    |     self.target.write_batch(stream_buffer)
target-snowflake    |   File "/project/drafthouse/.meltano/loaders/target-snowflake/venv/lib/python3.8/site-packages/target_snowflake/snowflake.py", line 164, in write_batch
target-snowflake    |     raise SnowflakeError(message, ex)
target-snowflake    | target_snowflake.exceptions.SnowflakeError: ('Exception writing records', 100038 (22018): Numeric value '2021-01-16 09:03:43.1983+00:00' is not recognized
target-snowflake    |   File '@TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210/159158ca_754b_43d0_b3cd_64787eb14ed7.gz', line 25930, character 96
target-snowflake    |   Row 25930, column "TMP_DD0B756A_F0E0_400A_9FAE_3DA84BE69210"["ID":2]
target-snowflake    |   If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.)
meltano             | Loading failed (1):   If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.)
meltano             | ELT could not be completed: Loader failed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.