datamill-co / target-postgres Goto Github PK

View Code? Open in Web Editor NEW

100.0 9.0 75.0 898 KB

A Singer.io Target for Postgres

License: MIT License

Python 99.69% Shell 0.31%

singer json-schema stream postgres etl

target-postgres's Introduction

Target Postgres

A Singer postgres target, for use with Singer streams generated by Singer taps.

Features

Creates SQL tables for Singer streams
Denests objects flattening them into the parent object's table
Denests rows into separate tables
Adds columns and sub-tables as new fields are added to the stream JSON Schema
Full stream replication via record version and ACTIVATE_VERSION messages.

Install

Add libpq dependency

# macos
brew install postgresql

# ubuntu
sudo apt install libpq-dev

install singer-target-postgres

pip install singer-target-postgres

Usage

Follow the Singer.io Best Practices for setting up separate tap and target virtualenvs to avoid version conflicts.

Create a config file at ~/singer.io/target_postgres_config.json with postgres connection information and target postgres schema.

{
  "postgres_host": "localhost",
  "postgres_port": 5432,
  "postgres_database": "my_analytics",
  "postgres_username": "myuser",
  "postgres_password": "1234",
  "postgres_schema": "mytapname"
}

Run target-postgres against a Singer tap.

~/.virtualenvs/tap-something/bin/tap-something \
  | ~/.virtualenvs/target-postgres/bin/target-postgres \
    --config ~/singer.io/target_postgres_config.json >> state.json

If you are running windows, the following is equivalent:

venvs\tap-exchangeratesapi\Scripts\tap-exchangeratesapi.exe | ^
venvs\target-postgresql\Scripts\target-postgres.exe ^
--config target_postgres_config.json

Config.json

The fields available to be specified in the config file are specified here.

Field	Type	Default	Details
`postgres_host`	`["string", "null"]`	`"localhost"`
`postgres_port`	`["integer", "null"]`	`5432`
`postgres_database`	`["string"]`	`N/A`
`postgres_username`	`["string", "null"]`	`N/A`
`postgres_password`	`["string", "null"]`	`null`
`postgres_schema`	`["string", "null"]`	`"public"`
`postgres_sslmode`	`["string", "null"]`	`"prefer"`	Refer to the libpq docs for more information about SSL
`postgres_sslcert`	`["string", "null"]`	`"~/.postgresql/postgresql.crt"`	Only used if a SSL request w/ a client certificate is being made
`postgres_sslkey`	`["string", "null"]`	`"~/.postgresql/postgresql.key"`	Only used if a SSL request w/ a client certificate is being made
`postgres_sslrootcert`	`["string", "null"]`	`"~/.postgresql/root.crt"`	Used for authentication of a server SSL certificate
`postgres_sslcrl`	`["string", "null"]`	`"~/.postgresql/root.crl"`	Used for authentication of a server SSL certificate
`invalid_records_detect`	`["boolean", "null"]`	`true`	Include `false` in your config to disable `target-postgres` from crashing on invalid records
`invalid_records_threshold`	`["integer", "null"]`	`0`	Include a positive value `n` in your config to allow for `target-postgres` to encounter at most `n` invalid records per stream before giving up.
`disable_collection`	`["string", "null"]`	`false`	Include `true` in your config to disable Singer Usage Logging.
`logging_level`	`["string", "null"]`	`"INFO"`	The level for logging. Set to `DEBUG` to get things like queries executed, timing of those queries, etc. See Python's Logger Levels for information about valid values.
`persist_empty_tables`	`["boolean", "null"]`	`False`	Whether the Target should create tables which have no records present in Remote.
`max_batch_rows`	`["integer", "null"]`	`200000`	The maximum number of rows to buffer in memory before writing to the destination table in Postgres
`max_buffer_size`	`["integer", "null"]`	`104857600` (100MB in bytes)	The maximum number of bytes to buffer in memory before writing to the destination table in Postgres
`batch_detection_threshold`	`["integer", "null"]`	`5000`, or 1/40th `max_batch_rows`	How often, in rows received, to count the buffered rows and bytes to check if a flush is necessary. There's a slight performance penalty to checking the buffered records count or bytesize, so this controls how often this is polled in order to mitigate the penalty. This value is usually not necessary to set as the default is dynamically adjusted to check reasonably often.
`state_support`	`["boolean", "null"]`	`True`	Whether the Target should emit `STATE` messages to stdout for further consumption. In this mode, which is on by default, STATE messages are buffered in memory until all the records that occurred before them are flushed according to the batch flushing schedule the target is configured with.
`add_upsert_indexes`	`["boolean", "null"]`	`True`	Whether the Target should create column indexes on the important columns used during data loading. These indexes will make data loading slightly slower but the deduplication phase much faster. Defaults to on for better baseline performance.
`before_run_sql`	`["string", "null"]`	`None`	Raw SQL statement(s) to execute as soon as the connection to Postgres is opened by the target. Useful for setup like `SET ROLE` or other connection state that is important.
`after_run_sql`	`["string", "null"]`	`None`	Raw SQL statement(s) to execute as soon as the connection to Postgres is opened by the target. Useful for setup like `SET ROLE` or other connection state that is important.
`before_run_sql_file`	`["string", "null"]`	`None`	Similar to `before_run_sql` but reads an external file instead of SQL in the JSON config file.
`after_run_sql_file`	`["string", "null"]`	`None`	Similar to `after_run_sql` but reads an external file instead of SQL in the JSON config file.
`application_name`	`["string", "null"]`	`None`	Set the postgresql `application_name` connection option to help with debugging, etc...

Supported Versions

target-postgres only supports JSON Schema Draft4. While declaring a schema is optional, any input schema which declares a version other than 4 will be rejected.

target-postgres supports all versions of PostgreSQL which are presently supported by the PostgreSQL Global Development Group. Our CI config defines all versions we are currently supporting.

Version	Current minor	Supported	First Release	Final Release
15	15.0	Yes	October 13, 2022	November 11, 2027
14	14.5	Yes	September 30, 2021	November 12, 2026
13	13.8	Yes	September 24, 2020	November 13, 2025
12	12.12	Yes	October 3, 2019	November 14, 2024
11	11.17	Yes	October 18, 2018	November 9, 2023
10	10.22	Yes	October 5, 2017	November 10, 2022

The above is copied from the current list of versions on Postgresql.org

Known Limitations

Requires a JSON Schema for every stream.
Only string, string with date-time format, integer, number, boolean, object, and array types with or without null are supported. Arrays can have any of the other types listed, including objects as types within items.
- Example of JSON Schema types that work
  - ['number']
  - ['string']
  - ['string', 'null']
- Exmaple of JSON Schema types that DO NOT work
  - ['string', 'integer']
  - ['integer', 'number']
  - ['any']
  - ['null']
JSON Schema combinations such as anyOf and oneOf are not supported.
JSON Schema $ref is partially supported:
- NOTE: The following limitations are known to NOT fail gracefully
- Presently you cannot have any circular or recursive $refs
- $refs must be present within the schema:
  - URI's do not work
  - if the $ref is broken, the behaviour is considered unexpected
Any values which are the string NULL will be streamed to PostgreSQL as the literal null
Table names are restricted to:
- 63 characters in length
- can only be composed of _, lowercase letters, numbers, $
- cannot start with $
- ASCII characters
Field/Column names are restricted to:
- 63 characters in length
- ASCII characters

Indexes

If the add_upsert_indexes config option is enabled, which it is by default, target-postgres adds indexes on the tables it creates for its own queries to be more performant. Specifically, target-postgres automatically adds indexes to the _sdc_sequence column and the _sdc_level_<n>_id columns which are used heavily when inserting and upserting.

target-postgres doesn't have any facilities for adding other indexes to the managed tables, so if there are more indexes required, they should be added by another downstream tool, or can just be added by an administrator when necessary. Note that these indexes incur performance overhead to maintain as data is inserted, These indexes can also prevent target-postgres from dropping columns in the future if the schema of the table changes, in which case an administrator should drop the index so target-postgres is able to drop the columns it needs to.

Note: Index adding is new as of version 0.2.1, and target-postgres does not retroactively create indexes for tables it created before that time. If you want to add indexes to older tables target-postgres is loading data into, they should be added manually.

Usage Logging

Singer.io requires official taps and targets to collect anonymous usage data. This data is only used in aggregate to report on individual tap/targets, as well as the Singer community at-large. IP addresses are recorded to detect unique tap/targets users but not shared with third-parties.

To disable anonymous data collection set disable_collection to true in the configuration JSON file.

Developing

target-postgres utilizes poetry for package management, and PyTest for testing.

Documentation

Docker

If you have Docker and Docker Compose installed, you can easily run the following to get a local env setup quickly.

$ docker-compose up -d --build
$ docker logs -tf target-postgres_target-postgres_1 # You container names might differ

As soon as you see INFO: Dev environment ready. you can shell into the container and start running test commands:

$ docker-compose exec target-postgres bash
(target-postgres) root@...:/code# pytest

The environment inside the docker container has a virtualenv set up and activated, with an --editable install of target-postgres inside it and your local code mounted as a Docker volume. If you make changes on your host and re-run pytest any changes should be reflected inside the container.

See the PyTest commands below!

DB

To run the tests, you will need a PostgreSQL server running.

NOTE: Testing assumes that you've exposed the traditional port 5432.

Make sure to set the following env vars for PyTest:

$ EXPORT POSTGRES_HOST='<your-host-name>' # Most likely 'localhost'
$ EXPORT POSTGRES_DB='<your-db-name>'     # We use 'target_postgres_test'
$ EXPORT POSTGRES_USER='<your-user-name'  # Probably just 'postgres', make sure this user has no auth

PyTest

To run tests, try:

$ poetry run pytest

If you've bash shelled into the Docker Compose container (see above), you should be able to simply use:

$ pytest

Collaboration and Contributions

Join the conversation over at the Singer.io Slack and on the #target-postgres channel.

Try to adhere to the following for contributing:

File New Issue -> Fork -> New Branch(If needed) -> Pull Request -> Approval -> Merge

Users can file an issue without submitting a pull request but be aware not all issues can or will be addressed.

Sponsorship

Target Postgres is sponsored by Data Mill (Data Mill Services, LLC) datamill.co.

Data Mill helps organizations utilize modern data infrastructure and data science to power analytics, products, and services.

target-postgres's People

Contributors

Stargazers

Watchers

Forkers

kevinsanz93 alexandermann timvisher stvhanna mirelagrigoras brechtdevlieger chrisnordqvist gpetepg cstejmar bradleyprice andyoneal aroder airhorns dialoguemd-archives brightpowersoftware wellth-app donnyzhao datacamp limess ablschools siefert davicorreiajr soxhub jarobe42 roadmaphealthcare ns-iknox whalyapp crwr45 charbelrseif justifihq brentguttmann geofflangenderfer ericyd skwaugh luisdomin5 abhishek-gopalaiah roberthuang-varicent doublethefish dkarzon chriskl j-oflaherty oviohub dwellers dgoo2308 hotgluexyz fixdauto eltonarodrigues monai hipacloud gustavohwulee minwareco yuruh brianndungunjenga smahs leag willdasilva synchart retentionscience hacker0x01 naveendev5 sagedata-ood tivonahaug joerg-schnitzbauer peliqan-io vbogretsov sureshattaluri sellerant smart-protection 0xb8n

target-postgres's Issues

Column Name Normalization: Length

Motivation

Split out from: #18

Presently when we attempt to canonicalize a column, we DO NOT handle truncation/length issues.

It has been decided that for any columns which encounter truncation issues will be:

truncated
if no successful mapping can be reached, hard fail

Suggested Musical Pairing

https://soundcloud.com/bombay-bicycle-club/how-can-you-swallow-so-much-1

Column Name Normalization: Collision

Motivation

Split out from: #18

Presently when we attempt to canonicalize a column, we DO NOT handle collision issues. ie, column Aa and column aA will both canonicalize to aa. When this happens, we hard fail. Instead, we can make a best case attempt to resolve this issue.

It has been decided that for any columns which encounter collision issues will be:

if colliding, truncated further and suffixed with: __0,__1, ...
if no successful mapping can be reached, hard fail

Suggested Musical Pairing

https://soundcloud.com/phoenix/too-young

Add Field Path to the Denested Column's Comment in PostgreSQL

Motivation

When we denest a column and then handle things like naming collisions, you can end up with very confusing structures when trying to map it all back to the input data.

To aid in this, we could set some helpful string comments/descriptions onto each column's COMMENT in PostgreSQL. Something like:

# Singer Target Postgres

This table was denested from (t0, t1, t2, ...)

This column was denested from (c0, c1, ...)

You can join this back to it's original parent via:

SOME-SQL-HERE

etc.

Suggested Musical Pairing

https://soundcloud.com/hiatus-kaiyote/nakamarra-1

Handle PK changes

The only thing we may be able to do is warn and update it, or fail hard. PK changes could be an actual source change, or more likely the using had it wrong in the Singer stream.

https://github.com/datamill-co/target-postgres/blob/master/target_postgres/postgres.py#L65

Setup continuous integration

Setup continuous integration using CircleCI, TravisCI, or another cloud hosted CI platform.

Add Stitch/Singer Anonymous Tracking

We would like to get target-postgres released on singer.io. In order to do that, we need to add the Stitch/Singer anonymous tracking code.

Tracking code
https://github.com/singer-io/target-csv/blob/master/target_csv.py#L103

Running it in a separate thread to not affect other code
https://github.com/singer-io/target-csv/blob/master/target_csv.py#L137

We should also add a section to the README describing the tracking code and how to disable it. It does record the IP, but I believe that is used to determine unique(ish) users.

Only key property being loaded into table

Hi there!

I'm a new Singer user and I'm having trouble when using target-postgres with tap-exchangeratesapi. When I run:

~/.virtualenvs/tap-exchangeratesapi/bin/tap-exchangeratesapi -c tap_config.json \
    | ~/.virtualenvs/target-postgres/bin/target-postgres -c postgres_config.json

only the key property is loaded into the destination table (column date), besides the _sdccolumns.

This is my config:

{
    "postgres_host": "localhost",
    "postgres_port": 5432,
    "postgres_database": "postgres",
    "postgres_username": "",
    "postgres_password": "",
    "postgres_schema": "public"
  }

Also, this is the content sent by the tap:

{"type": "SCHEMA", "stream": "exchange_rate", "schema": {"type": "object", "properties": {"date": {"type": "string", "format": "date-time"}}, "additionalProperties": true}, "key_properties": ["date"]}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0137751728, "PHP": 13.2305516911, "CZK": 5.8913950783, "BRL": 1.0, "CHF": 0.2545670171, "INR": 17.6687116564, "ISK": 31.5871097224, "HRK": 1.6917919128, "PLN": 0.9768741305, "NOK": 2.2299815267, "USD": 0.2550915684, "CNY": 1.7618765252, "RUB": 16.6787921637, "SEK": 2.4216023901, "MYR": 1.0660250416, "SGD": 0.3495792186, "ILS": 0.9258786234, "TRY": 1.5010148927, "BGN": 0.4460510411, "NZD": 0.3886240792, "HKD": 1.9988824777, "RON": 1.0805072183, "EUR": 0.2280657742, "MXN": 5.0432184642, "CAD": 0.3443337058, "AUD": 0.3666613451, "GBP": 0.2021073278, "KRW": 301.8336488243, "IDR": 3631.9930668005, "JPY": 27.6347298561, "DKK": 1.7031952015, "ZAR": 3.715419527, "HUF": 73.8317330718, "date": "2019-06-03T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0636142811, "PHP": 13.338603969, "CZK": 5.8994454375, "BRL": 1.0, "CHF": 0.2561528943, "INR": 17.8454557954, "ISK": 31.9217196022, "HRK": 1.7005820615, "PLN": 0.9808423851, "NOK": 2.2425409047, "USD": 0.2576653375, "CNY": 1.7802374078, "RUB": 16.8062239333, "SEK": 2.4350336862, "MYR": 1.0744305422, "SGD": 0.3528117696, "ILS": 0.9313213255, "TRY": 1.4962876392, "BGN": 0.4481873596, "NZD": 0.3909895046, "HKD": 2.0198680049, "RON": 1.0851322242, "EUR": 0.2291580732, "MXN": 5.0782116504, "CAD": 0.3462807645, "AUD": 0.3693111508, "GBP": 0.203350291, "KRW": 304.9864796737, "IDR": 3664.7738209817, "JPY": 27.8702048673, "DKK": 1.7113295751, "ZAR": 3.7763188047, "HUF": 73.7888995829, "date": "2019-06-04T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.129044143, "PHP": 13.4202224423, "CZK": 5.9080756212, "BRL": 1.0, "CHF": 0.2570520644, "INR": 17.9721141226, "ISK": 32.1228728671, "HRK": 1.7075782347, "PLN": 0.9850553803, "NOK": 2.2506274898, "USD": 0.2592166164, "CNY": 1.790452944, "RUB": 16.8856932323, "SEK": 2.4467497179, "MYR": 1.0756211573, "SGD": 0.3537891174, "ILS": 0.9349943583, "TRY": 1.4816588758, "BGN": 0.4503649803, "NZD": 0.3901489857, "HKD": 2.0323070901, "RON": 1.0878716006, "EUR": 0.2302714901, "MXN": 5.0863748359, "CAD": 0.3470191356, "AUD": 0.3710594791, "GBP": 0.2040919244, "KRW": 304.9554424667, "IDR": 3682.043429203, "JPY": 28.083910931, "DKK": 1.7197135423, "ZAR": 3.8199737491, "HUF": 74.0345867778, "date": "2019-06-05T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.08195332, "PHP": 13.3418081037, "CZK": 5.8780549257, "BRL": 1.0, "CHF": 0.2559380655, "INR": 17.8698550127, "ISK": 31.9063652397, "HRK": 1.6998786046, "PLN": 0.9800499324, "NOK": 2.2465700085, "USD": 0.2580453057, "CNY": 1.783824641, "RUB": 16.8282370187, "SEK": 2.4319155272, "MYR": 1.0735930736, "SGD": 0.351886209, "ILS": 0.9286286905, "TRY": 1.4891316796, "BGN": 0.4479717813, "NZD": 0.3889919604, "HKD": 2.0232483566, "RON": 1.0815868435, "EUR": 0.2290478481, "MXN": 5.1024301977, "CAD": 0.3457706315, "AUD": 0.3694999885, "GBP": 0.2028401933, "KRW": 304.3816853341, "IDR": 3662.6949769807, "JPY": 27.902608855, "DKK": 1.7106896631, "ZAR": 3.8370782656, "HUF": 73.5930735931, "date": "2019-06-06T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0836707152, "PHP": 13.4049269196, "CZK": 5.8651844736, "BRL": 1.0, "CHF": 0.2559756628, "INR": 17.9171527254, "ISK": 31.8625769116, "HRK": 1.6966536289, "PLN": 0.977012283, "NOK": 2.2407420115, "USD": 0.2578512775, "CNY": 1.7816738718, "RUB": 16.7641986322, "SEK": 2.4374299504, "MYR": 1.0723712802, "SGD": 0.3524097074, "ILS": 0.9283377936, "TRY": 1.5064845948, "BGN": 0.4473569844, "NZD": 0.389212928, "HKD": 2.0219126695, "RON": 1.0799881059, "EUR": 0.2287335026, "MXN": 5.0894347995, "CAD": 0.3444955283, "AUD": 0.3699306937, "GBP": 0.2028294334, "KRW": 305.638280839, "IDR": 3673.8649099934, "JPY": 27.9672453624, "DKK": 1.7081817974, "ZAR": 3.8811729454, "HUF": 73.5446830897, "date": "2019-06-07T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0732502222, "PHP": 13.4168243043, "CZK": 5.8404175308, "BRL": 1.0, "CHF": 0.2552590195, "INR": 17.9238553228, "ISK": 31.9301684254, "HRK": 1.690247738, "PLN": 0.9717847619, "NOK": 2.229140552, "USD": 0.2575609089, "CNY": 1.7860154523, "RUB": 16.6691660779, "SEK": 2.4262141897, "MYR": 1.0723613738, "SGD": 0.352189074, "ILS": 0.9233083392, "TRY": 1.4930601454, "BGN": 0.4457460629, "NZD": 0.3894067507, "HKD": 2.0200560658, "RON": 1.0756432755, "EUR": 0.2279098389, "MXN": 4.957540397, "CAD": 0.3419559222, "AUD": 0.3700116234, "GBP": 0.203404973, "KRW": 305.3353693279, "IDR": 3668.9541217494, "JPY": 27.9827700162, "DKK": 1.7020762586, "ZAR": 3.8287257561, "HUF": 72.9790095038, "date": "2019-06-10T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0657508031, "PHP": 13.3984917868, "CZK": 5.84147814, "BRL": 1.0, "CHF": 0.2559178001, "INR": 17.9107146925, "ISK": 32.0096598546, "HRK": 1.6891076026, "PLN": 0.9723190486, "NOK": 2.2258674504, "USD": 0.2578998929, "CNY": 1.7825849224, "RUB": 16.6365251862, "SEK": 2.4338277174, "MYR": 1.0735880437, "SGD": 0.3519012143, "ILS": 0.9237008179, "TRY": 1.4980065158, "BGN": 0.4455835782, "NZD": 0.3919987242, "HKD": 2.0208233659, "RON": 1.0765270089, "EUR": 0.2278267605, "MXN": 4.9388968628, "CAD": 0.3417857061, "AUD": 0.3707424874, "GBP": 0.2029594696, "KRW": 304.7137356754, "IDR": 3671.8497254688, "JPY": 28.0431959538, "DKK": 1.7015697264, "ZAR": 3.8078964755, "HUF": 73.0184767503, "date": "2019-06-11T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.1031126177, "PHP": 13.4708320927, "CZK": 5.8665628364, "BRL": 1.0, "CHF": 0.257712833, "INR": 17.9773711092, "ISK": 32.408785873, "HRK": 1.69785392, "PLN": 0.9758823664, "NOK": 2.2404434163, "USD": 0.2593389982, "CNY": 1.7944160693, "RUB": 16.7887359428, "SEK": 2.446118962, "MYR": 1.0781933533, "SGD": 0.3537481963, "ILS": 0.9289984196, "TRY": 1.5013398685, "BGN": 0.4479512609, "NZD": 0.3940816747, "HKD": 2.028583862, "RON": 1.0822015071, "EUR": 0.229037356, "MXN": 4.9612926868, "CAD": 0.3449073544, "AUD": 0.3731476604, "GBP": 0.203396624, "KRW": 306.4084652207, "IDR": 3693.3556263026, "JPY": 28.1074643274, "DKK": 1.7104051671, "ZAR": 3.827878427, "HUF": 73.6194773368, "date": "2019-06-12T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.1166962168, "PHP": 13.4831564162, "CZK": 5.8903037141, "BRL": 1.0, "CHF": 0.2580533745, "INR": 18.0696078657, "ISK": 32.58191531, "HRK": 1.7068778927, "PLN": 0.9803126943, "NOK": 2.2501093739, "USD": 0.2599415137, "CNY": 1.7993506643, "RUB": 16.7923507334, "SEK": 2.4630546409, "MYR": 1.0837919363, "SGD": 0.3551313638, "ILS": 0.9332473693, "TRY": 1.5276197932, "BGN": 0.45034424, "NZD": 0.3960717493, "HKD": 2.0349305763, "RON": 1.0873149278, "EUR": 0.2302608856, "MXN": 4.9824771466, "CAD": 0.3458748762, "AUD": 0.3761541827, "GBP": 0.2048124525, "KRW": 307.5686753091, "IDR": 3715.3445854153, "JPY": 28.1931428308, "DKK": 1.7195422414, "ZAR": 3.8655276428, "HUF": 74.1440051578, "date": "2019-06-13T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0835041338, "PHP": 13.4811044838, "CZK": 5.8816756097, "BRL": 1.0, "CHF": 0.2581811482, "INR": 18.1028487207, "ISK": 32.5864173364, "HRK": 1.7065840684, "PLN": 0.9795269788, "NOK": 2.2506045183, "USD": 0.2594247288, "CNY": 1.796213988, "RUB": 16.6704281141, "SEK": 2.4500840568, "MYR": 1.0819381434, "SGD": 0.3547198489, "ILS": 0.9330999701, "TRY": 1.5297653317, "BGN": 0.4504064666, "NZD": 0.3970476476, "HKD": 2.0304907537, "RON": 1.0877415195, "EUR": 0.230292702, "MXN": 4.9760035004, "CAD": 0.3458535799, "AUD": 0.3759298068, "GBP": 0.205174677, "KRW": 307.1183474196, "IDR": 3714.1837275177, "JPY": 28.0726803768, "DKK": 1.7197337816, "ZAR": 3.8350413375, "HUF": 74.0460124819, "date": "2019-06-14T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0363503516, "PHP": 13.403735501, "CZK": 5.83614942, "BRL": 1.0, "CHF": 0.2560507809, "INR": 17.9415928395, "ISK": 32.3088866563, "HRK": 1.6909078455, "PLN": 0.9728057357, "NOK": 2.2362316193, "USD": 0.2565074436, "CNY": 1.7762809389, "RUB": 16.4725545712, "SEK": 2.4316147593, "MYR": 1.0711708832, "SGD": 0.3516302859, "ILS": 0.9258608092, "TRY": 1.5058452827, "BGN": 0.4465704631, "NZD": 0.3940770847, "HKD": 2.0084710933, "RON": 1.0795506439, "EUR": 0.2283313545, "MXN": 4.9157913965, "CAD": 0.344072518, "AUD": 0.3734359302, "GBP": 0.2036898347, "KRW": 304.4227783359, "IDR": 3677.0344323683, "JPY": 27.8701251256, "DKK": 1.7050415563, "ZAR": 3.8008722258, "HUF": 73.5889122294, "date": "2019-06-17T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0906425323, "PHP": 13.451157851, "CZK": 5.9125435781, "BRL": 1.0, "CHF": 0.2580056796, "INR": 18.0085655577, "ISK": 32.6691755362, "HRK": 1.7097638123, "PLN": 0.9833537275, "NOK": 2.257867153, "USD": 0.2582827327, "CNY": 1.788977905, "RUB": 16.5951792764, "SEK": 2.4568374391, "MYR": 1.0799990765, "SGD": 0.3541200102, "ILS": 0.9328146284, "TRY": 1.5080922587, "BGN": 0.4515503429, "NZD": 0.3969247108, "HKD": 2.0234802484, "RON": 1.090480918, "EUR": 0.2308775656, "MXN": 4.9392792002, "CAD": 0.3466857525, "AUD": 0.377461732, "GBP": 0.20641147, "KRW": 306.0420658925, "IDR": 3699.254265463, "JPY": 27.9546556461, "DKK": 1.724008958, "ZAR": 3.7813589454, "HUF": 74.4141481772, "date": "2019-06-18T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0902040439, "PHP": 13.4281229803, "CZK": 5.9170436709, "BRL": 1.0, "CHF": 0.2579632536, "INR": 18.0213045887, "ISK": 32.6608808051, "HRK": 1.7085910812, "PLN": 0.984904441, "NOK": 2.2577555166, "USD": 0.2586787924, "CNY": 1.7860308374, "RUB": 16.5580278829, "SEK": 2.466715908, "MYR": 1.0793093897, "SGD": 0.3536838704, "ILS": 0.933016342, "TRY": 1.5145877574, "BGN": 0.4514356938, "NZD": 0.3963853753, "HKD": 2.0238897609, "RON": 1.0911965654, "EUR": 0.2308189456, "MXN": 4.9520358231, "CAD": 0.3459975995, "AUD": 0.376304127, "GBP": 0.2054750254, "KRW": 304.3624780722, "IDR": 3691.2173391192, "JPY": 28.0421937033, "DKK": 1.7235481488, "ZAR": 3.755262672, "HUF": 74.7784138122, "date": "2019-06-19T00:00:00Z"}}
{"type": "RECORD", "stream": "exchange_rate", "record": {"THB": 8.0581473034, "PHP": 13.398530191, "CZK": 5.9020434492, "BRL": 1.0, "CHF": 0.2566867095, "INR": 18.0861381805, "ISK": 32.5984288248, "HRK": 1.7056004792, "PLN": 0.9803948672, "NOK": 2.2272444537, "USD": 0.2604879397, "CNY": 1.7854723892, "RUB": 16.4580136844, "SEK": 2.4500195821, "MYR": 1.0804939296, "SGD": 0.3533070703, "ILS": 0.932061649, "TRY": 1.5041583155, "BGN": 0.4505724883, "NZD": 0.395281867, "HKD": 2.0360540927, "RON": 1.0888105605, "EUR": 0.2303775889, "MXN": 4.9291588914, "CAD": 0.3429400788, "AUD": 0.3760453383, "GBP": 0.2053931394, "KRW": 302.3129909922, "IDR": 3694.3695717281, "JPY": 28.0392563411, "DKK": 1.7200221162, "ZAR": 3.7161978483, "HUF": 74.545580206, "date": "2019-06-20T00:00:00Z"}}
{"type": "STATE", "value": {"start_date": "2019-06-20"}}

Important note: the exact same tap configuration works when using target-gsheet and target-csv, for which I get the complete set of columns (one for each currency) and the date.

Is there any additional configuration I need to do to make this work with Postgres' target?

Thanks in advance.

Enabling case sensitive table and column names

I would like to understand the reason of forcing table and column names to be lower case.

In postgres.py, on lines 385 and 392, the regexes only accept lowercase identifiers (and the error messages are in line with that).

What would be potential consequences of allowing case sensitive identifier names? For example, changing the regexes by:

'^[a-z_A-Z].*' and '^[a-z0-9_$A-Z]+$'

Thank you.
Regards

Column name mapping, invalid row detection

Motivation

In #43 we continue to expound on what the SQLInterface will and will not do. It is quite likely that we will end up bringing column mappings under its purview as column splitting etc. is likely a shared thing in most/all RDBMS we'll want to support (🤷‍♂️). For the time being though, we fill up a default_row with NULL_DEFAULT values. If one of those fields which has been pre-filled, is not allowed to be null in the remote, we will get a persistence error upon trying to load the data.

While the above is somewhat specific to the mentioned pr, this same logic exists in our current implementation.

Question

Do we want to add record validation on values after they've been serialized?

ie, do we want to perform:

https://github.com/datamill-co/target-postgres/pull/43/files#diff-5b906fece7e68cf803bb91e89b80047aR492

Draft4Validator(remote_schema['schema'], ...).validate(row)
serialized_rows.append(row)

Suggested Musical Pairing

https://soundcloud.com/cleacleamusic/bright-blue

Existing non-nullable column, not present in newly streamed schema

Motivation

When we remove a column from a schema definition, it implies that the column is nullable. At present, target-postgres does not see absence of properties to mean that a column should be made nullable.

Ex

2019-10-14 01:45:16,259 src.load.bulk.target_postgres {} INFO: Root table name jobs
2019-10-14 01:45:16,275 src.load.bulk.target_postgres {} INFO: Writing batch with 107 records for `jobs` with `key_properties`: `['id', 'source']`
2019-10-14 01:45:16,291 src.load.bulk.target_postgres {} INFO: Writing table batch schema for `('jobs',)`...
2019-10-14 01:45:18,341 src.load.bulk.target_postgres {} WARNING: NOT EMPTY: Forcing new column `('location__raw_id',)` in table `jobs` to be nullable due to table not empty.
2019-10-14 01:45:18,395 src.load.bulk.target_postgres {} INFO: Table Schema Change [`jobs`.`('location__raw_id',)`:`location__raw_id`] New column, non empty table (took 53 millis)
2019-10-14 01:45:18,418 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.114116907119751, "tags": {"job_type": "upsert_table_schema", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs", "status": "succeeded"}}
2019-10-14 01:45:18,420 src.load.bulk.target_postgres {} INFO: Writing table batch with 107 rows for `('jobs',)`...
2019-10-14 01:45:18,622 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "table_rows_persisted", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs"}}
2019-10-14 01:45:18,623 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.334609031677246, "tags": {"job_type": "table", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "table": "jobs", "status": "failed"}}
2019-10-14 01:45:18,625 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"count_type": "batch_rows_persisted", "path": ["jobs"], "database": "bog", "schema": "raw__singer"}}
2019-10-14 01:45:18,626 src.load.bulk.target_postgres {} INFO: METRIC: {"type": "timer", "metric": "job_duration", "value": 2.3506481647491455, "tags": {"job_type": "batch", "path": ["jobs"], "database": "bog", "schema": "raw__singer", "status": "failed"}}
2019-10-14 01:45:18,641 src.load.bulk.target_postgres {} ERROR: Exception writing records
2019-10-14 01:45:18,657 src.load.bulk.target_postgres {} ERROR: Traceback (most recent call last):
2019-10-14 01:45:18,660 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 303, in write_batch
2019-10-14 01:45:18,662 src.load.bulk.target_postgres {} ERROR:     {'version': target_table_version})
2019-10-14 01:45:18,665 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/sql_base.py", line 847, in write_batch_helper
2019-10-14 01:45:18,667 src.load.bulk.target_postgres {} ERROR:     metadata)
2019-10-14 01:45:18,669 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 616, in write_table_batch
2019-10-14 01:45:18,672 src.load.bulk.target_postgres {} ERROR:     csv_rows)
2019-10-14 01:45:18,674 src.load.bulk.target_postgres {} ERROR:   File "/code/venv/target-postgres/lib/python3.7/site-packages/target_postgres/postgres.py", line 566, in persist_csv_rows
2019-10-14 01:45:18,676 src.load.bulk.target_postgres {} ERROR:     cur.copy_expert(copy, csv_rows)
2019-10-14 01:45:18,679 src.load.bulk.target_postgres {} ERROR: psycopg2.errors.NotNullViolation: null value in column "location" violates not-null constraint
2019-10-14 01:45:18,682 src.load.bulk.target_postgres {} ERROR: DETAIL:  Failing row contains (3f61fc1debcf5fddbf4966730369104cda9ded40,  ...

From the above, we can see that our new column location__raw_id is being created and made nullable, but our existing column, location is retaining it's NOT NULL constraint, ultimately causing the load to fail.

Proposal

Make the upsert logic handle absence of a key in the Singer Schema as making the corresponding column nullable.

JSON Schema $ref is not supported

Add support for self referencing $ref (reference is within the same document).
Identify level of effort for resolving HTTP(S) $refs. Then comment and collaborate to determine if level of effort meets benefit.

Date-Time validation when syncing from tap

I am attempting to leverage this target running it against my tap I get a ValidationError for my date-time 2018-12-03T16:59:03 in my schema I do specify this as a string with a format date-time... what do you believe I am missing?

Here is a sample of the output from the console:

INFO GET https://noho.api.rentmanager.com/Tenants?fields=FirstName,IsCompany,IsProspect,LastName,Name,PropertyID,RentDueDay,RentPeriod,Status,TenantID,UpdateDate&pagesize=100&pagenumber=1
INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 8.24599003791809, "tags": {"http_status_code": 200, "status": "succeeded"}}
CRITICAL ('Invalid records detected above threshold: 0. See `.args` for details.', [(<ValidationError: "'2018-12-03T16:59:03' is not a 'date-time'">, {'type': 'RECORD', 'stream': 'tenants', 'record': {'TenantID': 554, 'Name': 'Michael Andrews', 'FirstName': 'Michael', 'LastName': 'Andrews', 'IsCompany': False, 'RentDueDay': 1, 'RentPeriod': 'Monthly', 'IsProspect': False, 'PropertyID': 90, 'UpdateDate': '2018-12-03T16:59:03', 'Status': 'Past'}})])
WARNING Invalid records detected for stream tenants: [(<ValidationError: "'2018-12-03T16:59:03' is not a 'date-time'">, {'type': 'RECORD', 'stream': 'tenants', 'record': {'TenantID': 554, 'Name': 'Michael Andrews', 'FirstName': 'Michael', 'LastName': 'Andrews', 'IsCompany': False, 'RentDueDay': 1, 'RentPeriod': 'Monthly', 'IsProspect': False, 'PropertyID': 90, 'UpdateDate': '2018-12-03T16:59:03', 'Status': 'Past'}})]
Traceback (most recent call last):
  File "c:\users\jim\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\jim\appdata\local\programs\python\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Jim\AppData\Local\Programs\Python\Python37\Scripts\target-postgres.exe\__main__.py", line 9, in <module>
  File "c:\users\jim\appdata\local\programs\python\python37\lib\site-packages\target_postgres\__init__.py", line 36, in cli
    main(args.config)
  File "c:\users\jim\appdata\local\programs\python\python37\lib\site-packages\target_postgres\__init__.py", line 30, in main
    target_tools.main(postgres_target)
  File "c:\users\jim\appdata\local\programs\python\python37\lib\site-packages\target_postgres\target_tools.py", line 25, in main
    stream_to_target(input_stream, target, config=config)
  File "c:\users\jim\appdata\local\programs\python\python37\lib\site-packages\target_postgres\target_tools.py", line 69, in stream_to_target
    raise e
  File "c:\users\jim\appdata\local\programs\python\python37\lib\site-packages\target_postgres\target_tools.py", line 58, in stream_to_target
    line)
  File "c:\users\jim\appdata\local\programs\python\python37\lib\site-packages\target_postgres\target_tools.py", line 151, in _line_handler
    streams[line_data['stream']].add_record_message(line_data)
  File "c:\users\jim\appdata\local\programs\python\python37\lib\site-packages\target_postgres\singer_stream.py", line 156, in add_record_message
    self.invalid_records)

If you need an example of the schema file or anything else please let me know.

Supported Versions of :all-the-things:

Problem

Presently we offer no guidance as to what versions of any of our dependencies we support/rely on.

The two which come to mind most readily:

PostgreSQL
JSON Schema

There are significant differences between each version of PSQL and each draft of JSON Schema.

Suggestion

For JSON Schema we pick a single version we support. For PSQL, I think that supporting multiple versions is reasonable and should be as simple as updating our CI to fan out over various versions of PSQL.

Suggested Musical Pairing

https://soundcloud.com/sylvanesso/hskt

Add to Pypi

https://github.com/statsbotco/target-postgres has taken the pypi project target-postgres though hasn't been updated in a few months and has a blank readme. Maybe the author would be willing to give it up?

An alternative may be singer-target-postgres

Types cannot change / mixed types

If a column type changes, the target should be able to handle that. There is no Singer standard for handling this, so we could look to Stitches standard. Though I think we can use JSON Schema to our advantage, since at least for now, we are assuming it is always available, and not handling any given JSON.

I think this comes down to:

Identify a set of rules for handling data type changes. Try to maximize user experience, and minimize negative consequences of changing the schema.
Implement those rules and create tests around each combination of type change cases.

Empty `key_properties` not tested

Motivation

I think there is a bug for any stream which is nested (ie, will result in multiple SQL Tables) which does not have key_properties set. The bug specifically is that I don't think any of the resulting child tables will have any keys which would mean the data is disconnected in PostgreSQL.

The goal of this issue is to add an explicit test for this case, and then if it's broken...fix it...

Suggested Musical Pairing

https://soundcloud.com/maverick-sabre/come-fly-away-1

`SQLInterface`: Logger, exceptions

Motivation

SQLInterface has no Error class, nor does it have a Logger. Both of these things are relatively simple, and would make surfacing problems to users much simpler.

Suggested Musical Pairing

https://soundcloud.com/kevinmorby/beautiful-strangers

Can only split a column once

Motivation

While working on #18 I found an overlooked bug in the column splitting logic. If you attempt to split a column over multiple types (ie, str, int, bool) our current logic will fail. Whereas, it should simply add the new column and mapping and call it a day.

Suggested Musical Pairing

https://soundcloud.com/wmi/metronomy-04-love-letters-2?in=metronomy/sets/love-letters-19

Pass STATE messages to stdout

In order for taps that allow incremental processing to work properly, they must have state data that informs where to pick up during the next sync. Currently, this target does not emit state messages.

This behavior seems to run counter to the specs. Notably, the Running and Developing Singer Taps and Targets document states:

Write State messages to stdout once all data that appeared in the stream before the State message has been processed by the Target. Note that although the State message is sent into the target, in most cases the Target's process won't actually store it anywhere or do anything with it other than repeat it back to stdout.

Failure on Adding a column that already exists.

When running tap-hubspot | target-redshift, during the writing of the initial batch, an error occurs caused by the trying to add a column that already exists:

INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.6819870471954346, "tags": {"endpoint": "email", "http_status_code": 200, "status": "succeeded"}}
INFO email_events - Writing batch (35631)
ERROR Exception writing records
Traceback (most recent call last):
  File "/Users/kevinsanz/.virtualenvs/target-sql/lib/python3.6/site-packages/target_sql/target_sql.py", line 151, in write_batch
    target_table_version)
  File "/Users/kevinsanz/.virtualenvs/target-sql/lib/python3.6/site-packages/target_sql/target_sql.py", line 459, in upsert_table_schema
    schema)
  File "/Users/kevinsanz/.virtualenvs/target-sql/lib/python3.6/site-packages/target_sql/target_sql.py", line 765, in merge_put_schemas
    default_value)
  File "/Users/kevinsanz/.virtualenvs/target-sql/lib/python3.6/site-packages/target_sql/target_sql.py", line 750, in add_column
    default_value=default_value))
psycopg2.ProgrammingError: column "appid" of relation "email_events" already exists

CRITICAL column "appid" of relation "email_events" already exists

CRITICAL column "appid" of relation "email_events" already exists

Not sure why this is happening given that merge_put_schemas seems to account for this.

Performance Testing

Ideally, we would have a script/CLI that could test / simulate different loads

Testing for load size in rows (10k, 100k, 1m, 10m etc rows)
Testing for load currency in number of tables (as well as num rows)

JSON Schema combinations such as "anyOf" and "allOf" are not supported

Allow for combining schemas

No data in database if nested objects within array

Hi,
I think I have encountered a problem with nested objects in arrays. An object in an array is fine but when I have an object within another object in an array (array->object->object) problem arises. No data is seen in that last level object even if I can see in the record data that values exist. If I instead have an array at the last level (array->object->array) it behaves as expected.

I have done tests with this in the module test_sandbox using a modified CATS_SCHEMA and the FakeStream.generate_record_message(also modified to match my schema) to generate test data. If I in the test make the last object's properties optional by "type": ["null", "string"] for example, everything passes because the object's properties are all null. However, if I use "type": "string" for a property, the test fails with:

CRITICAL ('Exception writing records', IntegrityError('null value in column "adoption__immunizations__vaccination_type__shot" violates not-null constraint\nDETAIL: Failing row contains (Rabies, 2537-09-12 15:34:00+02, null, 1, 1554384634, 0).\nCONTEXT: COPY tmp_36a9e5fb_ac07_49f0_ac26_8fa9f1f69885, line 1: "Rabies,2537-09-12 13:34:00.0000+00:00,NULL,1,1554384634,0"\n'))

To me it seems like this case isn't handled and I can only find tests of array_of_array and object_of_object in the tests directory. Does anyone have a clue and can point me in the right direction?

I will attach a snippet of what I added to the test_sandbox.py file so that you can test my examples and easier understand my problem.

class CatStreamObject(SandboxStream):
    stream = [
        {"type": "SCHEMA",
         "stream": "cats",
         "schema": {
             "additionalProperties": False,
             "properties": {
                 "id": {"type": "integer"},
                 "name": {"type": ["string"]},
                 "paw_size": {"type": ["integer"],
                              "default": 314159},
                 "paw_colour": {"type": "string", "default": ""},
                 "flea_check_complete": {"type": ["boolean"],
                                         "default": False},
                 "pattern": {"type": ["null", "string"]},
                 "age": {"type": ["null", "integer"]},
                 "adoption": {"type": ["object", "null"],
                              "properties": {"adopted_on": {
                                  "type": ["null", "string"],
                                  "format": "date-time"},
                                  "was_foster": {
                                      "type": "boolean"},
                                  "immunizations": {
                                      "type": ["null",
                                               "array"],
                                      "items": {
                                          "type": [
                                              "object"],
                                          "properties": {
                                              "type": {
                                                  "type": [
                                                      "null",
                                                      "string"]},
                                              "date_administered": {
                                                  "type": [
                                                      "null",
                                                      "string"],
                                                  "format": "date-time"},
                                              "vaccination_type": {
                                                  "type": "object",
                                                  "properties": {
                                                      "shot": {
                                                          "type": [
                                                              "null",
                                                              "string"]}}}}}}}}}},
         "key_properties": ["id"]},
        {"type": "RECORD",
         "stream": "cats",
         "record": {
             "id": 1,
             "name": "Morgan",
             "pattern": "Tortoiseshell",
             "age": 14,
             "adoption": {
                 "adopted_on": "2633-01-02T00:11:00",
                 "was_foster": False,
                 "immunizations": [
                     {"type": "Rabies", "date_administered": "2537-09-12T13:34:00",
                      "vaccination_type": {"shot": "Yes"}},
                     {"type": "Panleukopenia", "date_administered": "2889-03-01T17:18:00",
                      "vaccination_type": {"shot": "No"}},
                     {"type": "Feline Leukemia", "date_administered": "2599-08-08T07:47:00",
                      "vaccination_type": {"shot": "No"}},
                     {"type": "Feline Leukemia", "date_administered": "2902-04-14T01:34:00",
                      "vaccination_type": {"shot": "No"}}]}
         },
         "sequence": 1554384634}
    ]


def test_cat_stream_object__sandbox(db_cleanup):
    config = CONFIG.copy()
    main(config, input_stream=CatStreamObject())

    with psycopg2.connect(**TEST_DB) as conn:
        with conn.cursor() as cur:
            assert_tables_equal(cur,
                                {'cats', 'cats__adoption__immunizations'})

            assert_columns_equal(cur,
                                 'cats__adoption__immunizations',
                                 {
                                     ('_sdc_level_0_id', 'bigint', 'NO'),
                                     ('_sdc_sequence', 'bigint', 'YES'),
                                     ('_sdc_source_key_id', 'bigint', 'NO'),
                                     ('date_administered', 'timestamp with time zone', 'YES'),
                                     ('type', 'text', 'YES'),
                                     ('adoption__immunizations__vaccination_type__shot', 'text', 'YES')
                                 })

class CatStreamArray(SandboxStream):
    stream = [
        {"type": "SCHEMA",
         "stream": "cats",
         "schema": {
             "additionalProperties": False,
             "properties": {
                 "id": {"type": "integer"},
                 "name": {"type": ["string"]},
                 "paw_size": {"type": ["integer"],
                              "default": 314159},
                 "paw_colour": {"type": "string", "default": ""},
                 "flea_check_complete": {"type": ["boolean"],
                                         "default": False},
                 "pattern": {"type": ["null", "string"]},
                 "age": {"type": ["null", "integer"]},
                 "adoption": {"type": ["object", "null"],
                              "properties": {"adopted_on": {
                                  "type": ["null", "string"],
                                  "format": "date-time"},
                                  "was_foster": {
                                      "type": "boolean"},
                                  "immunizations": {
                                      "type": ["null",
                                               "array"],
                                      "items": {
                                          "type": [
                                              "object"],
                                          "properties": {
                                              "type": {
                                                  "type": [
                                                      "null",
                                                      "string"]},
                                              "date_administered": {
                                                  "type": [
                                                      "null",
                                                      "string"],
                                                  "format": "date-time"},
                                              "vaccination_type": {
                                                  "type": "array",
                                                  "items": {"type": "string"}}}}}}}}},
         "key_properties": ["id"]},
        {"type": "RECORD",
         "stream": "cats",
         "record": {
             "id": 1,
             "name": "Morgan",
             "pattern": "Tortoiseshell",
             "age": 14,
             "adoption": {
                 "adopted_on": "2633-01-02T00:11:00",
                 "was_foster": False,
                 "immunizations": [
                     {"type": "Rabies", "date_administered": "2537-09-12T13:34:00",
                      "vaccination_type": ["shot", "Yes"]},
                     {"type": "Panleukopenia", "date_administered": "2889-03-01T17:18:00",
                      "vaccination_type": ["shot", "No"]}]}
         },
         "sequence": 1554384634}
    ]


def test_cat_stream_array__sandbox(db_cleanup):
    config = CONFIG.copy()
    main(config, input_stream=CatStreamArray())

    with psycopg2.connect(**TEST_DB) as conn:
        with conn.cursor() as cur:
            assert_tables_equal(cur,
                                {'cats', 'cats__adoption__immunizations', 'cats__adoption__immunizations__vaccination_type'})

            assert_columns_equal(cur,
                                 'cats__adoption__immunizations',
                                 {
                                     ('_sdc_level_0_id', 'bigint', 'NO'),
                                     ('_sdc_sequence', 'bigint', 'YES'),
                                     ('_sdc_source_key_id', 'bigint', 'NO'),
                                     ('date_administered', 'timestamp with time zone', 'YES'),
                                     ('type', 'text', 'YES')
                                 })

Column name normalization and error handling

Currently, target-postgres blindly uses the JSON Schema property name as the column name https://github.com/datamill-co/target-postgres/blob/master/target_postgres/postgres.py#L597

However, postgres (and redshift) have limitations on what characters can be used in column names, and the length of a column name.

The target should attempt to normalize and shorten a column, and detect conflicts with other columns after doing so, throwing an exception if a conflict us detected.

From https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable. The SQL standard will not define a key word that contains digits or starts or ends with an underscore, so identifiers of this form are safe against possible conflict with future extensions of the standard.

The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier length is 63 bytes. If this limit is problematic, it can be raised by changing the NAMEDATALEN constant in src/include/pg_config_manual.h.

Optionally validate data using JSON Schema

As records stream in, validate them against their JSON Schema using a python JSON Schema library.

This validation should be optional and turned on/off via CLI flag or config JSON.

We may want to optionally allow for a certain number of records that would error and reject, similar to Redshift loading. ex allow for 5 erroneous documents.

Table Name Normalization

Motivation

Split out from: #18

Presently when we DO NOT handle any canonicalization in regards to table names.

Notes

We could add metadata to the root table's metadata which maps paths to a canonicalized table name.

ie, something in the metadata like:

{'table_mappings':
  {('root_table_NAME'): 'root_table_name',
   ('root_table_NAME', 'sub_field_abc'): 'root_table_name__sub_field_abc',
   ('root_table_NAME', 'sub_field_abc', 'sub_sub'): 'root_table_name__sub_field_abc__sub_sub',
   ('root_table_NAME', 'suppper....lonngfield...thiswillbetruncated'): 'root_table_name__suppper....lonngfield',

Suggested Musical Pairing

https://soundcloud.com/peterbjornjohn/up-against-the-wall

Metrics

The Singer standard includes metrics that are published via INFO log messages in a standard JSON format. target-postgres should publish some metrics for monitoring and diagnostic purposes.

Metrics should include:

SQL Query timings
Overall write_batch timings
Batch counts
Table persisted row counts (which may be higher than batch do to denesting)

Test more data cases

Test out more variations of data. Look to real world sources for examples.

Collaboration and contributions

How should members of the community help contribute and collaborate with the team at Data Mill?

From my experience I recommend joining singer-io.slack.com and asking questions in the general chat and on each tap/target's specific channel.

Start a conversation with the owners of the repo. Here specifically @awm33 @AlexanderMann then:

File new issue -> Fork -> PR -> Approval -> Merge.

Denesting arrays assumes items are not literals

At some points the code is assuming array items cannot be literals / scalar values. Currently it is failing to denest:

{
    ...
    "categories": [23, 22],
    ...
}

Unknown Column Path

Motivation

While using tap-hubspot, Singer Slack User Patryk Kallinowski ran into the following error:

target_postgres.postgres.PostgresError: ('Exception writing records', Exception("Unknown column path: ('properties', 'num_contacted_notes', 'value') for table: ('deals',)"))

After uploading his catalog.json file generated by tap-hubspot, the issue appears to be around this stream/sub-schema:

{
  "streams": [
    {
      "stream": "deals",
      "tap_stream_id": "deals",
      "schema": {
        "type": "object",
        "properties": {
          ...
          "properties": {
            "type": "object",
            "properties": {
              ...
              "num_contacted_notes": {
                "type": "object",
                "properties": {
                  "value": {
                    "type": ["null", "number", "string"]
                  },
                  ...
                }
              },
              ...
            }
          }
        }
      },
      "metadata": [
        ...
      ]
    }
  ]
}

At the very least, this error message can be improved to:

contain more context and information for those debugging
more descriptive/better language to identify what precisely has gone wrong
be associated with a function which has a complete docstring

And:

more testing to assure that whatever underlying bug (if any) is resolved

Notes

https://singer-io.slack.com/archives/C2TGFCZEV/p1552316668073400

catalog.json can't be uploaded as a part of this issue, available via download in the link above.

Suggested Musical Pairing

https://www.youtube.com/watch?v=PfrHCNo2I3M

Save properties not found in the jsonschema in a JSONB column

It makes a lot of sense that target-postgres tries to load data in a structured way into Postgres. The more that the database knows about the structure of the data, the faster and more efficient the database can be when querying it.

That said, lots of the singer taps don't or can't know the exact, specific structure of the data they are extracting before hand. Stitch itself deals with this in a variety of ways, but it does indeed deal with it some how. In my experience feeding web analytics event data from something like Segment into Redshift using Stitch, the event properties can be named whatever the heck developers want, and Stitch somewhat magically adds columns each time a new property is discovered. This is nice because no schema has to be updated anywhere and the pipeline doesn't break while someone figures out where to update it.

target-postgres doesn't need to be that smart, after all, you could just use Stitch, but, I think it could do a bit better than just dropping unknown columns data (see #127).

I think a nice middle ground would be to keep using the structured smarts that can create nice tables with good relational structure for things specified by the JSONSchema, but then loading unrecognized columns into a JSONB column called additional_properties or something like that. This would allow us to at least get a handle on the data from Postgres land, and if you want to break it out in to a real column or do something useful with it you could do so with a transform or SQL view or whatever. JSONB support is really quite nice in recent Postgres with all the partial indexing support and whatnot, and while it wouldn't be as fast as "real" columns, it can get close.

I think this may be undesirable behaviour for some users as well though -- they may be purposefully ignoring bad quality data in some columns, or discarding it because it's simply way too big. That's ok, and Singer's catalog functionality has mechanisms to ignore stuff that those people could use, and we should likely add a configuration option to disable this functionality in the event that folks don't want it. But, if I've learned anything as a pipeline developer, it's that that data is probably there because someone thought it was important, so being able to capture it and save it somehow would be great.

Sadly, JSONB support only landed in Postgres 9.4, which is much more recent than the 8.4.22 version mentioned in DECISIONS.md. 8.4.22 is 5 years old and unsupported by the Postgres community at this point, where as 9.4 is supported. So, we could consider upgrading the minimum required Postgres version, but that seems like a bit much for a net-new feature that people have been getting by without so far. So, I'd say that it'd be best to just detect JSONB support, and warn if the config option is not off in its absence or something like that. Is there precedent for that kind of behaviour in the project?

I will prepare a PR if y'all are into it!

PsycoPG2: Cannot bump to latest version

Motivation

While releasing 0.1.8, I tried to put PsycoPG2 to its latest dependency and ran into significant issues. I am currently unsure as to what happened.

Failing CI build here

Investigation into why this is happening (even just noting what in the CHANGELOG for PyscoPG2 has changed which might relate) is welcomed!

Housekeeping: Logger

Motivation

Noticed this lil' tidbit in some output recently:

/code/target_postgres/postgres.py:584: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead

Looks like warn has been deprecated for some time

Suggested Musical Pairing

https://soundcloud.com/jefferson-airplane-official/white-rabbit-4

Add commas to tests

In a dependant target, a comma in an enclosing field caused an issue.

Example - Here, the CSV parser would think the third column ends at "Sanford" causing a loading error:
1233,2018-01-04 08:23:34,"Sanford, Son, and Associates"

Most CSV parsers default to using double quotes (") as the field value encloser, but making that assumption caused issues.

We should add string values with commas to tests.

Performance: Use `CREATE TABLE ... LIKE ...`

Motivation

While working on datamill-co/target-snowflake#12 we figured out that leveraging the SQLInterface table upsert logic is far slower for the tmp staging tables than using a single command in remote.

Snowflake has the CREATE TABLE ... LIKE ... syntax that Postgres has as well.

Validate incoming JSON Schema

Validate incoming JSON Schema using a JSON Schema python library. Log a warning or throw an exception if the JSON Schema is not valid.

ValidationError: " '..' is not of type .. "

I bumped on following error using the tap-closeio on target-postgres.

target_postgres.singer_stream.SingerStreamError: ('Invalid records detected above threshold: 0. See .args` for details.', [(<ValidationError: "'2019-04-15T14:38:04.000000Z' is not of type 'object'">

It says that date: "2019-04-15T14: 38: 04.000000Z" behaves as an object type, while it should be a string with date-time format right?

I have added a file that contains the complete error, as well a file that shows my schema.

postgres_error.txt

catalog_activities.zip

Logging

Currently, target-postgres is a bit of a black box. We can't tell if it's progressing through the input Singer stream.

We want to:

Establish logging infrastructure within target-postgres so that code at any point can access the logger
Log each batch with stream name, table name, and record count
Log the host and database connection information, of course nothing sensitive (like password)

Connection is dropped

Hi, I'm trying to use target-postgres, but I'm having issue with it. It's seems that for large amount of data, the query is too long and the connection is dropped.

The only way to manually fix it would be to slip the input file, so to make the connections shorter. Is there another way around it ?

CRITICAL cursor already closed
Traceback (most recent call last):
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 111, in write_batch
    {'version': target_table_version})
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/sql_base.py", line 704, in write_batch_helper
    metadata)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 407, in write_table_batch
    {'version': remote_schema['version']})
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/sql_base.py", line 309, in upsert_table_helper
    table_name = self.add_table_mapping(connection, table_path, metadata)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 236, in add_table_mapping
    sql.Literal(root_table)))
psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/__init__.py", line 26, in main
    target_tools.main(postgres_target)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 25, in main
    stream_to_target(input_stream, target, config=config)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 69, in stream_to_target
    raise e
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 63, in stream_to_target
    _flush_streams(streams, target, force=True)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 88, in _flush_streams
    _flush_stream(target, stream_buffer)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 81, in _flush_stream
    target.write_batch(stream_buffer)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 117, in write_batch
    cur.execute('ROLLBACK;')
psycopg2.InterfaceError: cursor already closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".singer-postgres/bin/target-postgres", line 11, in <module>
    sys.exit(cli())
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/__init__.py", line 31, in cli
    main(args.config)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/__init__.py", line 26, in main
    target_tools.main(postgres_target)
psycopg2.InterfaceError: connection already closed

Using the tap-zendesk throws an exception

When I pipe the output from the tap-zendesk with the bellow schema it gives me this error:
CRITICAL ('Invalid records detected above threshold: 0. See.argsfor details.', [(<ValidationError: "None is not of type 'object'">, {'type': 'RECORD', 'stream': 'tickets', 'record': {'assignee_id': 389975897113, 'brand_id': 360003165813, 'collaborator_ids': [], 'created_at': '2019-09-03T19:10:47.000000Z', 'custom_fields': [{'id': 360025763994, 'value': None}
The schema is bellow:

{"type": "SCHEMA", "stream": "tickets", "schema": {"properties": {"organization_id": {"type": ["null", "integer"]}, "requester_id": {"type": ["null", "integer"]}, "problem_id": {"type": ["null", "integer"]}, "is_public": {"type": ["null", "boolean"]}, "description": {"type": ["null", "string"]}, "follower_ids": {"items": {"type": ["null", "integer"]}, "type": ["null", "array"]}, "submitter_id": {"type": ["null", "integer"]}, "generated_timestamp": {"type": ["null", "integer"]}, "brand_id": {"type": ["null", "integer"]}, "id": {"type": ["null", "integer"]}, "group_id": {"type": ["null", "integer"]}, "type": {"type": ["null", "string"]}, "recipient": {"type": ["null", "string"]}, "collaborator_ids": {"items": {"type": ["null", "integer"]}, "type": ["null", "array"]}, "tags": {"items": {"type": ["null", "string"]}, "type": ["null", "array"]}, "has_incidents": {"type": ["null", "boolean"]}, "created_at": {"format": "date-time", "type": ["null", "string"]}, "raw_subject": {"type": ["null", "string"]}, "status": {"type": ["null", "string"]}, "updated_at": {"format": "date-time", "type": ["null", "string"]}, "custom_fields": {"items": {"properties": {"id": {"type": ["null", "integer"]}, "value": {}}, "type": ["null", "object"]}, "type": ["null", "array"]}, "url": {"type": ["null", "string"]}, "allow_channelback": {"type": ["null", "boolean"]}, "allow_attachments": {"type": ["null", "boolean"]}, "due_at": {"format": "date-time", "type": ["null", "string"]}, "followup_ids": {"items": {"type": ["null", "integer"]}, "type": ["null", "array"]}, "priority": {"type": ["null", "string"]}, "assignee_id": {"type": ["null", "integer"]}, "subject": {"type": ["null", "string"]}, "external_id": {"type": ["null", "string"]}, "via": {"properties": {"source": {"properties": {"from": {"properties": {"name": {"type": ["null", "string"]}, "ticket_id": {"type": ["null", "integer"]}, "address": {"type": ["null", "string"]}, "subject": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "to": {"properties": {"address": {"type": ["null", "string"]}, "name": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "rel": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "channel": {"type": ["null", "string"]}}, "type": ["null", "object"]}, "ticket_form_id": {"type": ["null", "integer"]}, "satisfaction_rating": {"properties": {"id": {"type": ["null", "integer"]}, "assignee_id": {"type": ["null", "integer"]}, "group_id": {"type": ["null", "integer"]}, "reason_id": {"type": ["null", "integer"]}, "requester_id": {"type": ["null", "integer"]}, "ticket_id": {"type": ["null", "integer"]}, "updated_at": {"format": "date-time", "type": ["null", "string"]}, "created_at": {"format": "date-time", "type": ["null", "string"]}, "url": {"type": ["null", "string"]}, "score": {"type": ["null", "string"]}, "reason": {"type": ["null", "string"]}, "comment": {"type": ["null", "string"]}}, "type": ["null", "object", "string"]}, "sharing_agreement_ids": {"items": {"type": ["null", "integer"]}, "type": ["null", "array"]}, "email_cc_ids": {"items": {"type": ["null", "integer"]}, "type": ["null", "array"]}, "forum_topic_id": {"type": ["null", "integer"]}}, "type": ["null", "object"]}, "key_properties": ["id"]}

SQL Base/Interface: Use CRUD naming

Motivation

Presently SQLInterface is using mainly the following words for it's non-major API actions:

upsert
add
drop
get
serialize
canonicalize
make
migrate

I'd like to propose we tidy this up a bit and try to use something like CRUD naming to make each operation more atomic sounding in nature.

I think this would make on-boarding for someone new simpler.

Suggested Musical Pairing

https://soundcloud.com/groove-armada-1/at-the-river-3

Canonicalization of Stream Name (ie, allow uppercase, numeric, invalid string names)

Motivation

Stream names are not required per the Singer Spec to conform to anything other than to be "string"s...

Unfortunately for targets, this means that support for various taps requires supporting many varying formats etc.

Hacks and workarounds exist for this, but if a tap has a stream name which is upper case, often the user cannot edit/update this without forking the repo etc.

For those interested, please vote 👍 on this description for interest in this feature.

What we currently do

Presently, any string is allowed for any subtable. This means that if we detect a "nested table" in the data, we will create a "sub-table" where we canonicalize the "path" to that particular field.

ie, if we have a stream foobar which has an array field called B@z, that means the path to the array field is ("foobar", "B@z"). This path will be canonicalized as foobar__b_z.

The notable exception for this, is that stream names themselves, ie the "root" table name, must be strings which are:

lowercased
start with a letter
contain only alphanumeric characters and _ underscores

Why we do what we currently do

This logic was introduced in #60. This PR sought to allow for sub tables to be reasonable names derived from the data/field names provided.

At the time, this was because it was viewed that the data which comes out of a tap is well out of the user's control, but that often the stream names are already of this format (ie, many API based taps stream's are lowercase etc.).

An attempt was made to make it such that stream names could be of any string format. However it was more work to do that at the time.

Proposals

Table name mappings are currently stored on the "root" table in a db. ie, the stream name table. These mappings take a path to a canonicalized name.

ie, (a, b, c, d) -> a__b__c__d etc.

Dedicated Metadata Table

We could move this mapping out of the JSON blob stored in the table's comment and into a dedicated table. Something akin to _datamill_table_metadata.

Load Table Mappings

Each table's metadata could store the path. When target_postgres starts, it can list all tables present in the remote, iterate through them and fetch the raw_name thus building up a cache in memory of our path to canonicalized name.

This could even be done via sql with something like:

SELECT CAST(obj_description(oid) AS json)->'path'
FROM pg_class
WHERE relkind = 'r'

Suggested Musical Pairing

https://www.youtube.com/watch?v=crjugtkXZN4

Bump dependencies

Motivation

Saw the following while working on getting some CLI tests in place:

target-postgres 0.0.1 has requirement singer-python==5.0.12, but you'll have singer-python 5.3.3 which is incompatible.
Installing collected packages: pytz, singer-python, idna, certifi, chardet, urllib3, requests, tap-github
...

I'd recommend a few things:

get some dependencies checker setup for our CI. Something that could be hooked into Github Checks would be best
Bump all deps to latest. At the very least the singer-python dependency

Suggested Musical Pairing

https://soundcloud.com/labi-siffre/i-got-the-2006-remastered

Decision - When / how to split off Redshift

When is the best time to split off the codebase into:

datamill-co/singer-sql-target-tools (better names welcome)
datamill-co/target-redshift

A lot of the issues / features to be worked on overlap, but may require branching logic based on posgres vs redshift.

Is it best to do this after more work is done on postgres?

How best to do this?

Confusion around STATE messages

We have been attempting to use the new STATE message support in target-postgres to enable LOG_BASED syncing. We use tap-postgres as the corresponding tap and meltano to orchestrate the ELT process.

What we are experiencing is captured in a meltano ticket. However, based on input from a developer on the project, it appears that the root issue is due to a difference in how this target is handling STATE versus how it is handled elsewhere. Quoting from that ticket:

Looking at #130 it seems like they are indeed sending the full STATE message downstream, whereas Singer themselves only write the value out in their targets.
See https://github.com/singer-io/target-csv/blob/a98d4633a97e3b7183a99a4f790722535b97c238/target_csv.py#L91 for target-csv example.

I'd also note that the singer-target-template and target-gsheet duplicate the approach taken by target-csv.

Large datasets cause extremely slow imports

I'm currently using this target with the zendesk tap. We're using this to keep a local copy of the zendesk data, which we use to run daily reports, along with some additional use cases. I've had the import running for several days and noticed that cpu was spiking on my db instance along with some queries taking several minutes to run. After doing a bit of digging, I found that some of my tables were already into the millions of records and needed some indexes.

Specifically, the indexes needed were around the _sdc_sequence and _sdc_level_{}_id columns.

I'm wondering what the thoughts were around potentially making a PR to automatically create these indexes or if maybe it would be better to just have that noted in the documentation as to specific fields that might need to have indexes as the tables start growing.

For specifics, at the moment, these are what my largest tables (over 100k records) look like:

table	count
tickets__custom_fields	5,418,717
ticket_audits__events	2,466,633
ticket_audits__events__value	461,196
tickets__tags	342,643
ticket_audits	234,680
ticket_audits__events__previous_value	118,678

Denesting - is there a way to not denest primitive arrays?

Denesting makes sense for complex objects, but if my postgres tap pulls out an integer[] (integer array), the target always denests it into a separate table. Is there a way for the target to insert it as an integer array?

Below is my partial config relating to the integer[] column.

         {
            "breadcrumb": [
               "properties",
               "label_ids"
            ],
            "metadata": {
               "sql-datatype": "integer[]",
               "inclusion": "available",
               "selected-by-default": true,
               "selected": true
            }
         }

            "label_ids": {
               "type": "array",
               "items": {
                  "type": "integer"
               }
            }

Nested values throw/are treated differently

Motivation

When denesting values/records, errors are thrown, and the internal mechanisms for denesting are handling data in inherently different ways.

Presently I'm unsure whether this is intentional or whether this is a 🐛.

Examples

`denest_record` vs `denest_subrecord

Nested dictionaries

[{'x0':
    {'x1': {...}}}]

Throws invalid number of args exception originating from denest_subrecord.

[{'x0':
    {'x1': 123}}]

Results in:

{'root_table__x0': [
    {'x1': 123}
  ]}

Nested nulls

[{'x0':
    {'x1': None}}]

Results in:

{'root_table__x0': [
    {'x1': None}
  ]}

[{'x0': None}]

Results in:

{'root_table': [
    {}
  ]}

For this, there appears to be a discrepancy between how denest_record and denest_subrecord handle None values. This is less of a problem and more of a 😕 when looking at the code. Especially because of the comment present only in denest_record:

"## nulls mess up nested objects"

Suggested Musical Pairing

https://soundcloud.com/tiga/tiga-the-martinez-brothers-aphex-n-girls?in=tiga/sets/tiga-the-martinez-brothers-2

Stabilize V0 of `target-postgres`

Motivation

Target-Postgres has been changing dramatically for some time. While many of these changes are bug fixes, or straight features, many are modifying the underlying metadata/schema and are not purely "breaking" from a "can I rebase my branch onto this" standpoint.

As such, I'm proposing a new Milestone/Issue to gather up what bugs and housekeeping we want to get in place before going under a semvar versioning guide for persisted metadata/schema/data ONLY. ie, we still will not have to play nice with those who wish to write code on top of our repo, but we will play nice with future versions of our code and we will support the data which is persisted by v0.

Notes

Please understand that while I am in full support of making life simpler for other devs to work on top of our work, I'm much more concerned about Datamill or someone trying to use this and then getting a tonne of data moved over...which then has to be blown away because we don't know how to migrate etc.

Outstanding Work

On the top of my mind, the work I think that needs to be done:

in flight feature work:
- #52
refactor table mappings and column mappings to use similar/same structure
- #44
nested support as noted in:
- #42
- #58
Better testing
- Full integration test against Github/BigCommerce/whatever in CI (could be low hanging fruit for bugs etc.)

Suggested Musical Pairing

https://soundcloud.com/hotchip/huarache-lights

datamill-co / target-postgres Goto Github PK

target-postgres's Introduction

Target Postgres

Features

Install

Usage

Config.json

Supported Versions

Known Limitations

Indexes

Usage Logging

Developing

Documentation

Docker

DB

PyTest

Collaboration and Contributions

Sponsorship

target-postgres's People

Contributors

Stargazers

Watchers

Forkers

target-postgres's Issues

Motivation

Suggested Musical Pairing

Motivation

Suggested Musical Pairing

Motivation

Suggested Musical Pairing

Motivation

Question

Suggested Musical Pairing

Motivation

Ex

Proposal

Problem

Suggestion

Suggested Musical Pairing

Motivation

Suggested Musical Pairing

Motivation

Suggested Musical Pairing

Motivation

Suggested Musical Pairing

Motivation

Notes

Suggested Musical Pairing

Motivation

Notes

Suggested Musical Pairing

Motivation

Motivation

Suggested Musical Pairing

Motivation

Motivation

Suggested Musical Pairing

Motivation

What we currently do

Why we do what we currently do

Proposals

Dedicated Metadata Table

Load Table Mappings

Suggested Musical Pairing

Motivation

Suggested Musical Pairing

Motivation

Examples

denest_record vs `denest_subrecord

Nested dictionaries

Nested nulls

Suggested Musical Pairing

Motivation

Notes

Outstanding Work

Suggested Musical Pairing

Recommend Projects

Recommend Topics

Recommend Org

`denest_record` vs `denest_subrecord