eulerto / wal2json Goto Github PK
View Code? Open in Web Editor NEWJSON output plugin for changeset extraction
License: BSD 3-Clause "New" or "Revised" License
JSON output plugin for changeset extraction
License: BSD 3-Clause "New" or "Revised" License
Hello,
This is more of a question than an issue.
Assuming I have 3 concurrent transactions A,B and C all started at the same time x.
B happens to commit before A , and A commits before C.
So the commit order is something in the line of :
B,A,C.
giving they are commit at different times, do they all get decoded in the same order of commit. Or is it possible that in an environment with high concurrency , it's possible that the decoding is not strictly followed by the order/time with which each transaction was committed ?
If it happens that the transaction could be decoded regardless of their commit time , then it does suffice to say that the transaction commit time is not a sure way to track consumed messages and start replaying from where one left off.
Is it then possible to have the LSN number for both a begin and a commit statement ?
I know in the current setup of wal2json and the output messages, the begin header , transaction message and the commit header are all in a single message making it impossible to track the commit LSN.
Could you please shed some light on this topic ?
regards,
NUMERICOID
is encoded as a number in the JSON output, but decimals/numeric values are typically encoded as strings to preserve the precision and lengths without floating point abiguity. Gut feel is wal2json should do the same thing.
Thoughts?
Hello,
I recently found out that bytea columns are not properly decoded. The bytea data comes in some
strange decoding that is not in any way similar to the source data.
Could this be some kind of bug ?
cheers .
rugging24
More a question than a bug report, as I don't know many details of Postgres allocator:
pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
/* ... */
old = MemoryContextSwitchTo(data->context);
/* ... */
switch (change->action)
{
case REORDER_BUFFER_CHANGE_INSERT:
if (change->data.tp.newtuple == NULL)
{
elog(WARNING, "no tuple data for INSERT in table \"%s\"", NameStr(class_form->relname));
return;
}
break;
/* ... */
MemoryContextSwitchTo(old);
MemoryContextReset(data->context);
}
if the return is early, isn't the memory context left in an inconsistent state?
Thank you.
Currently, the plugin outputs JSON in pieces (got data and sent it). The receiver has to maintain the state to know if a complete JSON object has already arrived then it can process it.
At least for the SQL interface POV, it would be a good idea to support buffering JSON object until its end. It would avoid maintain the state in the receiver. The drawback is long transactions would consume a lot of server memory.
PS> unfortunately, pg_recvlogical can't take advantage of this option because you can only provide parameters at startup. Of course, you could choose a convenient default for all your streaming.
Hello again,
While testing some code path, I found that the content_size isn't used in every code path of pg_decode_mesage(), which can lead to erreoneous out. For instance:
rjuju=# SELECT pg_logical_emit_message(false, 'wal2json', 'a very long message');
pg_logical_emit_message
-------------------------
0/629A5358
(1 row)
rjuju=# SELECT pg_logical_emit_message(false, 'wal2json', 'meh');
pg_logical_emit_message
-------------------------
0/629A5398
(1 row)
rjuju=# select * from pg_logical_slot_peek_changes('wal2json', null, null, 'pretty-print', '1');
lsn | xid | data
------------+-----+----------------------------------------------------------
0/629A5358 | 0 | { +
| | "change": [ +
| | { +
| | "kind": "message", +
| | "transactional": false, +
| | "prefix": "wal2json", +
| | "content": "a very long message"+
| | } +
| | ] +
| | }
0/629A5398 | 0 | { +
| | "change": [ +
| | { +
| | "kind": "message", +
| | "transactional": false, +
| | "prefix": "wal2json", +
| | "content": "meh" +
| | } +
| | ] +
| | }
(2 rows)
rjuju=# select * from pg_logical_slot_peek_changes('wal2json', null, null, 'pretty-print', '0');
lsn | xid | data
------------+-----+-----------------------------------------------------------------------------------------------------------
0/629A5358 | 0 | {"change":[{"kind":"message","transactional":false,"prefix":"wal2json","content":"a very long message"}]}
0/629A5398 | 0 | {"change":[{"kind":"message","transactional":false,"prefix":"wal2json","content":"mehery long message"}]}
(2 rows)
(see last two messages content).
I didn't read upstream infrastructure, but I assume that for performance issue content
isn't zeroed on every call, so you can't rely on it being NULL-terminated.
I wanted to provide a fix for this, but I have some questions.
I'm not sure why when pretty-print is asked you do a appendBinaryStringInfo() and otherwise call quote_escape_json(). I assume that quote_escape_json() should be called in both cases, or is there something I missed?
If quote_escape_json() has to be called() for both, I think we can ad a Size parameter, and iterate as an array instead of currrent pointer loop. We could also t manually call enlargeStringInfo() with the given size, instead of relying on it doubling it's size autmatically. That probably won't be a huge win, but it might still win some cycles.
What do you thing?
Hi, I've been trying to use wal2son with kafka connect. However, I am not able to compile because the directory/file containing pgxs.mk cannot be found
I'm using Ubuntu 10.04 and postgres is installed under /usr/lib but it doesn't have the .mk file
I also tried installing postgresql from the source still didn't work. Any help would be greatly appreciated :)
Hi, the Debezium project is interested in using this plug-in; would it be possible to cut an official release and/or tag so we have a stable version we can refer to? I'm a bit reluctant to pull in just the HEAD of master as it's a moving target inherently. So we currently refer a specific commit id, but having a named tag would be better of course. Thanks a lot for considering this request!
I use wal2json and psycopg2 to got postgresql data change. I have a question on the format of datatime. In psql I have column type timestamp with timezone
In psql console:
postgres=# select create_time from test limit 1;
create_time
-------------------------------
2018-05-18 03:06:22.446039+00
In some graph ide like DataGrip (after I click on the column)
create_time
2018-05-18 03:06:22.446039 +00:00
The wal2json return '2018-05-18 03:06:22.446039+00'. But I look at https://docs.python.org/3/library/datetime.html The %z
is '+0000' have 4 digit.
Does I got some misunderstanding on the stand?
I don't think the current defaults are particularly sensible:
include-xid = true: xids are only useful if you know what to do with them, as well as LSN. Otherwise they are just a source of noise to be filtered out. See test suite.
write-in-chunks = true: with this value records emitted are not valid JSON objects and a streaming parser is required. I don't even know what is it useful for
piro=# SELECT data::json FROM pg_logical_slot_peek_changes('jslot', NULL, NULL, 'write-in-chunks', 'f');
data
---------------
{"change":[]}
(1 row)
piro=# SELECT data::json FROM pg_logical_slot_peek_changes('jslot', NULL, NULL, 'write-in-chunks', 't');
ERROR: invalid input syntax for type json
DETAIL: The input string ended unexpectedly.
CONTEXT: JSON data, line 1: {"change":[
I don't know what is the current policy about backwards compatibility, but I think these values should default to false.
Hi,
I wondered if it would be possible to (optionally) include the SESSION_USER and application_name values for the backend that committed a transaction in the json output?
Thanks.
When a qutoed identifiers are used - like when a table name contains space - then the JSON generator is not escaping the quotes and as a result the JSON is invalid.
Hi,
I noticed that sometimes json is not terminated with the final line "]}". After the final change array element I have directly a line similar to this:
'{"xid":000000,"change":['
and next the other changes and eventually the json terminating line.
This makes json validation more difficult.
We are using wal2json with Debezium to replicate some db tables containing large rows. While streaming changes, we have encountered the following error:
2018-07-27T19:54:09.616547179Z ERROR: no known snapshots
--
2018-07-27T19:54:09.616587252Z CONTEXT: slot "debezium", output plugin "wal2json", in the change callback, associated LSN 0/16D3850
We think this is a bug with wal2json's handling of TOAST columns, since this behavior only started happening after commit ce82d73
I have a more detailed analysis, including steps to reproduce the issue, up on Debezium's bugtracker:
https://issues.jboss.org/browse/DBZ-842
Hi, for columns such as NUMERIC(10,3)
, we'll only see NUMERIC
as a column type in corresponding messages, i.e. precision and scale are gone. It'd be great to have them, though, in order to derive corresponding schemas on the consumer side.
I'm not sure whether this (easily) can be done in wal2json, so I'm opening this to get a discussion started. For sure it'd be something were we'd benefit very much from in Debezium.
This is more of a query than an issue.
The LSN for each change comes with key as 'nextlsn' even though the value is the actual LSN. This gives the impression that the value given by wal2json for the 'nextlsn' key is for the next change rather than the current one. This threw me off.
The code says that it is the value pointing to the next commit record.
Output using JDBC and setting the slot option include-lsn
to true
:
{
"xid": 740,
"nextlsn": "0/18154A8",
"timestamp": "2018-02-16 12:46:40.877369+05:30",
"change": [
{
"kind": "insert",
"schema": "public",
"table": "test_table",
"columnnames": ["pk", "name"],
"columntypes": ["integer", "text"],
"columnvalues": [160, "val1"]
}
]
}
-----
{
"xid": 741,
"nextlsn": "0/1815558",
"timestamp": "2018-02-16 12:46:43.485518+05:30",
"change": [
{
"kind": "insert",
"schema": "public",
"table": "test_table",
"columnnames": ["pk", "name"],
"columntypes": ["integer", "text"],
"columnvalues": [161, "val2"]
}
]
}
-----
{
"xid": 742,
"nextlsn": "0/1815608",
"timestamp": "2018-02-16 12:46:46.437739+05:30",
"change": [
{
"kind": "insert",
"schema": "public",
"table": "test_table",
"columnnames": ["pk", "name"],
"columntypes": ["integer", "text"],
"columnvalues": [162, "val3"]
}
]
}
Output from psql:
SELECT * FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL);
lsn | xid | data
-----------+-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
0/18154A8 | 740 | {"change":[{"kind":"insert","schema":"public","table":"test_table","columnnames":["pk","name"],"columntypes":["integer","text"],"columnvalues":[160,"val1"]}]}
0/1815558 | 741 | {"change":[{"kind":"insert","schema":"public","table":"test_table","columnnames":["pk","name"],"columntypes":["integer","text"],"columnvalues":[161,"val2"]}]}
0/1815608 | 742 | {"change":[{"kind":"insert","schema":"public","table":"test_table","columnnames":["pk","name"],"columntypes":["integer","text"],"columnvalues":[162,"val3"]}]}
As you can see the nextlsn
from wal2json maps to the lsn
column given by Postgres.
Can someone clear the confusion?
The extension doesn't compile with lots of various errors.
Hello,
I have postgres 9.5 installed on Ubuntu and trying to install wal2json.
During execution of USE_PGXS=1 make or USE_PGXS=1 make install, I'm getting below error.
"/opt/wal2json# USE_PGXS=1 make
You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application.
make: *** No targets. Stop.
:/opt/wal2json# USE_PGXS=1 make install
You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application.
make: *** No rule to make target 'install'. Stop."
Please suggest.
Some connectors, namely Python psycopg2 in my case, use type OIDs directly for internal type conversion methods. In these cases it is far simpler to provide these OIDs directly and not rely on the type names.
Since a JSON output is a better API than the text-based example provided by Postgres, we should accommodate softwares using already existing connectors and frameworks. I hope this feature can be useful for a lot of people out there.
I'm providing a pull request for this feature, and am at your disposal for corrections or improvements.
Is there any way to figure out which user caused the changes? thanks so much.
btw. thanks for this awesome plugin :)
The JSON specification requires that control chars are properly escaped. If the string value contains control chars then it is sent as is without conversion to escape sequence. This causes JSON parsers to fail.
I have data in a table the holds valid hex \x
sequences (e.g. "\x1b"
for an encoded [ESC] character). The following code in quote_escape_json
seems to be the culprit:
/* XXX suppress \x in bytea field? */
if (ch == '\\' && *(valptr + 1) == 'x')
{
valptr++;
continue;
}
With this, my "\x1b"
value is emitted from this plugin as "\1b"
.
Even if this is right thing to do for bytea values (not convinced it is), it is not correct for other textual data.
Logical replication will emit events for materialized views, which will not have primary keys. Depending on the complexity of the mat view, this can result in many empty warnings which will saturate the backend. Is there a way around this?
Hi
There an error when build wal2json on SmartOS(base on Solaris), I have researched it on google few hours but not found any valuable information.
any help would be greatly appreciated :)
my env info:
[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# gcc --version
gcc (GCC) 4.9.4
[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# make --version
GNU Make 4.1
[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# ld --version
GNU ld (GNU Binutils) 2.26.1
postgres=# select version();
version
---------------------------------------------------------------------------------
PostgreSQL 9.5.9 on x86_64-sun-solaris2.11, compiled by gcc (GCC) 4.9.4, 64-bit
and the error:
[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# USE_PGXS=1 make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -pipe -O2 -pipe -O2 -I/opt/local/include -I/opt/local/include/ncurses -DLDAP_DEPRECATED -I/usr/include -fPIC -L/opt/local/lib -L/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4 -Wl,-R/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4 -L/opt/local/lib -Wl,-R/opt/local/lib -L/usr/lib/amd64 -Wl,-R/usr/lib/amd64 -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -Wl,--as-needed -Wl,-R'/opt/local/lib' -shared -o wal2json.so wal2json.o
/opt/local/x86_64-sun-solaris2.11/bin/ld:/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4/../../../libgcc-unwind.map: file format not recognized; treating as linker script
/opt/local/x86_64-sun-solaris2.11/bin/ld:/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4/../../../libgcc-unwind.map:1: syntax error
collect2: error: ld returned 1 exit status
/opt/local/lib/postgresql/pgxs/src/makefiles/../../src/Makefile.port:22: recipe for target 'wal2json.so' failed
make: *** [wal2json.so] Error 1
Is it available get by update statement previous value in result diff? For example next queries
create table test_table(id NUMERIC PRIMARY KEY, value VARCHAR(200));
insert into test_table(id, value) values(1, 'original');
update test_table set value = 'modified' where id = 1;
generate next diff
{
"xid": 8273,
"change": [
{
"kind": "insert",
"schema": "public",
"table": "test_table",
"columnnames": ["id", "value"],
"columntypes": ["numeric", "varchar"],
"columnvalues": [1, "original"]
}
]
}
{
"xid": 8274,
"change": [
{
"kind": "update",
"schema": "public",
"table": "test_table",
"columnnames": ["id", "value"],
"columntypes": ["numeric", "varchar"],
"columnvalues": [1, "modified"],
"oldkeys": {
"keynames": ["id"],
"keytypes": ["numeric"],
"keyvalues": [1]
}
}
]
}
It not allow propagate changes to external system. For example tables can be connected to one document and send to elasticsearch. After update some reference field we should invalidate previous value(it can be back-reference)
Hi,
I'm using version 9.6.3 and cannot load the wal2json extension. I cannot see it when i list the available extensions as well.
I can however work with 'auto_complete' and apparently they were both added together.
Currently, the output data structure is:
{"change":[{"kind":"update","table":"mytable2",
"columnnames":["id","name"],
"columnvalues":[400,"Update 1"],
"oldkeys":{"keynames":["id"],"keyvalues":[400]}}]
}
For postprocessing purposes, it would be more appropriate to have this format:
{"change":[{"kind":"update","table":"mytable2",
"changes":{"id":400, "name":"Update 1"},
"oldkeys":{"id":400}}]
}
I suggest this could be an option
changes-data-structure = two-arrays | key-value-hash
Thanks for considering.
When I try to select/get changes with the following parameter :
'include_xids': '1',
'include_timestamp': '1',
'include_schemas': '1',
'include_types': '1',
'pretty_print' : '1'
there's always a warning :
2017-04-04 22:32:44 CEST [27933-25] postgres@test_db WARNING: option include_xids = 1 is unknown
2017-04-04 22:32:44 CEST [27933-26] postgres@test_db CONTEXT: slot "test", output plugin "wal2json", in the startup callback
2017-04-04 22:32:44 CEST [27933-27] postgres@test_db WARNING: option include_timestamp = 1 is unknown
2017-04-04 22:32:44 CEST [27933-28] postgres@test_db CONTEXT: slot "test", output plugin "wal2json", in the startup callback
2017-04-04 22:32:44 CEST [27933-29] postgres@test_db WARNING: option include_schemas = 1 is unknown
Only the option : write-in-chunks
works without any warning.
Are these options supported ?
regards
Hi,
I noticed that various tests sometimes fail. I didn't explore the issue deeply but it looks like just a flaw of the tests, not a bug in the extension itself.
Example 1:
http://afiskon.ru/s/fc/53ae61ede5_regression.diffs.txt
http://afiskon.ru/s/d4/bd9a065311_regression.out.txt
Example 2:
http://afiskon.ru/s/79/11a4ea2430_regression.diffs.txt
http://afiskon.ru/s/35/1523b25c8b_regression.out.txt
The environment is Arch Linux x64, GCC 7.2.0, Core i7 3632QM 2200 Mhz (4 cores, 8 with HT).
I am having problems with "pg_recvlogical: could not clear search_path: ERROR: syntax error" on postgresql 9.6.8. Can anyone tell me how to solve it? Thank you
Hi,
I noticed a scenario in which events are resent. This can be found here: https://www.postgresql.org/message-id/CANcxKGKg5eof4Zm1fj-8EhirX9ZT4gnZyNoVEXqp74wruJvg8A%40mail.gmail.com
I suspect that this is related to the wal2json plugin.
Please let me know when you need more information or testing from my side.
Thanks you.
Hi
I have an issue with a table which has a primary key defined as DEFERRABLE INITIALLY DEFERRED
.
Here is postgres setup:
wal_level = logical
max_wal_senders = 3
max_replication_slots = 3
The PK is ignored and any data processed as an UPDATE or DELETE operation is not written by wal2json plugin.
Looking at the code, I found:
/* Make sure rd_replidindex is set */
RelationGetIndexList(relation);
...
if (!OidIsValid(relation->rd_replidindex) && relation->rd_rel->relreplident != REPLICA_IDENTITY_FULL)
RelationGetIndexList does not seem to set the right PK.
I created a small test case to show this behaviour (I updated the given sample).
$ cat /tmp/deferredPK.sql
CREATE TABLE table_with_deferred_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c) DEFERRABLE INITIALLY DEFERRED);
BEGIN;
INSERT INTO table_with_deferred_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table_with_deferred_pk (b, c) VALUES('Tuning', now());
INSERT INTO table_with_deferred_pk (b, c) VALUES('Replication', now());
UPDATE table_with_deferred_pk SET b = 'Tuning - update' where b = 'Tuning';
DELETE FROM table_with_deferred_pk WHERE a < 3;
COMMIT;
$ psql -At -f /tmp/deferredPK.sql postgres
CREATE TABLE
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
UPDATE 1
DELETE 2
COMMIT
We can see that only INSERT operations are written by wal2json
$ pg_recvlogical -d postgres --slot test_slot --start -o pretty-print=1 -f -
{
"change": [
]
}
WARNING: table "table_with_deferred_pk" without primary key or replica identity is nothing
WARNING: table "table_with_deferred_pk" without primary key or replica identity is nothing
WARNING: table "table_with_deferred_pk" without primary key or replica identity is nothing
{
"change": [
{
"kind": "insert",
"schema": "public",
"table": "table_with_deferred_pk",
"columnnames": ["a", "b", "c"],
"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
"columnvalues": [1, "Backup and Restore", "2018-04-19 07:19:02.867699"]
}
,{
"kind": "insert",
"schema": "public",
"table": "table_with_deferred_pk",
"columnnames": ["a", "b", "c"],
"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
"columnvalues": [2, "Tuning", "2018-04-19 07:19:02.867699"]
}
,{
"kind": "insert",
"schema": "public",
"table": "table_with_deferred_pk",
"columnnames": ["a", "b", "c"],
"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
"columnvalues": [3, "Replication", "2018-04-19 07:19:02.867699"]
}
]
}
Note:
DEFERRABLE INITIALLY IMMEDIATE
produces the same behaviourWhen one is building an Avro schema for the events it might be important to distinguish between optinal and non-optional fields for downstream clients. Would it be possible to add a flag to columns that they are optional?
Thanks!
Currently, pretty print JSON is the only option. It would be desirable to add an option to print a compact format.
Hi, we're seeking some clarity on the encoding used for sending the messages by wal2json. More specifically, is it safe to assume that the bytes can be interpreted using UTF-8?
It's nice to have an official docker image based on official postgres with at least 2 distributions: debian and alpine.
I find tracking how far behind the tip of the master branch we are is much easier with version numbers than commit hashes, and it would be good to know which commits you think warrant minor and major version number increases.
Cheers.
I see one of the use cases is to replicate a single table with known data schema to Kafka topic.
What do you thing about adding table
parameter to be used as a filter on relation name?
Maybe there is some other way?
UPD: I forgot to add that schema also should be filtered.
I have compiled wal2json against PostgreSQL 9.5, in Docker container of postgres:9.5
.
Then I am trying to load the plugin in PostgreSQL 9.5, again, Docker container, postgres:9.5-alpine
.
But here is what I get after SELECT * FROM pg_create_logical_replication_slot('export5lot', 'wal2json');
:
ERROR: could not load library "/usr/local/lib/postgresql/wal2json.so": Error relocating /usr/local/lib/postgresql/wal2json.so: __snprintf_chk: symbol not found
This is due to the Alpine Linux using musl
instead of glibc
.
I'm not quite sure what to do about it. It's been some time since my last C project.
Could you please see if there's a way to get around that? Perhaps some flag to use something else than __snprintf_chk
when building for Alpine?
Hi, we use wal2json to transfer incremental changes to an OLAP server (SnowFlake).
It's done in a batch after some period (nightly).
During the day, one table row may change several times.
By SnowFlake import, the changes are considered as a set, not an ordered list, so the duplications may cause the earlier change override the former. For instance:
START TRANSACTION;
UPDATE myTable2 SET name = 'Update AA' WHERE id = 401;
UPDATE myTable2 SET name = 'Update BB' WHERE id = 401;
COMMIT;
Leads to:
{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": [
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
]}
It would be great if wal2json
could add a number to each change
{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": [
{"sn":1, "kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"sn":12 "kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
]}
Thanks for considering.
When the event is processed downstream it is important to know what columns forms the primary key. Could you please provide a flag for columns that says that it is a part of primary key?
The readme and license file contain an {organization}
placeholder that should be replaced with something else.
Hi!
Executing tests from README.md I take this case:
[email protected]:moses_dev_db> SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', '1', 'write-in-chunks', '0');
WARNING: table "table_without_pk" without primary key or replica identity is nothing
+--------+
| data |
|--------|
| {
"change": [
]
} |
+--------+
SELECT 1
Time: 0.095s
But using test_decoding within another slot:
[email protected]:moses_dev_db> SELECT data FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL);
+-------------------------------------------------------------------------------------+
| data |
|-------------------------------------------------------------------------------------|
| BEGIN 1890 |
| table public.table_without_pk: UPDATE: a[integer]:1 b[numeric]:2.34 c[text]:'anta2' |
| COMMIT 1890 |
+-------------------------------------------------------------------------------------+
SELECT 3
Time: 0.084s
Looking the source code I see another possibly points where WARNINGs will be printed and data discated, how I can bypass this and don't lose data? For my case use test_decoding and implement data parser is better option?
Thanks a lot!
We use wal2json
to do daily batch of incremental updates of an OLAP db (SnowFlake).
The way the JSON is consumed is that the JSON structure is loaded by a SQL function as a table with a single column and one table row per text line; And then accessed using a JSON-parsing SQL functions.
Having multiple changes in a single text line makes processing it quite difficult - it is necessary (if not impossible) to iterate over the changes: [...]
array.
{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": [
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
]}
It would relieve a lot of work if the wal2json
plugin could optionally print a single change ,per JSON entry like this: one-change-per-entry = true
would lead to:
{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change":
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
}
{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change":
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
}
Along with #70, this would still keep each entry uniquely identifiable:
{"xid":1074, "sn":1, ... }
{"xid":1074, "sn":2, ... }
This way, feeding the data to the OLAP like SnowFlake would need no pre-processing, which would be a big improvement.
Thanks for considering.
Hi,
We are testing this plugin and got tons of log warnings ( > 1M lines) all referring to the same table from the output plugin due to a table without a PK. The table in question, isn't one that we are interested in changes from and is not part of the list we pass into the "add-tables" parameter when we connect.
Is it possible to ignore changes we're not interested in?
Thanks for any thoughts on this.
Connection request:
PGReplicationStream stream =
replConnection.getReplicationAPI()
.replicationStream()
.logical()
.withSlotName("app_decoder_slot")
.withSlotOption("add-tables", subscribedTables.join(','))
.withSlotOption("include-unchanged-toast", false)
.withStatusInterval(20, TimeUnit.SECONDS)
.start();
Sample from log output
... ] 82 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 83 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2B60
... ] 84 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 85 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2C30
... ] 86 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 87 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2D00
... ] 88 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 89 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2DC8
... ] 90 WARNING: table "domains_290906" without primary key or replica identity is nothing
Hi, thanks for the filter-tables
feature. I tried it today and it works fine, despite the fact, that it works like a blacklist. I have a ton of tables in my database, but I only want a couple of them to be streamed. Is there a way to somehow "inverse" that filter and make it work like a whitelist?
Why can I not use -o include_xids=1
or -o include_timestamp=1
as options.
When I try to use these options when running --start
i get the following error:
pg_recvlogical: unexpected termination of replication stream: ERROR: option "include_xids" = "1" is unknown
CONTEXT: slot "test_slot", output plugin "wal2json", in the startup callback
What am i doing wrong? and how to I show transaction ids and timestamps?
Hello ,
Irecently ran into an interesting situation with the plugin.
- cenario :
I made a ddl change in a relatively large table involves adding a new column with a default value, and a type; all in a single transaction . The table has a trigger added to it , but was disabled during the operation.
--
What I found out was that the wal2json plugin while decoding that part of the WAL output a lot of warning messages :
WARNING: column "idl_old_id_column_nanme" has an unchanged TOAST :
And eventually error out with the following error message:
ERROR: invalid memory alloc request size 1073741824
--
beyond this point , there was no decoding possible. Even after I manually provided LSNs and purged out all the data just before the complaining LSN value.
The above error message shows that it is trying to use more work_mem that it could be provided, and a few digging around points to a gradual reduction of the work_mem to prevent this; which didn't help either.
-- My question
cheers .
HI,
there is a contraindication to force read of "Toast Columns"?
Maybe there are performance problem?
/* XXX Unchanged TOAST Datum does not need to be output */
if (!isnull && typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
{
elog(WARNING, "column \"%s\" has an unchanged TOAST", NameStr(attr->attname));
continue;
}
Thanks!
ciao
p
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.