Coder Social home page Coder Social logo

Comments (11)

eulerto avatar eulerto commented on August 15, 2024

@qianl15 there are a few cases related to this message. Read about REPLICA IDENTITY. Postgres writes the information (old tuple) in the WAL that identifies the row that it is updating or deleting.

This message generally means:

a. REPLICA IDENTITY DEFAULT: your table does not contain a primary key. That's the common case.
b. REPLICA IDENTITY NOTHING: it does not store old row. Your setup is wrong and you should change it to DEFAULT (if the table has a primary key).

If it is case (a) and you don't want to add a primary key, you have 2 options:

  • REPLICA IDENTITY FULL: it stores all columns as old tuple, which can increase the amount of WAL to write.
  • REPLICA IDENTITY INDEX: it stores columns from an unique index as old tuple.

from wal2json.

qianl15 avatar qianl15 commented on August 15, 2024

Thanks @eulerto for your quick response! Just to clarify that my table contains a primary key and uses REPLICA IDENTITY DEFAULT. In my test, I insert 10,000 rows into that table, peek changes, export changes to another database, and finally advance the replication slot. I repeat this test process many times a day, but this empty identity issue only happens once every few days. More precisely, occasionally only one out of thousands of update records would have an empty identity field.

The table looks like this:

create table hello (
 name  text,
 greet_count  integer,
 primary key (name)
)

My workload simply updates the greet_count of the same row 10,000 times.

The peek slot command I used:

select lsn::text, xid::text, data from pg_logical_slot_peek_changes('myslot', NULL, 5000, 'format-version', '2', 'include-types', 'false', 'filter-tables', '*.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'include-lsn', 'true')

where I limit to peek up to 5000 changes each time.

Environment: RDS Postgres 16.1

from wal2json.

eulerto avatar eulerto commented on August 15, 2024

What's your Postgres version? Are you using the latest wal2json version? What's the SQL command to advance the replication slot?

Did you observe any pattern from the missing old tuples? Could you share a test case that we can reproduce the issue? (Even if the test case takes several hour to reproduce.)

from wal2json.

qianl15 avatar qianl15 commented on August 15, 2024

Postgres version: PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
wal2json version: 2.5 (pre-installed by RDS)

Advance slot command: SELECT * FROM pg_replication_slot_advance(myslot, <latest_nextlsn>)
I iterate through the records I got from pg_logical_slot_peek_changes and obtained latest_nextlsn from the "nextlsn" field in the last "C" record of the batch.

I didn't see any particular pattern as it appears to happen rarely at random intervals. I will try to reproduce on my end as well. The actual test depends on a larger internal system on our side, but here's the simplified description of it.

-- Create table
create table hello (name  text, greet_count  integer, primary key (name));

-- Create replication slot
select * from pg_create_logical_replication_slot('myslot', 'wal2json');

-- Repeat this command many times
INSERT INTO hello (name, greet_count) VALUES ('test', 1) ON CONFLICT (name) DO UPDATE SET greet_count = hello.greet_count + 1 RETURNING greet_count;

-- The following steps advance the replication slot
-- 1. Peak changes in batches

select lsn::text, xid::text, data from pg_logical_slot_peek_changes('myslot', NULL, 5000, 'format-version', '2', 'include-types', 'false', 'filter-tables', '*.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'include-lsn', 'true')

-- 2. Iterate through records from the above command and get the nextlsn from the last commit record

-- 3. Advance slot with the latest nextlsn
SELECT * FROM pg_replication_slot_advance(myslot, <latest_nextlsn>)

-- 4. Repeat steps 1~3 until no more records are read

from wal2json.

qianl15 avatar qianl15 commented on August 15, 2024

This only occurred once since I first reported, even though the test continuously runs every hour. Another observation: if I run pg_logical_slot_peek_changes again the data would become normal. It looks like a very rare and transient issue.

Is there any other traces I can provide to help figure out the issue?

from wal2json.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.