mkabilov / logical_backup Goto Github PK

View Code? Open in Web Editor NEW

8.0 4.0 3.0 830 KB

[WIP] Continuous backup of the tables using PostgreSQL logical replication

License: MIT License

Go 98.47% SQLPL 0.37% Shell 1.16%

postgresql logical-replication golang backup

logical_backup's People

Contributors

Stargazers

Watchers

Forkers

alexeyklyukin zilder jkylin

logical_backup's Issues

Drop the temp logical replication slot immediately after creation

We need it to fetch the LSN of the transaction snapshot, nothing more; there is no need to accumulate WAL segments on the server.

Preserve table backup state between restarts

Right now the following parameters are not written to disk, resulting in potentially wrong decisions about backup and deltas made by the backup tool:

deltas since last backup, segments since last backup, lastWrittenMessage - since we don't preserve it, the backup might not be triggered timely after the restart.

flush LSN - without it we will write some changes coming from the slot after restart twice unless the slot has flush LSN == restart LSN.

It makes sense to write this information into the table state file during the segment switch and read it back at startup.

Panic during the test run

I still have the files from that run and looking into this.

------------- Backup  --------------
2019/01/09 18:02:41 No staging directory. Writing directly to the archive dir
2019/01/09 18:02:41 Archive directory: "/tmp/final"
2019/01/09 18:02:41 BackupThreshold: 100
2019/01/09 18:02:41 DeltasPerFile: 10000
2019/01/09 18:02:41 DB connection string: @/tmp:0/lb_test1 slot:"testslot" publication:"mypub"
2019/01/09 18:02:41 Backing up new tables: true
2019/01/09 18:02:41 Fsync: true
2019/01/09 18:02:41 Force new basebackup of a modified table after inactivity for: 24h0m0s
2019/01/09 18:02:41 Pg backend session PID: 65228
2019/01/09 18:02:41 created missing publication: "create publication \"mypub\" for all tables"
2019/01/09 18:02:41 Creating logical replication slot testslot
2019/01/09 18:02:41 Created missing replication slot "testslot", consistent point 0/102CDBA8
2019/01/09 18:02:41 Queueing tables for the initial backup
2019/01/09 18:02:41 Starting 5 background backupers
2019/01/09 18:02:41 background base backuper 4: backing up table "public"."test"
2019/01/09 18:02:41 Starting from 0/102CDBA7 lsn
2019/01/09 18:02:41 Starting base backup of "public"."test"
2019/01/09 18:02:41 table test seems to have no rows; skipping
2019/01/09 18:02:41 "public"."test" backed up in 0s; start lsn: 0/102CFB40, first delta lsn to keep: 0/102CDC48
2019/01/09 18:02:41 Purging segments in /tmp/final/e3/47/00/18403/deltas before the LSN 0/102CDC48
2019/01/09 18:02:42 flushed current segment to disk file /tmp/final/e3/47/00/18403/deltas/00000000102cdc48 and set flush LSN to 0/10355838
2019/01/09 18:02:42 advanced flush LSN to 0/10355838
2019/01/09 18:02:42 sending new status with 0/10355838 flush lsn (i:865 u:1644 d:825 b:0.39Mb)
2019/01/09 18:02:45 flushed current segment to disk file /tmp/final/e3/47/00/18403/deltas/0000000010355918 and set flush LSN to 0/103DCDE0
2019/01/09 18:02:45 advanced flush LSN to 0/103DCDE0
2019/01/09 18:02:45 sending new status with 0/103DCDE0 flush lsn (i:864 u:1646 d:823 b:0.39Mb)
2019/01/09 18:02:45 received WAL message with LSN 0/103DCE50 that is lower or equal to the flush LSN 0/103DCDE0, skipping
2019/01/09 18:02:45 received WAL message with LSN 0/103DCE50 that is lower or equal to the flush LSN 0/103DCDE0, skipping
transaction type: pgbench.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 2
duration: 5 s
number of transactions actually processed: 8217
latency average = 1.218 ms
tps = 1641.615640 (including connections establishing)
tps = 1642.764676 (excluding connections establishing)
-------------- Restore  --------------
2019/01/09 18:02:46 received shutdown request: replication terminated
2019/01/09 18:02:46 quiting background base backuper 4
2019/01/09 18:02:46 quiting background base backuper 3
2019/01/09 18:02:46 prometheus connection closed
2019/01/09 18:02:46 quiting background base backuper 2
2019/01/09 18:02:46 quiting background base backuper 0
2019/01/09 18:02:46 quiting background base backuper 1
2019/01/09 18:02:46 debug http server closed
2019/01/09 18:02:46 writing and archiving current segment to 00000000103dce50 due to the shutdown
2019/01/09 18:02:46 flushed current segment to disk file /tmp/final/e3/47/00/18403/deltas/00000000103dce50 and set flush LSN to 0/10415600
2019/01/09 18:02:46 dump file doesn't exist, skipping
2019/01/09 18:02:46 reading "/tmp/final/e3/47/00/18403/deltas/00000000102cdc48" segment file
2019/01/09 18:02:47 reading "/tmp/final/e3/47/00/18403/deltas/0000000010355918" segment file
2019/01/09 18:02:48 reading "/tmp/final/e3/47/00/18403/deltas/00000000103dce50" segment file
panic: final LSN(0/103DCE80) does not match begin LSN(0/103DCE50)

goroutine 1 [running]:
github.com/ikitiki/logical_backup/pkg/logicalrestore.(*LogicalRestore).applySegmentFile(0xc000517e28, 0xc00034fac0, 0x31, 0x0, 0x0)
	/Users/alexk/src/go/src/github.com/ikitiki/logical_backup/pkg/logicalrestore/logicalrestore.go:232 +0xfab
github.com/ikitiki/logical_backup/pkg/logicalrestore.(*LogicalRestore).applyDeltas(0xc000517e28, 0x0, 0x0)
	/Users/alexk/src/go/src/github.com/ikitiki/logical_backup/pkg/logicalrestore/logicalrestore.go:314 +0x347
github.com/ikitiki/logical_backup/pkg/logicalrestore.(*LogicalRestore).Restore(0xc0001bde28, 0x0, 0x0)
	/Users/alexk/src/go/src/github.com/ikitiki/logical_backup/pkg/logicalrestore/logicalrestore.go:422 +0x3ea
main.main()
	/Users/alexk/src/go/src/github.com/ikitiki/logical_backup/cmd/restore/main.go:61 +0x216
tables are different

Handling of errors propagated from the lower-levels

if an error happens in a function down the execution path, we show it from the top level, possibly appending error messages produced by intermediate calling. That is, if we have functions a(), b() and c() and the call sequence is a->b->c, and c errors out, then it is common to have in c:

if err != nil {
   return ftm.Errorf("c went wrong at OID %d: %v", oid, err)

and sometimes in b as well (other times it just passes the error to the upper level):

if err != nil {
  return fmt.Errrof("operation from b went wrong: %v", err)

and finally in a:

if err != nil {
   logger.WithError(err).Error("a failed")

resulting in:

"a failed", {"error": "operation from b went wrong: c went wrong at OID 1233: foo() system call failed"}

followed by a stack trace originating from a.

This approach doesn't preserve a structure in the error message (we have no separate OID field), produces ad-hoc stack traces in the error messages and obfuscates the actual stack trace, that will start with a and not with c.

I'd propose to change it, so that the error is reported right away when it happens (i.e. c() will call logger.WithError(err).WithOID(oid).Error("c went wrong"), as well as passed upstream, where the first function that does not abort after seeing the error would report the error condition and possible workaround by itself (i.e. a will report logger.WithError("operation X failed, will retry after 5 seconds") without including the original error. That way, we would avoid showing the same error twice, while at the same time inform the user about the error as early as possible (together with a precise location in the code), as well as make the user know how the error has affected the flow.

Objections?

temp tables

Looks like sometimes we're trying to backup temp tables:

2018/11/07 09:42:10 could not basebackup "public"."pg_temp_180104": could not lock table: could not lock the table: ERROR: relation "public.pg_temp_180104" does not exist (SQLSTATE 42P01)

Use memory buffer to write current deltas before flushing them

Right now we write each delta to a delta file right away. What we could do to improve the performance is to write those deltas to the memory buffer first, flushing them when we fill the file.

As a related note, we could also send the flush response back to the server once we write all deltas to the file, fsync and close it. By not doing fsync we risk losing some records on crash, since the server usually disposes of them after receiving the flush.

SIGSTOP signal handler

Currently SIGSTOP signal forces backup tool to finish, but it has different semantics. It supposed to pause process until subsequent SIGCONT signal. That can actually be useful for example in case when we have low disk space issue and want to pause subscriber, free up some space and then continue consumption.

could not map filenode

2018/11/05 15:59:56 replication failed: ERROR: could not map filenode "pg_tblspc/177058/PG_10_201707211/16384/150705377" to relation OID (SQLSTATE XX000)

Fsync deltas on disk before writing

When standby status update message contains a flush LSN we need to make sure the deltas before that LSN are actually fsync-ed on disk up to that LSN, otherwise we risk misinforming the primary and forcing it to recycle the data we still need (in case we would need to retry the attempt to write them, i.e. due to the backup server unexpected shutdown).

This doesn't necessary mean we need to fsync every delta or even every transaction. We could write the status update messages less often and only fsync the file once it is closed.

add travis ci

as a first approach it would be nice to run test/test.sh script in order to check if backup-restore is actually working

Avoid writing (and archiving) deltas we don't need

Right now the hander process happily writes all deltas received for the tables to backup and the archiver than moves that data to the final storage. That creates unnecessary writes in the case the replication slot is far behind the current WAL position on the primary. Consider the following case:

the latest transaction that touched table A has the final LSN of 10/42.
the up-to-date backup has been triggered after the certain amount of delta files. The table to backup is at the LSNs of 11/B0 (the logical slot we create in the backup transaction gives us this number) at the primary, so all changes before than LSN are already applied to the SQL dump we are going to take.
the next transaction fetched from the 'deltas' slot that touches table A has the final LSN 10/A00.

Right now we will happily write this transaction and all subsequent one, however, from 10/A00 until 11/B0 there are around 4GB of data with the changes for table A that we don't need to write or process, since the up-to-date backup of table A already includes the result of all those changes.

I'd propose to skip writing those changes in the first place and remove the changes already written instead of archiving them in the archiver. Together with the code that purges old deltas when the backup is ready that would give us a reasonable guarantee that we don't leave the old data hanging around for longer than necessary.

As a side note, when removing old deltas after the base backup is complete we need to purge both temp and final directories of all deltas before basebackupLSN (with the exception of the file being written to that should be handled by the logical backup worker instead of the base backup worker.).

Introduce structured logging

A long-standing task to replace the basic logger with something more sophisticated, in order to avoid constructing strings for errors, LSNs, OIDs, table names and many other fields common to the log messages manually. Among the candidates are logrus, zap, zerolog and others

[idea] make restore command act as a postgres server

while performing restore, instead of generating DML operation, act as a postgres logical replication publisher server. so that restore's destination database would connect to the restore tool as if it's a postgres server

Some explanation about project

Hi!

Continuous logical backup is interesting and IMHO perspective tendency. Could you give any detailed explanation what your utility does to estimate it?

Do Panic on errors with fsync() and possibly close()

We need to do this to avoid silently loosing data. See https://www.postgresql.org/message-id/flat/CAMsr%2BYHh%2B5Oq4xziwwoEfhoTZgr07vdGG%2Bhu%3D1adXx59aTeaoQ%40mail.gmail.com (the same reasons apply to the this application).

Export backup metrics to prometheus

"File is already closed" error on a delta

Saw this at least once, trying to reproduce again: backup stops, complaining "could not save delta: file has already closed" on a temporary delta file belonging to a rather lightweight table (20MB total) that receives just 10-20 updates a day with an interval from 1 to 5 hours between updates.

I will provide more details if I manage to reproduce that. I suspect that CloseOldFiles closed the file that receives active writes (since the time between updates can be 5h and the timeout is set to 3). I am running with some extra debug messages to reproduce.

diff --git a/pkg/tablebackup/tablebackup.go b/pkg/tablebackup/tablebackup.go
index a7fbbc1..07080ec 100644
--- a/pkg/tablebackup/tablebackup.go
+++ b/pkg/tablebackup/tablebackup.go
@@ -300,6 +300,8 @@ func (t *TableBackup) CloseOldFiles() error {
                return nil
        }

+       log.Printf("closing old file %s for table %s", t.currentDeltaFilename, t);
+
        return t.currentDeltaFp.Close()
 }