Coder Social home page Coder Social logo

pg_warp's Introduction

pg_warp Build Status

pg_warp makes it easy to logically replicate data from a source Postgres instance with the test_decoding logical decoding plugin to a target Postgres instance, including distributed Postgres running on Citus Cloud.

pg_warp first takes an initial backup using standard Postgres tooling, and then consistently replays any changes that were made after the consistent snapshot used for the initial backup.

Contrary to other logical replication solutions, this test_decoding-based approach has little overhead on the source database, and resembles the native logical replication feature in PostgreSQL 10.

The necessary test_decoding plugin has been part of PostgreSQL since version 9.4 and is considered stable and safe to use on production systems.

Through the use of replication slots and replication origins pg_warp is able to recover from crashes of the replication tool itself, by simply starting the command up again with the same source and destination instance.

Unlike some other solutions, pg_warp has no restrictions or issues with the use of complex data types, or large JSON, JSONB or hstore columns.

Usage

First, consider access restrictions and decide where you want to run pg_warp:

  • A) Run pg_warp on the destination system directly (best replication throughput, requires giving source credentials and firewall access to the destination system)
  • B) Run pg_warp on a temporary server inside the source VPC (easier to setup, lower replication throughput)

For both of these choices you will need to download the latest pg_warp binary and then run pg_warp as following:

pg_warp --source postgres://username:password@hostname:port/database --destination postgres://username:password@hostname:port/database

By default pg_warp will copy all tables that exist on the source, and assume that their schema has already been created on the destination.

It will also issue TRUNCATE for every of those tables on the destination system - use with care!

After the initial backup has completed pg_warp will automatically switch into ongoing replication mode. In case the process crashes or is stopped, just start it again with the same parameters.

Parallelism for the initial backup

Its recommended to add the following options for most real-world use cases:

  --parallel-dump=value
    	Number of parallel operations whilst dumping the initial sync data (default 1 = not parallel) (default 1)
  --parallel-restore=value
    	Number of parallel operations whilst restoring the initial sync data (default 1 = not parallel) (default 1)
  --tmp-dir=string
    	Directory where we store temporary files as part of parallel operations (this needs to be fast and have space for your full data set)

For parallel operations you need to specify a temporary directory that has enough space for your full data set (size means records only without indexes), and ideally resides on a fast disk. This directory is used to first download the data in parallel, and then restore in parallel to the destination system.

Limiting the tables that are replicated

It is often useful to limit the tables that should be copied and replicated. To do this pg_warp supports the same options as pg_dump:

  -t, --table=value
    	dump, restore and replicate the named table(s) only
  -T, --exclude-table=value
    	do NOT dump, restore and replicate the named table(s)

Note that you currently need to fully qualify the names (= include the schema), e.g. -t public.my_data will only copy and replicate the data in the my_data table in the public schema.

Required Postgres configuration on the source

The following configuration is required to enable logical decoding and replication slots:

wal_level = logical
max_replication_slots = 5 # has to be > 0
max_wal_senders = 5 # has to be > 0

Required Postgres configuration on the target

The following configuration is required to enable replication origins on the target:

max_replication_slots = 5 # has to be > 0

We also recommend considering the use of synchronous_commit = off to speed up writes and increase the replication throughput. This does not lead to the potential of data loss on behalf of pg_warp itself, although other connections might be affected.

Cleaning the replication slot (IMPORTANT!)

After you are finished using and/or testing pg_warp, run it once with the --clean option, with the source and destination URLs specified.

This will remove the replication origin on the destination, and more importantly, the replication slot on the source.

If you do not remove the replication slot on the source your WAL will grow forever until you run out of disk space! (running pg_warp consumes the WAL, so this only applies once you stop using pg_warp)

Things to be aware of

  • The initial backup has some I/O impact on the source instance due to the data that needs to be read - we recommend kicking off the process in a time period with lower activity, and to always test on a copy or staging system first
  • All tables that receive UPDATE/DELETE commands need to have a primary key set (otherwise these operations will be ignored for ongoing replication - INSERTs work fine without primary key)
  • Sequence values will not be updated on the destination (except for the initial sync), you'll need to update these manually before you insert something from your application into the destination system
  • DDL is currently not replicated (for example from schema migrations)

Initial Authors

License

Copyright 2017 Citus Data Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

pg_warp's People

Contributors

lfittl avatar nickelser avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.