Coder Social home page Coder Social logo

aiven-open / pghoard Goto Github PK

View Code? Open in Web Editor NEW
1.3K 85.0 98.0 8.46 MB

PostgreSQL® backup and restore service

Home Page: http://aiven-open.github.io/pghoard/

License: Apache License 2.0

Makefile 0.26% Python 98.93% Go 0.76% Shell 0.05%
postgresql backup restore cloud-object-storage aiven

pghoard's Introduction

PGHoard BuildStatus

https://codecov.io/gh/aiven/pghoard/branch/main/graph/badge.svg?token=nLr7M7hvCx

pghoard is a PostgreSQL® backup daemon and restore tooling that stores backup data in cloud object stores.

Features:

  • Automatic periodic basebackups
  • Automatic transaction log (WAL/xlog) backups (using either pg_receivexlog, archive_command or experimental PG native replication protocol support with walreceiver)
  • Optional Standalone Hot Backup support
  • Cloud object storage support (AWS S3, Google Cloud, OpenStack Swift, Azure, Ceph)
  • Backup restoration directly from object storage, compressed and encrypted
  • Point-in-time-recovery (PITR)
  • Initialize a new standby from object storage backups, automatically configured as a replicating hot-standby

Fault-resilience and monitoring:

  • Persists over temporary object storage connectivity issues by retrying transfers
  • Verifies WAL file headers before upload (backup) and after download (restore), so that e.g. files recycled by PostgreSQL are ignored
  • Automatic history cleanup (backups and related WAL files older than N days)
  • "Archive sync" tool for detecting holes in WAL backup streams and fixing them
  • "Archive cleanup" tool for deleting obsolete WAL files from the archive
  • Keeps statistics updated in a file on disk (for monitoring tools)
  • Creates alert files on disk on problems (for monitoring tools)

Performance:

  • Parallel compression and encryption
  • WAL pre-fetching on restore

Overview

PostgreSQL Point In Time Replication (PITR) consists of a having a database basebackup and changes after that point go into WAL log files that can be replayed to get to the desired replication point.

PGHoard supports multiple operating models. The basic mode where you have a separate backup machine, pghoard can simply connect with pg_receivexlog to receive WAL files from the database as they're written. Another model is to use pghoard_postgres_command as a PostgreSQL archive_command. There is also experimental support for PGHoard to use PostgreSQL's native replication protocol with the experimental walreceiver mode.

With both modes of operations PGHoard creates periodic basebackups using pg_basebackup that is run against the database in question.

The PostgreSQL write-ahead log (WAL) and basebackups are compressed with Snappy (default), Zstandard (configurable, level 3 by default) or LZMA (configurable, level 0 by default) in order to ensure good compression speed and relatively small backup size. For performance critical applications it is recommended to test compression algorithms to find the most suitable trade-off for the particular use-case. E.g. Snappy is fast but yields larger compressed files, Zstandard (zstd) on the other hand offers a very wide range of compression/speed trade-off.

Optionally, PGHoard can encrypt backed up data at rest. Each individual file is encrypted and authenticated with file specific keys. The file specific keys are included in the backup in turn encrypted with a master RSA private/public key pair.

PGHoard supports backing up and restoring from either a local filesystem or from various object stores (AWS S3, Azure, Ceph, Google Cloud and OpenStack Swift.)

In case you just have a single database machine, it is heavily recommended to utilize one of the object storage services to allow backup recovery even if the host running PGHoard is incapacitated.

Requirements

PGHoard can backup and restore PostgreSQL versions 9.6 and above, but is only tested and actively developed with version 12 and above.

The daemon is implemented in Python and is tested and developed with version 3.10 and above. The following Python modules are required:

  • psycopg2 to look up transaction log metadata
  • requests for the internal client-server architecture

Optional requirements include:

  • azure for Microsoft Azure object storage (patched version required, see link)
  • botocore for AWS S3 (or Ceph-S3) object storage
  • google-api-client for Google Cloud object storage
  • cryptography for backup encryption and decryption (version 0.8 or newer required)
  • snappy for Snappy compression and decompression
  • zstandard for Zstandard (zstd) compression and decompression
  • systemd for systemd integration
  • swiftclient for OpenStack Swift object storage
  • paramiko for sftp object storage

Developing and testing PGHoard also requires the following utilities: flake8, pylint and pytest.

PGHoard has been developed and tested on modern Linux x86-64 systems, but should work on other platforms that provide the required modules.

Vagrant

The Vagrantfile can be used to setup a vagrant development environment. The vagrant environment has python 3.10, 3.11 and 3.12 virtual environments and installations of postgresql 12, 13, 14, 15 and 16.

By default vagrant up will start a Virtualbox environment. The Vagrantfile will also work for libvirt, just prefix VAGRANT_DEFAULT_PROVIDER=libvirt to the vagrant up command.

Any combination of Python (3.10, 3.11 and 3.12) and Postgresql (12, 13, 14, 15 and 16)

Bring up vagrant instance and connect via ssh:

vagrant up
vagrant ssh
vagrant@ubuntu2004:~$ cd /vagrant

Test with Python 3.11 and Postgresql 12:

vagrant@ubuntu2004:~$ source ~/venv3.11/bin/activate
vagrant@ubuntu2004:~$ PG_VERSION=12 make unittest
vagrant@ubuntu2004:~$ deactivate

Test with Python 3.12 and Postgresql 13:

vagrant@ubuntu2004:~$ source ~/venv3.12/bin/activate
vagrant@ubuntu2004:~$ PG_VERSION=13 make unittest
vagrant@ubuntu2004:~$ deactivate

And so on

Building

To build an installation package for your distribution, go to the root directory of a PGHoard Git checkout and run:

Debian:

make deb

This will produce a .deb package into the parent directory of the Git checkout.

Fedora:

make rpm

This will produce a .rpm package usually into rpm/RPMS/noarch/.

Python/Other:

python setup.py bdist_egg

This will produce an egg file into a dist directory within the same folder.

Installation

To install it run as root:

Debian:

dpkg -i ../pghoard*.deb

Fedora:

dnf install rpm/RPMS/noarch/*

On Linux systems it is recommended to simply run pghoard under systemd:

systemctl enable pghoard.service

and eventually after the setup section, you can just run:

systemctl start pghoard.service

Python/Other:

easy_install dist/pghoard-1.7.0-py3.6.egg

On systems without systemd it is recommended that you run pghoard under Supervisor or other similar process control system.

Setup

After this you need to create a suitable JSON configuration file for your installation.

  1. Make sure PostgreSQL is configured to allow WAL archival and retrieval. postgresql.conf should have wal_level set to archive or higher and max_wal_senders set to at least 1 (archive_command mode) or at least 2 (pg_receivexlog and walreceiver modes), for example:

    wal_level = archive
    max_wal_senders = 4
    

    Note that changing wal_level or max_wal_senders settings requires restarting PostgreSQL.

  2. Create a suitable PostgreSQL user account for pghoard:

    CREATE USER pghoard PASSWORD 'putyourpasswordhere' REPLICATION;
    
  3. Edit the local pg_hba.conf to allow access for the newly created account to the replication database from the primary and standby nodes. For example:

    # TYPE  DATABASE     USER     ADDRESS       METHOD
    host    replication  pghoard  127.0.0.1/32  md5
    

    After editing, please reload the configuration with either:

    SELECT pg_reload_conf();
    

    or by sending directly a SIGHUP to the PostgreSQL postmaster process.

  4. Fill in the created user account and primary/standby addresses into the configuration file pghoard.json to the section backup_sites.

  5. Fill in the possible object storage user credentials into the configuration file pghoard.json under section object_storage in case you wish pghoard to back up into the cloud.

  6. Now copy the same pghoard.json configuration to the standby node if there are any.

Other possible configuration settings are covered in more detail under the Configuration keys section of this README.

  1. If all has been set up correctly up to this point, pghoard should now be ready to be started.

Backing up your database

PostgreSQL backups consist of full database backups, basebackups, plus write ahead logs and related metadata, WAL. Both basebackups and WAL are required to create and restore a consistent database (does not apply for standalone hot backups).

To enable backups with PGHoard the pghoard daemon must be running locally. The daemon will periodically take full basebackups of the database files to the object store. Additionally, PGHoard and PostgreSQL must be set up correctly to archive the WAL. There are two ways to do this:

The default option is to use PostgreSQL's own WAL-archive mechanism with pghoard by running the pghoard daemon locally and entering the following configuration keys in postgresql.conf:

archive_mode = on
archive_command = pghoard_postgres_command --mode archive --site default --xlog %f

This instructs PostgreSQL to call the pghoard_postgres_command whenever a new WAL segment is ready. The command instructs PGHoard to store the segment in its object store.

The other option is to set up PGHoard to read the WAL stream directly from PostgreSQL. To do this archive_mode must be disabled in postgresql.conf and pghoard.json must set active_backup_mode to pg_receivexlog in the relevant site, for example:

{
    "backup_sites": {
        "default": {
            "active_backup_mode": "pg_receivexlog",
            ...
         },
     },
     ...
 }

Note that as explained in the Setup section, postgresql.conf setting wal_level must always be set to archive, hot_standby or logical and max_wal_senders must allow 2 connections from PGHoard, i.e. it should be set to 2 plus the number of streaming replicas, if any.

While pghoard is running it may be useful to read the JSON state file pghoard_state.json that exists where json_state_file_path points. The JSON state file is human readable and is meant to describe the current state of pghoard 's backup activities.

Standalone Hot Backup Support

Pghoard has the option to enable standalone hot backups.

To do this archive_mode must be disabled in postgresql.conf and pghoard.json must set active_backup_mode to standalone_hot_backup in the relevant site, for example:

{
    "backup_sites": {
        "default": {
            "active_backup_mode": "standalone_hot_backup",
            ...
         },
     },
     ...
 }

For more information refer to the postgresql documentation https://www.postgresql.org/docs/9.5/continuous-archiving.html#BACKUP-STANDALONE

Restoring databases

You can list your database basebackups by running:

pghoard_restore list-basebackups --config /var/lib/pghoard/pghoard.json

Basebackup                       Size  Start time            Metadata
-------------------------------  ----  --------------------  ------------
default/basebackup/2016-04-12_0  8 MB  2016-04-12T07:31:27Z  {'original-file-size': '48060928',
                                                              'start-wal-segment': '000000010000000000000012',
                                                              'compression-algorithm': 'snappy'}

If we'd want to restore to the latest point in time we could fetch the required basebackup by running:

pghoard_restore get-basebackup --config /var/lib/pghoard/pghoard.json \
    --target-dir /var/lib/pgsql/9.5/data --restore-to-primary

Basebackup complete.
You can start PostgreSQL by running pg_ctl -D foo start
On systemd based systems you can run systemctl start postgresql
On SYSV Init based systems you can run /etc/init.d/postgresql start

Note that the target-dir needs to be either an empty or non-existent directory in which case PGHoard will automatically create it.

After this we'd proceed to start both the PGHoard server process and the PostgreSQL server normally by running (on systemd based systems, assuming PostgreSQL 9.5 is used):

systemctl start pghoard
systemctl start postgresql-9.5

Which will make PostgreSQL start recovery process to the latest point in time. PGHoard must be running before you start up the PostgreSQL server. To see other possible restoration options please run:

pghoard_restore --help

Commands

Once correctly installed, there are six commands available:

pghoard is the main daemon process that should be run under a service manager, such as systemd or supervisord. It handles the backup of the configured sites.

pghoard_restore is a command line tool that can be used to restore a previous database backup from either pghoard itself or from one of the supported object stores. pghoard_restore can also configure recovery.conf to use pghoard_postgres_command as the WAL restore_command in recovery.conf.

pghoard_archive_cleanup can be used to clean up any orphan WAL files from the object store. After the configured number of basebackups has been exceeded (configuration key basebackup_count), pghoard deletes the oldest basebackup and all WAL associated with it. Transient object storage failures and other interruptions can cause the WAL deletion process to leave orphan WAL files behind, they can be deleted with this tool.

pghoard_archive_sync can be used to see if any local files should be archived but haven't been or if any of the archived files have unexpected content and need to be archived again. The other usecase it has is to determine if there are any gaps in the required files in the WAL archive from the current WAL file on to to the latest basebackup's first WAL file.

pghoard_create_keys can be used to generate and output encryption keys in the pghoard configuration format.

pghoard_postgres_command is a command line tool that can be used as PostgreSQL's archive_command or recovery_command. It communicates with pghoard 's locally running webserver to let it know there's a new file that needs to be compressed, encrypted and stored in an object store (in archive mode) or it's inverse (in restore mode.)

Configuration keys

active (default true)

Can be set on a per backup_site level to false to disable the taking of new backups and to stop the deletion of old ones.

active_backup_mode (default pg_receivexlog)

Can be either pg_receivexlog or archive_command. If set to pg_receivexlog, pghoard will start up a pg_receivexlog process to be run against the database server. If archive_command is set, we rely on the user setting the correct archive_command in postgresql.conf. You can also set this to the experimental walreceiver mode whereby pghoard will start communicating directly with PostgreSQL through the replication protocol. (Note requires an unreleased version of psycopg2 library)

alert_file_dir (default backup_location if set else os.getcwd())

Directory in which alert files for replication warning and failover are created.

backup_location (no default)

Place where pghoard will create its internal data structures for local state data and the actual backups. (if no object storage is used)

backup_sites (default {})

This object contains names and configurations for the different PostgreSQL clusters (here called sites) from which to take backups. The configuration keys for sites are listed below.

  • compression WAL/basebackup compression parameters
  • algorithm default "snappy" if available, otherwise "lzma" or "zstd"
  • level default "0" compression level for "lzma" or "zstd" compression
  • thread_count (default max(cpu_count, 5)) number of parallel compression threads

hash_algorithm (default "sha1")

The hash algorithm used for calculating checksums for WAL or other files. Must be one of the algorithms supported by Python's hashlib.

http_address (default "127.0.0.1")

Address to bind the PGHoard HTTP server to. Set to an empty string to listen to all available IPv4 addresses. Set it to the IPv6 :: wildcard address to bind to all available IPv4 and IPv6 addresses.

http_port (default 16000)

HTTP webserver port. Used for the archive command and for fetching of basebackups/WAL's when restoring if not using an object store.

json_state_file_path (default "/var/lib/pghoard/pghoard_state.json")

Location of a JSON state file which describes the state of the pghoard process.

log_level (default "INFO")

Determines log level of pghoard.

maintenance_mode_file (default "/var/lib/pghoard/maintenance_mode_file")

If a file exists in this location, no new backup actions will be started.

pg_receivexlog

When active backup mode is set to "pg_receivexlog" this object may optionally specify additional configuration options. The currently available options are all related to monitoring disk space availability and optionally pausing xlog/WAL receiving when disk space goes below configured threshold. This is useful when PGHoard is configured to create its temporary files on a different volume than where the main PostgreSQL data directory resides. By default this logic is disabled and the minimum free bytes must be configured to enable it.

pg_receivexlog.disk_space_check_interval (default 10)

How often to check available disk space.

pg_receivexlog.min_disk_free_bytes (default undefined)

Minimum bytes (as an integer) that must be available in order to keep on receiving xlogs/WAL from PostgreSQL. If available disk space goes below this limit a STOP signal is sent to the pg_receivexlog / pg_receivewal application.

pg_receivexlog.resume_multiplier (default 1.5)

Number of times the min_disk_free_bytes bytes of disk space that is required to start receiving xlog/WAL again (i.e. send the CONT signal to the pg_receivexlog / pg_receivewal process). Multiplier above 1 should be used to avoid stopping and continuing the process constantly.

restore_prefetch (default transfer.thread_count)

Number of files to prefetch when performing archive recovery. The default is the number of Transfer Agent threads to try to utilize them all.

statsd (default: disabled)

Enables metrics sending to a statsd daemon that supports Telegraf or DataDog syntax with tags.

The value is a JSON object:

{
    "host": "<statsd address>",
    "port": <statsd port>,
    "format": "<statsd message format>",
    "tags": {
        "<tag>": "<value>"
    }
}

format (default: "telegraf")

Determines statsd message format. Following formats are supported:

The tags setting can be used to enter optional tag values for the metrics.

pushgateway (default: disabled)

Enables metrics sending to a Prometheus Pushgateway with tags.

The value is a JSON object:

{
    "endpoint": "<pushgateway address>",
    "tags": {
        "<tag>": "<value>"
    }
}

The tags setting can be used to enter optional tag values for the metrics.

prometheus (default: disabled)

Expose metrics through a Prometheus endpoint.

The value is a JSON object:

{
    "tags": {
        "<tag>": "<value>"
    }
}

The tags setting can be used to enter optional tag values for the metrics.

syslog (default false)

Determines whether syslog logging should be turned on or not.

syslog_address (default "/dev/log")

Determines syslog address to use in logging (requires syslog to be true as well)

syslog_facility (default "local2")

Determines syslog log facility. (requires syslog to be true as well)

  • transfer WAL/basebackup transfer parameters
  • thread_count (default max(cpu_count, 5)) number of parallel uploads/downloads

upload_retries_warning_limit (default 3)

After this many failed upload attempts for a single file, create an alert file.

tar_executable (default "pghoard_gnutaremu")

The tar command to use for restoring basebackups. This must be GNU tar because some advanced switches like --transform are needed. If this value is not defined (or is explicitly set to "pghoard_gnutaremu"), Python's internal tarfile implementation is used. The Python implementation is somewhat slower than the actual tar command and in environments with fast disk IO (compared to available CPU capacity) it is recommended to set this to "tar".

Backup site configuration

The following options control the behavior of each backup site. A backup site means an individual PostgreSQL installation ("cluster" in PostgreSQL terminology) from which to take backups.

basebackup_age_days_max (default undefined)

Maximum age for basebackups. Basebackups older than this will be removed. By default this value is not defined and basebackups are deleted based on total count instead.

basebackup_chunks_in_progress (default 5)

How many basebackup chunks can there be simultaneously on disk while it is being taken. For chunk size configuration see basebackup_chunk_size.

basebackup_chunk_size (default 2147483648)

In how large backup chunks to take a local-tar basebackup. Disk space needed for a successful backup is this variable multiplied by basebackup_chunks_in_progress.

basebackup_compression_threads (default 0)

Number of threads to use within compression library during basebackup. Only applicable when using compression library that supports internal multithreading, namely zstd at the moment. Default value 0 means not to use multithreading.

basebackup_count (default 2)

How many basebackups should be kept around for restoration purposes. The more there are the more diskspace will be used. If basebackup_max_age is defined this controls the maximum number of basebackups to keep; if backup interval is less than 24 hour or extra backups are created there can be more than one basebackup per day and it is often desirable to set basebackup_count to something slightly higher than the max age in days.

basebackup_count_min (default 2)

Minimum number of basebackups to keep. This is only effective when basebackup_age_days_max has been defined. If for example the server is powered off and then back on a month later, all existing backups would be very old. However, in that case it is usually not desirable to immediately delete all old backups. This setting allows specifying a minimum number of backups that should always be preserved regardless of their age.

basebackup_hour (default undefined)

The hour of day during which to start new basebackup. If backup interval is less than 24 hours this is the base hour used to calculate the hours at which backup should be taken. E.g. if backup interval is 6 hours and this value is set to 1 backups will be taken at hours 1, 7, 13 and 19. This value is only effective if also basebackup_interval_hours and basebackup_minute are set.

basebackup_interval_hours (default 24)

How often to take a new basebackup of a cluster. The shorter the interval, the faster your recovery will be, but the more CPU/IO usage is required from the servers it takes the basebackup from. If set to a null value basebackups are not automatically taken at all.

basebackup_minute (default undefined)

The minute of hour during which to start new basebackup. This value is only effective if also basebackup_interval_hours and basebackup_hour are set.

basebackup_mode (default "basic")

The way basebackups should be created. The default mode, basic runs pg_basebackup and waits for it to write an uncompressed tar file on the disk before compressing and optionally encrypting it. The alternative mode pipe pipes the data directly from pg_basebackup to PGHoard's compression and encryption processing reducing the amount of temporary disk space that's required.

Neither basic nor pipe modes support multiple tablespaces.

Setting basebackup_mode to local-tar avoids using pg_basebackup entirely when pghoard is running on the same host as the database. PGHoard reads the files directly from $PGDATA in this mode and compresses and optionally encrypts them. This mode allows backing up user tablespaces.

When using delta mode, only changed files are uploaded into the storage. On every backup snapshot of the data files is taken, this results in a manifest file, describing the hashes of all the files needed to be backed up. New hashes are uploaded to the storage and used together with complementary manifest from control file for restoration. In order to properly assess the efficiency of delta mode in comparison with local-tar, one can use local-tar-delta-stats mode, which behaves the same as local-tar, but also collects the metrics as if it was delta mode. It can help in decision making of switching to delta mode.

basebackup_threads (default 1)

How many threads to use for tar, compress and encrypt tasks. Only applies for local-tar basebackup mode. Only values 1 and 2 are likely to be sensible for this, with higher thread count speed improvement is negligible and CPU time is lost switching between threads.

encryption_key_id (no default)

Specifies the encryption key used when storing encrypted backups. If this configuration directive is specified, you must also define the public key for storing as well as private key for retrieving stored backups. These keys are specified with encryption_keys dictionary.

encryption_keys (no default)

This key is a mapping from key id to keys. Keys in turn are mapping from public and private to PEM encoded RSA public and private keys respectively. Public key needs to be specified for storing backups. Private key needs to be in place for restoring encrypted backups.

You can use pghoard_create_keys to generate and output encryption keys in the pghoard configuration format.

object_storage (no default)

Configured in backup_sites under a specific site. If set, it must be an object describing a remote object storage. The object must contain a key storage_type describing the type of the store, other keys and values are specific to the storage type.

proxy_info (no default)

Dictionary specifying proxy information. The dictionary must contain keys type, host and port. Type can be either socks5 or http. Optionally, user and pass can be specified for proxy authentication. Supported by Azure, Google and S3 drivers.

The following object storage types are supported:

  • local makes backups to a local directory, see pghoard-local-minimal.json for example. Required keys:
  • directory for the path to the backup target (local) storage directory
  • sftp makes backups to a sftp server, required keys:
  • server
  • port
  • username
  • password or private_key
  • google for Google Cloud Storage, required configuration keys:
  • project_id containing the Google Storage project identifier
  • bucket_name bucket where you want to store the files
  • credential_file for the path to the Google JSON credential file
  • s3 for Amazon Web Services S3, required configuration keys:
  • aws_access_key_id for the AWS access key id
  • aws_secret_access_key for the AWS secret access key
  • region S3 region of the bucket
  • bucket_name name of the S3 bucket

Optional keys for Amazon Web Services S3:

  • encrypted if True, use server-side encryption. Default is False.
  • s3 for other S3 compatible services such as Ceph, required configuration keys:
  • aws_access_key_id for the AWS access key id
  • aws_secret_access_key for the AWS secret access key
  • bucket_name name of the S3 bucket
  • host for overriding host for non AWS-S3 implementations
  • port for overriding port for non AWS-S3 implementations
  • is_secure for overriding the requirement for https for non AWS-S3
  • is_verify_tls for configuring tls verify for non AWS-S3 implementations
  • azure for Microsoft Azure Storage, required configuration keys:
  • account_name for the name of the Azure Storage account
  • account_key for the secret key of the Azure Storage account
  • bucket_name for the name of Azure Storage container used to store objects
  • azure_cloud Azure cloud selector, "public" (default) or "germany"
  • swift for OpenStack Swift, required configuration keys:
  • user for the Swift user ('subuser' in Ceph RadosGW)
  • key for the Swift secret_key
  • auth_url for Swift authentication URL
  • container_name name of the data container
  • Optional configuration keys for Swift:
  • auth_version - 2.0 (default) or 3.0 for keystone, use 1.0 with Ceph Rados GW.
  • segment_size - defaults to 1024**3 (1 gigabyte). Objects larger than this will be split into multiple segments on upload. Many Swift installations require large files (usually 5 gigabytes) to be segmented.
  • tenant_name
  • region_name
  • user_id - for auth_version 3.0
  • user_domain_id - for auth_version 3.0
  • user_domain_name - for auth_version 3.0
  • tenant_id - for auth_version 3.0
  • project_id - for auth_version 3.0
  • project_name - for auth_version 3.0
  • project_domain_id - for auth_version 3.0
  • project_domain_name - for auth_version 3.0
  • service_type - for auth_version 3.0
  • endpoint_type - for auth_version 3.0

nodes (no default)

Array of one or more nodes from which the backups are taken. A node can be described as an object of libpq key: value connection info pairs or libpq connection string or a postgres:// connection uri. If for example you'd like to use a streaming replication slot use the syntax {... "slot": "slotname"}.

pg_bin_directory (default: find binaries from well-known directories)

Site-specific option for finding pg_basebackup and pg_receivexlog commands matching the given backup site's PostgreSQL version. If a value is not supplied PGHoard will attempt to find matching binaries from various well-known locations. In case pg_data_directory is set and points to a valid data directory the lookup is restricted to the version contained in the given data directory.

pg_data_directory (no default)

This is used when the local-tar basebackup_mode is used. The data directory must point to PostgreSQL's $PGDATA and must be readable by the pghoard daemon.

prefix (default: site name)

Path prefix to use for all backups related to this site. Defaults to the name of the site.

Alert files

Alert files are created whenever an error condition that requires human intervention to solve. You're recommended to add checks for the existence of these files to your alerting system.

authentication_error

There has been a problem in the authentication of at least one of the PostgreSQL connections. This usually denotes a wrong username and/or password.

configuration_error

There has been a problem in the authentication of at least one of the PostgreSQL connections. This usually denotes a missing pg_hba.conf entry or incompatible settings in postgresql.conf.

upload_retries_warning

Upload of a file has failed more times than upload_retries_warning_limit. Needs human intervention to figure out why and to delete the alert once the situation has been fixed.

version_mismatch_error

Your local PostgreSQL client versions of pg_basebackup or pg_receivexlog do not match with the servers PostgreSQL version. You need to update them to be on the same version level.

version_unsupported_error

Server PostgreSQL version is not supported.

License

PGHoard is licensed under the Apache License, Version 2.0. Full license text is available in the LICENSE file and at http://www.apache.org/licenses/LICENSE-2.0.txt

Credits

PGHoard was created by Hannu Valtonen <[email protected]> for Aiven and is now maintained by Aiven developers <[email protected]>.

Recent contributors are listed on the GitHub project page, https://github.com/aiven/pghoard/graphs/contributors

Contact

Bug reports and patches are very welcome, please post them as GitHub issues and pull requests at https://github.com/aiven/pghoard . Any possible vulnerabilities or other serious issues should be reported directly to the maintainers <[email protected]>.

Trademarks

Postgres, PostgreSQL and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.

Telegraf, Vagrant and Datadog are trademarks and property of their respective owners. All product and service names used in this website are for identification purposes only and do not imply endorsement.

Copyright

Copyright (C) 2015 Aiven Ltd

pghoard's People

Contributors

aiqin-aiven avatar alanfranz avatar alexole avatar alxric avatar carobme avatar cryptobioz avatar egor-voynov-aiven avatar facetoe avatar fingon avatar hnousiainen avatar jankatins avatar jason-adnuntius avatar jlprat avatar kathia-barahona avatar kmichel-aiven avatar lionbee avatar melor avatar mwfrojdman avatar ngoue avatar nicois avatar oikarinen avatar ojarva avatar ormod avatar packi avatar pellcorp avatar rdunklau avatar rikonen avatar saaros avatar songnon avatar willrouesnel-aivenio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pghoard's Issues

Restore from remote server possible?

I have three machines, PROD, STAGE, BACKUP. BACKUP is docker container that runs pghoard. BACKUP uses a replica slot to backup PROD. Before a deployment I use pghoard_restore to setup STAGE with the most recent data from PROD (without any additional load on PROD).

I upgraded BACKUP last night now STAGE can't perform a restore. It seems like without a shared filesystem there isn't a way to get the xlogs that STAGE needs to make the base backup consistent from the the object store. The pghoard_postgres_command that is set in recovery.conf restores the necessary files to BACKUP (not STAGE where they are needed).

Could you provide some information about the vision of how recovery is supposed to work? Is my use case wildly different than what you imagined? Am I supposed modify the config so that I can safely run it locally on STAGE then interact with that instance of pghoard?

pg_xlog/RECOVERYHISTORY": Permission denied error

I think this isn't a pghoard issue but if someone have an idea what's the reason or how solve it I would appreciate.

When I try to restore the backup I for the error pg_xlog/RECOVERYHISTORY": Permission denied I created a StackOverflow question with all the details about it.

Thanks

Using REST-api directly for Docker deployment?

I am looking into running PGHoard as a Docker-container on the host that runs "native" Postgres. I would like to avoid installing all of PGHoards dependencies on the host and rather isolate them to the container.

  • Do you have any recommendations regarding Docker-deployments of PGHoard?
  • It seems to me that the PGHoard client used in archive- and recovery-commands are just a http-client calling the PGHoard-backend. Would it be possible to use a "simpler" http-client to access the PGHoard-backend directly, i.e. without the dependencies)? Is the REST-api documented somewhere in that case?

Azure upload auth error

pghoard fails to upload backups to azure blob storage with an authentication error:

WARNING Problem in moving file: '/var/lib/pghoard/test/xlog/000000010000008A000000EC', need to ret                                        ry (AzureHttpError: Server failed to authenticate the request. Make sure the value of Authorization header is formed                                         correctly including the signature.
<?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate                                         the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:52014598-0001-00ed-7ffd-4bce47000000
Time:2016-12-01T18:07:30.7600412Z</Message><AuthenticationErrorDetail>The MAC signature found in the HTTP request 'Jg                                        nEloUqwfswtRL5ebtxSI29VJx4+8Ztfd2Ma2co7s0=' is not the same as any computed signature. Server used following string t                                        o sign: 'PUT


6132250

application/x-www-form-urlencoded






x-ms-blob-type:BlockBlob
x-ms-client-request-id:069097ae-b7f1-11e6-9bee-52540000bf1a
x-ms-date:Thu, 01 Dec 2016 18:07:30 GMT
x-ms-meta-compression_algorithm:snappy
x-ms-meta-compression_level:0
x-ms-meta-original_file_size:16777216
x-ms-meta-pg_version:90501
x-ms-version:2015-07-08
/azureaccount/ptest-backup/test/xlog/000000010000008A000000EC'.</AuthenticationErrorDetail></Err                                        or>)

The system is ubuntu-14, pghoard installed from repo's master (azure-storage 0.33) and is started with following configuration:

{
        "backup_location": "/var/lib/pghoard",
        "backup_sites": {
                "test": {
                        "nodes": [
                        {
                                "host": "127.0.0.1",
                                "password": "secret_password",
                                "port": 5432,
                                "user": "pghoard"
                        }
                        ],
                        "object_storage": {
                                "storage_type": "azure",
                                "account_name": "azureaccount",
                                "account_key": "mybigkey11==",
                                "bucket_name": "ptest-backup"
                        }
                }
        }
}

statsd description

According to README statsd param has following format:

{
    "host": "<statsd address>",
    "port": "<statsd port>",
    "tags": {
        "<tag>": "<value>"
    }
}

but when I try to use it, I am getting:

TypeError: an integer is required (got type str)

Got it working using following config for port: "port": <statsd port>,

Multiple folder

Hi,

I am taking basebackup using **pghoard --config pghoard.json **. I am getting multiple folders like shown below. What is it that is causing backup to be created in multiple folders in that same day.

drwxr-xr-x. 2 root root 21 Feb 9 09:17 2017-02-09_0
drwxr-xr-x. 2 root root 21 Feb 9 09:18 2017-02-09_1
drwxr-xr-x. 2 root root 21 Feb 9 09:28 2017-02-09_10
drwxr-xr-x. 2 root root 21 Feb 9 09:29 2017-02-09_11
drwxr-xr-x. 2 root root 21 Feb 9 09:30 2017-02-09_12
drwxr-xr-x. 2 root root 21 Feb 9 09:31 2017-02-09_13
drwxr-xr-x. 2 root root 21 Feb 9 09:32 2017-02-09_14
drwxr-xr-x. 2 root root 21 Feb 9 09:33 2017-02-09_15
drwxr-xr-x. 2 root root 21 Feb 9 09:34 2017-02-09_16
drwxr-xr-x. 2 root root 21 Feb 9 09:35 2017-02-09_17
drwxr-xr-x. 2 root root 21 Feb 9 09:36 2017-02-09_18
drwxr-xr-x. 2 root root 21 Feb 9 09:37 2017-02-09_19
drwxr-xr-x. 2 root root 21 Feb 9 09:19 2017-02-09_2
drwxr-xr-x. 2 root root 21 Feb 9 09:20 2017-02-09_3
drwxr-xr-x. 2 root root 21 Feb 9 09:21 2017-02-09_4
drwxr-xr-x. 2 root root 21 Feb 9 09:21 2017-02-09_5
drwxr-xr-x. 2 root root 21 Feb 9 09:23 2017-02-09_6
drwxr-xr-x. 2 root root 21 Feb 9 09:23 2017-02-09_7
drwxr-xr-x. 2 root root 21 Feb 9 09:25 2017-02-09_8
drwxr-xr-x. 2 root root 21 Feb 9 09:26 2017-02-09_9
[root@localhost basebackup_incoming]# cd 2017-02-09_19

Also, when I execute pghoard_restore list-basebackups --config pghoard.json I do not see the backup.

Appreciate your reply.

Thanks,
JoSo.

pg_receivexlog thread does not recover gracefully to missing log segment

Log from pghoard

[root@a020aa8b0909 ~]# pghoard /tmp/pghoard.json
2016-01-18 19:16:10,026 pghoard MainThread DEBUG Loading JSON config from: '/tmp/pghoard.json', signal: None, frame: None
2016-01-18 19:16:10,027 pghoard MainThread INFO pghoard initialized, own_hostname: 'a020aa8b0909', cwd: '/root'
2016-01-18 19:16:10,540 PGReceiveXLog Thread-13 INFO Started: ['/usr/pgsql-9.4/bin/pg_receivexlog', '--dbname', "dbname='replication' host='REDACTED-HOSTNAME port='5432' replication='true' user='pghoard'", '--status-interval', '1', '--verbose', '--directory', '/tmp/REDACTED-HOSTNAME/xlog_incoming'], running as PID: 231

Log from strace:
write(2, "pg_receivexlog: unexpected termination of replication stream: ERROR: requested WAL segment 000000010000002D000000B7 has already been removed\n", 142) = 142
rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
write(3, "\27\3\3\0\35u\225#\217\256\337\330o[\307\350\247n\221\240@\351\320\245E\266\263\240P\324 \331\6C", 34) = 34
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
write(3, "\25\3\3\0\32u\225#\217\256\337\330p\7[\241\322hqhc\36\203/*\320xQT=G", 31) = 31
rt_sigpending([]) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
close(3) = 0
write(2, "pg_receivexlog: disconnected; waiting 5 seconds to try again\n", 61) = 61

Crash on .partial log files after recovery

How to reproduce
Running a primary and replica with pghoard configured as walreceiver, terminate the primary, then promote the replica. In some cases where the state of logs was unclear when the primary goes down this will create a .partial file in the pg_xlog directory.

I tested the terminate-and-promote-to-master a few times on test databases and didn't see this happen, however we encountered it in production on a large, high traffic database which may have had something to do with the state of the xlog at the time.

Expected Outcome
The newly-promoted primary continues to back up xlog files.

Actual Outcome
Backups begin failing because pghoard doesn't know what to do with the .partial file.

Here is log output from when this happened:

127.0.0.1 - - [23/Mar/2017 22:40:18] "PUT /default/archive/00000001000000AC00000084.partial HTTP/1.1" 400 -
2017-03-23 22:40:19,616	WebServer	Thread-23	ERROR	HttpResponse 400: Unrecognized file '00000001000000AC00000084.partial' for archiving
/usr/local/bin/pghoard_postgres_command: ERROR: Archival failed with HTTP status 400
LOG:  archive command failed with exit code 3
DETAIL:  The failed archive command was: pghoard_postgres_command --mode archive --site default --xlog 00000001000000AC00000084.partial

This results in a pileup of xlog files and none being archived until the issue is resolved.

I was able to manually intervene and remove the pg_xlog/archive_status/00000001000000AC00000084.partial.ready file, causing postgres to stop trying to archive that file, but ideally pghoard would handle this better.

Thanks!

Setting pg_data_directory should not be mandatory

If I understand correctly, pg_data_directory is only used when the local-tar basebackup_mode is used.
Currently, setting pg_data_directory is mandatory and a PG_VERSION must exist in it.
I think this should not be a requirement as it makes no sense when using basic or pipe basebackup_mode.

Google cloud storage does not work

Distro: Ubuntu 16

chris@pghoard-test-ubuntu# sudo pghoard --config pghoard.json
2016-06-14 22:17:10,162 pghoard MainThread DEBUG Loading JSON config from: 'pghoard.json', signal: None
2016-06-14 22:17:10,326 pghoard MainThread INFO pghoard initialized, own_hostname: 'pghoard-test-ubuntu', cwd: '/home/chris'
2016-06-14 22:17:10,339 pghoard MainThread WARNING OperationalError (fe_sendauth: no password supplied
) connecting to DB at: "dbname='replication' host='x.x.x.x' port='5432' replication='true' user='pghoard'"
2016-06-14 22:17:10,339 pghoard.common MainThread WARNING Creating alert file: '/var/lib/pghoard/configuration_error'
2016-06-14 22:17:10,343 root MainThread WARNING No module named 'oauth2client.locked_file'
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/googleapiclient/discovery_cache/init.py", line 33, in autodetect
from google.appengine.api import memcache
ImportError: No module named 'google.appengine'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/googleapiclient/discovery_cache/init.py", line 38, in autodetect
from . import file_cache
File "/usr/lib/python3/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 32, in
from oauth2client.locked_file import LockedFile
ImportError: No module named 'oauth2client.locked_file'
2016-06-14 22:17:10,545 pghoard MainThread ERROR Unexpected exception in PGHoard main loop
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pghoard/rohmu/object_storage/google.py", line 227, in get_or_create_bucket
gs_buckets.get(bucket=bucket_name).execute()
File "/usr/lib/python3/dist-packages/oauth2client/util.py", line 137, in positional_wrapper
return wrapped(_args, *_kwargs)
File "/usr/lib/python3/dist-packages/googleapiclient/http.py", line 729, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/storage/v1/b/xxxxx-pghoard-test?alt=json returned "Forbidden">

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pghoard/pghoard.py", line 405, in run
self.handle_site(site, site_config)
File "/usr/lib/python3/dist-packages/pghoard/pghoard.py", line 359, in handle_site
self.time_of_last_backup[site] = self.check_backup_count_and_state(site)
File "/usr/lib/python3/dist-packages/pghoard/pghoard.py", line 257, in check_backup_count_and_state
basebackups = self.get_remote_basebackups_info(site)
File "/usr/lib/python3/dist-packages/pghoard/pghoard.py", line 242, in get_remote_basebackups_info
storage = get_transfer(storage_config)
File "/usr/lib/python3/dist-packages/pghoard/rohmu/init.py", line 40, in get_transfer
return storage_class(**storage_config)
File "/usr/lib/python3/dist-packages/pghoard/rohmu/object_storage/google.py", line 83, in init
self.bucket_name = self.get_or_create_bucket(bucket_name)
File "/usr/lib/python3/dist-packages/pghoard/rohmu/object_storage/google.py", line 233, in get_or_create_bucket
raise InvalidConfigurationError("Bucket {0!r} exists but isn't accessible".format(bucket_name))
pghoard.rohmu.errors.InvalidConfigurationError: Bucket 'xxxxx-pghoard-test' exists but isn't accessible
^C2016-06-14 22:17:11,303 pghoard MainThread WARNING Quitting, signal: 2
^C2016-06-14 22:17:11,833 pghoard MainThread WARNING Quitting, signal: 2
^C2016-06-14 22:17:12,330 pghoard MainThread WARNING Quitting, signal: 2
^C2016-06-14 22:17:12,608 pghoard MainThread WARNING Quitting, signal: 2
^C2016-06-14 22:17:13,257 pghoard MainThread WARNING Quitting, signal: 2
^C2016-06-14 22:17:13,438 pghoard MainThread WARNING Quitting, signal: 2

Restoring xlogs from S3

I have periodic job that restores db from S3 and applies xlogs to restored database, but sometimes pghoard is unable to restore xlogs from S3, it gets stuck.

127.0.0.1 - - [14/Jun/2016 14:59:46] "GET /foo/archive/000000010000004F0000005C HTTP/1.1" 201 -
127.0.0.1 - - [14/Jun/2016 14:59:47] "GET /foo/archive/000000010000004F0000005D HTTP/1.1" 201 -
127.0.0.1 - - [14/Jun/2016 14:59:47] "GET /foo/archive/000000010000004F0000005E HTTP/1.1" 201 -
127.0.0.1 - - [14/Jun/2016 14:59:47] "GET /foo/archive/000000010000004F0000005F HTTP/1.1" 201 -
127.0.0.1 - - [14/Jun/2016 14:59:47] "GET /foo/archive/000000010000004F00000060 HTTP/1.1" 201 -
2016-06-14 14:59:47,844 TransferAgent   Thread-12       INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000065', size: 2026516, took 0.131s
2016-06-14 14:59:48,019 TransferAgent   Thread-9        INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000062', size: 2391068, took 0.308s
2016-06-14 14:59:48,026 TransferAgent   Thread-10       INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000063', size: 2682295, took 0.314s
2016-06-14 15:16:54,313 pghoard MainThread      DEBUG   Loading JSON config from: '/data/pghoard/conf/pghoard.json', signal: None
2016-06-14 15:16:54,367 pghoard MainThread      INFO    pghoard initialized, own_hostname: 'ip-10-0-4-74', cwd: '/'
2016-06-14 15:16:59,346 TransferAgent   Thread-11       INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000065', size: 2026516, took 0.261s
2016-06-14 15:16:59,546 TransferAgent   Thread-10       INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000064', size: 2261309, took 0.461s
2016-06-14 15:16:59,652 TransferAgent   Thread-9        INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000063', size: 2682295, took 0.567s
2016-06-14 15:16:59,679 TransferAgent   Thread-8        INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000062', size: 2391068, took 0.595s
2016-06-14 15:17:00,135 TransferAgent   Thread-12       INFO    'DOWNLOAD' transfer of key: 'foo/xlog/000000010000004F00000061', size: 2370847, took 1.050s

I have to restart pghoard manually to continue.
I am using git revision ad4bb89

Failing to check age of old backup in local storage

We are seeing an ERROR in PGHoard when it tries to check the age of an old backup existing i local storage:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/pghoard/pghoard.py", line 448, in run
    self.handle_site(site, site_config)
  File "/usr/local/lib/python3.4/dist-packages/pghoard/pghoard.py", line 432, in handle_site
    delta_since_last_backup = datetime.datetime.now(datetime.timezone.utc) - self.time_of_last_backup[site]
TypeError: can't subtract offset-naive and offset-aware datetimes

The backup metadata contains:

cat pghoard/backup/backupsite1/basebackup/2017-01-17_0.metadata
{"original-file-size": "27914240", "compression-level": "0", "pg-version": "90409", "start-wal-segment": "000000010000000000000004", "compression-algorithm": "snappy", "start-time": "2017-01-17T15:05:48"}%

Looking at the code, it seems to be comparing a timestamp with timezone to one without, and thus failing. The metadata does not contain timezone information, so I have trouble understanding how that would ever work?

Could it be a problem with our setup of PGHoard?

We are using Python version 3.4.2

Tests fail on make deb

After finally getting the requirements installed (debian doesn't have the required package versions). Build fails with unable to import unit tests.

chris@pghoard-test-ubuntu:~/pghoard$ make deb
cp debian/changelog.in debian/changelog
dch -v 1.3.0-37-g2a16c96 --distribution unstable "Automatically built .deb"
dch warning: Previous package version was Debian native whilst new version is not
dpkg-buildpackage -A -uc -us
dpkg-buildpackage: source package pghoard
dpkg-buildpackage: source version 1.3.0-37-g2a16c96
dpkg-buildpackage: source distribution unstable
dpkg-buildpackage: source changed by Oskari Saarenmaa [email protected]
dpkg-source --before-build pghoard
fakeroot debian/rules clean
make[1]: Entering directory '/home/chris/pghoard'
dh clean --with python3 --buildsystem=pybuild
dh_testdir -O--buildsystem=pybuild
dh_auto_clean -O--buildsystem=pybuild
I: pybuild base:184: python3.5 setup.py clean
running clean
removing '/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build' (and everything under it)
'build/bdist.linux-x86_64' does not exist -- can't clean it
'build/scripts-3.5' does not exist -- can't clean it
dh_clean -O--buildsystem=pybuild
make[1]: Leaving directory '/home/chris/pghoard'
debian/rules build-indep
make[1]: Entering directory '/home/chris/pghoard'
dh build-indep --with python3 --buildsystem=pybuild
dh_testdir -i -O--buildsystem=pybuild
dh_update_autotools_config -i -O--buildsystem=pybuild
dh_auto_configure -i -O--buildsystem=pybuild
I: pybuild base:184: python3.5 setup.py config
running config
dh_auto_build -i -O--buildsystem=pybuild
I: pybuild base:184: /usr/bin/python3 setup.py build
running build
running build_py
creating /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/wal.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/compressor.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/basebackup.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/archive_sync.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/create_keys.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/statsd.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/receivexlog.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/webserver.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/logutil.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/transfer.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/postgres_command.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/patchedtarfile.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/common.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/config.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/restore.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/pgutil.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/pghoard.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/init.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/main.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
copying pghoard/version.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard
creating /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/errors.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/compressor.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/filewrap.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/encryptor.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/compat.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/snappyfile.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/rohmufile.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/init.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
copying pghoard/rohmu/inotify.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu
creating /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
copying pghoard/rohmu/object_storage/base.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
copying pghoard/rohmu/object_storage/local.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
copying pghoard/rohmu/object_storage/s3.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
copying pghoard/rohmu/object_storage/swift.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
copying pghoard/rohmu/object_storage/google.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
copying pghoard/rohmu/object_storage/azure.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
copying pghoard/rohmu/object_storage/init.py -> /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/pghoard/rohmu/object_storage
dh_auto_test -i -O--buildsystem=pybuild
I: pybuild base:184: cd /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build; python3.5 -m unittest discover -v
test.test_archivesync (unittest.loader._FailedTest) ... ERROR
test.test_basebackup (unittest.loader._FailedTest) ... ERROR
test.test_common (unittest.loader._FailedTest) ... ERROR
test.test_compressor (unittest.loader._FailedTest) ... ERROR
test.test_create_keys (unittest.loader._FailedTest) ... ERROR
test.test_encryptor (unittest.loader._FailedTest) ... ERROR
test.test_inotify (unittest.loader._FailedTest) ... ERROR
test.test_pghoard (unittest.loader._FailedTest) ... ERROR
test.test_pgutil (unittest.loader._FailedTest) ... ERROR
test.test_restore (unittest.loader._FailedTest) ... ERROR
test.test_storage (unittest.loader._FailedTest) ... ERROR
test.test_transferagent (unittest.loader._FailedTest) ... ERROR
test.test_wal (unittest.loader._FailedTest) ... ERROR
test.test_webserver (unittest.loader._FailedTest) ... ERROR

ERROR: test.test_archivesync (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_archivesync
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_archivesync.py", line 4, in
import pytest
ImportError: No module named 'pytest'

ERROR: test.test_basebackup (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_basebackup
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_basebackup.py", line 7, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_common (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_common
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_common.py", line 7, in
from .base import PGHoardTestCase
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_compressor (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_compressor
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_compressor.py", line 8, in
from .base import PGHoardTestCase, CONSTANT_TEST_RSA_PUBLIC_KEY, CONSTANT_TEST_RSA_PRIVATE_KEY
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_create_keys (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_create_keys
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_create_keys.py", line 11, in
import pytest
ImportError: No module named 'pytest'

ERROR: test.test_encryptor (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_encryptor
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_encryptor.py", line 7, in
from .base import CONSTANT_TEST_RSA_PUBLIC_KEY, CONSTANT_TEST_RSA_PRIVATE_KEY
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_inotify (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_inotify
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_inotify.py", line 8, in
from .base import PGHoardTestCase
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_pghoard (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_pghoard
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_pghoard.py", line 8, in
from .base import PGHoardTestCase
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_pgutil (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_pgutil
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_pgutil.py", line 12, in
from pytest import raises
ImportError: No module named 'pytest'

ERROR: test.test_restore (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_restore
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_restore.py", line 7, in
from .base import PGHoardTestCase
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_storage (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_storage
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_storage.py", line 13, in
import pytest
ImportError: No module named 'pytest'

ERROR: test.test_transferagent (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_transferagent
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_transferagent.py", line 8, in
from .base import PGHoardTestCase
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'

ERROR: test.test_wal (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_wal
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_wal.py", line 10, in
import pytest
ImportError: No module named 'pytest'

ERROR: test.test_webserver (unittest.loader._FailedTest)

ImportError: Failed to import test module: test.test_webserver
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/test_webserver.py", line 8, in
from .base import CONSTANT_TEST_RSA_PUBLIC_KEY, CONSTANT_TEST_RSA_PRIVATE_KEY
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/base.py", line 8, in
from .conftest import TestPG
File "/home/chris/pghoard/.pybuild/pythonX.Y_3.5/build/test/conftest.py", line 10, in
from py import path as py_path # pylint: disable=no-name-in-module
ImportError: No module named 'py'


Ran 14 tests in 0.004s

FAILED (errors=14)
E: pybuild pybuild:274: test: plugin distutils failed with: exit code=1: cd /home/chris/pghoard/.pybuild/pythonX.Y_3.5/build; python3.5 -m unittest discover -v
dh_auto_test: pybuild --test -i python{version} -p 3.5 --dir . returned exit code 13
debian/rules:6: recipe for target 'build-indep' failed
make[1]: *** [build-indep] Error 25
make[1]: Leaving directory '/home/chris/pghoard'
dpkg-buildpackage: error: debian/rules build-indep gave error exit status 2
Makefile:19: recipe for target 'deb' failed
make: *** [deb] Error 2

Getting `pghoard.rohmu.errors.InvalidConfigurationError`

Following is my pghoard.json file:


{
    "backup_location": "./metadata",
    "backup_sites": {
        "main": {
            "nodes": [
                {
                    "host": "X.X.X.X",
                    "password": "blahblah",
                    "port": 5432,
                    "user": "backup"
                }
            ],
            "object_storage": {
                "storage_type": "google",
                "bucket_name": "abcdef",
                "project_id": "project-id",
                "credential_file": "/home/ubuntu/credentials.json"
            }
        }
    }
}

Following the traceback for the command:

ubuntu@localhost:~$ pghoard --config=pghoard.json
2017-01-11 03:24:36,531	pghoard	MainThread	DEBUG	Loading JSON config from: 'pghoard.json', signal: None
2017-01-11 03:24:36,585	pghoard	MainThread	ERROR	Invalid config file 'pghoard.json': InvalidConfigurationError: Site 'main' command 'pg_basebackup' not found
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pghoard/pghoard.py", line 487, in load_config
    new_config = config.read_json_config_file(self.config_path)
  File "/usr/local/lib/python3.5/dist-packages/pghoard/config.py", line 152, in read_json_config_file
    return set_config_defaults(config, check_commands=check_commands)
  File "/usr/local/lib/python3.5/dist-packages/pghoard/config.py", line 126, in set_config_defaults
    raise InvalidConfigurationError("Site {!r} command {!r} not found".format(site_name, command))
pghoard.rohmu.errors.InvalidConfigurationError: Site 'main' command 'pg_basebackup' not found
pghoard: failed to load config pghoard.json: Site 'main' command 'pg_basebackup' not found

Not sure what exactly am I doing wrong here!

Recovery progress indicator

Restoring a large basebackup currently shows no progress at all.

It'd be nice to see the progress of download, tar extraction, etc.

No pg_hba.conf

my test server is set to trust all users for all databases on localhost:
host all all 127.0.0.1/32 trust

But I still get error: no pg_hba.conf set for user replicator database replication on host 127.0.0.1 (not the exact error message since im not at my pc right now but that's the point of it)

The problem goes away if I specifically also trust the replication db.
host replicator replication 127.0.0.1/32 trust

The test for setup should also allow 'all' for both/either user and/or database

Too many connections

I'm trying to setup pghoard for the first time but no backups are generated. Looking at the logs, this message is logged every 5 seconds:

connecting to DB at: "application_name='pghoard' dbname='replication' host='psql' port='5432' replication='true' user='pghoard'"
2016-12-31 16:32:29,064	pghoard.common	MainThread	WARNING	Creating alert file: '/data/configuration_error'
2016-12-31 16:32:34,065	pghoard	MainThread	INFO	Creating a new basebackup for 'default' because there are currently none
2016-12-31 16:32:34,067	pghoard	MainThread	WARNING	OperationalError (FATAL:  too many connections for role "pghoard"
)

is this why the first backup is not generated? or is there a bug in the number of connections created?

Using swift as backend gives same result.

Restore witout pghoard

Hi,

I created a backup of my db and save it in Google Cloud Storage but I'm unable to restore the backup using pg_restore not sure if I'm missing a config or restore works only using pghoard_restore

pg_restore -C -d test_db 2017-09-27_0   
pg_restore: [archiver] input file does not appear to be a valid archive

Update:
I see that backups is generated using a pg_basebackup so I tried to unpack the backup with tar but also get an error

tar -xvC ~/Downloads/pgdata -f ~/Downloads/2017-09-27_0 
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.

is there any option to restore the file without pghoard_restore?
Thanks

FATAL: RestoreError: TypeError: copyfileobj() takes from 2 to 4 positional arguments but 5 were given

Hi,

I'm running pghoard in a docker container on kubernetes, I'm trying to test the restore process but keep getting this error after it finishes downloading the basebackup

FATAL: RestoreError: TypeError: copyfileobj() takes from 2 to 4 positional arguments but 5 were given

I'm running pghoard 1.4.0 on python 3.6.1, restoring from a google cloud bucket to postgres 9.6.3

I can't figure out if I am doing something wrong or if this is a bug, any help would be appreciated.

Can't build rpm file

I am compiling it under centos 7. flake8, pytest, pylint and devel were already installed.

$ make rpm
error: Failed build dependencies:
python3-flake8 is needed by pghoard-1.4.0-35.g0a452f1.el7.centos.noarch
python3-pytest is needed by pghoard-1.4.0-35.g0a452f1.el7.centos.noarch
python3-pylint is needed by pghoard-1.4.0-35.g0a452f1.el7.centos.noarch
python3-devel is needed by pghoard-1.4.0-35.g0a452f1.el7.centos.noarch

I was able to compile it from source but I can't run this command using root account

systemctl enable pghoard

error: Failed to execute operation: Access denied

Problem with postgres restart: FileNotFoundError

There is FileNotFoundError in the pghoard log.
When I restart postgres: pg_ctlcluster 9.5 main restart,
then the immediate event has no problem,
but the next one (I think it is at the archive_timeout end)
will fail with FileNotFoundError.

DEBUG event: /var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7.partial IN_CLOSE_WRITE, None
DEBUG event: /var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7.partial IN_MOVED_FROM, None
DEBUG event: /var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7 IN_MOVED_TO, os.stat_result(st_mode=33152, st_ino=1048790, st_dev=2049, st_nlink=1, st_uid=121, st_gid=126, st_size=16777216, st_atime=1498809135, st_mtime=1498809168, st_ctime=1498809168)
DEBUG compressed_file_path for '/var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7' is '/var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog/0000000100000000000000C7'
DEBUG b'pg_receivexlog: finished segment at 0/C8000000 (timeline 1)\n'
DEBUG event: /var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7.partial IN_CLOSE_WRITE, None
DEBUG event: /var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7 IN_DELETE, None
INFO Compressed 16777216 byte open file '/var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7' to 797294 bytes (5%), took: 0.078s
ERROR Problem handling: {'type': 'MOVE', 'src_path': '/var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7.partial', 'full_path': '/var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7'}: FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7'
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pghoard/compressor.py", line 75, in run
self.handle_event(event, filetype)
File "/usr/lib/python3/dist-packages/pghoard/compressor.py", line 155, in handle_event
os.unlink(event["full_path"])
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/pghoard/metadata/pc_ef_zvolsky2/xlog_incoming/0000000100000000000000C7'

{
"log_level": "DEBUG",
"backup_location": "/var/lib/pghoard/metadata",
"backup_sites": {
"pc_ef_zvolsky2": {
"nodes": [
{
"host": "127.0.0.1",
"password": "xxxxxxxxxxxxxx",
"port": 5432,
"user": "pghoard"
}
],
"object_storage": {
"storage_type": "local",
"directory": "/var/lib/pghoard/backup"
},
"basebackup_mode": "pipe",
"pg_xlog_directory": "/var/lib/postgresql/9.5/main/pg_xlog",
"pg_bin_directory": "/usr/bin"
}
}
}

python3-snappy is not a debian package

It seems that when the debian package has been moved from python2 to python3, it has not been tested, as python3-snappy is not a package provided by debian distributions.

Improved basebackup retention policy

Currently PGHoard can only be configured to keep N basebackups. This should be improved to support various time-based scenarios, namely:

  • keep basebackups to allow restoring to any point in X previous days
  • long-term storing of older basebackups and enough WAL to keep them consistent, but not all the WAL for any PITR, supporting scenarios like allowing PITR for 14 days but also storing database snapshots for 8 last weeks and 6 last months

partial WAL files with recovery

I'm currently playing/testing with pghoard in Docker (thank you guys to open source this work) and it seems that for now pgboard is not yet able to transparently manage .partial file (this is also a limitation of Barman). Of course users can manually copy the latest partial file (making sure that the .partial suffix is removed) to the destination for recovery. In my case (testing only on S3) I made a simple:

 cp /data/site/xlog_incoming/xy.partial /data/site/xlog_incoming/xy

and once the restoration is complete, I remove this file to the destination (my S3 bucket) to allow pg_receivexlog to continue streaming transaction log if the goal was just to grab a consistant copy/snapshot of the PG cluster at a specific timestamp/txid.

I wondered if eventually it would be a good idea to automate this process and add a third remote directory called e.g. xlog-partial to upload this .partial file every X seconds or manually force a synchronization via pghoard_archive_sync? This way the download process could also look to see if a file .partial exists in this directory to recover this partial WAL through the pghoard_postgres_command.

In case of a crash of the server where pghoard is running, we'll lose only the X seconds between two .partial file upload.

What do you think? Perhaps there is already a way/plan to do this?

Feature request: move more to cloud: partial xlog, pgdump

Hi, I am very happy with forpsi virtual server+pghoard+google cloud.

May be I am missing something and it is already possible to make it clever? I want backup small db with low traffic.
If I have last partial xlog, I can rename it (remove the .partial extension) and restore to the crash time. After such restore I however must re-run the basebackup (remove metadata + more safe: use different google bucket --or-- less safe: remove the content of the used bucket).

Because database is small, it is not problem for me do it by hand (of course it would be super if it could be automatic too). However at least the backup part would be great for me.

  1. can we have option where datetime of .partial (maybe checkPartialInterval=N seconds; checkPartialInterval=0=skip this function) is checked and if it is changed the file will be moved into partial/ folder of the cloud?

  2. I make pg_dumpall -g + pg_dump single_database backups too. Could pg_hoard check once in 15(?) minutes the listed folders and if something changes then move the changes to the cloud?

Maybe both could be joined into 1 function with setting [(interval, srcfolder_w_path, cloudfolder=srcfolder)] where all cloudfolders will sit in some root for this backuping goal...?

pg_rewind support

It's pretty common to promote a replica server of a database cluster when performing maintenance, i.e. maintenance is first done on the replica and after the maintenance is done and replica is fully synced up the old master is shut down and replica promoted. Typically we want to make the old master a replica after maintenance has been performed there, but PG doesn't allow master to switch roles to a replica too easily.

pg_rewind can be used for this, but running it is a bit cumbersome and access to WAL archive is required after it has been run, so it would be useful if PGHoard could take care of running pg_rewind and then setting up the old master as a replica, with proper recovery.conf pointing to PGHoard's wal archive and an optional primary_conninfo line connecting it to the new master.

Discussion - restore backup using remote daemon

I'm a little bit confused on how to best implement pghoard when it comes to managing multiple db servers. After playing around with local backups and restoring them to the same machine they were taken on, I then moved on to working on an environment that more closely matches our current setup.

Right now I have three AWS EC2 servers:

  • Master Database
  • Slave Database
  • Dedicated PGHoard Server

I thought it would make sense to have a server dedicated to running the pghoard daemon taking basebackups and archiving WAL files with pg_receivexlog, but I then tried to spin up a new database server and restore from a basebackup and discovered that the pghoard webserver doesn't transfer WAL files remotely.

I'm curious to know how other people are implementing pghoard? When I restore, should I just start a local pghoard daemon and add a maintenance file so no backups are taken while I'm restoring? Is there a way to have a single remote pghoard daemon?

Thanks in advance.

Incomplete base backups in incoming get treated as complete on restart

It seems that base backups that are interrupted (by pghoard restart) get treated as they are complete. I believe we should test the integrity of the tar and do something sensible if it fails integrity check. (Perhaps move it out of the way and remove it after we get a good basebackup.) I'm sorry I don't have a better problem report for this; I'll try to improve it soon.

pghoard can't find python-systemd though it is installed

Running latest pghoard from RPM on CentoOS 7 that I built myself. It was hard but I made it.

The question is if python-systemd was required at build time or at runtime?

Now I get a timeout when trying to start:

[root@localhost vagrant]# systemctl start pghoard.service
Job for pghoard.service failed because a timeout was exceeded. See "systemctl status pghoard.service" and "journalctl -xe" for details.

/var/log/messages says:

Feb 8 19:54:28 localhost pghoard: WARNING: Running under systemd but python-systemd not available, systemd won't see our notifications

It wasn't installed when I built it but now I've installed it by (I'm not a Python pro):

yum install systemd-python
pip3 install python-systemd

Thanks for any help!

Add support for region when using swift

I can't find a way to set the region when using swift as object storage.
This is usually set using the OS_REGION_NAME environment variable.
It would be nice to support this.

last_upload_age is constantly growing

Hello,

We use StatsD collection for pghoard and looks like last_upload_age grows over time to the infinity, but because of that It's impossible to set backups monitoring that tracks last upload time.

2017-05-24 20 42 16

Is that our issue (backups are not uploaded fast enough) or expected behaviour?

Feature request: check mode for configuration file

Add a flag to command line utilities to check the syntax and semantics of the config file.

This would make it very useful with tools such as Ansible's validate feature (e.g. as used in the template module), or in general to dry-check the config before applying / reloading.

Can't get status from webserver

By reading the documentation and source code it looks like you can't get the status (as saved in the state json file) from the web server. Is this by design? If not do you have a preferred path part of the url you would like to see this implemented under?

pghoard_restore – RestoreError [Errno 28] No space left on device

I've got a problem restoring backups from S3 and I wanted to get some input before I created any pull requests.

I'm trying to restore a basebackup to an AWS EC2 instance and we're running out of disk space. Specifically, we only spin up the EC2 instance with a small(ish) root volume (50-100GB) but we then mount a much larger volume (1TB) to /var/lib/postgresql/9.{x}/pg_mount where we then store our cluster directory at /var/lib/postgresql/9.{x}/pg_mount/main.

pghoard_restore copies the compressed basebackup from S3 (~128GB) to the config["backup_location"] first before uncompressing and moving it to my desired location on the mounted drive (~440GB uncompressed). This initial temporary file fills up the root volume before the download completes and stops the basebackup from being restored.

Here are my two suggestions:

  1. Download the compressed basebackup to the parent of --target-dir (i.e. with --target-dir /var/lib/postgresql/9.5/pg_mount/main/, download the basebackup to /var/lib/postgresql/9.5/pg_mount/basebackup.tmp.pghoard)
  2. Add a new configuration key tmp_backup_dir for all temporary basebackups before they're uncompressed and restored.

Option 1 seems the simplest and it makes sense to me that you would copy the compressed basebackup here right where it's going to be restored.

Option 2 requires that we add another configuration key, but will allow users to specify where to store all temporary basebackups before they're stored in S3 (or wherever else) and when restoring a basebackup.

Anyways, let me know what you guys think and I'll work on a solution. For now I'm all good since I'm just testing and I can spin up a new EC2 instance with more disk space, but that won't be the case when we encounter the same issue in production.

Error making deb

I'm trying to build from code in a Docker image base on postgres:9.5 (debian:jessie) and some test don't work, looks like permission issue but not sure which folder is trying to use I created the folder /usr/bin/initdb with complete access.

This is the complete
error.log

Thanks

Alert commands

Currently pghoard's alerting just writes an alert file on disk. Enhance this with a configurable alert command to allow running external commands which create alerts in external monitoring systems or send notifications to slack, etc.

Unable to restore from backup

I've tried multiple times to accomplish a basic scenario.

  1. Start pghoard, which initially takes the first base backup
  2. Restore that backup to new instance

I'm using S3 storage to save the backups.

Every time I do restore I get errors about miising files in the backup storage

2017-07-04 09:17:07,531	TransferAgent	Thread-9	WARNING	'default/timeline/0000001C.history' not found from storage
2017-07-04 09:17:07,531	TransferAgent	Thread-9	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/0000001C.history', size: 0, took 0.028s
2017-07-04 09:17:07,531	TransferAgent	Thread-8	WARNING	'default/timeline/0000001D.history' not found from storage
2017-07-04 09:17:07,531	TransferAgent	Thread-8	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/0000001D.history', size: 0, took 0.028s
2017-07-04 09:17:07,535	TransferAgent	Thread-12	WARNING	'default/timeline/0000001E.history' not found from storage
2017-07-04 09:17:07,535	TransferAgent	Thread-12	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/0000001E.history', size: 0, took 0.030s
2017-07-04 09:17:07,535	TransferAgent	Thread-10	WARNING	'default/timeline/0000001A.history' not found from storage
2017-07-04 09:17:07,535	TransferAgent	Thread-10	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/0000001A.history', size: 0, took 0.030s
2017-07-04 09:17:07,599	TransferAgent	Thread-11	WARNING	'default/timeline/0000001B.history' not found from storage
2017-07-04 09:17:07,600	TransferAgent	Thread-11	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/0000001B.history', size: 0, took 0.098s
127.0.0.1 - - [04/Jul/2017 09:17:07] "GET /default/archive/0000001A.history HTTP/1.1" 404 -
2017-07-04 09:17:07,677	TransferAgent	Thread-8	WARNING	'default/timeline/00000019.history' not found from storage
2017-07-04 09:17:07,677	TransferAgent	Thread-8	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/00000019.history', size: 0, took 0.008s
2017-07-04 09:17:07,678	TransferAgent	Thread-9	WARNING	'default/timeline/0000001A.history' not found from storage
2017-07-04 09:17:07,678	TransferAgent	Thread-9	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/0000001A.history', size: 0, took 0.009s
127.0.0.1 - - [04/Jul/2017 09:17:07] "GET /default/archive/00000019.history HTTP/1.1" 404 -
2017-07-04 09:17:07,758	TransferAgent	Thread-12	WARNING	'default/xlog/0000001900000000000000F0' not found from storage
2017-07-04 09:17:07,758	TransferAgent	Thread-12	INFO	'DOWNLOAD' FAILED transfer of key: 'default/xlog/0000001900000000000000F0', size: 0, took 0.012s
2017-07-04 09:17:07,760	TransferAgent	Thread-9	WARNING	'default/xlog/0000001900000000000000EF' not found from storage
2017-07-04 09:17:07,760	TransferAgent	Thread-9	INFO	'DOWNLOAD' FAILED transfer of key: 'default/xlog/0000001900000000000000EF', size: 0, took 0.012s
2017-07-04 09:17:07,801	TransferAgent	Thread-11	WARNING	'default/xlog/0000001900000000000000F2' not found from storage
2017-07-04 09:17:07,802	TransferAgent	Thread-11	INFO	'DOWNLOAD' FAILED transfer of key: 'default/xlog/0000001900000000000000F2', size: 0, took 0.055s
2017-07-04 09:17:07,810	TransferAgent	Thread-8	INFO	'DOWNLOAD' transfer of key: 'default/xlog/0000001900000000000000F3', size: 797270, took 0.062s
2017-07-04 09:17:07,817	TransferAgent	Thread-10	WARNING	'default/xlog/0000001900000000000000F1' not found from storage
2017-07-04 09:17:07,818	TransferAgent	Thread-10	INFO	'DOWNLOAD' FAILED transfer of key: 'default/xlog/0000001900000000000000F1', size: 0, took 0.071s
127.0.0.1 - - [04/Jul/2017 09:17:07] "GET /default/archive/0000001900000000000000EF HTTP/1.1" 404 -

The thing is that those history/xlog files don't exist in the backup, and pghoard never uploaded them. I just see the base backup archive in the S3. What could be the reason.

I tried multiple combination of restore, including --restore-to-master

Backup not continuous

Hi,

PGHoard created folder structure and uploaded base backup to S3. However it doesn't update anything there, despite me inserting new data to the table. As I'm writing this issue Postgresql is creating new WALs, but PGHoard is ignoring them and don't upload them to S3.

Also there is only "default/basebackup" directory in S3 bucket, there is no directory with WALs.

My Postgresql and PGHoard lives on the same machine and talk to each other using unix socket.
Here is my pghoard.conf:

{
  "backup_location": "/var/lib/pghoard",
  "log_level": "INFO",
  "backup_sites": {
    "default": {
      "nodes": [ "postgresql://" ],
      "object_storage": {
        "aws_access_key_id": "SECRET",
        "aws_secret_access_key": "SECRET",
        "bucket_name": "some-bucket",
        "region": "eu-west-1",
        "storage_type": "s3"
      },
      "active_backup_mode": "pg_receivexlog",
      "basebackup_count": 2,
      "basebackup_interval_hours": 24,
      "basebackup_mode": "local-tar",
      "pg_data_directory": "/var/lib/postgresql/9.6/main",
      "encryption_key_id": "default",
      "encryption_keys": {
        "default": {
          "public": "SECRET"
        }
      }
    }
  }
}

And my comments-stripped pg_hba.conf:

local   all             postgres                                peer
local   all             all                                     peer
local   replication     postgres

Am I doing something wrong, or is there a bug in PGHoard?

Mac Support

When I tried to setup a development environment on Mac OS 10.11 I found that there is no inotify support in the OS. How do you feel about me refactoring the requirement for inotify to use http://pythonhosted.org/watchdog/ instead? Would the pull request be welcomed or would you rather not introduce a new dependency? I haven't evaluated if all cases where inotify is currently used are covered by watchdog API but would start with that analysis.

Tablespace support

Currently PGHoard only supports a single tablespace due to limitations in pg_basebackup and PGHoard: pg_basebackup can only stream tar mode backups for a single basebackup and PGHoard can only deal with a single file.

It'd be pretty easy to lift the restriction of a single file per basebackup, but typically tablespaces are used in environments with large databases and requiring temporary storage (which is needed unless we can stream the backup) for the full set of data is an issue.

Options, roughly in the order of usefulness and amount of work required:

  • Just add support for multiple files per basebackup and require the extra disk space
  • Assuming PGHoard is running locally, call pg_start_backup / pg_stop_backup and process the files locally instead of requiring pg_basebackup to generate a tar
  • Use Psycopg2's upcoming replication protocol support to stream data directly from the PG backend

Improved basebackup scheduling

Currently PGHoard automatically creates a new basebackup after N hours have passed from the previous one. We should implement at least two alternative ways to schedule basebackups:

  • Create basebackups based on WAL volume (absolute or relative to last basebackup size) instead of time
  • Start basebackups at a given hour and/or day of week instead of X hours from the previous one

Restore only completes with `--recovery-target-time`

I'm playing around with pghoard and I'm having an issue getting a full restore to work without specifying a --recovery-target-time. I have a single site taking backups on the same machine that Postgres is running on and storing them in S3. When I start up pghoard everything works great and both WAL files and base backups are being taken as expected. However I can only successfully restore a backup when I specify --recovery-target-time.

Here's my config:

{
    "backup_location": "/var/lib/pghoard",
    "backup_sites": {
        "default": {
            "nodes": [{
                "host": "127.0.0.1",
                "password": "password",
                "port": 5432,
                "user": "replication"
            }],
            "object_storage": {
                "storage_type": "s3",
                "bucket_name": "[REDACTED]",
                "region": "[REDACTED]",
                "aws_access_key_id": "[REDACTED]",
                "aws_secret_access_key": "[REDACTED]"
            },
            "active_backup_mode": "archive_command",
            "pg_data_directory": "/var/lib/postgresql/9.5/main",
            "basebackup_mode": "pipe"
        }
    }
}

For testing, I have a database with a single table. I turn on pghoard and my first backup is taken and stored in S3 along with the initial WAL files. After taking a backup, I open up my database and drop the lone table. I then begin the recovery process (all commands issued as the postgres user):

  1. Stop Postgres
/etc/init.d/postgresql stop
  1. Copy the data directory to a new location
cp -a /var/lib/postgresql/9.5/main /var/lib/postgresql/9.5/old-main
  1. Restore my latest backup
pghoard_restore get-basebackup --config /var/lib/pghoard/pghoard.json --target-dir /var/lib/postgresql/9.5/main --overwrite
  1. Copy unarchived WAL files to pg_xlog:
cp -a /var/lib/postgresql/9.5/old-main/pg_xlog/* /var/lib/postgresql/9.5/main/pg_xlog/
  1. Restart Postgres to begin recovery:
/etc/init.d/postgresql start

The recovery begins and eventually gets hung up when it's unable to find some files in the archive:

# pghoard log

2017-05-11 19:41:07,310	TransferAgent	Thread-10	WARNING	'default/timeline/00000002.history' not found from storage
2017-05-11 19:41:07,310	TransferAgent	Thread-10	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/00000002.history', size: 0, took 0.071s
127.0.0.1 - - [11/May/2017 19:41:07] "GET /default/archive/00000002.history HTTP/1.1" 404 -
2017-05-11 19:41:12,180	TransferAgent	Thread-9	WARNING	'default/xlog/000000010000000000000004' not found from storage
2017-05-11 19:41:12,181	TransferAgent	Thread-9	INFO	'DOWNLOAD' FAILED transfer of key: 'default/xlog/000000010000000000000004', size: 0, took 0.077s
127.0.0.1 - - [11/May/2017 19:41:07] "GET /default/archive/000000010000000000000004 HTTP/1.1" 404 -

This is correct, as both the WAL segment and the .history file both are not archived. The missing WAL segment however was copied over from the old data directory's pg_xlog during step 4 of my recovery process (00000002.history doesn't exist anywhere). PG docs say that the system will look for WAL logs in the pg_xlog if they can't be found in the archive but that doesn't seem to be the case when using pghoard_postgres_command as the restore_command.

I thought I'd try manually uploading the unarchived WAL segments to my S3 archive, but that yields another error:

# pghoard log

2017-05-11 19:49:38,485	TransferAgent	Thread-9	INFO	'DOWNLOAD' transfer of key: 'default/xlog/000000010000000000000004', size: 16777216, took 7.937s
127.0.0.1 - - [11/May/2017 19:49:38] "GET /default/archive/000000010000000000000004 HTTP/1.1" 201 -
2017-05-11 19:49:38,917	TransferAgent	Thread-8	WARNING	'default/timeline/00000002.history' not found from storage
2017-05-11 19:49:38,917	TransferAgent	Thread-8	INFO	'DOWNLOAD' FAILED transfer of key: 'default/timeline/00000002.history', size: 0, took 0.285s
127.0.0.1 - - [11/May/2017 19:49:38] "GET /default/archive/00000002.history HTTP/1.1" 404 -

This time the WAL segment is found in the archive and downloaded, but Postgres complains about it:

# postgresql log

2017-05-11 19:57:47 UTC [14488-11] LOG:  restored log file "000000010000000000000004" from archive
2017-05-11 19:57:47 UTC [14488-12] LOG:  invalid record length at 0/4000098
/usr/local/bin/pghoard_postgres_command: ERROR: '00000002.history' not found from archive

Everything works just fine however if I change my restore command in step three to include:

--recovery-target-time 2017-05-11T19:40:00Z

I'm pretty new to managing Postgres so maybe there's something basic that I'm missing... Like maybe you have to specify a --recovery-target for it to work, but I'd like to be able to restore to the last working point in time.

Thanks in advance.

Pluggable logging

Allow configuring custom logger modules to be imported and used in pghoard, allows easily pushing all or parts of the logs to various external services.

pghoard.upload_size metric type

I wonder why pghoard.upload_size is defined as counter?
IMHO it should be defined as gauge as it tracks a value at particular points in time.

can't run pghoard_archive_cleanup

running pghoard_archive_cleanup returns:

Traceback (most recent call last):
  File "/usr/bin/pghoard_archive_cleanup", line 11, in <module>
    load_entry_point('pghoard==1.4.0.dev93', 'console_scripts', 'pghoard_archive_cleanup')()
  File "/usr/lib/python3.6/site-packages/pghoard/archive_cleanup.py", line 60, in main
    return tool.run()
  File "/usr/lib/python3.6/site-packages/pghoard/archive_cleanup.py", line 52, in run
    self.set_config(args.config, args.site)
  File "/usr/lib/python3.6/site-packages/pghoard/archive_cleanup.py", line 25, in set_config
    self.config = config.read_json_config_file(config_file, check_commands=False)
  File "/usr/lib/python3.6/site-packages/pghoard/config.py", line 144, in read_json_config_file
    with open(filename, "r") as fp:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Howto: Recover from backups

There is no documentation on how to restore from previously backed up data.

Is there any plans to add documentation for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.