Coder Social home page Coder Social logo

backup-restore-sidecar's Introduction

metal-stack

we believe kubernetes runs best on bare metal, this is all about providing metal as a service

backup-restore-sidecar's People

Contributors

azneo avatar gerrit91 avatar majst01 avatar mschuller avatar mwennrich avatar mwindower avatar vknabel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jayge-ekenstam

backup-restore-sidecar's Issues

Postgres databases do not start anymore

Defaulted container "postgres" out of: postgres, backup-restore-sidecar, backup-restore-sidecar-provider (init)
{"level":"info","timestamp":"2023-08-23T05:39:37Z","caller":"cmd/main.go:290","msg":"read config file","config-file":"/etc/backup-restore-sidecar/config.yaml"}
{"level":"info","timestamp":"2023-08-23T05:39:37Z","caller":"cmd/main.go:359","msg":"initialized database adapter","type":"postgres"}
{"level":"info","timestamp":"2023-08-23T05:39:37Z","logger":"wait","caller":"wait/wait.go:25","msg":"waiting until initializer completes","interval":"3s"}
{"level":"info","timestamp":"2023-08-23T05:39:40Z","logger":"wait","caller":"wait/wait.go:40","msg":"initializer succeeded, database can be started","message":"done"}
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

initdb: error: directory "/data/postgres" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/data/postgres" or run initdb
with an argument other than "/data/postgres".

This is because the backup-restore-sidecar initializer will allways first copy over the postgres binaries:

github-runner@integration-0:~/actions-runner/_work/releases/releases/mini-lab$ kubectl -n metal-control-plane logs ipam-db-0 -c backup-restore-sidecar
{"level":"info","timestamp":"2023-08-23T05:23:20Z","caller":"cmd/main.go:290","msg":"read config file","config-file":"/etc/backup-restore-sidecar/config.yaml"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","caller":"cmd/main.go:359","msg":"initialized database adapter","type":"postgres"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","caller":"cmd/main.go:405","msg":"initialized backup provider","type":"local"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","caller":"cmd/main.go:118","msg":"starting backup-restore-sidecar","version":"v0.7.0 (0cdb888d), tags/v0.7.0-0-g0cdb888, go1.21.0","bind-addr":"127.0.0.1:8000"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"initializer","caller":"initializer/initializer.go:70","msg":"start initializer server","address":"127.0.0.1:8000"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"initializer","caller":"initializer/initializer.go:107","msg":"start running initializer"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"initializer","caller":"initializer/initializer.go:109","msg":"ensuring backup bucket"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"backup","caller":"local/local.go:65","msg":"ensuring backup bucket called for provider local"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"initializer","caller":"initializer/initializer.go:116","msg":"checking database"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"postgres","caller":"postgres/postgres.go:55","msg":"data directory is empty"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"initializer","caller":"initializer/initializer.go:130","msg":"database potentially needs to be restored, looking for backup"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"backup","caller":"local/local.go:115","msg":"listing backups called for provider local"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"initializer","caller":"initializer/initializer.go:139","msg":"there are no backups available, it's a fresh database. allow database to start"}
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"postgres","caller":"postgres/upgrade.go:284","msg":"copying postgres binaries for later upgrades","from":"/usr/local/bin","to":"/data/postgres/pg-bin-v12"}
'/usr/local/bin/tini' -> '/data/postgres/pg-bin-v12/tini'
'/usr/local/bin/backup-restore-sidecar' -> '/data/postgres/pg-bin-v12/backup-restore-sidecar'
'/usr/local/bin/docker-entrypoint.sh' -> '/data/postgres/pg-bin-v12/docker-entrypoint.sh'
'/usr/local/bin/pg_restore' -> '/data/postgres/pg-bin-v12/pg_restore'
'/usr/local/bin/ecpg' -> '/data/postgres/pg-bin-v12/ecpg'
'/usr/local/bin/pgbench' -> '/data/postgres/pg-bin-v12/pgbench'
'/usr/local/bin/pg_receivewal' -> '/data/postgres/pg-bin-v12/pg_receivewal'
'/usr/local/bin/pg_upgrade' -> '/data/postgres/pg-bin-v12/pg_upgrade'
'/usr/local/bin/clusterdb' -> '/data/postgres/pg-bin-v12/clusterdb'
'/usr/local/bin/initdb' -> '/data/postgres/pg-bin-v12/initdb'
'/usr/local/bin/pg_dumpall' -> '/data/postgres/pg-bin-v12/pg_dumpall'
'/usr/local/bin/pg_standby' -> '/data/postgres/pg-bin-v12/pg_standby'
'/usr/local/bin/createdb' -> '/data/postgres/pg-bin-v12/createdb'
'/usr/local/bin/pg_basebackup' -> '/data/postgres/pg-bin-v12/pg_basebackup'
'/usr/local/bin/vacuumdb' -> '/data/postgres/pg-bin-v12/vacuumdb'
'/usr/local/bin/pg_resetwal' -> '/data/postgres/pg-bin-v12/pg_resetwal'
'/usr/local/bin/pg_test_timing' -> '/data/postgres/pg-bin-v12/pg_test_timing'
'/usr/local/bin/pg_recvlogical' -> '/data/postgres/pg-bin-v12/pg_recvlogical'
'/usr/local/bin/pg_ctl' -> '/data/postgres/pg-bin-v12/pg_ctl'
'/usr/local/bin/pg_dump' -> '/data/postgres/pg-bin-v12/pg_dump'
'/usr/local/bin/pg_controldata' -> '/data/postgres/pg-bin-v12/pg_controldata'
'/usr/local/bin/dropuser' -> '/data/postgres/pg-bin-v12/dropuser'
'/usr/local/bin/psql' -> '/data/postgres/pg-bin-v12/psql'
'/usr/local/bin/pg_rewind' -> '/data/postgres/pg-bin-v12/pg_rewind'
'/usr/local/bin/pg_archivecleanup' -> '/data/postgres/pg-bin-v12/pg_archivecleanup'
'/usr/local/bin/postgres' -> '/data/postgres/pg-bin-v12/postgres'
'/usr/local/bin/dropdb' -> '/data/postgres/pg-bin-v12/dropdb'
'/usr/local/bin/createuser' -> '/data/postgres/pg-bin-v12/createuser'
'/usr/local/bin/vacuumlo' -> '/data/postgres/pg-bin-v12/vacuumlo'
'/usr/local/bin/pg_waldump' -> '/data/postgres/pg-bin-v12/pg_waldump'
'/usr/local/bin/oid2name' -> '/data/postgres/pg-bin-v12/oid2name'
'/usr/local/bin/pg_isready' -> '/data/postgres/pg-bin-v12/pg_isready'
'/usr/local/bin/pg_config' -> '/data/postgres/pg-bin-v12/pg_config'
'/usr/local/bin/pg_test_fsync' -> '/data/postgres/pg-bin-v12/pg_test_fsync'
'/usr/local/bin/postmaster' -> '/data/postgres/pg-bin-v12/postmaster'
'/usr/local/bin/reindexdb' -> '/data/postgres/pg-bin-v12/reindexdb'
'/usr/local/bin/pg_checksums' -> '/data/postgres/pg-bin-v12/pg_checksums'
'/usr/local/bin' -> '/data/postgres/pg-bin-v12'
{"level":"info","timestamp":"2023-08-23T05:23:20Z","logger":"postgres","caller":"postgres/upgrade.go:46","msg":"\"/data/postgres/PG_VERSION\" is not present, no upgrade required"}

We should probably only copy over the binaries if PG_VERSION exists

Use cron job runner for taking backups

At the moment, we just wait for backup_interval until the next backup is being made. It would be better to run backups as a cron job such that you can time backups more precisely.

Also mitigates the problem of metal-db and ipam-db coming out of sync when restoring backups.

Provide unit tests

This project can become crucial and we should try to test it as good as we can.

Testing the project is not so is easy though, because:

  • the sidecar is moving files around in the filesystem in a couple of places,
  • uploading stuff to cloud providers,
  • and requiring real databases.

For the filesystem One possible solution would be to introduce a file system abstraction, which is mockable, like https://github.com/spf13/afero.

For the cloud provider testing one possible way would be to create mocks from the cloud provider's storage interface, which would at least give a bit of confidence but still is not ideal.

LocalFS Provider

Instead of a real database it might be interesting for developer use-cases to just back up a directory of the local file system.

CLI panics on unknown commands or error in commands

/meili_data # backup-restore-sidecar backup
Error: unknown command "backup" for "backup-restore-sidecar"
Run 'backup-restore-sidecar --help' for usage.
panic: unknown command "backup" for "backup-restore-sidecar"

goroutine 1 [running]:
main.main()
        /home/runner/work/backup-restore-sidecar/backup-restore-sidecar/cmd/main.go:263 +0xbf

Take a last backup before termination

currently only during backup-intervall a backup is taken and stored. But if the pod is terminated there is a time gap between the last backup and the last commited transaction of the database. If then the disk dies, these last commits are lost.

At least in the case of normal pod termination we already get the termination signal. With this there is a chance to do one additional backup.

Downside: termination may take longer.

Provide Helm chart for deployment

Probably, it would make life easier to use this project if there was a deployment chart inside this project. This way, users would not need to copy manifests and adapt them to their use cases by hand.

Lint the project

We should add a make target for linting the project and also add this to CI to improve the code quality.

Does not work with quay.io/coreos/etcd:v3.5.7 and newer

etcd switched to the distroless/static-debian11 base image, which means there is no more sh or mkdir contained in the official image.

As a result, both the etcd and backup-restore-sidecar containers fail to start: While tini is a static binary, we specifically call sh and mkdir in those containers, which are no longer available.

As a proof of concept, I ran apk add busybox-static in the initContainer and made sure the /bin/busybox.static binary was also copied to /bin-provision, and then used the static busybox sh and busybox mkdir in those containers.

While this worked, I am not convinced this is a proper solution.

@Gerrit91 suggested a -c flag to the backup-restore-sidecar wait which would then handle the start of the given command itself, even removing the need for tini.

Postgres integration test sometimes produces `archive/tar: write too long` error

When creating a backup, we sometimes see in the postgres or postgres with timescaleDB integration tests:

...
    main_test.go:274: deploy sts with next database version "timescale/timescaledb:2.11.2-pg15", container "timescale/timescaledb:2.11.2-pg15"
    main_test.go:287: verify that data is still the same
    main_test.go:241: taking a backup
    main_test.go:250: rpc error: code = Internal desc = error creating backup: unable to compress backup: walking /backup/upload/files: /backup/upload/files/base.tar.gz: writing: files/base.tar.gz: copying contents: archive/tar: write too long
    main_test.go:274: deploy sts with next database version "timescale/timescaledb:2.13.1-pg15", container "timescale/timescaledb:2.13.1-pg15"
...

Rethinkdb not starting after update

The backup-restore-sidecar command just thinks that the initialize step in the sidecar has not completed.

When entering the rethinkdb container via kubectl exec, running backup-restore-sidecar wait succeeds without any issues. As a workaround the database can be started in another workaround and replaced with a newer version as soon as this got fixed.

Add s3 provider

Add a provider for s3-compatible backends (like minio, ceph-rgw)

Support exponential cleanup for backups

At the moment we just keep all the backups with a given number of revisions and remove the oldest ones.

When backups are configured very often and with a high number of revisions to keep, this blows up the sizes of the backups quite a lot.

It would be beneficial if there would be less backups that lie longer back in the past.

Removal of tini

After pre- and post-exec commands, we do not need to ship this anymore. This will result into a breaking minor release requiring users to migrate to pre- and post-exec commands. In our metal-stack landscape this has now already been done.

aws-go-sdk v1 deprecated, EOL Jul 2025

Existing applications that use AWS SDK for Go (v1) will continue to function as intended, unless there is a fundamental change to how an AWS service works. This is uncommon and would be broadly communicated if it happens. Between July 31, 2024 and end-of-support on July 31, 2025, the AWS SDK for Go (v1) will only receive critical bug fixes and security updates. The SDK will not be updated to support new AWS services, new service features, or changes to existing services.

https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-go-v1-on-july-31-2025/

Consistency checks

At various places it would be very nice to ensure data consistency before applying actions:

  • Check for database consistency to decide if restore is required or not
    • pg_checksums can be an approach for postgres databases?
  • Check if backups are valid before uploading
  • Check if backups are valid before restoring
    • adding checksum?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.