paunin / postdock Goto Github PK
View Code? Open in Web Editor NEWPostDock - Postgres & Docker - Postgres streaming replication cluster for any docker environment
License: MIT License
PostDock - Postgres & Docker - Postgres streaming replication cluster for any docker environment
License: MIT License
Hi, is it possible to use StatefulSet to create the cluster? I am not sure how this can be done, so could you please advise?
Thank you,
Yashu
Currently after recovery from backup 2nd tier nodes are unable to automatically resume replication. You have to manually do pg_rewind
. This happens because one can't rewind to another standby node but only to a master note.
So we need to find a way to determine cluster master on 2nd tier node and change source-system
for pg_rewind.
I made some basic changes to your solution to suit our deployment “provisioner” needs. Instead of having 3 StatefulSets (for a 3 node setup), I created just one StatefulSet (with Adaptive Mode) with 3 replicas and changed the docker entry point to populate NODE_ID, NODE_NAME and other instance specific things for each replica. Pgpool sits in the front of this setup.
Now I don’t know if its because of some misconfiguration at my end, but this is what I see:
At times when the failed master comes up, it is not able to auto-detect the new master and assumes the role of master. Thus causing 2 masters in the setup.
When one instance goes down and does not come up beyond x seconds, Pgpool marks it as “down” but does not mark it as “up” when it finally comes up and is responsive
At times Pgpool does not show the correct master.
Any help will be greatly appreciated
Thanks,
Sudeep
I tried to connect external folder.
for pgmaster
volumes:
build:
context: .
dockerfile: Pgsql.Dockerfile
environment:
INITIAL_NODE_TYPE: master # Used by repmgr register
as initial node role
NODE_ID: 1 # Integer number of node
NODE_NAME: node1 # Node name
CLUSTER_NODE_NETWORK_NAME: pgmaster # (default: hostanme of the node)
#database we want to use for application
POSTGRES_PASSWORD: main_db
POSTGRES_USER: main_user
POSTGRES_DB: MsizaR4z5A3kp1t
CONFIGS: "listen_addresses:'*'"
#defaults:
CLUSTER_NAME: pg_cluster # default is pg_cluster
REPLICATION_DB: replication_db # default is replication_db
REPLICATION_USER: replication_user # default is replication_user
REPLICATION_PASSWORD: zd74YHywT747Uwe # default is replication_pass
networks:
cluster:
aliases:
#<<< Branch 1
pgslave1:
build:
context: .
dockerfile: Pgsql.Dockerfile
depends_on:
networks:
cluster:
aliases:
Kubernetes 1.6.3, minikube 0.19.0, postgres-docker-cluster/master commit a81e20b
I am trying to run k8s/database-service example and get the following error on node2 and node4
>>> STARTING POSTGRES...
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
>>> Wait db replication_db on database-node1-service:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
>>>>>> Db replication_db exists on database-node1-service:5432!
ERROR: relation "repmgr_pg_cluster.repl_nodes" does not exist
LINE 1: SELECT id FROM repmgr_pg_cluster.repl_nodes WHERE conninfo L...
There is an ssh private key committed to the repo. There is a note in the readme about changing it, but it's not very visible. Is it possible to generate the keys on the fly and add them to a volume by default? I.e. the first time you run docker-compose it creates a volume, generates the keys on the volume, and mounts it in .ssh of all the containers?
Setup Barman solution into eco-system of docker-postgres cluster.
To Be Done:
I think there is a high chance to create a script in Barman container to automate recovery process now. Manual work can be to enable SSH in each container, but that might be it.
Postgresql 10
Hi there,
I try to pass the following env CONFIG variable:
--env CONFIGS="log_line_prefix:'%m [%p-%l] %q%u@%d '"
But this breaks the postgresql.conf file due to error:
...
>>>>>> Adding config 'log_line_prefix' with value ''%m'
sed: -e expression #1, char 67: Invalid range end
>>>>>> Adding config '[%p-%l]' with value ''
>>>>>> Adding config '%q%u@%d' with value ''
>>>>>> Adding config ''' with value ''
...
The problem is at the line (in primary.entrypoint.sh) :
for CONFIG_PAIR in ${CONFIG_PAIRS[@]}
If ${CONFIG_PAIRS[@]} contains whitespace this leads to the error.
Actually I don't find any suitable solution to this.
Maybe you will have an idea.
Hello,
First I would like to thank you for this very usefull Docker clusterisation of Postgresql which is a mess.
I've got a strange behaviour I would like to expose when trying to start containers once by once (and not with Docker compose):
docker run --rm --name Pg1 --env INITIAL_NODE_TYPE=master --env NODE_ID=1 --env NODE_NAME=node1 --env CLUSTER_NODE_NETWORK_NAME=pgmaster --env POSTGRES_PASSWORD=monkey_pass --env POSTGRES_USER=monkey_user --env POSTGRES_DB=monkey_db --env
CONFIGS="listen_addresses:'*',logging_collector:on" --env CLUSTER_NAME=pg_cluster --env REPLICATION_DB=replication_db --env REPLICATION_USER=replication_user --env REPLICATION_PASSWORD=replication_pass --env FORCE_CLEAN=1 -p 5432:5432 -v
/data/pg1:/var/lib/postgresql/data paunin/postgresql-cluster-pgsql
The PG is active and all seams to be OK.
docker run --rm --name Pg2 --env INITIAL_NODE_TYPE=standby --env NODE_ID=2 --env NODE_NAME=node2 --env CLUSTER_NODE_NETWORK_NAME=pgstandby --env CONFIGS="listen_addresses:'*',logging_collector:'on'" --env CLUSTER_NAME=pg_cluster --env REP
LICATION_UPSTREAM_NODE_ID=1 --env FORCE_CLEAN=1 --env REPLICATION_PRIMARY_HOST=localhost --env REPLICATION_PRIMARY_PORT=5432 -p 5433:5432 -v /data/pg2:/var/lib/postgresql/data paunin/postgresql-cluster-pgsql
but the PG embedded in this container never start and never connect to master. Here are the logs:
For node with initial type standby you have to setup REPLICATION_UPSTREAM_NODE_ID
>>> Will wait db replication_db on pgstandby:5432, will try 50 times with delay 5 seconds
>>> Will wait db replication_db on localhost:5432, will try 50 times with delay 5 seconds
psql: could not connect to server: Connection refused
Is the server running on host "pgstandby" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
...
Any help would be grealty appreciated.
EDIT:
The docker run for pg2 was wrong. I fix it.
EDIT2:
The first server is running and listening:
$ netstat -an | grep 5432
tcp6 0 0 :::5432 :::* LISTEN
Make container be able to decide on start/restart how should it act - master or slave, and where to clone instance from.
Basically on start of a container if $PGDATA has structure of postgres cluster
standby
(clone from master repmgr standby clone --force
, register as a standby in repmgr
)NB:
$INITIAL_NODE_TYPE
only here and here to decide which mode(master|slave) to use.REPLICATION_UPSTREAM_NODE_ID
, REPLICATION_PRIMARY_HOST
and use something like NODES_IN_REPLICA
with name of all servers are used for the cluster, so on non-first start each node will be able to ask all partners if master exists already (and use it as upstream)Running under k8s using a pod definition basically the same as the example.
During the startup of postgres via /docker-entrypoint.sh postgres
the database is started up once and the primary/entrypoint.sh
script is run to create the REPLICATION_DB
, postgres is then shutdown and restarted.
While that is going on wait_db
is sitting waiting till REPLICATION_DB
appears to be able to start repmgr
. If wait_db
happens to check just after the database is created in the primary/entrypoint.sh
script, but before postgres is shutdown, it can think the database is ready to use. repmgr
then tries to connect, but errors out as the database system is shutting down by that point.
I haven't checked how long the window actually is, but we were managing to hit this pretty much 100% of the time on a test cluster. With the current 5s poll interval for wait_db
it seems like it should be very unlikely to hit...
Logs from pod:
>>> STARTING SSH SERVER...
>>> Tuning up sshd...
>>> Starting...
>>> STARTING POSTGRES...
>>> Setting up repmgr...
>>> Adding loopback '127.0.0.1 database-node1-service'
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting postgres on this node to start repmgr...
>>> Wait db replication_db on database-node1-service:5432(user: replication_user,password: *******), will try 50 times with delay 5 seconds
psql: could not connect to server: Connection refused
Is the server running on host "database-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replication_db is still not accessable on database-node1-service:5432 (will try 50 times)
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /var/lib/postgresql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....LOG: database system was shut down at 2017-02-17 13:11:49 UTC
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
done
server started
CREATE DATABASE
CREATE ROLE
/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/entrypoint.sh
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Adding config 'autovacuum'='on'
>>> Creating replication user 'replication_user'
CREATE ROLE
>>> Creating replication db 'replication_db'
>>>>>> Db replication_db exists on database-node1-service:5432!
>>> Registering node with role master
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
waiting for server to shut down....LOG: shutting down
FATAL: the database system is shutting down
ERROR: connection to database failed: FATAL: the database system is shutting down
>>>>>> Can't re-register node. Means it has been already done before!
>>> Starting repmgr daemon...
[2017-02-17 13:11:51] [NOTICE] looking for configuration file in current directory
[2017-02-17 13:11:51] [NOTICE] looking for configuration file in /etc
[2017-02-17 13:11:51] [NOTICE] configuration file found at: /etc/repmgr.conf
[2017-02-17 13:11:51] [INFO] connecting to database 'user=replication_user password=replication_pass host=database-node1-service dbname=replication_db port=5432 connect_timeout=2'
FATAL: the database system is shutting down
[2017-02-17 13:11:51] [ERROR] connection to database failed: FATAL: the database system is shutting down
Now Pgpool and Repmgr know nothing about each over. But there is a chance when a new master elected pgpool still points to old master.
To Be Done:
Don't allow start node with master role after crash as cluster might be already recovered with new leader/master
We can use liveness from k8s to check status of pgpool http://kubernetes.io/docs/user-guide/liveness/
or docker HEALTHCHECK https://docs.docker.com/engine/reference/builder/#healthcheck
I am trying to create a cloud-config example based on docker-compose but I got some issues and get stuck in slave1 creation (notice I am not implementing pgpool yet)
Once master is deployed seems to be working properly, but when I deploy slave1 I get this
>>> STARTING POSTGRES...
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Previously Locked standby upstream node LOCKED_STANDBY=''
cat: /var/lib/postgresql/data/standby.lock: No such file or directory
2017/05/09 10:45:36 Waiting for host: tcp://10.10.2.4:5432
2017/05/09 10:45:36 Connected to tcp://10.10.2.4:5432
>>> Can not get REPLICATION_UPSTREAM_NODE_ID from LOCK file or by CURRENT_REPLICATION_PRIMARY_HOST=10.10.2.4
In master logs I get:
LOG: incomplete startup packet
LOG: incomplete startup packet
LOG: incomplete startup packet
cloud-config master:
#cloud-config
write_files:
- path: /etc/postgres-master.env
permissions: 0600
owner: root
content: |
VOLUME_NAME=/media/pgdatam
MASTER_CONTAINER_NAME=pgmaster
PG_VERSION=9.6
PARTNER_NODES="10.10.2.4,10.10.2.5"
NODE_ID=1
NODE_NAME=node1
CLUSTER_NODE_NETWORK_NAME=pg01
POSTGRES_PASSWORD=foo
POSTGRES_USER=foo
POSTGRES_DB=foo
PG_CONFIGS="listen_addresses:'*'"
CLUSTER_NAME=pg_cluster
REPLICATION_DB=replication_db
REPLICATION_USER=replication_user
REPLICATION_PASSWORD=replication_pass
coreos:
units:
- name: media.mount
command: start
content: |
[Unit]
Description=Mount ephemeral to /media
[Mount]
What=/dev/sdc1
Where=/media
Type=ext4
- name: postgres-master.service
command: start
enable: true
content: |
[Unit]
Description=postgres-master
After=network-online.target
After=docker.service
Description=Postgres Master
Requires=network-online.target
Requires=docker.service
[Service]
Restart=always
RestartSec=5
EnvironmentFile=/etc/postgres-master.env
ExecStartPre=-/usr/bin/docker stop ${MASTER_CONTAINER_NAME}
ExecStartPre=-/usr/bin/docker rm ${MASTER_CONTAINER_NAME}
ExecStartPre=/usr/bin/docker pull paunin/postgresql-cluster-pgsql:${PG_VERSION}
ExecStart=/usr/bin/docker run --name ${MASTER_CONTAINER_NAME} \
-e NODE_ID=${NODE_ID} \
-e NODE_NAME=${NODE_NAME} \
-e CLUSTER_NODE_NETWORK_NAME=${CLUSTER_NODE_NETWORK_NAME} \
-e POSTGRES_PASSWORD=${POSTGRES_PASSWORD} \
-e POSTGRES_USER=${POSTGRES_USER} \
-e POSTGRES_DB=${POSTGRES_DB} \
-e CONFIGS=${PG_CONFIGS} \
-e CLUSTER_NAME=${CLUSTER_NAME} \
-e REPLICATION_DB=${REPLICATION_DB} \
-e REPLICATION_USER=${REPLICATION_USER} \
-e REPLICATION_PASSWORD=${REPLICATION_PASSWORD} \
-p "5432:5432" \
-v "${VOLUME_NAME}:/var/lib/postgresql/data" \
paunin/postgresql-cluster-pgsql:${PG_VERSION}
cloud-config slave:
#cloud-config
write_files:
- path: /etc/postgres-slave.env
permissions: 0600
owner: root
content: |
PARTNER_NODES="10.10.2.4,10.10.2.5"
REPLICATION_PRIMARY_HOST=10.10.2.4
NODE_ID=2
NODE_NAME=node2
CLUSTER_NODE_NETWORK_NAME=pg02
VOLUME_NAME=/media/pgdatas
PG_CONTAINER_NAME=pgslave1
PG_VERSION=9.6
coreos:
units:
- name: media.mount
command: start
content: |
[Unit]
Description=Mount ephemeral to /media
[Mount]
What=/dev/sdc1
Where=/media
Type=ext4
- name: postgres-slave.service
command: start
enable: true
content: |
[Unit]
Description=postgres-slave
After=network-online.target
After=docker.service
Description=Postgres Slave
Requires=network-online.target
Requires=docker.service
[Service]
Restart=always
RestartSec=5
EnvironmentFile=/etc/postgres-slave.env
ExecStartPre=-/usr/bin/docker stop ${PG_CONTAINER_NAME}
ExecStartPre=-/usr/bin/docker rm ${PG_CONTAINER_NAME}
ExecStartPre=/usr/bin/docker pull paunin/postgresql-cluster-pgsql:${PG_VERSION}
ExecStart=/usr/bin/docker run --name ${PG_CONTAINER_NAME} \
-e NODE_ID=${NODE_ID} \
-e NODE_NAME=${NODE_NAME} \
-e CLUSTER_NODE_NETWORK_NAME=${CLUSTER_NODE_NETWORK_NAME} \
-e PARTNER_NODES=${PARTNER_NODES} \
-e REPLICATION_PRIMARY_HOST=${REPLICATION_PRIMARY_HOST} \
-p "5432:5432" \
-v "${VOLUME_NAME}:/var/lib/postgresql/data" \
paunin/postgresql-cluster-pgsql:${PG_VERSION}
Standby does not have enough time to start (cloning takes too much time) if DB is big
Need to make configurable wait_db utility
variable1:value1[,variable2:value2[, ...]]
Can use PG_VERSION
file as indicator or something else.
ls -al $PGDATA
total 128
drwx------ 19 postgres root 4096 Dec 28 03:00 .
drwxr-xr-x 3 postgres postgres 4096 Dec 28 02:57 ..
drwx------ 7 postgres postgres 4096 Dec 28 03:00 base
drwx------ 2 postgres postgres 4096 Dec 28 03:01 global
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_clog
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_commit_ts
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_dynshmem
-rw------- 1 postgres postgres 4538 Dec 28 03:00 pg_hba.conf
-rw------- 1 postgres postgres 1636 Dec 28 03:00 pg_ident.conf
drwx------ 4 postgres postgres 4096 Dec 28 03:00 pg_logical
drwx------ 4 postgres postgres 4096 Dec 28 03:00 pg_multixact
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_notify
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_replslot
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_serial
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_snapshots
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_stat
drwx------ 2 postgres postgres 4096 Dec 28 04:01 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_subtrans
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_tblspc
drwx------ 2 postgres postgres 4096 Dec 28 03:00 pg_twophase
-rw------- 1 postgres postgres 4 Dec 28 03:00 PG_VERSION
drwx------ 3 postgres postgres 4096 Dec 28 03:00 pg_xlog
-rw------- 1 postgres postgres 88 Dec 28 03:00 postgresql.auto.conf
-rw-r--r-- 1 root root 21753 Dec 28 03:00 postgresql.conf
-rw------- 1 postgres postgres 37 Dec 28 03:00 postmaster.opts
-rw------- 1 postgres postgres 86 Dec 28 03:00 postmaster.pid
should be else
for this if
Have health check for a node to check if it's false-master.
If all conditions returns true
Some tips:
NODES_IN_CLUSTER
or get nodes from repmgr cluster show
This health-check should be used in k8s to shutdown pods with false-master role issue. #23 solves the problem of correct second node start
Right now repmgr waits certain amount of time for DB on localhost instead of real moment of finished replication.
We can make lock mechanism which will wait for initial replication is done and then launch repmgr
Steps to reproduce: https://github.com/gorozcoh/docker
Connecting Odoo 10 front end to service database-pgpool-service backend do not working. Database creation using web browser fails with error:
pgpool FATAL: database does not exist.
Database creation direct on database-node1 using psql works fine and is replicated without issues.
Connecting Odoo 10 front end to service database-node1-service works without issues.
How do I change the NAMEDATALEN attribute?
Executing:
kubectl create -f ./k8s/database-service/node1-master.yaml
Error:
yaml: line 14: could not find expected ':'
Solution:
Replace ---- with ---
Environment Details:
Google Cloud Shell, Kubectl version 1.4.1
When using paunin/postgresql-cluster-pgsql
in docker-compose containers are stopped after timeout with SIGKILL. I think this happens because PID 1 (bash /usr/local/bin/cluster/postgres/entrypoint.sh
) is not forwarding signals.
I don't know how to properly shut down postgres+repmgr combination. With postgres alone it's enough to send SIGTERM to postgres root process (as in postgres
image, where postgres process is PID 1)
Current problem is that master drops wal segments regardless of slaves but based on wal_keep_segments
variable.
To Be done:
pg_replication_slots
for all partners in the clusterNB:
https://blog.2ndquadrant.com/postgresql-9-4-slots/
Any wal_keep_segments parameter will be respected as a minimum for WAL retention ??
using the docker run command with the following variables -e REPLICATION_USER='mytestuser' -e REPLICATION_PASSWORD='mytestpassword' for the image paunin/postgresql-cluster-pgpool
the logs display:
Adding check user 'replication_user' for md5 auth
Adding user 'replication_user' as check user
Adding user 'replication_user' as health-check user
the /etc/pgpool2/pgpool.conf is still set to the default values:
sr_check_password = 'replication_pass'
sr_check_user = 'replication_user'
health_check_password = 'replication_pass'
health_check_user = 'replication_user'
The default values are always used.
docker run -e PCP_USER=postgres -e PCP_PASSWORD=xxx -e PGPOOL_START_DELAY=120 -e REPLICATION_USER=xxx -e REPLICATION_PASSWORD=xxx -e SEARCH_PRIMARY_NODE_TIMEOUT=5 -e DB_USERS=postgres:xxx -e BACKENDS=0:xx.x.xx.x:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:xx.x.xx.x:5433:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER -p 5434:5432 -p 9898:9898 --name pgpooltest paunin/postgresql-cluster-pgpool
variable1:value1[,variable2:value2[, ...]]
Hi.
I try to launch your docker-compose.yml.
Did i miss something ?
Thanks !
(docker-compose version 1.8.1, build 878cff1)
(Docker version 1.12.1, build 23cf638)
[...]
pgmaster_1 | #------------------------------------------------------------------------------
pgmaster_1 | # AUTOGENERATED
pgmaster_1 | #------------------------------------------------------------------------------
pgmaster_1 |
pgmaster_1 | listen_addresses = '*'
pgmaster_1 | CREATE ROLE
pgmaster_1 |
pgmaster_1 | LOG: received fast shutdown request
pgmaster_1 | LOG: aborting any active transactions
pgmaster_1 | waiting for server to shut down....LOG: autovacuum launcher shutting down
pgmaster_1 | LOG: shutting down
pgmaster_1 | LOG: database system is shut down
pgmaster_1 | done
pgmaster_1 | server stopped
pgmaster_1 |
pgmaster_1 | PostgreSQL init process complete; ready for start up.
pgmaster_1 |
pgmaster_1 | FATAL: could not access file "repmgr_funcs": No such file or directory
pgmaster_1 | psql: could not connect to server: Connection refused
pgmaster_1 | Is the server running on host "pgmaster" (127.0.0.1) and accepting
pgmaster_1 | TCP/IP connections on port 5432?
pgmaster_1 | could not connect to server: Connection refused
pgmaster_1 | Is the server running on host "pgmaster" (172.24.0.2) and accepting
pgmaster_1 | TCP/IP connections on port 5432?
[...]
In my test case,when master down, then maste and standby change role.
when connect cluster though pgool, those nodes status all is 3.Then client can not connect db cluster.
I restart pgpool,then nodes status is right.client can connect db cluster.
can you help me ?you meet this case?
Hi @paunin ,
I'm trying to use your pg cluster in our k8s, it's very useful!
But I found a problem when I tried to rebuild pg cluster from existing volumes(I'm using ceph rbd volumes to store pg data), below are the reproducing steps:
(1) Create new rbd volumes, and then deploy pg cluster with 1 primary and 2 standby nodes, this step works perfectly.
Here is my node1-master.yml
apiVersion: v1
kind: Pod
metadata:
name: application-db-node1
labels:
name: database
node: node1
system: application
spec:
containers:
-
name: application-db-node1
image: paunin/postgresql-cluster-pgsql
env: # those variables are not secret as it's just initial state of DB
-
name: "INITIAL_NODE_TYPE"
value: "master"
-
name: "NODE_ID"
value: "1"
-
name: "NODE_NAME"
value: "node1"
-
name: "CLUSTER_NODE_NETWORK_NAME"
value: "application-db-node1-service"
-
name: "CONFIGS"
value: "wal_keep_segments:250,shared_buffers:500MB"
# Work DB
- name: "POSTGRES_DB"
value: "application_db"
- name: "POSTGRES_USER"
value: "application_user"
- name: "POSTGRES_PASSWORD"
value: "application_pass"
# Cluster configs
- name: "CLUSTER_NAME"
value: "application_cluster"
- name: "REPLICATION_DB"
value: "replica_db"
- name: "REPLICATION_USER"
value: "replica_user"
- name: "REPLICATION_PASSWORD"
value: "replica_pass"
ports:
-
containerPort: 5432
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: data
volumes:
- name: "data"
rbd:
monitors:
- "172.22.111.172:6789"
- "172.22.111.173:6789"
- "172.22.111.174:6789"
pool: "rbd"
image: "pgdb"
user: "admin"
secretRef:
name: "ceph-secret"
keyring: "/etc/ceph/ceph.client.admin.keyring"
fsType: "xfs"
readOnly: false
(2) Do some db operations, e.g. create some db tables...
(3) Delete all the pods by using "kubectl delete -f" command, now all the pods are gone, and the db data is stored in k8s rbd volumes
(4) Rebuild the pg cluster with these rbd volumes, but I got such errors, and node1-master can not start
root@k8s-master1:~/workspaces/postgres-repmgr-cluster/k8s# kubectl logs -f application-db-node1
>>> Data folder is not empty /var/lib/postgresql/data:
total 68
drwx------ 19 postgres root 4096 Jan 9 08:06 .
drwxr-xr-x 3 postgres postgres 4096 Dec 29 14:32 ..
drwx------ 7 postgres postgres 62 Jan 9 08:04 base
drwx------ 2 postgres postgres 4096 Jan 9 08:06 global
drwx------ 2 postgres postgres 17 Jan 9 08:04 pg_clog
drwx------ 2 postgres postgres 6 Jan 9 08:03 pg_commit_ts
drwx------ 2 postgres postgres 6 Jan 9 08:03 pg_dynshmem
-rwx------ 1 postgres postgres 4534 Jan 9 08:04 pg_hba.conf
-rwx------ 1 postgres postgres 1636 Jan 9 08:04 pg_ident.conf
drwx------ 4 postgres postgres 37 Jan 9 08:03 pg_logical
drwx------ 4 postgres postgres 34 Jan 9 08:03 pg_multixact
drwx------ 2 postgres postgres 17 Jan 9 08:06 pg_notify
drwx------ 2 postgres postgres 6 Jan 9 08:03 pg_replslot
drwx------ 2 postgres postgres 6 Jan 9 08:03 pg_serial
wx------ 2 postgres postgres 6 Jan 9 08:03 pg_snapshots
â–½x------ 2 postgres postgres 6 Jan 9 08:03 pg_stat
drwx------ 2 postgres postgres 76 Jan 9 08:17 pg_stat_tmp
drwx------ 2 postgres postgres 17 Jan 9 08:04 pg_subtrans
drwx------ 2 postgres postgres 6 Jan 9 08:03 pg_tblspc
drwx------ 2 postgres postgres 6 Jan 9 08:03 pg_twophase
-rwx------ 1 postgres postgres 4 Jan 9 08:03 PG_VERSION
drwx------ 3 postgres postgres 4096 Jan 9 08:15 pg_xlog
-rwx------ 1 postgres postgres 88 Jan 9 08:04 postgresql.auto.conf
-rwx------ 1 postgres root 21816 Jan 9 08:04 postgresql.conf
-rwx------ 1 postgres postgres 37 Jan 9 08:06 postmaster.opts
-rw------- 1 postgres postgres 86 Jan 9 08:06 postmaster.pid
>>> Postgresql data folder structure detected!
>>> Setting up repmgr...
>>> Adding loopback '127.0.0.1 application-db-node1-service'
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting postgres on this node to start repmgr...
>>> Wait db replica_db on application-db-node1-service:5432(user: replica_user,password: *******), will try 50 times with delay 5 seconds
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 50 times)
LOG: database system was interrupted; last known up at 2017-01-09 08:14:42 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/4000050
LOG: invalid record length at 0/6001098
LOG: redo done at 0/6001060
LOG: last completed transaction was at log time 2017-01-09 08:16:14.522034+00
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
>>>>>> Db replica_db exists on application-db-node1-service:5432!
>>> Registering node with initial role master
[2017-01-09 08:18:42] [INFO] connecting to master database
[2017-01-09 08:18:43] [INFO] retrieving node list for cluster 'application_cluster'
WARNING: there is already a transaction in progress
WARNING: there is already a transaction in progress
[2017-01-09 08:18:43] [NOTICE] deleting existing master record with id 1
ERROR: update or delete on table "repl_nodes" violates foreign key constraint "repl_nodes_upstream_node_id_fkey" on table "repl_nodes"
DETAIL: Key (id)=(1) is still referenced from table "repl_nodes".
STATEMENT: DELETE FROM "repmgr_application_cluster".repl_nodes WHERE id = 1
[2017-01-09 08:18:43] [ERROR] Unable to delete node record: ERROR: update or delete on table "repl_nodes" violates foreign key constraint "repl_nodes_upstream_node_id_fkey" on table "repl_nodes"
DETAIL: Key (id)=(1) is still referenced from table "repl_nodes".
Do you have any idea about that? thx a lot~~
This flow should be taken care of
https://github.com/paunin/postgres-docker-cluster/blob/master/FLOWS.md#split-brain-isolated-master
Don't allow pgpool be alive with number of backends less then... REQUIRE_MIN_BACKENDS ? or some custom
I am trying to change the exposed port of pgpool in the host machine. You are using the 5440 port as a default for pgpool in the host machine. When I change it to use the 5432 for pgpool and 5440 for pgmaster and 5441 for pgslave the cluster won`t run as expected.
I am testing in a swarm cluster running the following command:
$ docker stack deploy --compose-file=docker-compose.yml postgres
I have made some changes in the docker-compose.yml. In short: I am using only three nodes, I have added the deploy session and removed the build section to support swarm, and some changes related to the ports as I said earlier.
Here is my docker-compose.yml:
version: '3'
networks:
cluster:
services:
postgres-master:
image: caiofct/postgresdockercluster_pgmaster
# build:
# context: .
# dockerfile: Postgres-latest.Dockerfile
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == postgres-master
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
environment:
PARTNER_NODES: "postgres-master,postgres-slave"
NODE_ID: 1 # Integer number of node
NODE_NAME: pgm # Node name
CLUSTER_NODE_NETWORK_NAME: postgres-master # (default: hostname of the node)
NODE_PRIORITY: 100 # (default: 100)
#database we want to use for application
POSTGRES_PASSWORD: monkey_pass
POSTGRES_USER: monkey_user
POSTGRES_DB: monkey_db
CONFIGS: "listen_addresses:'*'"
# in format variable1:value1[,variable2:value2[,...]]
# used for pgpool.conf file
#defaults:
CLUSTER_NAME: pg_cluster # default is pg_cluster
REPLICATION_DB: replication_db # default is replication_db
REPLICATION_USER: replication_user # default is replication_user
REPLICATION_PASSWORD: replication_pass # default is replication_pass
ports:
- 5440:5432
networks:
- cluster
postgres-slave:
image: caiofct/postgresdockercluster_pgslave1
# build:
# context: .
# dockerfile: Postgres-latest.Dockerfile
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == postgres-slave
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
environment:
PARTNER_NODES: "postgres-master,postgres-slave"
REPLICATION_PRIMARY_HOST: postgres-master
NODE_ID: 2
NODE_NAME: pgs1
CLUSTER_NODE_NETWORK_NAME: postgres-slave # (default: hostname of the node)
REPLICATION_PRIMARY_PORT: 5432
ports:
- 5441:5432
networks:
- cluster
pgpool:
image: caiofct/postgresdockercluster_pgpool
# build:
# context: .
# dockerfile: Pgpool.Dockerfile
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == pgpool
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: monkey_user
CHECK_PASSWORD: monkey_pass
CHECK_PGCONNECT_TIMEOUT: 3 #timout for checking if primary node is healthy
DB_USERS: monkey_user:monkey_pass # in format user:password[,user:password[...]]
BACKENDS: "0:postgres-master:5440:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:postgres-slave:5441:::" #,4:pgslaveDOES_NOT_EXIST::::
# in format num:host:port:weight:data_directory:flag[,...]
# defaults:
# port: 5432
# weight: 1
# data_directory: /var/lib/postgresql/data
# flag: ALLOW_TO_FAILOVER
REQUIRE_MIN_BACKENDS: 1 # minimal number of backends to start pgpool (some might be unreachable)
CONFIGS: "num_init_children:250,max_pool:4"
# in format variable1:value1[,variable2:value2[,...]]
# used for pgpool.conf file
ports:
- 5432:5432
- 9898:9898 # PCP
networks:
- cluster
When I deploy the cluster it wont work as expected, especially because pgpool can
t find the two nodes that I have. The master and slave area starting and working as expected but pgpool can`t find them.
I would like to know why did you use the 5440 port for pgpool and 5432 port for pgmaster.
Sorry if I wasn`t very clear.
Thanks in advance.
Cheers.
I'm having the following error when building your docker-compose.yml locally:
E: Version '2.2-1.pgdg80+1' for 'barman' was not found
I'm bypassing the issue using your generated dockerfile for barma but maybe this version is not working....
Hi,
I noticed that only master has a volume attached.
So after master die, how do you make sure you do not loose everything if the other nodes ?
Hi !
I think you don't have to set the port open for the host.
If the different nodes are in the same docker network, you can just expose like that :
#<<< Branch 1
pgslave1:
# image: paunin/postgresql-cluster-pgsql
build:
context: .
dockerfile: Pgsql.Dockerfile
depends_on:
- pgmaster
environment:
FORCE_CLEAN: 1
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: node2
CLUSTER_NODE_NETWORK_NAME: pgslave1 # (default: hostanme of the node)
REPLICATION_UPSTREAM_NODE_ID: 1
expose:
- 5432
networks:
cluster:
aliases:
- pgslave1
Or maybe just set the pool port on host if you want to connect it on WAN.
In is_major_master.sh
....
....
echo ">>> Will ask all nodes in the cluster"
NODES=PGPASSWORD=$REPLICATION_PASSWORD psql -h $CLUSTER_NODE_NETWORK_NAME -U $REPLICATION_USER $REPLICATION_DB -c "SELECT conninfo FROM repmgr_$CLUSTER_NAME.repl_show_nodes WHERE cluster='$CLUSTER_NAME'" | grep host | awk '{print $3}' | cut -d "=" -f2
Only the first value in NODES gets checked, by the "for NODE in "${NODES[@]}"; do" loop. It seems like this needs to be declared similar to MASTER_MAP? Something like
unset NODES
declare -A NODES
Depends on kubernetes/kubernetes#40651
As each node should always use the same volume.
How would I get persisted data working in my docker-compose file?
Hi!
As a newbie in postgresql replication, i'm trying to run the k8s example in minikube-kubernetes everything is working though, but once i pass a query to the database using the following credentials on my node-express app, it returns md5 authentication failed.
config: {
user: 'd2lkZQ==',
password: 'cGFzcw==',
database: 'mysystem.mysystem',
host: 'mysystem-pgpool-service',
port: 5432
}
Thank you very much in advance.
It seems that it gets stuck trying to connect to itself (am I wrong?):
Here are the logs from the pod:
>>> Data folder is not empty /var/lib/postgresql/data:
total 24
drwxr-xr-x 3 root root 4096 Dec 27 23:00 .
drwxr-xr-x 3 postgres postgres 4096 Dec 27 21:03 ..
drwx------ 2 root root 16384 Dec 27 23:00 lost+found
>>> Setting up repmgr...
>>> Adding loopback '127.0.0.1 application-db-node1-service'
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting postgres on this node to start repmgr...
>>> Wait db replica_db on application-db-node1-service:5432(user: replica_user,password: *******), will try 50 times with delay 5 seconds
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 50 times)
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
initdb: directory "/var/lib/postgresql/data" exists but is not empty
It contains a lost+found directory, perhaps due to it being a mount point.
Using a mount point directly as the data directory is not recommended.
Create a subdirectory under the mount point.
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 49 times)
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 48 times)
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 47 times)
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 46 times)
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 45 times)
psql: could not connect to server: Connection refused
Is the server running on host "application-db-node1-service" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
>>>>>> Db replica_db is still not accessable on application-db-node1-service:5432 (will try 44 times)
Any idea?
If I build the docker image I receive following error:
Get:1 http://apt.postgresql.org jessie-pgdg InRelease [56.4 kB]
Get:2 http://security.debian.org jessie/updates InRelease [63.1 kB]
Ign http://deb.debian.org jessie InRelease
Get:3 http://deb.debian.org jessie-updates InRelease [145 kB]
Get:4 http://apt.postgresql.org jessie-pgdg/main amd64 Packages [133 kB]
Get:5 http://deb.debian.org jessie Release.gpg [2373 B]
Get:6 http://security.debian.org jessie/updates/main amd64 Packages [547 kB]
Get:7 http://deb.debian.org jessie Release [148 kB]
Get:8 http://deb.debian.org jessie-updates/main amd64 Packages [23.1 kB]
Get:9 http://deb.debian.org jessie/main amd64 Packages [9063 kB]
Fetched 10.2 MB in 4s (2285 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Version '2.3-1.pgdg80+1' for 'barman' was not found
ERROR: Service 'pgbackup' failed to build: The command '/bin/sh -c wget -q https://www.postgresql.org/media/keys/ACCC4CF8.asc -O - | apt-key add - && sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ jessie-pgdg main" >> /etc/apt/sources.list.d/pgdg.list' && apt-get update && apt-get install -y libffi-dev libssl-dev barman=$BARMAN_VERSION openssh-server' returned a non-zero code: 100
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.