yolean / kubernetes-mysql-cluster Goto Github PK

View Code? Open in Web Editor NEW

146.0 15.0 39.0 148 KB

The simplest SQL cluster that could possibly work

License: Apache License 2.0

Shell 100.00%

kubernetes-mysql-cluster's Introduction

MySQL with automatic failover on Kubernetes

This is a galera cluster setup, with plain manifests. We actually use it in production, though with modest loads.

Get started

First create a storage class mysql-data. See exampels in ./configure/. You might also want to edit the volume size request, at the bottom of ./50mariadb.yml.

Then: kubectl apply -f ..

Cluster Health

Readiness and liveness probes will only assert client-level health of individual pods. Watch logs for "sst" or "Quorum results", or run this quick check:

for i in 0 1 2; do kubectl -n mysql exec mariadb-$i -- mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size';" -N; done

Port 9104 exposes plaintext metris in Prometheus scrape format.

# with kubectl -n mysql port-forward mariadb-0 9104:9104
$ curl -s http://localhost:9104/metrics | grep ^mysql_global_status_wsrep_cluster_size
mysql_global_status_wsrep_cluster_size 3

A reasonable alert is on mysql_global_status_wsrep_cluster_size staying below the desired number of replicas.

Cluster un-health

We need to assume a couple of things here. First and foremost: Production clusters are configured so that the statefulset pods do not go down together.

Pods are properly spread across nodes.
Nodes are spread across multiple availability zones.

Let's also assume that there is monitoring. Any wsrep_cluster_size issue (see above), or absence of wsrep_cluster_size should lead to a human being paged.

Rarity combined with manual attention means that this statefulset can/should avoid attempts at automatic recovery. The reason for that being: we can't test for failure modes properly, as they depend on the Kubernetes setup. Automation may appoint the wrong leader - losing writes - or cause split-brain situations.

We can however support detection in the init script.

It's normal operations to scale down to two instances

actually one instance, but nodes should be considered ephemeral so don't do that - and up to any number of replicas.

phpMyAdmin

Carefully consider the security implications before you create this. Note that it uses a non-official image.

kubectl apply -f myadmin/

PhpMyAdmin has a login page where you need a mysql user. To allow login (with full access) create a user with your choice of password:

kubectl -n mysql exec mariadb-0 -- mysql -e "CREATE USER 'phpmyadmin'@'%' IDENTIFIED BY 'my-admin-pw'; GRANT ALL ON *.* TO 'phpmyadmin'@'%' WITH GRANT OPTION;"

kubernetes-mysql-cluster's People

Contributors

Stargazers

Watchers

Forkers

young8 xalperte jljlpch liuyang430068 jefflaplante lapterchow michelpereira neso-io james75 jacobh2 jmgao1983 domecloud squawell ninjabanjo wanyinglong jakirpatel aland-zhang ivanyinusa ms-building-blocks richardor mircoba wangyd45 sdscgithub hn0pw wufenglinux l3za dustinmoorman wsg2006wsg jeroenrinzema nicholascp chrischiancone redflowflag epasham jacknotes chen5041 torinouq arun9theja jianw851 scybwdf

kubernetes-mysql-cluster's Issues

Spreading (a.k.a. anti affinity) is untested

It's an important assumption in #12. See also Yolean/kubernetes-kafka#70.

Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'

Events:
Type Reason Age From Message

Normal Scheduled 8m default-scheduler Successfully assigned mariadb-1 to node3
Normal SuccessfulMountVolume 8m kubelet, node3 MountVolume.SetUp succeeded for volume "conf"
Normal SuccessfulMountVolume 8m kubelet, node3 MountVolume.SetUp succeeded for volume "initdb"
Normal SuccessfulMountVolume 8m kubelet, node3 MountVolume.SetUp succeeded for volume "default-token-m5z8x"
Normal SuccessfulMountVolume 8m (x2 over 8m) kubelet, node3 MountVolume.SetUp succeeded for volume "pvc-6a2124d5-3ae5-11e8-8dec-001e9098365d"
Normal Pulling 8m kubelet, node3 pulling image "mariadb:10.2.12@sha256:862de06a9b35f001e87bbefbb49008e84a59c4afd089c9a320947a9ae0e7cf1a"
Normal Pulled 3m kubelet, node3 Successfully pulled image "mariadb:10.2.12@sha256:862de06a9b35f001e87bbefbb49008e84a59c4afd089c9a320947a9ae0e7cf1a"
Normal Created 3m kubelet, node3 Created container
Normal Started 3m kubelet, node3 Started container
Normal Pulling 3m kubelet, node3 pulling image "prom/mysqld-exporter@sha256:a1eda24a95f09a817f2cf39a7fa3d506df88e76ebdc08c0293744ebaa546e3ab"
Normal Pulled 3m kubelet, node3 Successfully pulled image "prom/mysqld-exporter@sha256:a1eda24a95f09a817f2cf39a7fa3d506df88e76ebdc08c0293744ebaa546e3ab"
Normal Started 3m kubelet, node3 Started container
Normal Created 3m kubelet, node3 Created container
Normal Pulled 2m (x2 over 3m) kubelet, node3 Container image "mariadb:10.2.12@sha256:862de06a9b35f001e87bbefbb49008e84a59c4afd089c9a320947a9ae0e7cf1a" already present on machine
Normal Started 2m (x2 over 3m) kubelet, node3 Started container
Normal Created 2m (x2 over 3m) kubelet, node3 Created container
Warning Unhealthy 1m (x6 over 3m) kubelet, node3 Readiness probe failed: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

system error: 104 /bin/sh: 1: [: =: argument expected

Events:
Type Reason Age From Message

Normal Scheduled 2m default-scheduler Successfully assigned mariadb-1 to node5
Normal SuccessfulMountVolume 2m kubelet, node5 MountVolume.SetUp succeeded for volume "initdb"
Normal SuccessfulMountVolume 2m kubelet, node5 MountVolume.SetUp succeeded for volume "conf"
Normal SuccessfulMountVolume 2m kubelet, node5 MountVolume.SetUp succeeded for volume "default-token-m5z8x"
Normal SuccessfulMountVolume 2m (x2 over 2m) kubelet, node5 MountVolume.SetUp succeeded for volume "pvc-47bb437f-3aec-11e8-8dec-001e9098365d"
Normal Pulled 2m kubelet, node5 Container image "mariadb:10.2.12@sha256:862de06a9b35f001e87bbefbb49008e84a59c4afd089c9a320947a9ae0e7cf1a" already present on machine
Normal Created 2m kubelet, node5 Created container
Normal Started 2m kubelet, node5 Started container
Normal Pulled 2m kubelet, node5 Container image "mariadb:10.2.12@sha256:862de06a9b35f001e87bbefbb49008e84a59c4afd089c9a320947a9ae0e7cf1a" already present on machine
Normal Created 2m kubelet, node5 Created container
Normal Started 2m kubelet, node5 Started container
Normal Pulled 2m kubelet, node5 Container image "prom/mysqld-exporter@sha256:a1eda24a95f09a817f2cf39a7fa3d506df88e76ebdc08c0293744ebaa546e3ab" already present on machine
Normal Created 2m kubelet, node5 Created container
Normal Started 2m kubelet, node5 Started container
Warning Unhealthy 1m kubelet, node5 Readiness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 104
/bin/sh: 1: [: =: argument expected

Bitnami stack in unrecoverable state after a node termination

We run ephemeral nodes with termination handlers and have survived thousands of node terminations with v2.1.0 of this repo. On two or three occasions recovery has required manual intervention because we were unlucky enough to lose two out of three pods concurrently. A simple method of recovery has been to scale down to zero and back up to X>=3 again.

With the bitnami stack #35 we ended up in an unrecoverable state after about 1 node termination.

One newly started pod would join the galera cluster but fail to do SST

[Warning] WSREP: Member 1.0 (ystack-mariadb-galera-0) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable

The other pods appearing Ready the failing pod restarted to this state:

2021-06-24 16:02:11 2 [Note] WSREP: Server status change joiner -> initializing
2021-06-24 16:02:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-06-24 16:02:11 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 10% 20% 41% 53% 65% 80% 92% 100% (0.0 seconds); tables to flush: 2 1 0
 (0.0 seconds); 
2021-06-24 16:02:11 0 [Note] mysqld: Aria engine: recovery done
2021-06-24 16:02:11 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
2021-06-24 16:02:11 0 [Warning] The parameter innodb_log_files_in_group is deprecated and has no effect.
2021-06-24 16:02:11 0 [Note] InnoDB: Uses event mutexes
2021-06-24 16:02:11 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-06-24 16:02:11 0 [Note] InnoDB: Number of pools: 1
2021-06-24 16:02:11 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-06-24 16:02:11 0 [Note] mysqld: O_TMPFILE is not supported on /opt/bitnami/mariadb/tmp (disabling future attempts)
2021-06-24 16:02:11 0 [Note] InnoDB: Using Linux native AIO
2021-06-24 16:02:11 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648, chunk size = 134217728
2021-06-24 16:02:11 0 [Note] InnoDB: Completed initialization of buffer pool
2021-06-24 16:02:11 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 134217728 bytes
2021-06-24 16:02:12 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2021-06-24 16:02:12 0 [Note] InnoDB: New log file created, LSN=151017
2021-06-24 16:02:12 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
2021-06-24 16:02:12 0 [Note] InnoDB: Trx id counter is 235104
2021-06-24 16:02:12 0 [Note] InnoDB: 128 rollback segments are active.
2021-06-24 16:02:12 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2021-06-24 16:02:12 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2021-06-24 16:02:12 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-06-24 16:02:12 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-06-24 16:02:12 0 [ERROR] InnoDB: preallocating 12582912 bytes for file ./ibtmp1 failed with error 28
2021-06-24 16:02:12 0 [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space
2021-06-24 16:02:12 0 [ERROR] InnoDB: Unable to create the shared innodb_temporary
2021-06-24 16:02:12 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
210624 16:02:12 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

Server version: 10.5.10-MariaDB-log
key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=0
max_threads=502
thread_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1137879 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2021-06-24 16:02:12 0 [Note] InnoDB: Rolled back recovered transaction 235103
2021-06-24 16:02:12 0 [Note] InnoDB: Rollback of non-prepared transactions completed
stack_bottom = 0x0 thread_stack 0x49000
/opt/bitnami/mariadb/sbin/mysqld(my_print_stacktrace+0x2e)[0x5646a8de15fe]
/opt/bitnami/mariadb/sbin/mysqld(handle_fatal_signal+0x485)[0x5646a889e735]

I suppose there is recovery tools for this state, but we're reverting back to maintaining our own stack.

No peers found, but data exists

My mariadb cluster exceeded the max connections and crashed. I did delete the pods to make them recreated but they couldn't start. Checking the init container init-config I found those lines:

$ kubectl -n mysql logs -f mariadb-0 -c init-config
This is pod 0 (mariadb-0.mariadb.mysql.svc.cluster.local ) for statefulset mariadb.mysql.svc.cluster.local
This is the 1st statefulset pod. Checking if the statefulset is down ...
+ HOST_ID=0
++ dnsdomainname -d
+ STATEFULSET_SERVICE=mariadb.mysql.svc.cluster.local
++ dnsdomainname -A
+ POD_FQDN='mariadb-0.mariadb.mysql.svc.cluster.local '
+ echo 'This is pod 0 (mariadb-0.mariadb.mysql.svc.cluster.local ) for statefulset mariadb.mysql.svc.cluster.local'
+ '[' -z /data/db ']'
+ SUGGEST_EXEC_COMMAND='kubectl --namespace=mysql exec -c init-config mariadb-0 --'
+ [[ mariadb.mysql.svc.cluster.local = mariadb.* ]]
+ '[' 0 -eq 0 ']'
+ echo 'This is the 1st statefulset pod. Checking if the statefulset is down ...'
+ getent hosts mariadb
+ '[' 2 -eq 2 ']'
+ '[' '!' -d /data/db/mysql ']'
+ set +x
----- ACTION REQUIRED -----
No peers found, but data exists. To start in wsrep_new_cluster mode, run:
  kubectl --namespace=mysql exec -c init-config mariadb-0 -- touch /tmp/confirm-new-cluster
Or to start in recovery mode, to see replication state, run:
  kubectl --namespace=mysql exec -c init-config mariadb-0 -- touch /tmp/confirm-recover
Or to try a regular start (for example after recovery + manual intervention), run:
  kubectl --namespace=mysql exec -c init-config mariadb-0 -- touch /tmp/confirm-resume
Waiting for response ...

So, I tried three of above options but no luck. The new pods are always CrashLoopBackOff

Any suggestion would be very appreciated.

Clients unable to authenticate after pods' initial state transfer

I did a rolling replace of pods now with empty - larger - volumes. After that clients received ER_HOST_NOT_PRIVILEGED and MariaDB logged lines like 2018-07-31 5:18:06 140261701723904 [Warning] IP address '10.0.7.26' could not be resolved: Name or service not known.

The rolling replace of persistent volumes meant all nodes started afresh from state transfer, logged as:

2018-07-31  4:42:43 139965199542016 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 68306)
2018-07-31  4:42:43 139965605660416 [Note] WSREP: State transfer required: 
	Group state: cb1f8096-65a4-11e8-95a2-1a202ed178a3:68306
	Local state: 00000000-0000-0000-0000-000000000000:-1
2018-07-31  4:42:43 139965605660416 [Note] WSREP: New cluster view: global state: cb1f8096-65a4-11e8-95a2-1a202ed178a3:68306, view# 24: Primary, number of nodes: 3, my index: 0, protocol version 3
2018-07-31  4:42:43 139965605660416 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-07-31  4:42:43 139965191149312 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.0.7.43' --datadir '/data/db/'   --parent '103'  '' '
MySQL init process in progress...
2018-07-31  4:42:43 139965605660416 [Note] WSREP: Prepared SST request: rsync|10.0.7.43:4444/rsync_sst
2018-07-31  4:42:43 139965605660416 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-07-31  4:42:43 139965605660416 [Note] WSREP: REPL Protocols: 8 (3, 2)
2018-07-31  4:42:43 139965605660416 [Note] WSREP: Assign initial position for certification: 68306, protocol version: 3
2018-07-31  4:42:43 139965456377600 [Note] WSREP: Service thread queue flushed.
2018-07-31  4:42:43 139965605660416 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (cb1f8096-65a4-11e8-95a2-1a202ed178a3): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2018-07-31  4:42:43 139965199542016 [Note] WSREP: Member 0.0 (mariadb-2) requested state transfer from '*any*'. Selected 1.0 (mariadb-1)(SYNCED) as donor.
2018-07-31  4:42:43 139965199542016 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 68306)
2018-07-31  4:42:43 139965605660416 [Note] WSREP: Requesting state transfer: success, donor: 1
2018-07-31  4:42:43 139965605660416 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> cb1f8096-65a4-11e8-95a2-1a202ed178a3:68306
MySQL init process in progress...
2018-07-31  4:42:45 139965422806784 [Note] WSREP: (28f31447, 'tcp://0.0.0.0:4567') turning message relay requesting off
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
MySQL init process in progress...
2018-07-31  4:43:08 139965199542016 [Note] WSREP: 1.0 (mariadb-1): State transfer to 0.0 (mariadb-2) complete.
2018-07-31  4:43:08 139965199542016 [Note] WSREP: Member 1.0 (mariadb-1) synced with group.
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 152 (20180731 04:43:09.247)
MySQL init process in progress...
WSREP_SST: [INFO] Joiner cleanup done. (20180731 04:43:09.755)
2018-07-31  4:43:09 139965607159744 [Note] WSREP: SST complete, seqno: 68306

It turns out that after the state transfer SELECT host, user FROM mysql.user; returns an empty user table.

The solution was to re-create the users, but oddly if I didn't do a DROP first I got ERROR 1396 (HY000): Operation CREATE USER failed for 'myuser'@'%':

MariaDB [(none)]> DROP USER 'myuser';
Query OK, 0 rows affected (0.03 sec)

MariaDB [(none)]> 
MariaDB [(none)]> CREATE USER 'myuser'@'%'  ...
Query OK, 0 rows affected (0.16 sec)

MariaDB [(none)]> 
MariaDB [(none)]> GRANT ALL PRIVILEGES ON mydb.* TO 'myuser'@'%';
Query OK, 0 rows affected (0.01 sec)

All the actual data from other databases appear to be intact.

The state transfer above was from the first rotated pod. Next one logged

2018-07-31  4:57:04 140341984134912 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 68310)
2018-07-31  4:57:04 140342383351552 [Note] WSREP: State transfer required: 
	Group state: cb1f8096-65a4-11e8-95a2-1a202ed178a3:68310
	Local state: 00000000-0000-0000-0000-000000000000:-1
2018-07-31  4:57:04 140342383351552 [Note] WSREP: New cluster view: global state: cb1f8096-65a4-11e8-95a2-1a202ed178a3:68310, view# 30: Primary, number of nodes: 3, my index: 0, protocol version 3
2018-07-31  4:57:04 140342383351552 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-07-31  4:57:04 140341975742208 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.0.13.44' --datadir '/data/db/'   --parent '98'  '' '
MySQL init process in progress...
MySQL init process in progress...
2018-07-31  4:57:06 140342383351552 [Note] WSREP: Prepared SST request: rsync|10.0.13.44:4444/rsync_sst
2018-07-31  4:57:06 140342383351552 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-07-31  4:57:06 140342383351552 [Note] WSREP: REPL Protocols: 8 (3, 2)
2018-07-31  4:57:06 140342383351552 [Note] WSREP: Assign initial position for certification: 68310, protocol version: 3
2018-07-31  4:57:06 140342099846912 [Note] WSREP: Service thread queue flushed.
2018-07-31  4:57:06 140342383351552 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (cb1f8096-65a4-11e8-95a2-1a202ed178a3): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2018-07-31  4:57:06 140341984134912 [Note] WSREP: Member 0.0 (mariadb-1) requested state transfer from '*any*'. Selected 1.0 (mariadb-2)(SYNCED) as donor.
2018-07-31  4:57:06 140341984134912 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 68310)
2018-07-31  4:57:06 140342383351552 [Note] WSREP: Requesting state transfer: success, donor: 1
2018-07-31  4:57:06 140342383351552 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> cb1f8096-65a4-11e8-95a2-1a202ed178a3:68310
MySQL init process in progress...
2018-07-31  4:57:07 140341999032064 [Note] WSREP: (2ae99905, 'tcp://0.0.0.0:4567') turning message relay requesting off
MySQL init process in progress...
2018-07-31  4:57:08 140341984134912 [Note] WSREP: 1.0 (mariadb-2): State transfer to 0.0 (mariadb-1) complete.
2018-07-31  4:57:08 140341984134912 [Note] WSREP: Member 1.0 (mariadb-2) synced with group.
MySQL init process in progress...
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 147 (20180731 04:57:09.567)
WSREP_SST: [INFO] Joiner cleanup done. (20180731 04:57:10.111)
MySQL init process in progress...
2018-07-31  4:57:10 140342384850880 [Note] WSREP: SST complete, seqno: 68310
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Uses event mutexes
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Compressed tables use zlib 1.2.11
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Using Linux native AIO
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Number of pools: 1
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Using SSE2 crc32 instructions
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Completed initialization of buffer pool
2018-07-31  4:57:10 140341035341568 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Highest supported file format is Barracuda.
2018-07-31  4:57:10 140342384850880 [Note] InnoDB: Starting crash recovery from checkpoint LSN=116143358
MySQL init process in progress...
2018-07-31  4:57:12 140342384850880 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-07-31  4:57:12 140342384850880 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-07-31  4:57:12 140342384850880 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-07-31  4:57:12 140342384850880 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-07-31  4:57:13 140342384850880 [Note] InnoDB: Waiting for purge to start
2018-07-31  4:57:13 140342384850880 [Note] InnoDB: 5.7.22 started; log sequence number 116143367
2018-07-31  4:57:13 140340799465216 [Note] InnoDB: Loading buffer pool(s) from /data/db/ib_buffer_pool
MySQL init process in progress...
2018-07-31  4:57:13 140340799465216 [Note] InnoDB: Buffer pool(s) load completed at 180731  4:57:13
2018-07-31  4:57:14 140342384850880 [Note] Plugin 'FEEDBACK' is disabled.
MySQL init process in progress...
2018-07-31  4:57:14 140342384850880 [Note] WSREP: Signalling provider to continue.
2018-07-31  4:57:14 140342384850880 [Note] WSREP: SST received: cb1f8096-65a4-11e8-95a2-1a202ed178a3:68310
2018-07-31  4:57:14 140341984134912 [Note] WSREP: 0.0 (mariadb-1): State transfer from 1.0 (mariadb-2) complete.
2018-07-31  4:57:14 140341984134912 [Note] WSREP: Shifting JOINER -> JOINED (TO: 68310)
2018-07-31  4:57:14 140342384850880 [Note] Reading of all Master_info entries succeded
2018-07-31  4:57:14 140342384850880 [Note] Added new Master_info '' to hash table
2018-07-31  4:57:14 140342384850880 [Note] mysqld: ready for connections.
Version: '10.2.16-MariaDB-1:10.2.16+maria~bionic'  socket: '/var/run/mysqld/mysqld.sock'  port: 0  mariadb.org binary distribution
2018-07-31  4:57:14 140341984134912 [Note] WSREP: Member 0.0 (mariadb-1) synced with group.
2018-07-31  4:57:14 140341984134912 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 68310)
2018-07-31  4:57:14 140342383351552 [Note] WSREP: Synchronized with group, ready for connections
2018-07-31  4:57:14 140342383351552 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2018-07-31  4:57:15 140341552326400 [Note] mysqld (initiated by: unknown): Normal shutdown
2018-07-31  4:57:15 140341552326400 [Note] WSREP: Stop replication
2018-07-31  4:57:15 140341552326400 [Note] WSREP: Closing send monitor...
2018-07-31  4:57:15 140341552326400 [Note] WSREP: Closed send monitor.
2018-07-31  4:57:15 140341552326400 [Note] WSREP: gcomm: terminating thread
2018-07-31  4:57:15 140341552326400 [Note] WSREP: gcomm: joining thread
2018-07-31  4:57:15 140341552326400 [Note] WSREP: gcomm: closing backend

SST to instances with empty volume is broken

Remaining from #7 because we're waiting for Mariadb 10.2.11, #7 (comment).

Second pod start hangs on Running wsrep_sst_rsync

With a fresh microk8s 1.14.4 cluster kubectl apply -k . gets mariadb-0 to Ready state, but mariadb-1 blocks during start.

2019-07-27 14:25:40 139726685889856 [Note] mysqld (mysqld 10.2.25-MariaDB-1:10.2.25+maria~bionic) starting as process 1 ...
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-07-27 14:25:40 139726685889856 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2019-07-27 14:25:40 139726685889856 [Note] WSREP: wsrep_load(): Galera 25.3.26(r3857) by Codership Oy <[email protected]> loaded successfully.
2019-07-27 14:25:40 139726685889856 [Note] WSREP: CRC-32C: using hardware acceleration.
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Passing config to GCS: base_dir = /data/db/; base_host = 10.1.1.49; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/db/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/db//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false;
2019-07-27 14:25:40 139726685889856 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: wsrep_sst_grab()
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Start replication
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: protonet asio version 0
2019-07-27 14:25:40 139726685889856 [Note] WSREP: Using CRC-32C for message checksums.
2019-07-27 14:25:40 139726685889856 [Note] WSREP: backend: asio
2019-07-27 14:25:40 139726685889856 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2019-07-27 14:25:40 139726685889856 [Note] WSREP: restore pc from disk successfully
2019-07-27 14:25:40 139726685889856 [Note] WSREP: GMCast version 0
2019-07-27 14:25:40 139726685889856 [Warning] WSREP: Failed to resolve tcp://mariadb-1.mariadb:4567
2019-07-27 14:25:40 139726685889856 [Warning] WSREP: Failed to resolve tcp://mariadb-2.mariadb:4567
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-07-27 14:25:40 139726685889856 [Note] WSREP: EVS version 0
2019-07-27 14:25:40 139726685889856 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'mariadb-0.mariadb:,mariadb-1.mariadb:,mariadb-2.mariadb:'
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') connection established to 4188e6e8 tcp://10.1.1.48:4567
2019-07-27 14:25:40 139726685889856 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2019-07-27 14:25:41 139726685889856 [Note] WSREP: gcomm: connected
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2019-07-27 14:25:41 139726685889856 [Note] WSREP: Waiting for SST to complete.
2019-07-27 14:25:43 139726519949056 [Warning] WSREP: no nodes coming from prim view, prim not possible
2019-07-27 14:25:43 139726519949056 [Note] WSREP: view(view_id(NON_PRIM,5559eb88,4) memb {
	5559eb88,0
} joined {
} left {
} partitioned {
})
2019-07-27 14:25:43 139726505051904 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2019-07-27 14:25:43 139726505051904 [Note] WSREP: Flow-control interval: [16, 16]
2019-07-27 14:25:43 139726505051904 [Note] WSREP: Trying to continue unpaused monitor
2019-07-27 14:25:43 139726505051904 [Note] WSREP: Received NON-PRIMARY.
2019-07-27 14:25:43 139726684378880 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
2019-07-27 14:25:43 139726684378880 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-07-27 14:25:43 139726519949056 [Note] WSREP: (5559eb88, 'tcp://0.0.0.0:4567') turning message relay requesting off
2019-07-27 14:25:44 139726519949056 [Note] WSREP: declaring 4188e6e8 at tcp://10.1.1.48:4567 stable
2019-07-27 14:25:44 139726519949056 [Note] WSREP: re-bootstrapping prim from partitioned components
2019-07-27 14:25:44 139726519949056 [Note] WSREP: view(view_id(PRIM,4188e6e8,5) memb {
	4188e6e8,0
	5559eb88,0
} joined {
} left {
} partitioned {
})
2019-07-27 14:25:44 139726519949056 [Note] WSREP: save pc into disk
2019-07-27 14:25:44 139726505051904 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2019-07-27 14:25:44 139726519949056 [Note] WSREP: clear restored view
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: sent state msg: 6b41ad2f-b07a-11e9-b313-aefa479322f6
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: got state msg: 6b41ad2f-b07a-11e9-b313-aefa479322f6 from 0 (mariadb-0)
2019-07-27 14:25:44 139726505051904 [Note] WSREP: STATE EXCHANGE: got state msg: 6b41ad2f-b07a-11e9-b313-aefa479322f6 from 1 (mariadb-1)
2019-07-27 14:25:44 139726505051904 [Warning] WSREP: Quorum: No node with complete state:

	Version      : 4
	Flags        : 0x3
	Protocols    : 0 / 9 / 3
	State        : NON-PRIMARY
	Desync count : 0
	Prim state   : SYNCED
	Prim UUID    : 55b40e71-b07a-11e9-8369-fb3b92b43c93
	Prim  seqno  : 2
	First seqno  : -1
	Last  seqno  : 4
	Prim JOINED  : 1
	State UUID   : 6b41ad2f-b07a-11e9-b313-aefa479322f6
	Group UUID   : 3f1e1add-b07a-11e9-b142-86a198db1bc6
	Name         : 'mariadb-0'
	Incoming addr: '10.1.1.48:3306'

	Version      : 4
	Flags        : 00
	Protocols    : 0 / 9 / 3
	State        : NON-PRIMARY
	Desync count : 0
	Prim state   : NON-PRIMARY
	Prim UUID    : 00000000-0000-0000-0000-000000000000
	Prim  seqno  : -1
	First seqno  : -1
	Last  seqno  : -1
	Prim JOINED  : 0
	State UUID   : 6b41ad2f-b07a-11e9-b313-aefa479322f6
	Group UUID   : 00000000-0000-0000-0000-000000000000
	Name         : 'mariadb-1'
	Incoming addr: '10.1.1.49:3306'

2019-07-27 14:25:44 139726505051904 [Note] WSREP: Full re-merge of primary 55b40e71-b07a-11e9-8369-fb3b92b43c93 found: 1 of 1.
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Quorum results:
	version    = 4,
	component  = PRIMARY,
	conf_id    = 2,
	members    = 1/2 (joined/total),
	act_id     = 4,
	last_appl. = -1,
	protocols  = 0/9/3 (gcs/repl/appl),
	group UUID = 3f1e1add-b07a-11e9-b142-86a198db1bc6
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Flow-control interval: [23, 23]
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Trying to continue unpaused monitor
2019-07-27 14:25:44 139726505051904 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 4)
2019-07-27 14:25:44 139726684378880 [Note] WSREP: State transfer required: 
	Group state: 3f1e1add-b07a-11e9-b142-86a198db1bc6:4
	Local state: 00000000-0000-0000-0000-000000000000:-1
2019-07-27 14:25:44 139726684378880 [Note] WSREP: New cluster view: global state: 3f1e1add-b07a-11e9-b142-86a198db1bc6:4, view# 3: Primary, number of nodes: 2, my index: 1, protocol version 3
2019-07-27 14:25:44 139726684378880 [Warning] WSREP: Gap in state sequence. Need state transfer.
2019-07-27 14:25:44 139726291986176 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.1.1.49' --datadir '/data/db/'   --parent '1'  ''  '''

Nothing happens after that last line. The state is minimal because all volumes were empty. The --address is the pod IP of mariadb-1.

Backup strategy

I didn't found anything about backup and restore in the deployment files.
How do you provide backup and restore in this mariadb cluster?

README.md refers to examples in ../configure/ which do not exist in this repository

The README leaves the user at a dead-end right from the start, referring to 'mysql-data' in ../configure/ which doesn't exist anywhere in this repository. Is there a pre-requisite git clone that this tree belongs inside of? Should those dependencies be mentioned here first, so this repository can be used?

could not change the root's password?

I had deloyed the cluster, and i Discovered the root's password was null ,so i changed it ,but the cluster crashed,.why was happend

UTF8 defaults is broken

I suspect that c19f59c broke things so that default charset for a database is latin1 again. PhpMyAdmin databases screen says latin1 for new databases. Previously it said utf8mb4_unicode_ci.

Cluster (i.e. first pod) start must be managed manually

See https://github.com/Yolean/kubernetes-mysql-cluster#initialize-volumes-and-cluster for background.

The manual toggle of --wsrep-new-cluster is particularly bad if you suffer unplanned downtime. The cluster will not initialize, or worse you'll get replicas but they are disconnected, without manual intervention.

Someone told me there is a mechanism for running a special type of init container for the first pod in a replica set. Can't find it now.

@Jacobh2 we should look into this if you plan on running multiple clusters/instances. You might start a single instance per site, but with the possibility to scale up individual sites as load increase.

Manual recovery after "gcs connection failed" with one statefulset pod gone

On preemtible nodes we had one instance of manual recovery, after #30. There was no mariadb-1 pod, and -0 and -2 stayed crashlooping. The were past init but the mariadb containers exited after:

2020-05-27  6:10:26 140050350966464 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50176S), skipping check
2020-05-27  6:10:55 140050350966464 [Note] WSREP: view((empty))
2020-05-27  6:10:55 140050350966464 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
	 at gcomm/src/pc.cpp:connect():158
2020-05-27  6:10:55 140050350966464 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2020-05-27  6:10:55 140050350966464 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb': -110 (Connection timed out)
2020-05-27  6:10:55 140050350966464 [ERROR] WSREP: gcs connect failed: Connection timed out
2020-05-27  6:10:55 140050350966464 [ERROR] WSREP: wsrep::connect(gcomm://mariadb-0.mariadb,mariadb-1.mariadb,mariadb-2.mariadb) failed: 7
2020-05-27  6:10:55 140050350966464 [ERROR] Aborting

This could be a case for switching from OrderedReady to Parrallel.

The solution was to scale down to zero and then back up to three again. Oddly the pods wouldn't go away at scale to zero, so I had to manually delete mariadb-2. Is that the expected behavior for OrderedReady?

Writes lost during replica churn

Running #37 the following scenario may result in lost writes:

All 3 replicas down
replica 0 goes up, starts state transfer to 1 and 2. state transfer succeeds
replica 0 and 1 goes down again
clients write to replica 2
Split brain, manual scale down required
replica 0 goes up as "new cluster", begins state transfer to 1 and again
writes written to replica 1 or 2 are lost

pod ends up in Completed state when it fails to mount its volume

See Yolean/kubernetes-kafka#15

Connect to cluster from outside

Hello,

Thank you for providing this repo.
I set up a running cluster, but I can't connect to it form outside. So my question is the following.

What is the best way, to connect to the running cluster?
I've tried it with the following config, but had no success.

apiVersion: v1
kind: Service
metadata:
  name: mariadbport
  namespace: mysql
spec:
  ports:
  - nodePort: 30306
    port: 3306
  selector:
    app: mariadb
  type: NodePort

I've also installed the phpmyadmin config, but there is no IP provided which I can use in my browser to look inside phpmyadmin.
How do you look inside a running database cluster, to maybe check something inside the database?

Thank's for your help.

Failed to open backend connection: -131 (State not recoverable)

2018-04-08 9:21:53 140614723073920 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'alpaca-mariadb-0.alpaca-mariadb:,alpaca-mariadb-1.alpaca-mariadb:'
2018-04-08 9:21:53 140614723073920 [ERROR] WSREP: failed to open gcomm backend connection: 131: No address to connect (FATAL)
at gcomm/src/gmcast.cpp:connect_precheck():282
2018-04-08 9:21:53 140614723073920 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -131 (State not recoverable)
2018-04-08 9:21:53 140614723073920 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://alpaca-mariadb-0.alpaca-mariadb,alpaca-mariadb-1.alpaca-mariadb': -131 (State not recoverable)
2018-04-08 9:21:53 140614723073920 [ERROR] WSREP: gcs connect failed: State not recoverable
2018-04-08 9:21:53 140614723073920 [ERROR] WSREP: wsrep::connect(gcomm://alpaca-mariadb-0.alpaca-mariadb,alpaca-mariadb-1.alpaca-mariadb) failed: 7

I get into trouble that the cluster split-brain

The cluster is running successfully,and the wsrep_cluster_size=3。
I delete the pod mariadb-1 or mariadb-2, after the pods restart,the cluster is right.
But I delete the pod mariadb-0 , the cluster is split-brain,

What's the happened？

Is it related to “safe_to_bootstrap”？