Coder Social home page Coder Social logo

yoshinorim / mha4mysql-manager Goto Github PK

View Code? Open in Web Editor NEW
1.4K 119.0 496.0 377 KB

Development tree of Master High Availability Manager and tools for MySQL (MHA), Manager part

Home Page: http://code.google.com/p/mysql-master-ha/

License: GNU General Public License v2.0

Perl 81.07% Shell 18.93%

mha4mysql-manager's Introduction

mha4mysql-manager's People

Contributors

altmannmarcelo avatar bmildren avatar grypyrg avatar hirose31 avatar ijin avatar jburnham avatar jonahberquist avatar kaiwangchen avatar kane3 avatar kobehaha avatar renatobo avatar shenlongxing avatar sjmudd avatar takus avatar yoshinorim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mha4mysql-manager's Issues

behavior with secondary_check_script

According to the wiki, "If A was unsuccessful, masterha_secondary_check exits with return code 2 and MHA Manager guesses that network problem has happened and it does not start failover."

I thought masterha would give up failing over in this case.

When I tested the secondary_check_script with an unreachable address on purpose with
masterha_secondary_check -s 10.119.45.30 -s 10.120.45.30 (10.119.45.30 is unreachable),
then masterha seems to retry the failover on and on and on...

this repeats:

ssh: connect to host 10.119.45.30 port 22: No route to host
Monitoring server 10.119.45.30 is NOT reachable!

Sat Jul 14 06:04:52 2012 - [warning] At least one of monitoring servers is not reachable from this script. This is likely network problem. Failover should not happen.

Is this behavior expected?

MHA should support RESET SLAVE ALL in MySQL 5.5.16+

Originally reported from Joffrey Michaie -- Thanks!

Starting with MySQL 5.5, reset slave does not loose the replication user and replication password, it just removes the replication position.
As you can imagine this is really dangerous, if the end-user does a "start slave" on the new master, this can be a disaster !

The behavior was changed in MySQL 5.1 :
http://dev.mysql.com/doc/refman/5.5/en/reset-slave.html

Starting from MySQL 5.5.16, RESET SLAVE ALL was introduced. MHA should be aware of this command.

Incorrect "User xxx does not exist or does not have REPLICATION SLAVE privilege!" error

I have created a user with the replication privilege both in the master and the slave of MySQL instances (which are running in different hosts) as below.

mysql> SELECT User, Host, Repl_slave_priv FROM mysql.user WHERE User = 'repl_user';
+-----------+---------------+-----------------+
| User      | Host          | Repl_slave_priv |
+-----------+---------------+-----------------+
| repl_user | localhost     | N               |
| repl_user | %             | Y               |
+-----------+---------------+-----------------+

However, MHA fails to start with the following error.

User repl_user does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.

If I grant Repl_slave_priv to "repl_user"@"localhost", MHA starts successfully.
Therefore, it is likely that MHA only checks the privileges of "repl_user"@"localhost" but not "repl_user"@"%".

More strict SSH checking

Originally MHA checks master's reachability by just connecting via SSH and exiting with return code 0. This in some cases does not work especially if SSH works but data files are not accessible. In this fix, MHA checks master's ssh reachability by executing save_binary_logs command (dry run). MHA Client also needs to be updated to 0.53.

Fix: 4607f29

Got "MySQL server has gone away" error on checking slave status

Hi, my mha failover failed due to "MySQL server has gone away".

my config:

[server default]
# mysql user and password
user=root
password=root
ssh_user=root
# working directory on the manager
manager_workdir=/home/worker/dbtest/mha4mysql-manager/test
# working directory on MySQL servers
remote_workdir=/home/worker/dbtest/mha4mysql-node/test

master_binlog_dir=/home/worker/dbtest/percona/log
manager_log=/home/worker/dbtest/mha4mysql-manager/logs/manager.log

[server1]
hostname=10.32.64.13
port=3302

[server2]
hostname=10.32.64.20
port=3302

the output of manager log:

Fri Feb 10 20:27:11 2017 - [info] MHA::MasterMonitor version 0.56.
Fri Feb 10 20:27:12 2017 - [info] GTID failover mode = 0
Fri Feb 10 20:27:12 2017 - [info] Dead Servers:
Fri Feb 10 20:27:12 2017 - [info]   10.32.64.13(10.32.64.13:3302)
Fri Feb 10 20:27:12 2017 - [info] Alive Servers:
Fri Feb 10 20:27:12 2017 - [info]   10.32.64.20(10.32.64.20:3302)
Fri Feb 10 20:27:12 2017 - [info] Alive Slaves:
Fri Feb 10 20:27:12 2017 - [info]   10.32.64.20(10.32.64.20:3302)  Version=5.7.17-11-log (oldest major version between slaves) log-bin:enabled
Fri Feb 10 20:27:12 2017 - [info]     Replicating from 10.32.64.13(10.32.64.13:3302)
Fri Feb 10 20:27:12 2017 - [warning] MySQL master is not currently alive!
Fri Feb 10 20:27:12 2017 - [info] Checking slave configurations..
Fri Feb 10 20:27:12 2017 - [info]  read_only=1 is not set on slave 10.32.64.20(10.32.64.20:3302).
Fri Feb 10 20:27:12 2017 - [info] Checking replication filtering settings..
Fri Feb 10 20:27:12 2017 - [info]  Replication filtering check ok.
Fri Feb 10 20:27:12 2017 - [info] GTID (with auto-pos) is not supported
Fri Feb 10 20:27:12 2017 - [info] Starting SSH connection tests..
Fri Feb 10 20:27:12 2017 - [info] All SSH connection tests passed successfully.
Fri Feb 10 20:27:12 2017 - [info] Checking MHA Node version..
Fri Feb 10 20:27:13 2017 - [info]  Version check ok.
Fri Feb 10 20:27:13 2017 - [info] Getting current master (maybe dead) info ..
Fri Feb 10 20:27:13 2017 - [info] Identified master is 10.32.64.13(10.32.64.13:3302).
Fri Feb 10 20:27:13 2017 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 10 20:27:13 2017 - [info] HealthCheck: SSH to 10.32.64.13 is reachable.
Fri Feb 10 20:27:13 2017 - [info] Master MHA Node version is 0.56.
Fri Feb 10 20:27:13 2017 - [info] Checking recovery script configurations on 10.32.64.13(10.32.64.13:3302)..
Fri Feb 10 20:27:13 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/worker/dbtest/percona/log --output_file=/home/worker/dbtest/mha4mysql-node/test/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000012
Fri Feb 10 20:27:13 2017 - [info]   Connecting to [email protected](10.32.64.13:22)..
###############################################################################
##                                !!! ALERT !!!                                #
##                    You are entering into a secured area!                    #
##                                                                             #
##             Your IP, login time and username has been noted and             #
##                  has been sent to the server administrator!                 #
##                                                                             #
##            This service is restricted to authorized users only.             #
##                  All activities on this system are logged.                  #
##                                                                             #
##             Unauthorized access will be fully investigated and              #
##            reported to the appropriate law enforcement agencies.            #
################################################################################

  Creating /home/worker/dbtest/mha4mysql-node/test if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /home/worker/dbtest/percona/log, up to mysql-bin.000014
Fri Feb 10 20:27:13 2017 - [info] Binlog setting check done.
Fri Feb 10 20:27:13 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Feb 10 20:27:13 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=10.32.64.20 --slave_ip=10.32.64.20 --slave_port=3302 --workdir=/home/worker/dbtest/mha4mysql-node/test --target_version=5.7.17-11-log --manager_version=0.56 --relay_log_info=/home/worker/dbtest/percona/data/relay-log.info  --relay_dir=/home/worker/dbtest/percona/data/  --slave_pass=xxx
Fri Feb 10 20:27:13 2017 - [info]   Connecting to [email protected](10.32.64.20:22)..
###############################################################################
##                                !!! ALERT !!!                                #
##                    You are entering into a secured area!                    #
##                                                                             #
##             Your IP, login time and username has been noted and             #
##                  has been sent to the server administrator!                 #
##                                                                             #
##            This service is restricted to authorized users only.             #
##                  All activities on this system are logged.                  #
##                                                                             #
##             Unauthorized access will be fully investigated and              #
##            reported to the appropriate law enforcement agencies.            #
################################################################################

  Checking slave recovery environment settings..
    Opening /home/worker/dbtest/percona/data/relay-log.info ... ok.
    Relay log found at /home/worker/dbtest/percona/log, up to mysql-relay-bin.000016
    Temporary relay log file is /home/worker/dbtest/percona/log/mysql-relay-bin.000016
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Feb 10 20:27:14 2017 - [info] Slaves settings check done.
Fri Feb 10 20:27:14 2017 - [info]
10.32.64.13(10.32.64.13:3302) (current master)
 +--10.32.64.20(10.32.64.20:3302)

Fri Feb 10 20:27:14 2017 - [warning] master_ip_failover_script is not defined.
Fri Feb 10 20:27:14 2017 - [warning] shutdown_script is not defined.
Fri Feb 10 20:27:14 2017 - [error][/root/perl5/lib/perl5/MHA/Server.pm, ln457] Checking slave status failed on 10.32.64.20(10.32.64.20:3302). err=Got error when executing SHOW SLAVE STATUS. MySQL server has gone away
Fri Feb 10 20:27:14 2017 - [info] Set master ping interval 3 seconds.
Fri Feb 10 20:27:14 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Feb 10 20:27:14 2017 - [info] Starting ping health check on 10.32.64.13(10.32.64.13:3302)..
Fri Feb 10 20:27:14 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.32.64.13' (111))
Fri Feb 10 20:27:14 2017 - [warning] Connection failed 1 time(s)..
Fri Feb 10 20:27:14 2017 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/home/worker/dbtest/percona/log --output_file=/home/worker/dbtest/mha4mysql-node/test/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin
Fri Feb 10 20:27:14 2017 - [info] HealthCheck: SSH to 10.32.64.13 is reachable.
Fri Feb 10 20:27:17 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.32.64.13' (111))
Fri Feb 10 20:27:17 2017 - [warning] Connection failed 2 time(s)..
Fri Feb 10 20:27:20 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.32.64.13' (111))
Fri Feb 10 20:27:20 2017 - [warning] Connection failed 3 time(s)..
Fri Feb 10 20:27:23 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.32.64.13' (111))
Fri Feb 10 20:27:23 2017 - [warning] Connection failed 4 time(s)..
Fri Feb 10 20:27:23 2017 - [warning] Master is not reachable from health checker!
Fri Feb 10 20:27:23 2017 - [warning] Master 10.32.64.13(10.32.64.13:3302) is not reachable!
Fri Feb 10 20:27:23 2017 - [warning] SSH is reachable.
Fri Feb 10 20:27:23 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /home/worker/dbtest/mha4mysql-manager/manager.cnf again, and trying to connect to all servers to check server status..
Fri Feb 10 20:27:23 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 10 20:27:23 2017 - [info] Reading application default configuration from /home/worker/dbtest/mha4mysql-manager/manager.cnf..
Fri Feb 10 20:27:23 2017 - [info] Reading server configuration from /home/worker/dbtest/mha4mysql-manager/manager.cnf..
Fri Feb 10 20:27:24 2017 - [info] GTID failover mode = 0
Fri Feb 10 20:27:24 2017 - [info] Dead Servers:
Fri Feb 10 20:27:24 2017 - [info]   10.32.64.13(10.32.64.13:3302)
Fri Feb 10 20:27:24 2017 - [info] Alive Servers:
Fri Feb 10 20:27:24 2017 - [info]   10.32.64.20(10.32.64.20:3302)
Fri Feb 10 20:27:24 2017 - [info] Alive Slaves:
Fri Feb 10 20:27:24 2017 - [info]   10.32.64.20(10.32.64.20:3302)  Version=5.7.17-11-log (oldest major version between slaves) log-bin:enabled
Fri Feb 10 20:27:24 2017 - [info]     Replicating from 10.32.64.13(10.32.64.13:3302)
Fri Feb 10 20:27:24 2017 - [info] Checking slave configurations..
Fri Feb 10 20:27:24 2017 - [info]  read_only=1 is not set on slave 10.32.64.20(10.32.64.20:3302).
Fri Feb 10 20:27:24 2017 - [info] Checking replication filtering settings..
Fri Feb 10 20:27:24 2017 - [info]  Replication filtering check ok.
Fri Feb 10 20:27:24 2017 - [info] Master is down!
Fri Feb 10 20:27:24 2017 - [info] Terminating monitoring script.
Fri Feb 10 20:27:24 2017 - [info] Got exit code 20 (Master dead).
Fri Feb 10 20:27:24 2017 - [info] MHA::MasterFailover version 0.56.
Fri Feb 10 20:27:24 2017 - [info] Starting master failover.
Fri Feb 10 20:27:24 2017 - [info]
Fri Feb 10 20:27:24 2017 - [info] * Phase 1: Configuration Check Phase..
Fri Feb 10 20:27:24 2017 - [info]
Fri Feb 10 20:27:25 2017 - [info] GTID failover mode = 0
Fri Feb 10 20:27:25 2017 - [info] Dead Servers:
Fri Feb 10 20:27:25 2017 - [info]   10.32.64.13(10.32.64.13:3302)
Fri Feb 10 20:27:25 2017 - [info] Checking master reachability via MySQL(double check)...
Fri Feb 10 20:27:25 2017 - [info]  ok.
Fri Feb 10 20:27:25 2017 - [info] Alive Servers:
Fri Feb 10 20:27:25 2017 - [info]   10.32.64.20(10.32.64.20:3302)
Fri Feb 10 20:27:25 2017 - [info] Alive Slaves:
Fri Feb 10 20:27:25 2017 - [info]   10.32.64.20(10.32.64.20:3302)  Version=5.7.17-11-log (oldest major version between slaves) log-bin:enabled
Fri Feb 10 20:27:25 2017 - [info]     Replicating from 10.32.64.13(10.32.64.13:3302)
Fri Feb 10 20:27:25 2017 - [info] Starting Non-GTID based failover.
Fri Feb 10 20:27:25 2017 - [info]
Fri Feb 10 20:27:25 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Feb 10 20:27:25 2017 - [info]
Fri Feb 10 20:27:25 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Feb 10 20:27:25 2017 - [info]
Fri Feb 10 20:27:25 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Feb 10 20:27:25 2017 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Fri Feb 10 20:27:25 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Feb 10 20:27:26 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Feb 10 20:27:26 2017 - [info]
Fri Feb 10 20:27:26 2017 - [info] * Phase 3: Master Recovery Phase..
Fri Feb 10 20:27:26 2017 - [info]
Fri Feb 10 20:27:26 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Feb 10 20:27:26 2017 - [info]
Fri Feb 10 20:27:26 2017 - [error][/root/perl5/lib/perl5/MHA/ServerManager.pm, ln937] Checking slave status failed. err=Got error when executing SHOW SLAVE STATUS. Lost connection to MySQL server during query
Fri Feb 10 20:27:26 2017 - [error][/root/perl5/lib/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /root/perl5/lib/perl5/MHA/MasterFailover.pm line 515.
Fri Feb 10 20:27:26 2017 - [info]

----- Failover Report -----

manager: MySQL Master failover 10.32.64.13(10.32.64.13:3302)

Master 10.32.64.13(10.32.64.13:3302) is down!

Check MHA Manager logs at app03.hp.sp.tst.bmsre.com:/home/worker/dbtest/mha4mysql-manager/logs/manager.log for details.

Started automated(non-interactive) failover.
Got Error so couldn't continue failover from here.

log in slave:

2017-02-10T12:27:11.310017Z 156 [Note] Aborted connection 156 to db: 'unconnected' user: 'root' host: 'app03.hp.sp.tst.bmsre.com' (Got an error reading communication packets)
2017-02-10T12:27:23.164972Z 159 [Note] Aborted connection 159 to db: 'unconnected' user: 'root' host: 'app03.hp.sp.tst.bmsre.com' (Got an error reading communication packets)
2017-02-10T12:27:24.252659Z 161 [Note] Aborted connection 161 to db: 'unconnected' user: 'root' host: 'app03.hp.sp.tst.bmsre.com' (Got an error reading communication packets)
2017-02-10T12:27:26.263051Z 162 [Note] Aborted connection 162 to db: 'unconnected' user: 'root' host: 'app03.hp.sp.tst.bmsre.com' (Got an error reading communication packets)
2017-02-10T12:27:26.275693Z 160 [Note] Aborted connection 160 to db: 'unconnected' user: 'root' host: 'app03.hp.sp.tst.bmsre.com' (Got an error reading communication packets)

Uninitialized value in MasterFailover.pm

Sun Feb 23 14:28:45 2014 - [info] Starting recovery on db02(172.16.101.6:3306)..
Sun Feb 23 14:28:45 2014 - [info] Generating diffs succeeded.
Sun Feb 23 14:28:45 2014 - [info] Waiting until all relay logs are applied.
Sun Feb 23 14:28:45 2014 - [error][/usr/share/perl5/MHA/Server.pm, ln529] Checking slave status failed on db02(172.16.101.6:3306). err=SQL thread is not running! Check slave status.
Sun Feb 23 14:28:45 2014 - [error][/usr/share/perl5/MHA/MasterFailover.pm, ln1095] Applying existing relay logs failed!
Sun Feb 23 14:28:45 2014 - [error][/usr/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: Use of uninitialized value $low in concatenation (.) or string at /usr/share/perl5/MHA/MasterFailover.pm line 1182.

feature request: clone with lvm/zfs

the best feature that mmm has is cloning storage with lvm when a crash happens
and to be able to make a quick snapshot of storage for backup.

will be good to have these functionalities

thanks.

behavior with secondary_check_script

According to the wiki, "If A was unsuccessful, masterha_secondary_check exits with return code 2 and MHA Manager guesses that network problem has happened and it does not start failover."

I thought masterha would give up failing over in this case.

When I tested the secondary_check_script with an unreachable address on purpose with
masterha_secondary_check -s 10.119.45.30 -s 10.120.45.30 (10.119.45.30 is unreachable),
then masterha seems to retry the failover on and on and on...

this repeats:

ssh: connect to host 10.119.45.30 port 22: No route to host
Monitoring server 10.119.45.30 is NOT reachable!
Sat Jul 14 06:04:52 2012 - [warning] At least one of monitoring servers is not reachable from this script. This is likely network problem. Failover should not happen.

Is this behavior expected?

Feature request: Slave Options for master switch = alive

When doing online master failover, there's no way to set a list of slave options, such as ssl_ca_file, master_ssl, etc. Would be nice if there was a way to specify slave options to apply to the newly demoted master/slave.

Checking slave state string is incorrect for MySQL 5.5+

When the SQL thread has read all relay logs but does not reached the tail of the relay log (i.e. relay logs are terminated in the middle of a transaction), slave state should be checked from show processlist (or I_S) output. But the current MHA's state string is not correct for MySQL 5.5+.

Can't locate MHA/SSHCheck.pm

[root@rav-xxts-db masterha]# masterha_check_ssh --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf
Can't locate MHA/SSHCheck.pm in @inc (@inc contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /usr/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/bin/masterha_check_ssh line 25.

How to setup HA for MHA

Hi , I want to have 2 MHA server so that if 1 go down other can monitor my cluster,

and at the time want to make sure that when failover happen ony one mha node should participate

can some one help me on this

Thanks

mha_manager doesn't select new master when one of slaves is dead

I have tried to find any information about this issue but I have failed. The problem I encountered presents like this: I have configuration with 3 servers:

[server1]
hostname=192.168.33.10
candidate_master=1

[server2]
hostname=192.168.33.11
candidate_master=1

[server3]
hostname=192.168.33.12
candidate_master=1

MHA works fine when only master fails, then it picks first slave available on list and promotes it to new master but problem appears when during the script execution one of slaves fails somehow, then if master fails, new master cannot be selected despite the fact there is still a one working slave.

Here is a end of log where error appears:

Thu Oct 19 11:19:37 2017 - [info] MHA::MasterFailover version 0.57.
Thu Oct 19 11:19:37 2017 - [info] Starting master failover.
Thu Oct 19 11:19:37 2017 - [info]
Thu Oct 19 11:19:37 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Oct 19 11:19:37 2017 - [info]
Thu Oct 19 11:19:38 2017 - [info] GTID failover mode = 0
Thu Oct 19 11:19:38 2017 - [info] Dead Servers:
Thu Oct 19 11:19:38 2017 - [info] 192.168.33.11(192.168.33.11:3306)
Thu Oct 19 11:19:38 2017 - [info] 192.168.33.12(192.168.33.12:3306)
Thu Oct 19 11:19:38 2017 - [info] Checking master reachability via MySQL(double check)...
Thu Oct 19 11:19:38 2017 - [info] ok.
Thu Oct 19 11:19:38 2017 - [info] Alive Servers:
Thu Oct 19 11:19:38 2017 - [info] 192.168.33.10(192.168.33.10:3306)
Thu Oct 19 11:19:38 2017 - [info] Alive Slaves:
Thu Oct 19 11:19:38 2017 - [info] 192.168.33.10(192.168.33.10:3306) Version=10.2.9-MariaDB-10.2.9+maria~xenial-log (oldest major version between slaves) log-bin:enabled
Thu Oct 19 11:19:38 2017 - [info] Replicating from 192.168.33.11(192.168.33.11:3306)
Thu Oct 19 11:19:38 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Oct 19 11:19:38 2017 - [error][/usr/local/share/perl/5.22.1/MHA/ServerManager.pm, ln492] Server 192.168.33.12(192.168.33.12:3306) is dead, but must be alive! Check server settings.
Thu Oct 19 11:19:38 2017 - [error][/usr/local/share/perl/5.22.1/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/share/perl/5.22.1/MHA/MasterFailover.pm line 268.

Any ideas?

Document MHA DB User Requirements

Per this old bug: https://code.google.com/p/mysql-master-ha/issues/detail?id=50

It would be good to know what grants are required for MHA user. When we tested against MariaDB 5.5 we found the following grants were required:

CREATE USER mha IDENTIFIED BY '<password>';
GRANT RELOAD, SUPER ON *.* TO 'mha'@'%' ; -- Limited admin commands to manage replication, config
GRANT SELECT ON `mysql`.* TO 'mha'@'%';
GRANT ALL PRIVILEGES ON `mysql`.`apply_diff_relay_logs` TO 'mha'@'%';
GRANT ALL PRIVILEGES ON `mysql`.`apply_diff_relay_logs_test` TO 'mha'@'%';
FLUSH PRIVILEGES;

Its possible the last 2 grants could be tightened, but I didn't experiment widely with this.

report_script does not work because of options..

HI, I'm trying to get report_script to work with no luck.

  1. There seems to be a conf argument passed - which is easy to workaround by adding..
  2. "Option new_slave_hosts requires an argument"
    How can this be sloved?
  3. sh: 1: Syntax error: "(" unexpected
    Seems to be caused by ( in the body text...

Here is my code:

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

new_master_host and new_slave_hosts are set only when recovering master succeeded

my ( $dead_master_host, $new_master_host, $conf, $new_slave_hosts, $subject, $body, $email );
GetOptions(
'orig_master_host=s' => $dead_master_host,
'new_master_host=s' => $new_master_host,
'conf=s' => $conf,
'new_slave_hosts=s' => $new_slave_hosts,
'subject=s' => $subject,
'body=s' => $body,
);

exit &main();

sub main {

$email="[email protected]";

system("echo ".$body." | mail -s ".$subject." ".$email );

}

Thanks for help! Appreciate it!

Feature Request: Option to avoid running CHANGE MASTER TO on the promoted slave

Hi there, me again.

I'd like to know if it would be possible to add an option to avoid running the CHANGE MASTER TO sequence on the promoted slave. We have some master-master use cases where it is desirable to be able to resume replication once the failed master comes back online.

We have this setup ( A <-> B, A -> C & B -> D):

+----+            +----+
|  A |  < --- >   |  B |
+----+            +----+
   V                 V
+----+            +----+
|  C |            |  D |
+----+            +----+

Which upon failure would become (A -> C, A -> D and B is offline):

+----+          +----+
|  A |          |  X |
+----+          +----+
   V        \  
+----+          +----+
|  C |          |  D |
+----+          +----+

What we'd like to do, when possible, as we bring B back online is: (A <-> B, A -> C & A -> D)

+----+            +----+
|  A |  < --- >   |  B |
+----+            +----+
   V        \  
+----+          +----+
|  C |          |  D |
+----+          +----+

Right now we can replicate from A -> B, but we need to do a full CHANGE MASTER TO ... on A to start replication back from B. I'd like to be able to add an option to the MHA config file, so that step is not needed. The option should be OFF by default.

Thanks for your efforts on the project.

Gerry
I hope the ASCII art helps to understand what I'm trying to do.

Cheers,
G

failover hang up when the mysql user has not stop slave privilege.

Hi, I discover that when doing failover, If the mysql user("app.conf:[server default] user parameter") has not stop slave privilege, the failover process will hang up on Phase 2: Dead Master Shutdown Phase...

It is well to output a warnning or error message about that.

The mha version is 0.56

Wrong exit code on masterha_manager

When MHA monitoring (invoked from masterha_manager) failed with other than master down (i.e. configuration error), an exit code (mostly 255) was not equal to the code printed on log outputs (mostly 1).

readdir() attempted on invalid dirhandle $dir Error occured!

HI! I am admiring your work, and really thanks for this.

but I got this issue and has been stucked for 2 days.

I am using CentOS 6.8 , mha 0.56 , mysql 5.7.17

when I put this command,

masterha_check_repl --conf=/Test/service/manager.conf

this error returns
Checking slave recovery environment settings..
Opening /data001/mysql/relay-log.info ...readdir() attempted on invalid dirhandle $dir at /usr/local/share/perl5/MHA/BinlogManager.pm line 271.

Fri Feb 17 02:23:42 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln201] Slaves settings check failed!
Fri Feb 17 02:23:42 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln390] Slave configuration failed.
Fri Feb 17 02:23:42 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln401] Error happend on checking configurations. at /usr/local/bin/masterha_check_repl line 48
Fri Feb 17 02:23:42 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln500] Error happened on monitoring servers.
Fri Feb 17 02:23:42 2017 - [info] Got exit code 1 (Not master dead).

unfortunately I don't know how to use perl script. PLEASE HELP ME!!!

mha4mysql-service won't start after updating to mysql 5.7

Hi,
we have been using the mha4mysql-service for along time,
but know we are trying to upgrade to mysql 5.7 (from 5.5 threw 5.6)
but the mha4mysql-service won't start after updating to mysql 5.7.
we are running on Centos 6.8
and getting the error:

Wed Apr 25 14:16:44 2018 - [info] Dead Servers:
Wed Apr 25 14:16:44 2018 - [info] Alive Servers:
Wed Apr 25 14:16:44 2018 - [info] 10.00.187.3(10.00.187.3:3306)
Wed Apr 25 14:16:44 2018 - [info] 10.00.187.4(10.00.187.4:3306)
Wed Apr 25 14:16:44 2018 - [info] Alive Slaves:
Wed Apr 25 14:16:44 2018 - [info] 10.00.187.4(10.00.187.4:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:disabled
Wed Apr 25 14:16:44 2018 - [info] Replicating from 10.00.187.3(10.00.187.3:3306)
Wed Apr 25 14:16:44 2018 - [info] Current Alive Master: 10.00.187.3(10.00.187.3:3306)
Wed Apr 25 14:16:44 2018 - [info] Checking slave configurations..
Wed Apr 25 14:16:44 2018 - [warning] relay_log_purge=0 is not set on slave 10.00.187.4(10.00.187.4:3306).
Wed Apr 25 14:16:44 2018 - [warning] log-bin is not set on slave 10.00.187.4(10.00.187.4:3306). This host cannot be a master.
Wed Apr 25 14:16:44 2018 - [info] Checking replication filtering settings..
Wed Apr 25 14:16:44 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 396.
Wed Apr 25 14:16:44 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Wed Apr 25 14:16:44 2018 - [info] Got exit code 1 (Not master dead).

i'd appreciate some help.

MHA manager user

MHA manager does not do failover when all the DBs it monitors are in read only mode
This is because Mha-manager user is a super user and it can insert even when DB is read only, can we separate users in mha manager for admin activities and monitoring

[Bug] masterha_master_switch --skip_disable_read_only causes error

According to the documentation masterha_master_switch supports the --skip_disable_read_only flag but adding the flag causes error.

Actual result:

$ masterha_master_switch --master_state=alive --conf=/etc/mha4mysql/some_config.conf --interactive=0 --skip_disable_read_only
Unknown options: --skip_disable_read_only

Expected result:
Program should failover a master with respect to specified flags and skip disabling a read only mode at the new master.

Additional info:

$ masterha_master_switch --version
masterha_master_switch version 0.56.

Feature Request: Supporting multi-master configuration

Some users want MHA to work with multi-master configuration, which current MHA (v 0.50) does not support.

Here are some design considerations.

  • Only one master should be writable
  • How should MHA verify that?
      1. Checking "read_only" global variable
        When MHA detects two or more masters and two or masters are read_only ==0, MHA aborts
      1. Adding additional configuration parameter on MHA
        i.e. read_only_master=1
        (I like no.1)
  • How MHA detects current (writable) master?
    • Get all unique "Master_Hosts" from all servers
    • Skip if read_only==1 on that Master_Host
    • If the number of the rest servers is 1, that is the current writable master. Otherwise MHA aborts.

Documentation broken because of dead google code wiki

After all of the google code hosting goes down, the hyperlinks towards the wiki pages in the source code do not work any longer.

Also there seems to be no backup possibly to be found via archive.org.

Example: bin/masterha_master_switch - there may be more. Using the URL leads to:

{"data":{"text":"Anonymous users does not have storage.objects.get access to object google-code-archive/v2/code.google.com/mysql-master-ha/wiki/Requirements.wiki."},"status":401,"config":{"method":"GET","transformRequest":[null],"url":"https://www.googleapis.com/storage/v1/b/google-code-archive/o/v2%2Fcode.google.com%2Fmysql-master-ha%2Fwiki%2FRequirements.wiki?alt=media","headers":{"Accept":"application/json, text/plain, /"}},"statusText":""}

If just trying to access the wiki, there are no existing page anymore.

Patch for MHA when using {master,relay_log}_info_repository = TABLE

I was trying MHA and noticed on the servers I use that it does not work properly on MySQL 5.6 with the following settings:

master_info_repository = TABLE
relay_log_info_repository = TABLE

I notice that there is code to check for the value being TABLE and not FILE but it does not work,
at least not in the 0.56 rpms that I have been using.

There are 2 patches which can be found at: http://ftp.wl0.org/mha/

Please consider incorporating these patches into MHA.
Note: 1 patch is for the manager and another one is for the node. I'm only making 1 issue but can make a separate one for https://github.com/yoshinorim/mha4mysql-node if that's better.

Master ping via MySQL CONNECT or SELECT

Right now MHA establishes a persistent connection to a master and checks master's availability by executing "SELECT 1".
But in some cases, it is better to check by connecting/disconnecting every time, because it's more strict and it can detect TCP connection level failure more quickly.

In this issue, I'm going to add a new configuration parameter "ping_type". You can choose either "CONNECT" or "SELECT". If CONNECT is set, MHA connects/disconnects from master every ping operations. Default is CONNECT.

@@global.relay_log_purge should be preserved on change master

http://yoshinorimatsunobu.blogspot.com/2011/07/announcing-mysql-mha-mysql-master-high.html#comments

Anonymous said...

I have tried this tool, it's really a great tool in doing the master failover.

I have one question:

In the wiki, the relay_log_purge is suggested to be set to OFF. I did that and start testing the failover. After the failover is successfully done and I checked this parameter again and it's set back to ON.

It means I have to set them to OFF manually before I start the manager next time. Is it designed to be this?
August 24, 2011 6:42 PM 

Yoshinori Matsunobu said...

@Anonymous

relay_log_purge is implicitly turned to ON by MySQL itself sometimes, including when executing CHANGE MASTER. MHA internally executes CHANGE MASTER so relay_log_purge is set to ON. This applies when you execute CHANGE MASTER manually, too.

I can modify MHA Manager to check relay_log_purge parameter before executing CHANGE MASTER and set that value (executing SET GLOBAL relay_log_purge=0 or 1) after executing CHANGE MASTER, but I recommend another approach: Executing relay_log_purge script included in MHA Node package regularly, and setting --disable-relay-log-purge argument. See the online manual for details.

http://code.google.com/p/mysql-master-ha/wiki/Requirements#purge_relay_logs_script

By using this, you can safely remove unneeded relay logs and set relay_log_purge=0 automatically.
August 24, 2011 7:12 PM 

mysql open gtid,when master down failover not from master get diff binlog

hi,yoshinorim
nihao.
my mysql instance open gtid,and use mha 0.57.when master down and all slaves binlog pos behind the master's,i find the last slave not get master binlog.
why?
if i want to point binlog server to master binlog dir,when failover get master binlog.
what are the risks?

thank you.

MHA hangs/unable to generate a failover report on a bad failover

If one of the replicas which MHA is monitoring has errant transactions (writes on the replica), on a failover, MHA just hangs in Phase 4.1: Starting Slaves in parallel.. indefinitely.

Due to this, a failover report (of a failover error/bad failover) never gets generated and the report script is never triggered to report the bad failover.

masterha_check_repl occur "There is no alive server. We can't do failover" error

When I use masterha_check_repl --conf=/usr/local/mha/app.cnf to check master-slave replication status, it occurs an error.
But the ssh and masterha_secondary_check execute success,and I use the user and repl_user configure in app.cnf can success access mysql.

[error][/usr/local/share/perl5/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
[error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/MasterMonitor.pm line 329
[error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.

MHA Manager looks for relay-log.info in wrong location

Hi there,

I just found a bug as described in the title.

We store relay-log.info in a non-standard location (mostly for legacy reasons), which seems to confuse your scripts:

mysql -e "show global variables like '%relay_log%'"
+-----------------------+------------------------------+
| Variable_name | Value |
+-----------------------+------------------------------+
| max_relay_log_size | 0 |
| relay_log | /db_log/mysql/relay-bin |
| relay_log_index | /db_log/mysql/relay-index |
| relay_log_info_file | /db_log/mysql/relay-log.info |
| relay_log_purge | ON |
| relay_log_space_limit | 0 |
+-----------------------+------------------------------+

I'm using the following configuration for MHA:

Default:
cat /etc/masterha_default.cnf
[server default]
user=adm_mha
password=******
ssh_user=mysql
master_binlog_dir=/var/lib/mysql
remote_workdir=/var/log/masterha
ping_interval=3

Application:
cat /etc/masterha.d/test.cnf
[server default]
manager_workdir=/var/log/masterha/test
manager_log=/var/log/masterha/test.log
multi_tier_slave=1
master_binlog_dir=/db_log/mysql/

The results from masterha_check_repl are (only relevant section):
...
Mon Sep 26 17:44:05 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Mon Sep 26 17:44:05 2011 - [info] Executing command: save_binary_logs --command=test --start_file=mysql-bin.000114 --start_pos=4 --binlog_dir=/db_log/mysql/ --output_file=/var/log/masterha/save_binary_logs_test --manager_version=0.52
Mon Sep 26 17:44:05 2011 - [info] Connecting to mysql@xdc-tst-mysql-003(xdc-tst-mysql-003)..
Creating /var/log/masterha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /db_log/mysql/, up to mysql-bin.000114
Mon Sep 26 17:44:05 2011 - [info] Master setting check done.
Mon Sep 26 17:44:05 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Sep 26 17:44:05 2011 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=adm_mha --slave_host=xdc-tst-mysql-004 --slave_ip=10.55.210.155 --slave_port=3306 --workdir=/var/log/masterha --target_version=5.1.56-community-log --manager_version=0.52 --relay_log_info=/db_data/mysql//db_log/mysql/relay-log.info --slave_pass=xxx
Mon Sep 26 17:44:05 2011 - [info] Connecting to [email protected](xdc-tst-mysql-004)..
Checking slave recovery environment settings..
Opening /db_data/mysql//db_log/mysql/relay-log.info ...Could not open relay-log-info file /db_data/mysql//db_log/mysql/relay-log.info.
at /usr/bin/apply_diff_relay_logs line 274
Mon Sep 26 17:44:05 2011 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln129] Slaves settings check failed!
...

In bold are the parameters / output that illustrate the settings and bug. I know that this can be fixed by moving the master.info and relay-log.info files, but that would require a server bounce which is out of the question for now.

Please let me know if you need additional info or if there is an easy work around for now.

last_failover_minute will had no effect after setting parameter 'ignore_last_failover'

I start masterha_manager as following:
perl /usr/local/bin/masterha_manager --global_conf=/etc/mha/masterha_default.cnf --conf=/etc/mha/mha_test_vm/conf/app.conf --last_failover_minute=240 --ignore_last_failover --wait_on_monitor_error=60 --wait_on_failover_error=60

then,the app 'test_vm' have done failover for multiple times in past 40 minutes.
` # Checking last failover error file
if ($g_ignore_last_failover) {
MHA::NodeUtil::drop_file_if($_failover_error_file);
MHA::NodeUtil::drop_file_if($_failover_complete_file);
}

If the last failover was done within 8 hours, we don't do failover

to avoid ping-pong

if ( -f $_failover_complete_file ) {
my $lastts = ( stat($_failover_complete_file) )[9];
my $current_time = time();
if ( $current_time - $lastts < $g_last_failover_minute * 60 ) {
my ( $sec, $min, $hh, $dd, $mm, $yy, $week, $yday, $opt ) =
localtime($lastts);
my $t = sprintf( "%04d/%02d/%02d %02d:%02d:%02d",
$yy + 1900, $mm + 1, $dd, $hh, $mm, $sec );
my $msg =
"Last failover was done at $t."
. " Current time is too early to do failover again. If you want to "
. "do failover, manually remove $_failover_complete_file "
. "and run this script again.";
$log->error($msg);
croak;
}
else {
MHA::NodeUtil::drop_file_if($_failover_complete_file);
}
}
$_server_manager->get_failover_advisory_locks();
$_server_manager->start_sql_threads_if();
return $dead_master;
}
`

So I think that issue can be fixed by commenting out the following code:
# MHA::NodeUtil::drop_file_if($_failover_complete_file);

Specify replication server

I have two MySQL servers : Master and slave.
The first one has address 172.16.5.90 and the second has 172.16.5.70.
But replication works through another network interfaces with addresses from another subnet: 192.168.5.90 and 192.168.5.70.

My config of mha:

$ cat /etc/mha/app1.cnf 
[server default]
# mysql user and password
user=root
password=qq
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# manager log file
manager_log=/var/log/masterha/app1/app1.log
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

[server1]
hostname=192.168.5.70
ssh_host=flexo
repl_password=qq
repl_user=replication

[server2]
hostname=192.168.5.90
ssh_host=bender
repl_password=qq
repl_user=replication

When I'm trying to change server's role using command:

masterha_master_switch --master_state=alive  --conf=/etc/mha/app1.cnf  --orig_master_is_new_slave --interactive=1

I got an error:

[error][/usr/share/perl5/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover

I have done a lot of tests And I think it is due to interfaces 192.168.5.90 and 192.168.5.70 that are not available from 172.16.0.0/24 where manager is located.
These interfaces take part for checks connection(alive or dead) and used in SQL ' change master to' statement. And if MHA does not check availability of 192.168.5.90 and 192.168.5.70 fails.
What is the best way to make this scheme? What for the 'ssh_host' parameter? OK, if I specify a hostname that available from manager ip address(172.16.5.90 and 172.16.5.70), how can I specify a host to use for replication?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.