fantasyni / mysql-master-ha Goto Github PK

0.0 0.0 0.0 172 KB

Automatically exported from code.google.com/p/mysql-master-ha

mysql-master-ha's People

Contributors

mysql-master-ha's Issues

Get Error when execute masterha_check_repl

config the parameter file : /etc/section1.cnf
[server default]
# mysql user and password
user=root
password=rootpass
# working directory on the manager
manager_workdir=/apps/mha4mysql-manager-0.53/workdir/section1
# manager log file
manager_log=/apps/mha4mysql-manager-0.53/workdir/section1/section1.log
# working directory on MySQL servers
remote_workdir=/apps/mha4mysql-node-0.53/section1
# master_binlog_dir
master_binlog_dir=/apps/mysql-5.5.16/data
# master_ip_failover_script
master_ip_failover_script=/usr/local/samples/bin/master_ip_failover
# shutdown_script
shutdown_script=/usr/local/samples/bin/power_manager
# master_ip_online_change_script
master_ip_online_change_script=/usr/local/samples/bin/master_ip_online_change

[server1]
hostname=192.168.167.71
[server2]
hostname=192.168.167.47
candidate_master=1
[server3]
hostname=192.168.167.46

when i execute the masterha_check_repl ,get the follow error :

Fri Feb  3 15:00:45 2012 - [info]   /usr/local/samples/bin/master_ip_failover 
--command=status --ssh_user=root --orig_master_host=192.168.167.47 
--orig_master_ip=192.168.167.47 --orig_master_port=3306 
Bareword "FIXME_xxx" not allowed while "strict subs" in use at 
/usr/local/samples/bin/master_ip_failover line 88.
Execution of /usr/local/samples/bin/master_ip_failover aborted due to 
compilation errors.
Fri Feb  3 15:00:45 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln214]  Failed to 
get master_ip_failover_script status with return code 255:0.
Fri Feb  3 15:00:45 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln383] Error 
happend on checking configurations.  at /usr/bin/masterha_check_repl line 48
Fri Feb  3 15:00:45 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln478] Error 
happened on monitoring servers.
Fri Feb  3 15:00:45 2012 - [info] Got exit code 1 (Not master dead).

if i marked the parameter "master_ip_failover_script","shutdown_script",
"master_ip_online_change_script" , check repl was ok.

can you tell me ,why ?

i used version of 0.53 on linux enterprise 5.

Original issue reported on code.google.com by [email protected] on 3 Feb 2012 at 7:23

check_ssh is wrongly claiming ssh tests are failing (user == root)

> What steps will reproduce the problem?
1. Configure passwordless ssh access on all servers with user 'root'
2. Use virtual IP on current master as server1
3. Run masterha_check_ssh on manager host

> What is the expected output? What do you see instead?

All ssh checks work manually .. so, the script is expected to confirm this .. 
but instead I get the output below:

Wed Jul 18 08:50:38 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Jul 18 08:50:38 2012 - [info] Reading application default configurations 
from /etc/app1.cnf..
Wed Jul 18 08:50:38 2012 - [info] Reading server configurations from 
/etc/app1.cnf..
Wed Jul 18 08:50:38 2012 - [info] Starting SSH connection tests..
Wed Jul 18 08:50:38 2012 - [debug] 
Wed Jul 18 08:50:38 2012 - [debug]  Connecting via SSH from 
[email protected](10.0.0.50:22) to [email protected](10.0.0.14:22)..
Wed Jul 18 08:50:38 2012 - [debug]   ok.
Wed Jul 18 08:50:38 2012 - [debug]  Connecting via SSH from 
[email protected](10.0.0.50:22) to [email protected](10.0.0.12:22)..
Wed Jul 18 08:50:38 2012 - [debug]   ok.
Wed Jul 18 08:50:39 2012 - [debug] 
Wed Jul 18 08:50:38 2012 - [debug]  Connecting via SSH from 
[email protected](10.0.0.14:22) to [email protected](10.0.0.50:22)..
Wed Jul 18 08:50:38 2012 - [debug]   ok.
Wed Jul 18 08:50:38 2012 - [debug]  Connecting via SSH from 
[email protected](10.0.0.14:22) to [email protected](10.0.0.12:22)..
Wed Jul 18 08:50:39 2012 - [debug]   ok.
Wed Jul 18 08:50:39 2012 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln63] 
Wed Jul 18 08:50:39 2012 - [debug]  Connecting via SSH from 
[email protected](10.0.0.12:22) to [email protected](10.0.0.50:22)..
Permission denied (publickey,password).
Wed Jul 18 08:50:39 2012 - [error][/usr/share/perl5/MHA/SSHCheck.pm, ln107] SSH 
connection from [email protected](10.0.0.12:22) to [email protected](10.0.0.50:22) 
failed!
SSH Configuration Check Failed!
 at /usr/bin/masterha_check_ssh line 44

(now, the following is a manual test immediately after the failed run)#

root@staging:~# ssh -b 10.0.0.12 -l root 10.0.0.50
Linux live1 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
You have new mail.
Last login: Wed Jul 18 14:03:09 2012 from staging

> What version of the product are you using? On what operating system?
I am using masterha 0.53 on debian 6

> Please provide any additional information below.
I strace'd the run, and peeked in the logs and it seems this is something to do 
with the temp file created in $workdir for each check .. but the script also 
removes these temp files, so we cannot ascertain why or what in the log for 
this check is making the script report it as a failed connection attempt)

Original issue reported on code.google.com by [email protected] on 18 Jul 2012 at 1:06

How to change installation directory when installing mha from sources

What steps will reproduce the problem?

1. make git clone sources of mha node (or manager)
2. Do steps like in wiki
  $ perl Makefile.PL
  $ make
  $ sudo make install
3. it installs "bin" files in /usr/local/bin and my system(ubuntu) doesnt see 
it.
these files are : apply_diff_relay_logs  filter_mysqlbinlog  purge_relay_logs  
save_binary_logs

If i use deb package, the dpkg installs these files in /usr/bin and all work 
good.

So how to change installation dir for 'bin' files?
thanks in advance.

ps: as temporary solution i make symbolic links of these files to /usr/bin/. 
but it's not good and easy for deploing on many machines

Original issue reported on code.google.com by [email protected] on 15 Nov 2012 at 2:19

wrong exit code logic in failover_start script example?

The example code on 

  http://code.google.com/p/mysql-master-ha/wiki/Using_With_Clustering_Software

currently reads:

  rc=`masterha_master_switch --master_state=dead --interactive=0 --wait_on_failover_error=0 --dead_master_host=host1 --new_master_host=host2`
  exit $rc

The `` operator actually doesn't return the exit code of the enclosed command 
but its stdout output. As long as the output is actually empty the code above 
works as expected, returning the exit status of the executed code as "exit $rc" 
effectively becomes just "exit" and so returns the exit status of the previous 
command. 

As soon as the command in backticks actually returns text the result will 
become this instead though:

  bash: exit: some_text: numeric argument required

and the actual exit status will always be "2" for "incorrect use of builtin 
shell argument" even if the actual code in backticks executed successfully

So the right code should actually be just

  start)
  `...`
  exit

to return the exit status, or maybe using "exit $?" instead of just "exit" 
to make it more explicit. The rc=... assignment on the other hand can be 
removed completely

Or am i missing something in the original code?

Original issue reported on code.google.com by [email protected] on 19 Apr 2012 at 12:45

would be great to specify the port for ssh and scp

What steps will reproduce the problem?
1. run master_check_ssh

What is the expected output? What do you see instead?

ssh connect errors as my servers listen for ssh on a non-standard port.

What version of the product are you using? On what operating system?

manager is 0.55

Please provide any additional information below.

Adding a configure option for ssh_port would be very useful.  The nuisance of 
it is that ssh uses "-p" for port and scp uses "-P".

I'm just getting started but this project looks to have very nicely addressed a 
complicated process in an elegant way.  Thank you.

Original issue reported on code.google.com by [email protected] on 20 Dec 2012 at 10:48

Failed: Starting master failover

What steps will reproduce the problem?
1. masterha_check_ssh:   OK(No Error)
2. masterha_check_repl:  OK(No Error)
3. masterha_manager:     OK(No Error. End of filed is "Ping Succeeded, ...)
4. Master node is down(shutdown -r now)
5. Failed: Starting master failover.

Output with bellow:

---------------------
Wed Jun 20 12:08:44 2012 - [info] Starting ping health check on 
10.1.10.80(10.1.10.80:3306)..
Wed Jun 20 12:08:44 2012 - [info] Ping succeeded, sleeping until it doesn't 
respond..
         : 
         :  Master node is down(shutdown -r now)
         :
Wed Jun 20 12:10:02 2012 - [warning] Got error on MySQL ping: 2006 (MySQL 
server has gone away)
ssh: connect to host 10.1.10.80 port 22: Connection refused
Wed Jun 20 12:10:02 2012 - [warning] HealthCheck: SSH to 10.1.10.80 is NOT 
reachable.
Wed Jun 20 12:10:08 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.1.10.80' (4))
Wed Jun 20 12:10:08 2012 - [warning] Connection failed 1 time(s)..
Wed Jun 20 12:10:11 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.1.10.80' (4))
Wed Jun 20 12:10:11 2012 - [warning] Connection failed 2 time(s)..
Wed Jun 20 12:10:14 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.1.10.80' (4))
Wed Jun 20 12:10:14 2012 - [warning] Connection failed 3 time(s)..
Wed Jun 20 12:10:14 2012 - [warning] Master is not reachable from health 
checker!
Wed Jun 20 12:10:14 2012 - [warning] Master 10.1.10.80(10.1.10.80:3306) is not 
reachable!
Wed Jun 20 12:10:14 2012 - [warning] SSH is NOT reachable.
Wed Jun 20 12:10:14 2012 - [info] Connecting to a master server failed. Reading 
configuration file /etc/masterha_default.cnf and /etc/mha_manager/app1.cnf 
again, and trying to connect to all servers to check server status..
Wed Jun 20 12:10:14 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 20 12:10:14 2012 - [info] Reading application default configurations 
from /etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] Reading server configurations from 
/etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] Dead Servers:
Wed Jun 20 12:10:14 2012 - [info]   10.1.10.80(10.1.10.80:3306)
Wed Jun 20 12:10:14 2012 - [info] Alive Servers:
Wed Jun 20 12:10:14 2012 - [info]   10.1.10.81(10.1.10.81:3306)
Wed Jun 20 12:10:14 2012 - [info]   10.1.20.80(10.1.20.80:3306)
Wed Jun 20 12:10:14 2012 - [info] Alive Slaves:
Wed Jun 20 12:10:14 2012 - [info]   10.1.10.81(10.1.10.81:3306)  
Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 20 12:10:14 2012 - [info]     Replicating from 
10.1.10.80(10.1.10.80:3306)
Wed Jun 20 12:10:14 2012 - [info]   10.1.20.80(10.1.20.80:3306)  
Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 20 12:10:14 2012 - [info]     Replicating from 
10.1.10.80(10.1.10.80:3306)
Wed Jun 20 12:10:14 2012 - [info] Checking slave configurations..
Wed Jun 20 12:10:14 2012 - [warning]  read_only=1 is not set on slave 
10.1.10.81(10.1.10.81:3306).
Wed Jun 20 12:10:14 2012 - [warning]  relay_log_purge=0 is not set on slave 
10.1.10.81(10.1.10.81:3306).
Wed Jun 20 12:10:14 2012 - [warning]  read_only=1 is not set on slave 
10.1.20.80(10.1.20.80:3306).
Wed Jun 20 12:10:14 2012 - [warning]  relay_log_purge=0 is not set on slave 
10.1.20.80(10.1.20.80:3306).
Wed Jun 20 12:10:14 2012 - [info] Checking replication filtering settings..
Wed Jun 20 12:10:14 2012 - [info]  Replication filtering check ok.
Wed Jun 20 12:10:14 2012 - [info] Master is down!
Wed Jun 20 12:10:14 2012 - [info] Terminating monitoring script.
Wed Jun 20 12:10:14 2012 - [info] Got exit code 20 (Master dead).
Wed Jun 20 12:10:14 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Jun 20 12:10:14 2012 - [info] Reading application default configurations 
from /etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] Reading server configurations from 
/etc/mha_manager/app1.cnf..
Wed Jun 20 12:10:14 2012 - [info] MHA::MasterFailover version 0.52.
Wed Jun 20 12:10:14 2012 - [info] Starting master failover.
Wed Jun 20 12:10:14 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerUtil.pm, ln158] Got ERROR: 
Use of uninitialized value in scalar chomp at 
/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerConst.pm line 84.
---------------------


What is the expected output? What do you see instead?
1. Why this problem means.
2. Please tell me the workaround.

What version of the product are you using? On what operating system?
Manager: 
  - OS:  RHEL5.7 (2.6.18-274.el5)
  - MHA Manager: 0.52   ※This issue happend 0.53
  - MHA Node: 0.52
Node:(10.1.10.8[01], 10.1.20.80)  
  - OS:  RHEL5.7 (2.6.18-274.el5)
  - MHA Node: 0.52
  - MySQL 5.5.25

Original issue reported on code.google.com by [email protected] on 20 Jun 2012 at 10:30

masterha_check_ssh issue

What steps will reproduce the problem ?
1. Configure ssh keys on all servers
2. Test ssh connection with command line => OK
3. Test ssh connection with masterha_check_ssh => KO

What is the expected output? What do you see instead ?

Wed Jan  4 16:47:30 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Jan  4 16:47:30 2012 - [info] Reading application default configurations 
from /etc/myha.cnf..
Wed Jan  4 16:47:30 2012 - [info] Reading server configurations from 
/etc/myha.cnf..
Wed Jan  4 16:47:30 2012 - [info] Starting SSH connection tests..
Wed Jan  4 16:47:30 2012 - [error][/usr/local/share/perl5/MHA/SSHCheck.pm, ln63]
Wed Jan  4 16:47:30 2012 - [debug]  Connecting via SSH from root@node1 to 
root@node2..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Wed Jan  4 16:47:30 2012 - [error][/usr/local/share/perl5/MHA/SSHCheck.pm, 
ln106] SSH connection from root@node1 to root@node2 failed!
Wed Jan  4 16:47:31 2012 - [debug]
Wed Jan  4 16:47:30 2012 - [debug]  Connecting via SSH from root@node2 to 
root@node1..
Wed Jan  4 16:47:30 2012 - [debug]   ok.
SSH Configuration Check Failed!
 at ./masterha_check_ssh line 44

What version of the product are you using? On what operating system?

0.52 on redhat 6

Original issue reported on code.google.com by [email protected] on 4 Jan 2012 at 3:58

Can't exec "apply_diff_relay_logs"

1. Try start masterha_check_repl --conf=/etc/masterha/app1.cnf
2. Get this error:
[root@EGSNS-49-2 bin]# ./masterha_check_repl --conf=/etc/masterha/app1.cnf 
Fri Jun  1 14:13:46 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Fri Jun  1 14:13:46 2012 - [info] Reading application default configurations 
from /etc/masterha/app1.cnf..
Fri Jun  1 14:13:46 2012 - [info] Reading server configurations from 
/etc/masterha/app1.cnf..
Fri Jun  1 14:13:46 2012 - [info] MHA::MasterMonitor version 0.53.
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, 
ln151] Can't exec "apply_diff_relay_logs": No such file or directory at 
/usr/local/share/perl5/MHA/ManagerUtil.pm line 116.
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, 
ln383] Error happend on checking configurations. Died at 
/usr/local/share/perl5/MHA/ManagerUtil.pm line 152.
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, 
ln478] Error happened on monitoring servers.
Fri Jun  1 14:13:46 2012 - [info] Got exit code 1 (Not master dead).
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, 
ln122] Got error when getting node version. Error:
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, 
ln123] 

MySQL Replication Health is NOT OK!
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, 
ln151] Use of uninitialized value $host in concatenation (.) or string at 
/usr/local/share/perl5/MHA/ManagerUtil.pm line 139.
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, 
ln383] Error happend on checking configurations. Died at 
/usr/local/share/perl5/MHA/ManagerUtil.pm line 152.
Fri Jun  1 14:13:46 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, 
ln478] Error happened on monitoring servers.
Fri Jun  1 14:13:46 2012 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NO

3. This is my /etc/masterha/app1.cnf
[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/
user=root
password=emag@234
ssh_user=root
repl_user=rep
repl_password=rep
shutdown_script=""
master_ip_failover_script="/apps/mha4mysql-manager-0.53/samples/scripts/master_i
p_failover"
report_script=""
remote_workdir=/apps/mha4mysql-node-0.53/section1
[server1]
hostname=192.168.49.9
[server2]
hostname=192.168.49.2
candidate_master=1
[server3]
hostname=192.168.49.1
[server4]
hostname=192.168.49.3
[server5]
hostname=192.168.49.4



Pls help me explain why does this error happen?


OS:redhat 6.2 64bit   
Mysql: 5.5.22( build from source )
basedir: /apps/mysql
datadir: /apps/mysql/data

current master:192.168.49.9 
standby master:192.168.49.1
masterha_check_ssh:OK

Pls help me check and give me an advice as soon as possible

Thanks,
[email protected]

Original issue reported on code.google.com by [email protected] on 1 Jun 2012 at 6:28

master_ip_online_change DBI connect

1.have 1 master 1 slave.
2 When i'm running the command  masterha_master_switch --master_state=alive 
--conf=/etc/app1.cnf --new_master_host=db1  i get the following output. It 
seems that the manager connect to the mysql DB using root no pass 

Mon Nov 26 00:46:38 2012 - [info] MHA::MasterRotate version 0.53.
Mon Nov 26 00:46:38 2012 - [info] Starting online master switch..
Mon Nov 26 00:46:38 2012 - [info] 
Mon Nov 26 00:46:38 2012 - [info] * Phase 1: Configuration Check Phase..
Mon Nov 26 00:46:38 2012 - [info] 
Mon Nov 26 00:46:38 2012 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Mon Nov 26 00:46:38 2012 - [info] Reading application default configurations 
from /etc/app1.cnf..
Mon Nov 26 00:46:38 2012 - [info] Reading server configurations from 
/etc/app1.cnf..
Mon Nov 26 00:46:38 2012 - [info] Current Alive Master: db1(10.0.1.248:3306)
Mon Nov 26 00:46:38 2012 - [info] Alive Slaves:
Mon Nov 26 00:46:38 2012 - [info]   db3(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Mon Nov 26 00:46:38 2012 - [info]     Replicating from 
10.0.1.248(10.0.1.248:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before 
switching. Is it ok to execute on db1(10.0.1.248:3306)? (YES/no): 
Mon Nov 26 00:46:40 2012 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. 
This may take long time..
Mon Nov 26 00:46:40 2012 - [info]  ok.
Mon Nov 26 00:46:40 2012 - [info] Checking MHA is not monitoring or doing 
failover..
Mon Nov 26 00:46:40 2012 - [info] Checking replication health on db3..
Mon Nov 26 00:46:40 2012 - [info]  ok.
Mon Nov 26 00:46:40 2012 - [info] db3 can be new master.
Mon Nov 26 00:46:40 2012 - [info] 
From:
db1 (current master)
 +--db3

To:
db3 (new master)

Starting master switch from db1(10.0.1.248:3306) to db3(10.0.1.49:3306)? 
(yes/NO): no
Continue? (yes/NO): yes
Enter new master host name: db3
Master switch to db3(10.0.1.49:3306). OK? (yes/NO): yes
Mon Nov 26 00:47:03 2012 - [info] Checking whether db3(10.0.1.49:3306) is ok 
for the new master..
Mon Nov 26 00:47:03 2012 - [info]  ok.
Mon Nov 26 00:47:03 2012 - [info] ** Phase 1: Configuration Check Phase 
completed.
Mon Nov 26 00:47:03 2012 - [info] 
Mon Nov 26 00:47:03 2012 - [info] * Phase 2: Rejecting updates Phase..
Mon Nov 26 00:47:03 2012 - [info] 
Mon Nov 26 00:47:03 2012 - [info] Executing master ip online change script to 
disable write on the current master:
Mon Nov 26 00:47:03 2012 - [info]   /opt/scripts/master_ip_online_change 
--command=stop --orig_master_host=db1 --orig_master_ip=10.0.1.248 
--orig_master_port=3306 --new_master_host=db3 --new_master_ip=10.0.1.49 
--new_master_port=3306  
Got Error: DBI 
connect(';host=10.0.1.49;port=3306;mysql_connect_timeout=4','',...) failed: 
Access denied for user 'root'@'10.0.1.45' (using password: NO) at 
/usr/share/perl5/MHA/DBHelper.pm line 181
 at /opt/scripts/master_ip_online_change line 128



What is the expected output? What do you see instead?
Connect To the db with username and password in the configuration file. 
What version of the product are you using? On what operating system?
0.5.3

Please provide any additional information below.
Below my conf files 


# mysql user and password
  user=replication
  repl_user=replication 
  repl_password=xxx
  password=xxx
  ssh_user=root
  # working directory on the manager
  manager_workdir=/var/log/masterha/app1
  # working directory on MySQL servers
  remote_workdir=/var/log/masterha/app1
  manager_log=/var/log/masterha/app1.log 
  [server1]
  hostname=db1
  ignore_fail=1 

  [server3]
  hostname=db3
  ignore_fail=1

--------------------
[server default]
  user=replication
  password=xxx
 repl_password=xxx
  ssh_user=root
  master_binlog_dir= /var/log/mysql
  remote_workdir=/data/log/masterha
   ping_interval=3
  master_ip_failover_script= /opt/scripts/master_ip_failover
  master_ip_online_change_script= /opt/scripts/master_ip_online_change
~                                                                               

~                                                                            

Regards

Original issue reported on code.google.com by [email protected] on 26 Nov 2012 at 1:06

Feature Request - Possibility to specify "identity_file" or "options" for SSH connection

I'd like to specify the "identity_file" or "options" for ssh inside conf files, 
because I'm trying to use "ssh_user=mysql", but "id_rsa" file is inside mysql's 
home.

So, when I try to use "masterha_check_ssh" using root privileges, ssh uses 
"/root/.ssh/id_rsa"

Original issue reported on code.google.com by [email protected] on 23 Nov 2011 at 4:09

SSH remote user should not be root

What steps will reproduce the problem?
MHA manager uses ssh to manage nodes with a ssh_user defined in configuration 
file. Some of commands must have admin rights on nodes, meaning that by 
default, if you do not set ssh_user to root, it will not work.
In a production environment, ssh is not allowed to root.
ssh_user should be sudoer with no password and on manager, commands should use 
sudo (apply_diff_relay_logs, save_binary_logs)

Thanks for you work. This is a very nice project.

Original issue reported on code.google.com by [email protected] on 19 Apr 2012 at 9:56

Wrong error message when a new version of Log::Dispatch installed

What steps will reproduce the problem?
  Install a later version of Log::Dispatch

What is the expected output? What do you see instead?
  The error we get when we use switch master is
  Sun Nov 11 05:15:09 2012 - [info] MHA::MasterRotate version 0.53.
  Sun Nov 11 05:15:09 2012 - [info] Starting online master switch..
  Sun Nov 11 05:15:09 2012 - [error][/usr/lib/perl5/vendor_perl /MHA/ManagerUtil.pm, ln178] Got ERROR: Use of uninitialized value in   scalar chomp at /usr/lib/perl5/vendor_perl/MHA/ManagerConst.pm line 90.

What version of the product are you using? On what operating system?
0.53 Redhat linux 5.6

Please provide any additional information below.
  When we try to failover and there is a newer version of the module (Log::Dispatch) installed the error message is not helpful.

Original issue reported on code.google.com by [email protected] on 11 Nov 2012 at 12:16

relay_log_file inserts datadir path even when relay_log on MySQL is using an absolute path

What steps will reproduce the problem?
1. define relay_log with an absolute path in my.cnf (eg: 
relay_log=/var/lib/mysql/logs/relay-log)
2. define datadir in my.cnf (eg: datadir=/var/lib/mysql/data)

What is the expected output? What do you see instead?
The relay_log_file should be "/var/lib/mysql/logs/relay-log", instead of 
"/var/lib/mysql/data//var/lib/mysql/logs/relay-log.info"

What version of the product are you using? On what operating system?
0.52 on CentOS 5.7 (using RPM dist)

Original issue reported on code.google.com by [email protected] on 23 Nov 2011 at 5:18

Attachments:

masterha-issue11.diff

upgrade to 0.54 problem

Hi,
MHA worked fine on my two servers (debian) in 0.53. After upgrade the package 
in 0.54, i have a problem when i start the manager

masterha_manager --conf=/etc/masterha/app1.cnf

-------
Wed Dec 12 14:56:57 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Dec 12 14:56:57 2012 - [info] Reading application default configurations 
from /etc/masterha/app1.cnf..
Wed Dec 12 14:56:57 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln386] Error happend on checking configurations. Undefined subroutine 
&MHA::NodeUtil::escape_for_shell called at /usr/share/perl5/MHA/Config.pm line 
285.
Wed Dec 12 14:56:57 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln482] Error happened on monitoring servers.
Wed Dec 12 14:56:57 2012 - [info] Got exit code 1 (Not master dead).
----------

this is my app1.cnf

----------
 [server default]
  # mysql user and password
  user=root
  password=rootpass
  ssh_user=mysql
  # working directory on the manager
  manager_workdir=/var/log/masterha/app1
  manager_log=/var/log/masterha/app1/app1.log
  # working directory on MySQL servers
  remote_workdir=/var/log/masterha/app1
  master_binlog_dir=/data/mysql/log_binaire
  ping_interval=5

  master_ip_failover_script=/home/mysql/master_ip_failover

  [server1]
  hostname=10.0.0.1

  [server2]
  hostname=10.0.0.2

----------

if i test the ssh with: masterha_check_ssh --conf=/etc/masterha/app1.cnf

i have this issue Wed Dec 12 14:59:42 2012 - [warning] Global configuration 
file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 12 14:59:42 2012 - [info] Reading application default configurations 
from /etc/masterha/app1.cnf..
Undefined subroutine &MHA::NodeUtil::escape_for_shell called at 
/usr/share/perl5/MHA/Config.pm line 285.

thanks for your help

Sebastien

Original issue reported on code.google.com by [email protected] on 12 Dec 2012 at 2:01

Running masterha_master_switch errors with DBI connect

What steps will reproduce the problem?
1. masterha_master_switch --master_state=alive --conf=/vol/mha/mapi_qa.cnf 
--new_master_host=xx.xxx.xxx.xxx --orig_master_is_new_slave

2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?
0.53

Please provide any additional information below.

Trying to do a manual failover and am getting the following error:
Starting master switch from xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306) to 
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)? (yes/NO): yes
Tue Feb 21 19:29:19 2012 - [info] Checking whether 
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306) is ok for the new master..
Tue Feb 21 19:29:19 2012 - [info]  ok.
Tue Feb 21 19:29:19 2012 - [info] xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306): SHOW 
SLAVE STATUS returned empty result. To check replication filtering rules, 
temporarily executing CHANGE MASTER to a dummy host.
Tue Feb 21 19:29:19 2012 - [info] xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306): 
Resetting slave pointing to the dummy host.
Tue Feb 21 19:29:19 2012 - [info] ** Phase 1: Configuration Check Phase 
completed.
Tue Feb 21 19:29:19 2012 - [info] 
Tue Feb 21 19:29:19 2012 - [debug]  Disconnected from 
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)
Tue Feb 21 19:29:19 2012 - [info] * Phase 2: Rejecting updates Phase..
Tue Feb 21 19:29:19 2012 - [info] 
Tue Feb 21 19:29:19 2012 - [info] Executing master ip online change script to 
disable write on the current master:
Tue Feb 21 19:29:19 2012 - [info]   /vol/mha/master_ip_online_change 
--command=stop --orig_master_host=xx.xxx.xxx.xxx 
--orig_master_ip=xx.xxx.xxx.xxx --orig_master_port=3306 
--new_master_host=xx.xxx.xxx.xxx --new_master_ip=xx.xxx.xxx.xxx 
--new_master_port=3306  
Got Error: DBI 
connect(';host=xx.xxx.xxx.xxx;port=3306;mysql_connect_timeout=4','root',...) 
failed: Access denied for user 'root'@'xx.xxx.xxx.xxx' (using password: YES) at 
/usr/local/share/perl5/MHA/DBHelper.pm line 181
 at /vol/mha/master_ip_online_change line 122

Tue Feb 21 19:29:20 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, 
ln178] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53
Tue Feb 21 19:29:20 2012 - [debug]  Already disconnected from 
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)
Tue Feb 21 19:29:20 2012 - [debug]  Disconnected from 
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)
Tue Feb 21 19:29:20 2012 - [debug]  Disconnected from 
xx.xxx.xxx.xxx(xx.xxx.xxx.xxx:3306)

Am I doing something wrong?  auto failover with masterha_manager running seems 
to work fine... I can alos connect to each mysql host from all the involved 
databases in the cluster

Original issue reported on code.google.com by [email protected] on 21 Feb 2012 at 7:33

masterha_master_switch does not configure slave replication

What steps will reproduce the problem?

1. I have testdb01 and testdb02 as master and slaves. I am switching to new 
master testdb02
2. masterha_master_switch --conf=cluster.conf --master_state=alive 
--new_master_host=testdb02
2.
3.

What is the expected output? What do you see instead?

 The script should configure testdb02 as master and testdb01 as a slave of testdb02. Instead it spews the following output.
===================================
Wed Sep  7 00:30:49 2011 - [info] MHA::MasterRotate version 0.51.
Wed Sep  7 00:30:49 2011 - [info] Starting online master switch..
Wed Sep  7 00:30:49 2011 - [info] 
Wed Sep  7 00:30:49 2011 - [info] * Phase 1: Configuration Check Phase..
Wed Sep  7 00:30:49 2011 - [info] 
Wed Sep  7 00:30:49 2011 - [warn] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Sep  7 00:30:49 2011 - [info] Reading application default configurations 
from cluster.conf..
Wed Sep  7 00:30:49 2011 - [info] Reading server configurations from 
cluster.conf..
Wed Sep  7 00:30:49 2011 - [info] Current Master: testdb01(192.168.12.10:3306)
Wed Sep  7 00:30:49 2011 - [info] Alive Slaves:
Wed Sep  7 00:30:49 2011 - [info]   testdb02(192.168.12.11:3306)  
Version=5.1.52-community-log (oldest major version between slaves) 
log-bin:enabled
Wed Sep  7 00:30:49 2011 - [info]     Replicating from 
testdb01(192.168.12.10:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before 
switching. Is it ok to execute on testdb01(192.168.12.10:3306)? (YES/no): yes
Wed Sep  7 00:31:01 2011 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. 
This may take long time..
Wed Sep  7 00:31:01 2011 - [info]  ok.
Wed Sep  7 00:31:01 2011 - [info] Checking MHA is not monitoring or doing 
failover..
Wed Sep  7 00:31:01 2011 - [info] Checking replication health on testdb02..
Wed Sep  7 00:31:01 2011 - [info]  ok.
Wed Sep  7 00:31:01 2011 - [info] testdb02 can be new master.
Wed Sep  7 00:31:01 2011 - [info] 
From:
testdb01 (current master)
 +--testdb02

To:
testdb02 (new master)

Starting master switch from testdb01(192.168.12.10:3306) to 
testdb02(192.168.12.11:3306)? (yes/NO): yes
Wed Sep  7 00:31:19 2011 - [info] ** Phase 1: Configuration Check Phase 
completed.
Wed Sep  7 00:31:19 2011 - [info] 
Wed Sep  7 00:31:19 2011 - [info] * Phase 2: Rejecting updates Phase..
Wed Sep  7 00:31:19 2011 - [info] 
master_ip_online_change_script is not defined. If you do not disable writes on 
the current master manually, applications keep writing on the current master. 
Is it ok to proceed? (yes/NO): yes
Wed Sep  7 00:31:40 2011 - [info] Locking all tables on the orig master to 
reject updates from everybody (including root):
Wed Sep  7 00:31:40 2011 - [info] Executing FLUSH TABLES WITH READ LOCK..
Wed Sep  7 00:31:40 2011 - [info]  ok.
Wed Sep  7 00:31:40 2011 - [info] Orig master binlog:pos is 
mysql-bin.000005:519.
Wed Sep  7 00:31:40 2011 - [info]  Waiting to execute all relay logs on 
testdb02(192.168.12.11:3306)..
Wed Sep  7 00:31:40 2011 - [info]  master_pos_wait(mysql-bin.000005:519) 
completed on testdb02(192.168.12.11:3306). Executed 0 events.
Wed Sep  7 00:31:40 2011 - [info]   done.
Wed Sep  7 00:31:40 2011 - [info] Getting new master's binlog name and 
position..
Wed Sep  7 00:31:40 2011 - [info]  mysql-bin.000004:106
Wed Sep  7 00:31:40 2011 - [info]  All other slaves should start replication 
from here. Statement should be: CHANGE MASTER TO MASTER_HOST='testdb02 or 
192.168.12.11', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000004', 
MASTER_LOG_POS=106, MASTER_USER='slave', MASTER_PASSWORD='xxx';
Wed Sep  7 00:31:40 2011 - [info] 
Wed Sep  7 00:31:40 2011 - [info] * Switching slaves in parallel..
Wed Sep  7 00:31:40 2011 - [info] 
Wed Sep  7 00:31:40 2011 - [info] Unlocking all tables on the orig master:
Wed Sep  7 00:31:40 2011 - [info] Executing UNLOCK TABLES..
Wed Sep  7 00:31:40 2011 - [info]  ok.
Wed Sep  7 00:31:40 2011 - [info] All new slave servers switched successfully.
Wed Sep  7 00:31:40 2011 - [info] 
Wed Sep  7 00:31:40 2011 - [info] * Phase 5: New master cleanup phease..
Wed Sep  7 00:31:40 2011 - [info] 
Wed Sep  7 00:31:40 2011 - [info] Switching master to 
testdb02(192.168.12.11:3306) completed successfully.
===============================

 I want the script to configure new slave so that it connects and gets updates from new master.

What version of the product are you using? On what operating system?

centos 5.6.

Please provide any additional information below.

 I have only two hosts defined in cluster.conf.
testdb01 and testdb02

Original issue reported on code.google.com by [email protected] on 7 Sep 2011 at 6:04

Dead slave during the switch

What steps will reproduce the problem?
1.Stop original  master 
2. while the mha monitor is electing slave A to master power down the slave A  
3.check the log 
-------------------
What is the expected output? What do you see instead?
Not sure if this behaviour  is by design, but i would expect that the manger 
when it detects that the slave is not reachable via ssh would try another slave 
( my test environment is 1 master and 3 slaves ) 

What version of the product are you using? On what operating system?
Linux ubuntu 12.04 - mha-5.3 

Please provide any additional information below.
Please see the log below At this line "Fri Dec  7 17:05:33 2012 - [warning] 
HealthCheck: SSH to ip-10-0-1-248 is NOT reachable." manger know that the 
elected master is not reachable and fail the switch. ( make sense to make a 
second check ? ) 

Thanks 

Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:03:44 2012 - [debug] SSH check command: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.53 --binlog_prefix=mysql-bin --debug 
Fri Dec  7 17:03:44 2012 - [info] Set master ping interval 3 seconds.
Fri Dec  7 17:03:44 2012 - [warning] secondary_check_script is not defined. It 
is highly recommended setting it to check master reachability from two or more 
routes.
Fri Dec  7 17:03:44 2012 - [info] Starting ping health check on 
ip-10-0-1-149(10.0.1.149:3306)..
Fri Dec  7 17:03:44 2012 - [debug] Connected on master.
Fri Dec  7 17:03:44 2012 - [debug] Set short wait_timeout on master: 6 seconds
Fri Dec  7 17:03:44 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL 
doesn't respond..
root@ip-10-0-1-45:/var/log/masterha# tail -f app1.log
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:03:44 2012 - [debug]  Disconnected from 
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:03:44 2012 - [debug] SSH check command: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.53 --binlog_prefix=mysql-bin --debug 
Fri Dec  7 17:03:44 2012 - [info] Set master ping interval 3 seconds.
Fri Dec  7 17:03:44 2012 - [warning] secondary_check_script is not defined. It 
is highly recommended setting it to check master reachability from two or more 
routes.
Fri Dec  7 17:03:44 2012 - [info] Starting ping health check on 
ip-10-0-1-149(10.0.1.149:3306)..
Fri Dec  7 17:03:44 2012 - [debug] Connected on master.
Fri Dec  7 17:03:44 2012 - [debug] Set short wait_timeout on master: 6 seconds
Fri Dec  7 17:03:44 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL 
doesn't respond..
Fri Dec  7 17:05:17 2012 - [warning] Got error on MySQL select ping: 2006 
(MySQL server has gone away)
Fri Dec  7 17:05:17 2012 - [info] Executing SSH check script: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.53 --binlog_prefix=mysql-bin --debug 
Fri Dec  7 17:05:18 2012 - [info] HealthCheck: SSH to ip-10-0-1-149 is 
reachable.
Fri Dec  7 17:05:20 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.0.1.149' (111))
Fri Dec  7 17:05:20 2012 - [warning] Connection failed 1 time(s)..
Fri Dec  7 17:05:23 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.0.1.149' (111))
Fri Dec  7 17:05:23 2012 - [warning] Connection failed 2 time(s)..
Fri Dec  7 17:05:26 2012 - [warning] Got error on MySQL connect: 2003 (Can't 
connect to MySQL server on '10.0.1.149' (111))
Fri Dec  7 17:05:26 2012 - [warning] Connection failed 3 time(s)..
Fri Dec  7 17:05:26 2012 - [warning] Master is not reachable from health 
checker!
Fri Dec  7 17:05:26 2012 - [warning] Master ip-10-0-1-149(10.0.1.149:3306) is 
not reachable!
Fri Dec  7 17:05:26 2012 - [warning] SSH is reachable.
Fri Dec  7 17:05:26 2012 - [info] Connecting to a master server failed. Reading 
configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and 
trying to connect to all servers to check server status..
Fri Dec  7 17:05:26 2012 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Fri Dec  7 17:05:26 2012 - [info] Reading application default configurations 
from /etc/app1.cnf..
Fri Dec  7 17:05:26 2012 - [info] Reading server configurations from 
/etc/app1.cnf..
Fri Dec  7 17:05:26 2012 - [debug] Skipping connecting to dead master 
ip-10-0-1-149(10.0.1.149:3306).
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers..
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-248(10.0.1.248:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: ip-10-0-1-49(10.0.1.49:3306), 
user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-171(10.0.1.171:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Comparing MySQL versions..
Fri Dec  7 17:05:26 2012 - [debug]   Comparing MySQL versions done.
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers done.
Fri Dec  7 17:05:26 2012 - [info] Dead Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] Alive Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:05:26 2012 - [info] Alive Slaves:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] Checking slave configurations..
Fri Dec  7 17:05:26 2012 - [info]  read_only=1 is not set on slave 
ip-10-0-1-248(10.0.1.248:3306).
Fri Dec  7 17:05:26 2012 - [warning]  relay_log_purge=0 is not set on slave 
ip-10-0-1-248(10.0.1.248:3306).
Fri Dec  7 17:05:26 2012 - [info]  read_only=1 is not set on slave 
ip-10-0-1-49(10.0.1.49:3306).
Fri Dec  7 17:05:26 2012 - [warning]  relay_log_purge=0 is not set on slave 
ip-10-0-1-49(10.0.1.49:3306).
Fri Dec  7 17:05:26 2012 - [info]  read_only=1 is not set on slave 
ip-10-0-1-171(10.0.1.171:3306).
Fri Dec  7 17:05:26 2012 - [warning]  relay_log_purge=0 is not set on slave 
ip-10-0-1-171(10.0.1.171:3306).
Fri Dec  7 17:05:26 2012 - [info] Checking replication filtering settings..
Fri Dec  7 17:05:26 2012 - [info]  Replication filtering check ok.
Fri Dec  7 17:05:26 2012 - [info] Master is down!
Fri Dec  7 17:05:26 2012 - [info] Terminating monitoring script.
Fri Dec  7 17:05:26 2012 - [info] Got exit code 20 (Master dead).
Fri Dec  7 17:05:26 2012 - [info] MHA::MasterFailover version 0.53.
Fri Dec  7 17:05:26 2012 - [info] Starting master failover.
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [info] * Phase 1: Configuration Check Phase..
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [debug] Skipping connecting to dead master 
ip-10-0-1-149.
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers..
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-248(10.0.1.248:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: ip-10-0-1-49(10.0.1.49:3306), 
user=root
Fri Dec  7 17:05:26 2012 - [debug]  Connected to: 
ip-10-0-1-171(10.0.1.171:3306), user=root
Fri Dec  7 17:05:26 2012 - [debug]  Comparing MySQL versions..
Fri Dec  7 17:05:26 2012 - [debug]   Comparing MySQL versions done.
Fri Dec  7 17:05:26 2012 - [debug] Connecting to servers done.
Fri Dec  7 17:05:26 2012 - [info] Dead Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] Checking master reachability via mysql(double 
check)..
Fri Dec  7 17:05:26 2012 - [info]  ok.
Fri Dec  7 17:05:26 2012 - [info] Alive Servers:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:05:26 2012 - [info] Alive Slaves:
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:26 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:26 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:26 2012 - [info] ** Phase 1: Configuration Check Phase 
completed.
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Dec  7 17:05:26 2012 - [info] 
Fri Dec  7 17:05:26 2012 - [debug]  Stopping IO thread on 
ip-10-0-1-248(10.0.1.248:3306)..
Fri Dec  7 17:05:26 2012 - [debug]  Stopping IO thread on 
ip-10-0-1-49(10.0.1.49:3306)..
Fri Dec  7 17:05:26 2012 - [debug]  Stop IO thread on 
ip-10-0-1-248(10.0.1.248:3306) done.
Fri Dec  7 17:05:26 2012 - [info] Forcing shutdown so that applications never 
connect to the current master..
Fri Dec  7 17:05:26 2012 - [debug]  Stopping IO thread on 
ip-10-0-1-171(10.0.1.171:3306)..
Fri Dec  7 17:05:26 2012 - [debug]  Stop IO thread on 
ip-10-0-1-49(10.0.1.49:3306) done.
Fri Dec  7 17:05:26 2012 - [info] Executing master IP deactivatation script:
Fri Dec  7 17:05:26 2012 - [info]   /opt/scripts/master_ip_failover 
--orig_master_host=ip-10-0-1-149 --orig_master_ip=10.0.1.149 
--orig_master_port=3306 --command=stopssh --ssh_user=root  
Fri Dec  7 17:05:26 2012 - [debug]  Stop IO thread on 
ip-10-0-1-171(10.0.1.171:3306) done.
Fri Dec  7 17:05:27 2012 - [info]  done.
Fri Dec  7 17:05:27 2012 - [warning] shutdown_script is not set. Skipping 
explicit shutting down of the dead master.
Fri Dec  7 17:05:27 2012 - [info] * Phase 2: Dead Master Shutdown Phase 
completed.
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] * Phase 3: Master Recovery Phase..
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [debug] Fetching current slave status..
Fri Dec  7 17:05:27 2012 - [debug]  Fetching current slave status done.
Fri Dec  7 17:05:27 2012 - [info] The latest binary log file/position on all 
slaves is mysql-bin.000009:82776781
Fri Dec  7 17:05:27 2012 - [info] Latest slaves (Slaves that received relay log 
files to the latest):
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info] The oldest binary log file/position on all 
slaves is mysql-bin.000009:82776781
Fri Dec  7 17:05:27 2012 - [info] Oldest slaves:
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:27 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:27 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] * Phase 3.2: Saving Dead Master's Binlog 
Phase..
Fri Dec  7 17:05:27 2012 - [info] 
Fri Dec  7 17:05:27 2012 - [info] Fetching dead master's binary logs..
Fri Dec  7 17:05:27 2012 - [info] Executing command on the dead master 
ip-10-0-1-149(10.0.1.149:3306): save_binary_logs --command=save 
--start_file=mysql-bin.000009  --start_pos=82776781 --binlog_dir=/var/log/mysql 
--output_file=/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306
_20121207170526.binlog --handle_raw_binlog=1 --disable_log_bin=0 
--manager_version=0.53 --debug 
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000009 pos 82776781 to mysql-bin.000009 EOF into /var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog ..
parse_init_headers: file=mysql-bin.000009 event_type=15 server_id=10 length=103 
nextmpos=107 prevrelay=4 cur(post)relay=107
parse_init_headers: file=mysql-bin.000009 event_type=2 server_id=10 length=78 
nextmpos=185 prevrelay=107 cur(post)relay=185
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/log/mysql/mysql-bin.000009 position 82776781 to tail(82777069).. ok.
parse_init_headers: 
file=saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog 
event_type=15 server_id=10 length=103 nextmpos=107 prevrelay=4 
cur(post)relay=107
parse_init_headers: 
file=saved_master_binlog_from_ip-10-0-1-149_3306_20121207170526.binlog 
event_type=2 server_id=10 length=78 nextmpos=82776859 prevrelay=107 
cur(post)relay=185
 Concat succeeded.
Fri Dec  7 17:05:29 2012 - [info] scp from 
[email protected]:/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_33
06_20121207170526.binlog to 
local:/var/log/masterha/app1/saved_master_binlog_from_ip-10-0-1-149_3306_2012120
7170526.binlog succeeded.
Fri Dec  7 17:05:33 2012 - [warning] HealthCheck: SSH to ip-10-0-1-248 is NOT 
reachable.
Fri Dec  7 17:05:34 2012 - [info] HealthCheck: SSH to ip-10-0-1-49 is reachable.
Fri Dec  7 17:05:35 2012 - [info] HealthCheck: SSH to ip-10-0-1-171 is 
reachable.
Fri Dec  7 17:05:35 2012 - [info] 
Fri Dec  7 17:05:35 2012 - [info] * Phase 3.3: Determining New Master Phase..
Fri Dec  7 17:05:35 2012 - [info] 
Fri Dec  7 17:05:35 2012 - [info] Finding the latest slave that has all relay 
logs for recovering other slaves..
Fri Dec  7 17:05:35 2012 - [info] All slaves received relay logs to the same 
position. No need to resync each other.
Fri Dec  7 17:05:35 2012 - [info] Dead Servers:
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-248(10.0.1.248:3306) Not 
reachable via SSH  Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version 
between slaves) log-bin:enabled
Fri Dec  7 17:05:35 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:35 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - [info] Alive Slaves:
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-49(10.0.1.49:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:35 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:35 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - [info]   ip-10-0-1-171(10.0.1.171:3306)  
Version=5.5.28-0ubuntu0.12.04.2-log (oldest major version between slaves) 
log-bin:enabled
Fri Dec  7 17:05:35 2012 - [debug]    Relay log info repository: FILE
Fri Dec  7 17:05:35 2012 - [info]     Replicating from 
10.0.1.149(10.0.1.149:3306)
Fri Dec  7 17:05:35 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln443]  Server 
ip-10-0-1-248(10.0.1.248:3306) is dead, but must be alive! Check server 
settings.
Fri Dec  7 17:05:35 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/ManagerUtil.pm, ln178] Got ERROR:  at 
/usr/local/share/perl/5.14.2/MHA/MasterFailover.pm line 1456
Fri Dec  7 17:05:35 2012 - [debug]  Disconnected from 
ip-10-0-1-49(10.0.1.49:3306)
Fri Dec  7 17:05:35 2012 - [debug]  Disconnected from 
ip-10-0-1-171(10.0.1.171:3306)
Fri Dec  7 17:05:35 2012 - [info] 

----- Failover Report -----

app1: MySQL Master failover ip-10-0-1-149

Master ip-10-0-1-149 is down!

Check MHA Manager logs at ip-10-0-1-45:/var/log/masterha/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on ip-10-0-1-149.
The latest slave ip-10-0-1-248(10.0.1.248:3306) has all relay logs for recovery.
Got Error so couldn't continue failover from here.
_
Andrea Ceresoni

Original issue reported on code.google.com by [email protected] on 7 Dec 2012 at 5:26

Error while Testing master failover

What steps will reproduce the problem?
I have set up 2 different version of mysql on same machine and made one the 
master of the other.
and followed all the steps given in 

http://code.google.com/p/mysql-master-ha/wiki/Tutorial#Installing_MHA_Manager_on
_host4%28manager_host%29


But for testing master failover if I kill the master I get the following output

Tue Nov  8 21:24:28 2011 - [info] Ping succeeded, sleeping until it doesn't 
respond..
Tue Nov  8 21:24:49 2011 - [warning] Got error on MySQL ping: 2006 (MySQL 
server has gone away)
Tue Nov  8 21:24:49 2011 - [info] Executing seconary network check script: 
masterha_secondary_check -s remote_host1 -s remote_host2  --user=root  
--master_host=127.0.0.1  --master_ip=127.0.0.1  --master_port=3308
ssh: Could not resolve hostname remote_host1: Name or service not known
Monitoring server remote_host1 is NOT reachable!
Tue Nov  8 21:24:49 2011 - [warning] At least one of monitoring servers is not 
reachable from this script. This is likely network problem. Failover should not 
happen.
Tue Nov  8 21:24:49 2011 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Nov  8 21:24:52 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:24:52 2011 - [warning] Connection failed 1 time(s)..
Tue Nov  8 21:24:55 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:24:55 2011 - [warning] Connection failed 2 time(s)..
Tue Nov  8 21:24:58 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:24:58 2011 - [warning] Connection failed 3 time(s)..
Tue Nov  8 21:24:58 2011 - [warning] Secondary network check script returned 
errors. Failover should not start so checking server status again. Check 
network settings for details.
Tue Nov  8 21:25:01 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:25:01 2011 - [warning] Connection failed 1 time(s)..
Tue Nov  8 21:25:01 2011 - [info] Executing seconary network check script: 
masterha_secondary_check -s remote_host1 -s remote_host2  --user=root  
--master_host=127.0.0.1  --master_ip=127.0.0.1  --master_port=3308
ssh: Could not resolve hostname remote_host1: Name or service not known
Monitoring server remote_host1 is NOT reachable!
Tue Nov  8 21:25:01 2011 - [warning] At least one of monitoring servers is not 
reachable from this script. This is likely network problem. Failover should not 
happen.
Tue Nov  8 21:25:01 2011 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Nov  8 21:25:04 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:25:04 2011 - [warning] Connection failed 2 time(s)..
Tue Nov  8 21:25:07 2011 - [warning] Got error on MySQL connect: 2013 (Lost 
connection to MySQL server at 'reading initial communication packet', system 
error: 111)
Tue Nov  8 21:25:07 2011 - [warning] Connection failed 3 time(s)..
Tue Nov  8 21:25:07 2011 - [warning] Secondary network check script returned 
errors. Failover should not start so checking server status again. Check 
network settings for details.




What version of the product are you using? On what operating system?
On liNUX

Original issue reported on code.google.com by [email protected] on 8 Nov 2011 at 4:00

masterha_master_switch fails when not using port 3306 on all instances

What steps will reproduce the problem?
1. Configure two instance's both on port 3331 
2. Setup instance B to be a Slave of Instance A
3. masterha_master_switch --conf=/etc/mha.cnf --master_state=alive 
--new_master_host="hostb" --orig_master_is_new_slave

What is the expected output? What do you see instead?
masterha_master_switch --conf=/etc/mha.cnf --master_state=alive 
--new_master_host=10.30.70.54 --orig_master_is_new_slave
Wed Apr 11 11:27:26 2012 - [info] MHA::MasterRotate version 0.53.
Wed Apr 11 11:27:26 2012 - [info] Starting online master switch..
Wed Apr 11 11:27:26 2012 - [info] 
Wed Apr 11 11:27:26 2012 - [info] * Phase 1: Configuration Check Phase..
Wed Apr 11 11:27:26 2012 - [info] 
Wed Apr 11 11:27:26 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Apr 11 11:27:26 2012 - [info] Reading application default configurations 
from /etc/mha.cnf..
Wed Apr 11 11:27:26 2012 - [info] Reading server configurations from 
/etc/mha.cnf..
Wed Apr 11 11:27:26 2012 - [info] Current Alive Master: 
10.30.36.132(10.30.36.132:3331)
Wed Apr 11 11:27:26 2012 - [info] Alive Slaves:
Wed Apr 11 11:27:26 2012 - [info]   10.30.70.54(10.30.70.54:3331)  
Version=5.3.6-MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Apr 11 11:27:26 2012 - [info]     Replicating from 
10.30.36.132(10.30.36.132:3331)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before 
switching. Is it ok to execute on 10.30.36.132(10.30.36.132:3331)? (YES/no): 
YES    
Wed Apr 11 11:27:35 2012 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. 
This may take long time..
Wed Apr 11 11:27:35 2012 - [info]  ok.
Wed Apr 11 11:27:35 2012 - [info] Checking MHA is not monitoring or doing 
failover..
Wed Apr 11 11:27:35 2012 - [info] Checking replication health on 10.30.70.54..
Wed Apr 11 11:27:35 2012 - [info]  ok.
Wed Apr 11 11:27:35 2012 - [error][/usr/share/perl5/MHA/ServerManager.pm, 
ln1145] 10.30.70.54 is not alive!
Wed Apr 11 11:27:35 2012 - [error][/usr/share/perl5/MHA/MasterRotate.pm, ln232] 
Failed to get new master!
Wed Apr 11 11:27:35 2012 - [error][/usr/share/perl5/MHA/ManagerUtil.pm, ln178] 
Got ERROR:  at /usr/bin/masterha_master_switch line 53


What version of the product are you using? On what operating system?
ii  mha4mysql-manager                   0.53                         Master 
High Availability Manager and Tools for MySQL, Manager Package
ii  mha4mysql-node                      0.53                         Master 
High Availability Manager and Tools for MySQL, Node Package

Debian 6.0.3



Please provide any additional information below.

When i reconfigure the mysql instances to port 3306 and re-configure the 
mha.conf file it all works fine.


Broken config
[server default]
# mysql user and password
user=xxxxx
password=xxxxx
repl_user=xxxx
repl_password=xxxxx

# working directory on the manager
manager_workdir=/var/log/masterha/app1

# manager log file
manager_log=/var/log/masterha/app1/app1.log

# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

[server1]
hostname=10.30.36.132
port=3331

[server2]
hostname=10.30.70.54
port=3331

Original issue reported on code.google.com by [email protected] on 11 Apr 2012 at 7:59

[patch] wrong perl vendor dir for rhel6

On the default perl install on RHEL6, the vendor dir is 
/usr/share/perl5/vendor_lib. The spec files hardcode /usr/lib/perl5/vendor_lib 
and make the packages unusable on RHEL6.

Original issue reported on code.google.com by petefbsd on 18 Nov 2011 at 2:01

Attachments:

masterha_manager needs to be good daemon

i write script for starting/stopping masterha_manager as service in 
system(ubuntu)

you advice use nohup or daemontools.
http://code.google.com/p/mysql-master-ha/wiki/Runnning_Background
i decide using nohup

my script looks very simple still

#############
#############
#############
#!/bin/bash

RETVAL=0

do_start() {
        echo "Starting"
        nohup masterha_manager --conf=/etc/mha_manager/app1.cnf < /dev/null > /home/mha4mysql/app1.log 2>&1 &
        RETVAL=$?
        echo
        return $RETVAL
}

do_stop() {
        echo "Stopping"
        masterha_stop --conf=/etc/mha_manager/app1.cnf
        RETVAL=$?
        echo
        return $RETVAL
}

do_stop_force() {
        echo "Stopping"
        masterha_stop --abort --conf=/etc/mha_manager/app1.cnf
        RETVAL=$?
        echo
        return $RETVAL
}

case $* in

start)
        do_start
        ;;

stop)
        do_stop
        ;;

stop_force)
        do_stop_force
        ;;

*)
        echo "usage: $0 {start|stop|restart}" >&2

        exit 1
        ;;
esac

exit $RETVAL
#############
#############
#############


when i start masterha_manager, i cannot know it starts good or not. 
Usually when any daemon starts, it does all checks, then forks and return 0 as 
state that all is good. Or other numer when there is an error. 
It gives important information about successfull of start
If i start masterha_manager i cannot get any status code, because it doesnt 
fork and work in one thread. If i start it with nohup i receive 0. Its nohup 
status. Nohup says that it start good and nothing more. It can stop with error 
little while, becouse masterha_manager will make checks and find any error.

I would want to get return code from masterha_manager, but cannot now with 
nohup.

i think masterha_manager should be able to start as normal daemon with fork. It 
should make checks all and fork only when all is good. And when it forks it'd 
return 0. In other cases, when any checks were failed, it shouldnt fork and 
must return error code.

or how can i write service script?

ps: same thing with masterha_stop: it returns 0 always. Even it cannot find 
mha_manager process - it writes about it but anyway return 0;
for example

root@:/home/mha4mysql# ./mha4mysql.servise stop
Stopping
MHA Manager is not running on app1(2:NOT_RUNNING).

root@:/home/mha4mysql# echo $?
0

Original issue reported on code.google.com by [email protected] on 4 Dec 2012 at 2:04

masterha_check_repl error

1. Try start masterha_check_repl --conf=/etc/masterha/app1.cnf
2. Get this error:
[root@EGSNS-49-2 bin]# ./masterha_check_repl --conf=/etc/masterha/app1.cnf 
Fri Jun  1 17:37:44 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Fri Jun  1 17:37:44 2012 - [info] Reading application default configurations 
from /etc/masterha/app1.cnf..
Fri Jun  1 17:37:44 2012 - [info] Reading server configurations from 
/etc/masterha/app1.cnf..
Fri Jun  1 17:37:44 2012 - [info] MHA::MasterMonitor version 0.53.
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, 
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.

 at /usr/local/share/perl5/MHA/DBHelper.pm line 181
 at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, 
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.

 at /usr/local/share/perl5/MHA/DBHelper.pm line 181
 at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, 
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.

 at /usr/local/share/perl5/MHA/DBHelper.pm line 181
 at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, 
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.

 at /usr/local/share/perl5/MHA/DBHelper.pm line 181
 at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, 
ln255] install_driver(mysql) failed: Attempt to reload DBD/mysql.pm aborted.
Compilation failed in require at (eval 43) line 3.

 at /usr/local/share/perl5/MHA/DBHelper.pm line 181
 at /usr/local/share/perl5/MHA/Server.pm line 166
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, 
ln263] Got fatal error, stopping operations
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, 
ln383] Error happend on checking configurations.  at 
/usr/local/share/perl5/MHA/MasterMonitor.pm line 298
Fri Jun  1 17:37:44 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, 
ln478] Error happened on monitoring servers.
Fri Jun  1 17:37:44 2012 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

Pls help me explain why does this error happen?

And I had installed the perl-DBD-MySQL package every node.

3. This is my /etc/masterha/app1.cnf
[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/
user=root
password=emag@234
ssh_user=root
repl_user=rep
repl_password=rep
shutdown_script=""
master_ip_failover_script="/apps/mha4mysql-manager-0.53/samples/scripts/master_i
p_failover"
report_script=""
remote_workdir=/apps/mha4mysql-node-0.53/section1
[server1]
hostname=192.168.49.9
[server2]
hostname=192.168.49.2
candidate_master=1
[server3]
hostname=192.168.49.1
[server4]
hostname=192.168.49.3
[server5]
hostname=192.168.49.4



Pls help me explain why does this error happen?


OS:redhat 6.2 64bit   
Mysql: 5.5.22( build from source )
basedir: /apps/mysql
datadir: /apps/mysql/data

current master:192.168.49.9 
standby master:192.168.49.1
masterha_check_ssh:OK

Pls help me check and give me an advice as soon as possible

Thanks,
[email protected]

Original issue reported on code.google.com by [email protected] on 1 Jun 2012 at 10:08

el5 rpm not packaged properly for el5

What steps will reproduce the problem?
1. log in to a el5 system (centos 5.5 in this case)
2. rpm -ivh mha4mysql-node-0.54-0.el5.noarch.rpm 
3.

What is the expected output? What do you see instead?
EXPECTED:
Preparing...                ########################################### [100%]
   1:mha4mysql-node         ########################################### [100%]

ACTUAL:
error: Failed dependencies:
        rpmlib(FileDigests) <= 4.6.0-1 is needed by mha4mysql-node-0.54-0.el5.noarch
        rpmlib(PayloadIsXz) <= 5.2-1 is needed by mha4mysql-node-0.54-0.el5.noarch


What version of the product are you using? On what operating system?
mha 5.4 on Centos 5.5

Please provide any additional information below.
The error output is the same when I try to install the el6 package, so I'm 
assuming the el5 wasn't packaged for el5 but for el6.

Original issue reported on code.google.com by [email protected] on 18 Dec 2012 at 6:56

modify ssh_port

mysql-master-ha can add an option to modify ssh_port many ssh server is not 
running on standard port 22!

Original issue reported on code.google.com by unix114 on 21 Oct 2011 at 8:08

possible bug in masterha_secondary_check

What steps will reproduce the problem?

There is simple mysql-slave replication.
For test:
172.16.50.11 - master
172.16.50.14 - slave

I am test functionality of secondary_check_script
In mha conf i added 
secondary_check_script = masterha_secondary_check -s 172.16.50.14

On 172.16.50.11 i execute masterha_master_monitor and then test and look how 
fail-over will doing and how masterha_secondary_check will work

masterha_master_monitor --conf=/etc/mha_manager/app1.cnf

After starting manager, in other terminal i shutdown master 172.16.50.11 

but unfortunately i got next error messages

#############
#############
Fri Nov 16 16:30:10 2012 - [info]
172.16.50.11 (current master)
 +--172.16.50.14

Fri Nov 16 16:30:10 2012 - [warning] master_ip_failover_script is not defined.
Fri Nov 16 16:30:10 2012 - [warning] shutdown_script is not defined.
Fri Nov 16 16:30:10 2012 - [info] Set master ping interval 3 seconds.
Fri Nov 16 16:30:10 2012 - [info] Set secondary check script: 
masterha_secondary_check -s 172.16.50.14
Fri Nov 16 16:30:10 2012 - [info] Starting ping health check on 
172.16.50.11(172.16.50.11:3306)..
Fri Nov 16 16:30:10 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL 
doesn't respond..
Fri Nov 16 16:30:22 2012 - [warning] Got error on MySQL select ping: 2006 
(MySQL server has gone away)
Fri Nov 16 16:30:22 2012 - [info] Executing SSH check script: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/home/mysqldata/ 
--output_file=/home/mha_manager_data/app1/save_binary_logs_test 
--manager_version=0.54 --binlog_prefix=mysql-bin
Fri Nov 16 16:30:22 2012 - [info] Executing seconary network check script: 
masterha_secondary_check -s 172.16.50.14  --user=mha4mysql  
--master_host=172.16.50.11  --master_ip=172.16.50.11  --master_port=3306

command-line line 0: invalid time value.
Monitoring server 172.16.50.14 is NOT reachable!
Fri Nov 16 16:30:22 2012 - [warning] At least one of monitoring servers is not 
reachable from this script. This is likely network problem. Failover should not 
happen.
  Creating /home/mha_manager_data/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /home/mysqldata/, up to mysql-bin.000016
Fri Nov 16 16:30:23 2012 - [info] HealthCheck: SSH to 172.16.50.11 is reachable.
#############
#############

well, i start debug it
Found in masterha_secondary_check at line 78 place where $comand construct

i write "print $command" and get constructed @command

ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes 
-o ConnectTimeout=VAR_CONNECT_TIMEOUT -p 22 [email protected] "perl -e 
\"use IO::Socket::INET; my \\\$sock = IO::Socket::INET->new(PeerAddr => 
\\\"172.16.50.11\\\", PeerPort=> 3306, Proto =>'tcp', Timeout => 4); 
if(\\\$sock) { close(\\\$sock); exit 3; } exit 0;\" "

For some reason there is VAR_CONNECT_TIMEOUT variable exists here.
If i comment(or erace) place with VAR_CONNECT_TIMEOUT, then it works and could 
connect to 172.16.50.14 and mha_manager correctly can use this check in work

Is it bug or i forgot something configure in cfg?

Original issue reported on code.google.com by [email protected] on 16 Nov 2012 at 1:09

When doing SSH checks, SSH authentication availability should not be checked from those hosts which have no_master set to 1

What steps will reproduce the problem?
1. Create a MHA config (/etc/mha_cluster_config) with some nodes specified with 
no_master=1
2. Run SSH check as follows: masterha_check_ssh --conf=/etc/mha_cluster_config
3. When you check the output of masterha_check_ssh you will see that it is 
trying to check SSH connection from those nodes as well which have no_master=1

What is the expected output? What do you see instead?
According to how MHA works, SSH connection originating from only those hosts 
are needed to work which ever have the possibility of becoming a master, 
because they need to transfer differential relay logs to other slaves. However, 
when a node is defined as no_master=1, we are specifically asking MHA to make 
sure that this particular node is never considered for a master role, and hence 
we do not need to check if nodes if no_master=1 can connect to other other 
hosts. I suggest that when MHA checks SSH connection it should only try to test 
to make sure that the candidate master nodes can connect to all the other nodes.

What version of the product are you using? On what operating system?
# rpm -qa | grep -i mha
mha4mysql-node-0.53-0.el6.noarch
mha4mysql-manager-0.53-0.el6.noarch

# uname -r
2.6.32-279.9.1.el6.x86_64

# cat /etc/redhat-release 
CentOS release 6.3 (Final)

Original issue reported on code.google.com by [email protected] on 19 Oct 2012 at 6:50

Question about data synchronisation between slaves when master server is fully dead

I think about next situation:
there are 4 machines
m1 - current master
m2 - slave, candidate to master
m3 - slave, candidate to master
m4 - slave, cannot be master

They are using asynchronous replication.
When m1 will fully dead (poweroff for example), mha_manager starts do failover
one of step of failover is synchronisation of binlogs between slaves. Mha finds 
slave with newest replication position, download binlog from it and apply diffs 
to other slaves.

there is a chance that m4 will have the newest data.
Will mha copy binlog from m4 and aplly diffs to m2 and m3, if m4 is defined as 
"cannot be master"?
Or will mha compare logs only between m2 and m3?

Original issue reported on code.google.com by [email protected] on 20 Dec 2012 at 2:19

masterha_manager will quit out after the master server fail

1.I had setting up the masterha_manager and masterha_node,but when i am kill 
the master mysql's porcess, the  masterha_manager will quit out.
and  the  failure of the switch can not be achieved.

2.The below is the Architecture  when i am testing.

         master                      candidate_master                           
      10.1.200.216 --------> 10.1.200.215                10.1.200.27   
      masterha_node             masterha_node           masterha_manager & masterha_node
              \  
           \
                \   
        slave 
    10.1.200.217


--------------------------------------------
The Purpose:
 after killall -9 mysqld at 10.1.200.216, it must be the below,

            master                           
         10.1.200.215             10.1.200.27   
          masterha_node         masterha_manager
              \                 
               \
                \   
              slave 
           10.1.200.217
               masterha_node


BUT:
  after killall -9 mysqld at 10.1.200.216, masterha_manager will quit out, and nothing change.


some more info:

1.install the mysql package both at 10.1.200.215, 10.1.200.216,10.1.200.217, 
10.1.200.27
     rpm -ivh    MySQL-server-5.5.16-1.linux2.6.x86_64.rpm 
     rpm -ivh    MySQL-devel-5.5.16-1.linux2.6.x86_64.rpm
     rpm -ivh    MySQL-client-5.5.16-1.linux2.6.x86_64.rpm

2.install the mha4mysql-node-0.52 to all mysql nodes and 10.1.200.27
   cd mha4mysql-node-0.52;
   perl Makefile.PL&&make install 
   (cut  some of the steps that are not related)

3.install  masterha_manger on   10.1.200.27
    cd mha4mysql-manager-0.52
    perl Makefile.PL
    (cut  some of the steps that are not related)
    make install 


4. the configuration on  10.1.200.27
cat /etc/app1.cnf 
[server default]
  user=root
  password=
  manager_workdir=/var/log/masterha/app1
  manager_log=/var/log/masterha/app1/app1.log
  remote_workdir=/var/log/masterha/app1

  [server1]
  hostname=10.1.200.215
  candidate_master=1
  master_binlog_dir=/var/lib/mysql

  [server2]
  hostname=10.1.200.216
  master_binlog_dir=/var/lib/mysql

 [server3]
  hostname=10.1.200.217
  master_binlog_dir=/var/lib/mysql


  cat /etc/masterha_default.cnf 
  [server default]
  user=root
  password=
  ssh_user=root
  repl_user=slave
  repl_password= mysqlsalve
  master_binlog_dir= /var/lib/mysql
  remote_workdir=/data/log/masterha
  manager_log=/data/log/masterha/manager.log
  secondary_check_script= masterha_secondary_check -s 10.1.200.217 -s 10.1.200.215  --user=root --master_host=10.1.200.216
  ping_interval=3
  master_ip_failover_script= /usr/local/bin/master_ip_failover
  master_ip_online_change_script=/usr/local/bin/master_ip_online_change
  report_script=/usr/local/bin/send_report


1.the output of masterha_check_ssh( )

masterha_check_ssh --conf=/etc/app1.cnf
Tue Dec 27 22:14:06 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 22:14:06 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 22:14:06 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
Tue Dec 27 22:14:06 2011 - [info] Starting SSH connection tests..
Tue Dec 27 22:14:07 2011 - [debug] 
Tue Dec 27 22:14:06 2011 - [debug]  Connecting via SSH from 
[email protected](10.1.200.215) to [email protected](10.1.200.216)..
Tue Dec 27 22:14:06 2011 - [debug]   ok.
Tue Dec 27 22:14:06 2011 - [debug]  Connecting via SSH from 
[email protected](10.1.200.215) to [email protected](10.1.200.217)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:07 2011 - [debug] 
Tue Dec 27 22:14:06 2011 - [debug]  Connecting via SSH from 
[email protected](10.1.200.216) to [email protected](10.1.200.215)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:07 2011 - [debug]  Connecting via SSH from 
[email protected](10.1.200.216) to [email protected](10.1.200.217)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:08 2011 - [debug] 
Tue Dec 27 22:14:07 2011 - [debug]  Connecting via SSH from 
[email protected](10.1.200.217) to [email protected](10.1.200.215)..
Tue Dec 27 22:14:07 2011 - [debug]   ok.
Tue Dec 27 22:14:07 2011 - [debug]  Connecting via SSH from 
[email protected](10.1.200.217) to [email protected](10.1.200.216)..
Tue Dec 27 22:14:08 2011 - [debug]   ok.
Tue Dec 27 22:14:08 2011 - [info] All SSH connection tests passed successfully. 





the output of masterha_check_repl --conf=/etc/app1.cnf

Tue Dec 27 22:16:10 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 22:16:10 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 22:16:10 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
Tue Dec 27 22:16:10 2011 - [info] MHA::MasterMonitor version 0.52.
Tue Dec 27 22:16:10 2011 - [info] Dead Servers:
Tue Dec 27 22:16:10 2011 - [info] Alive Servers:
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.215(10.1.200.215:3306)
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.217(10.1.200.217:3306)
Tue Dec 27 22:16:10 2011 - [info] Alive Slaves:
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.215(10.1.200.215:3306)  
Version=5.5.16-log (oldest major version between slaves) log-bin:enabled
Tue Dec 27 22:16:10 2011 - [info]     Replicating from 
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info]     Primary candidate for the new Master 
(candidate_master is set)
Tue Dec 27 22:16:10 2011 - [info]   10.1.200.217(10.1.200.217:3306)  
Version=5.5.16-log (oldest major version between slaves) log-bin:enabled
Tue Dec 27 22:16:10 2011 - [info]     Replicating from 
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] Current Alive Master: 
10.1.200.216(10.1.200.216:3306)
Tue Dec 27 22:16:10 2011 - [info] Checking slave configurations..
Tue Dec 27 22:16:10 2011 - [warning]  read_only=1 is not set on slave 
10.1.200.215(10.1.200.215:3306).
Tue Dec 27 22:16:10 2011 - [warning]  relay_log_purge=0 is not set on slave 
10.1.200.215(10.1.200.215:3306).
Tue Dec 27 22:16:10 2011 - [warning]  read_only=1 is not set on slave 
10.1.200.217(10.1.200.217:3306).
Tue Dec 27 22:16:10 2011 - [warning]  relay_log_purge=0 is not set on slave 
10.1.200.217(10.1.200.217:3306).
Tue Dec 27 22:16:10 2011 - [info] Checking replication filtering settings..
Tue Dec 27 22:16:10 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
Tue Dec 27 22:16:10 2011 - [info]  Replication filtering check ok.
Tue Dec 27 22:16:10 2011 - [info] Starting SSH connection tests..
Tue Dec 27 22:16:12 2011 - [info] All SSH connection tests passed successfully.
Tue Dec 27 22:16:12 2011 - [info] Checking MHA Node version..
Tue Dec 27 22:16:13 2011 - [info]  Version check ok.
Tue Dec 27 22:16:13 2011 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on the current master..
Tue Dec 27 22:16:13 2011 - [info]   Executing command: save_binary_logs 
--command=test --start_file=mysql-bin.000007 --start_pos=4 
--binlog_dir=/var/lib/mysql 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.52 
Tue Dec 27 22:16:13 2011 - [info]   Connecting to 
[email protected](10.1.200.216).. 
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000007
Tue Dec 27 22:16:14 2011 - [info] Master setting check done.
Tue Dec 27 22:16:14 2011 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on all alive slave servers..
Tue Dec 27 22:16:14 2011 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user=root --slave_host=10.1.200.215 
--slave_ip=10.1.200.215 --slave_port=3306 --workdir=/var/log/masterha/app1 
--target_version=5.5.16-log --manager_version=0.52 
--relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Tue Dec 27 22:16:14 2011 - [info]   Connecting to 
[email protected](10.1.200.215).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to yl-hyper-15-relay-bin.000014
    Temporary relay log file is /var/lib/mysql/yl-hyper-15-relay-bin.000014
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Dec 27 22:16:14 2011 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user=root --slave_host=10.1.200.217 
--slave_ip=10.1.200.217 --slave_port=3306 --workdir=/var/log/masterha/app1 
--target_version=5.5.16-log --manager_version=0.52 
--relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Tue Dec 27 22:16:14 2011 - [info]   Connecting to 
[email protected](10.1.200.217).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to yl-hyper-17-relay-bin.000013
    Temporary relay log file is /var/lib/mysql/yl-hyper-17-relay-bin.000013
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Dec 27 22:16:14 2011 - [info] Slaves settings check done.
Tue Dec 27 22:16:14 2011 - [info] 
10.1.200.216 (current master)
 +--10.1.200.215
 +--10.1.200.217

Tue Dec 27 22:16:14 2011 - [info] Checking replication health on 10.1.200.215..
Tue Dec 27 22:16:14 2011 - [info]  ok.
Tue Dec 27 22:16:14 2011 - [info] Checking replication health on 10.1.200.217..
Tue Dec 27 22:16:14 2011 - [info]  ok.
Tue Dec 27 22:16:14 2011 - [info] Checking master_ip_failvoer_script status:
Tue Dec 27 22:16:14 2011 - [info]   /usr/local/bin/master_ip_failover 
--command=status --ssh_user=root --orig_master_host=10.1.200.216 
--orig_master_ip=10.1.200.216 --orig_master_port=3306
Tue Dec 27 22:16:14 2011 - [info]  OK.
Tue Dec 27 22:16:14 2011 - [warning] shutdown_script is not defined.
Tue Dec 27 22:16:14 2011 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.


i do the below command to run masterha_manager at 10.1.200.27
  mkdir -p /data/log/masterha;
  nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > /data/log/masterha/manager.log 2>&1 &;

tail -f /data/log/masterha/manager.log
Tue Dec 27 21:39:59 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 21:39:59 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 21:39:59 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Tue Dec 27 21:40:33 2011 - [info] Reading server configurations from 
/etc/app1.cnf..

Original issue reported on code.google.com by [email protected] on 27 Dec 2011 at 2:43

MySQL Replication Health is NOT OK!

What steps will reproduce the problem?
1. execute  sudo ./masterha_check_repl --conf=/etc/app1.cnf

2.Expected output:
MySQL Replication Health

3.Output got: Mon Oct 31 16:58:54 2011 - [info] Reading default configuratoins 
from /etc/masterha_default.cnf..
Mon Oct 31 16:58:54 2011 - [info] Reading application default configurations 
from /etc/app1.cnf..
Mon Oct 31 16:58:54 2011 - [info] Reading server configurations from 
/etc/app1.cnf..
Mon Oct 31 16:58:54 2011 - [info] MHA::MasterMonitor version 0.52.
Mon Oct 31 16:58:54 2011 - 
[error][/usr/local/share/perl/5.10.1/MHA/MasterMonitor.pm, ln315] Error happend 
on checking configurations. Use of uninitialized value $datadir in 
concatenation (.) or string at /usr/local/share/perl/5.10.1/MHA/SlaveUtil.pm 
line 123.
Mon Oct 31 16:58:54 2011 - 
[error][/usr/local/share/perl/5.10.1/MHA/MasterMonitor.pm, ln396] Error 
happened on monitoring servers.
Mon Oct 31 16:58:54 2011 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 31 Oct 2011 at 11:33

Wrong transfer password across parameters into other scripts (need escaping and quotes)

Hello!
I found a bug with working with config file of mha.

i have strong password in mysql, that contains such symbols like 
'$','\','%','$' and other. Not just letters and didgits.

For example i write in in mha cnf 
password=%DE&T^GF1

and then i execute 
masterha_check_repl --conf=/etc/mha_manager/app1.cnf

i got messages about programm couldnt connect to mysql.

############################
############################
############################
Wed Nov 14 17:31:17 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Wed Nov 14 17:31:17 2012 - [info] Reading application default configurations 
from /etc/mha_manager/app1.cnf..
Wed Nov 14 17:31:17 2012 - [info] Reading server configurations from 
/etc/mha_manager/app1.cnf..
Wed Nov 14 17:31:17 2012 - [info] MHA::MasterMonitor version 0.53.
Wed Nov 14 17:31:17 2012 - [info] Dead Servers:
Wed Nov 14 17:31:17 2012 - [info] Alive Servers:
Wed Nov 14 17:31:17 2012 - [info]   172.16.50.11(172.16.50.11:3306)
Wed Nov 14 17:31:17 2012 - [info]   172.16.50.14(172.16.50.14:3306)
Wed Nov 14 17:31:17 2012 - [info] Alive Slaves:
Wed Nov 14 17:31:17 2012 - [info]   172.16.50.11(172.16.50.11:3306)  
Version=5.5.28-MariaDB-log (oldest major version between slaves) log-bin:enabled
Wed Nov 14 17:31:17 2012 - [info]     Replicating from 
172.16.50.14(172.16.50.14:3306)
Wed Nov 14 17:31:17 2012 - [info] Current Alive Master: 
172.16.50.14(172.16.50.14:3306)
Wed Nov 14 17:31:17 2012 - [info] Checking slave configurations..
Wed Nov 14 17:31:17 2012 - [info] Checking replication filtering settings..
Wed Nov 14 17:31:17 2012 - [info]  binlog_do_db= testdb, binlog_ignore_db=
Wed Nov 14 17:31:17 2012 - [info]  Replication filtering check ok.
Wed Nov 14 17:31:17 2012 - [info] Starting SSH connection tests..
Wed Nov 14 17:31:18 2012 - [info] All SSH connection tests passed successfully.
Wed Nov 14 17:31:18 2012 - [info] Checking MHA Node version..
Wed Nov 14 17:31:18 2012 - [info]  Version check ok.
Wed Nov 14 17:31:18 2012 - [info] Checking SSH publickey authentication 
settings on the current master..
Wed Nov 14 17:31:18 2012 - [info] HealthCheck: SSH to 172.16.50.14 is reachable.
Wed Nov 14 17:31:18 2012 - [info] Master MHA Node version is 0.53.
Wed Nov 14 17:31:18 2012 - [info] Checking recovery script configurations on 
the current master..
Wed Nov 14 17:31:18 2012 - [info]   Executing command: save_binary_logs 
--command=test --start_pos=4 --binlog_dir=/home/mysqldata/ 
--output_file=/home/mha_manager_data/app1/save_binary_logs_test 
--manager_version=0.53 --start_file=mysql-bin.000009
Wed Nov 14 17:31:18 2012 - [info]   Connecting to 
[email protected](172.16.50.14)..
  Creating /home/mha_manager_data/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /home/mysqldata/, up to mysql-bin.000009
Wed Nov 14 17:31:19 2012 - [info] Master setting check done.
Wed Nov 14 17:31:19 2012 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on all alive slave servers..
Wed Nov 14 17:31:19 2012 - [info]   Connecting to 
[email protected](172.16.50.11:22)..
  Checking slave recovery environment settings..
    Opening /home/mysqldata/relay-log.info ... ok.
    Relay log found at /home/mysqldata, up to mysql-relay-bin.000002
    Temporary relay log file is /home/mysqldata/mysql-relay-bin.000002
    Testing mysql connection and privileges..ERROR 1045 (28000): Access denied for user 'root'@'172.16.50.11' (using password: YES)

mysql command failed with rc 1:0!
 at /usr/bin/apply_diff_relay_logs line 351
        main::check() called at /usr/bin/apply_diff_relay_logs line 470
        eval {...} called at /usr/bin/apply_diff_relay_logs line 450
        main::main() called at /usr/bin/apply_diff_relay_logs line 110
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln194] Slaves settings check failed!
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln373] Slave configuration failed.
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln384] Error happend on checking configurations.  at 
/usr/bin/masterha_check_repl line 48
Wed Nov 14 17:31:19 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln479] Error happened on monitoring servers.
Wed Nov 14 17:31:19 2012 - [info] Got exit code 1 (Not master dead).
############################
############################
############################

well, and there is very interesting moment: in the begin it could connect to 
mysql and it could get values of global variables.
but then it couldnt.

i've never written perl scripts (i use c++ and bash usually) but i tried find 
where is problem;

i found that problem in incorrect passing parameters without escaping.
In MasterMonitor.pm in line 185: when construction $command and concatenate 
--slave_pass it should be escaped and placed in quotes. Because the script 
looks on  '$' symbol like on control character and miss it.

I tried change MasterMonitor.pm in this way:

$command .= " --slave_pass='$s->{password}' ";

But this is not help me.

I tried manually run  apply_diff_relay_logs with parameters and found that 
sybmol '$' in password should be escaped 
for example:
it doesnt work
--slave_pass='%DE&T^GF1' 

and it works
--slave_pass='%DE\&T^GF1' 

and if i remove ''' quotes slashes it doesnt work too
--slave_pass=%DE\&T^GF1 

So, please fix this bug or say how to work with such symbols in password(maybe 
there is correct way to write it in cnf file).

I would have done patch for it if i had known Perl.

Original issue reported on code.google.com by [email protected] on 14 Nov 2012 at 2:12

Running with Peacemaker

Hi.

I have a setup where I have one master database and multiple slaves.
Our application can talk to the master database for writes only and to the 
slaves for reads.
The slaves are behind a mysql proxy so the read only load is divided among 
them. 

What I need to do is to create a setup where whenever the master MySQL server 
goes down, it's virtual IP will be assigned to the new master (a former slave).
I was thinking about using a script and just let mysql-master-ha assign a new 
IP to the new master.
But this may fail miserably if the reason for the old master server to go down 
was e.g. someone pulling out the network cable. In that case I may suddenly 
have two servers with the same IP.
Therefore I need to use clustering software like peacemaker together with 
mysql-master-ha.

Is it possible to do that? 
The document on 
http://code.google.com/p/mysql-master-ha/wiki/Using_With_Clustering_Software 
describes how to do that with a simple two node scenario but that wouldn't work 
for me...

Can myster-ha maybe run a command to tell pacemaker to move over the virtual IP 
whenever it decides to switch the master database ?
That way the peacemaker would make sure there will never be two servers with 
the same virtual IP and at the same time mysql-master-ha would have control 
over promotion of slaves to master?

Original issue reported on code.google.com by [email protected] on 20 Sep 2012 at 12:32

errors about uninitialized 'escaped_password' host attribute

What steps will reproduce the problem?
1. Blank mysql password (not sure this is required)
2. Run master_ha_check_repl
3. See error:

Tue Dec 11 10:07:59 2012 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on all alive slave servers..
Tue Dec 11 10:07:59 2012 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user='root' --slave_host=33.33.33.12 
--slave_ip=33.33.33.12 --slave_port=3306 --workdir=/var/log/masterha/app1 
--target_version=5.1.66-0ubuntu0.11.10.3-log --manager_version=0.54 
--relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  
--slave_pass=xxx  
Tue Dec 11 10:07:59 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln386] Error happend on checking configurations. Use of uninitialized value in 
string ne at /usr/share/perl5/MHA/MasterMonitor.pm line 186.  
Tue Dec 11 10:07:59 2012 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln482] Error happened on monitoring servers.  
Tue Dec 11 10:07:59 2012 - [info] Got exit code 1 (Not master dead).
  MySQL Replication Health is NOT OK!


What is the expected output? What do you see instead?

replication health ok, no errors.  

What version of the product are you using? On what operating system?

MHA 0.54_0, Ubuntu Oneric, this vagrant install: 
https://github.com/jayjanssen/vagrant-mysql-mha


Please provide any additional information below.

If I replace every instance of 'escaped_password' in /usr/share/perl5/MHA with 
'password', it works fine.  I could not see anywhere in the code where 
'escaped_password' was actually set.

Original issue reported on code.google.com by [email protected] on 11 Dec 2012 at 4:22

my.cnf

Could you please recommend/share BEST my.cnf settings for your script?!

like log-bin and etc.

Original issue reported on code.google.com by [email protected] on 29 Feb 2012 at 11:54

Don't working if mysql in chroot

What steps will reproduce the problem?
1. Add to config 
[mysqld]                                                                        


chroot=/var/lib/mysql   
2. Try start masterha_check_repl --conf=/etc/app1.cnf
3. Get error:

Tue Oct 25 21:16:45 2011 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on the current master..
Tue Oct 25 21:16:45 2011 - [info]   Executing command: save_binary_logs 
--command=test --start_file=mysql-bin.000003 --start_pos=4 
--binlog_dir=/var/lib/mysql/binlog 
--output_file=/var/lib/mysql/tmp/save_binary_logs_test --manager_version=0.52 
Tue Oct 25 21:16:45 2011 - [info]   Connecting to root@db1(db1).. 
  Creating /var/lib/mysql/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql/binlog, up to mysql-bin.000003
Tue Oct 25 21:16:46 2011 - [info] Master setting check done.
Tue Oct 25 21:16:46 2011 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on all alive slave servers..
Tue Oct 25 21:16:46 2011 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user=root --slave_host=db2 --slave_ip=192.168.10.4 
--slave_port=3306 --workdir=/var/lib/mysql/tmp --target_version=5.1.59-log 
--manager_version=0.52 --relay_log_info=/db/relay-log.info  --slave_pass=xxx
Tue Oct 25 21:16:46 2011 - [info]   Connecting to [email protected](db2).. 
  Checking slave recovery environment settings..
    Opening /db/relay-log.info ...Could not open relay-log-info file /db/relay-log.info.
 at /usr/bin/apply_diff_relay_logs line 274
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln129] Slaves settings check failed!
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln304] Slave configuration failed.
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln315] Error happend on checking configurations.  at 
/usr/bin/masterha_check_repl line 48
Tue Oct 25 21:16:46 2011 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, 
ln396] Error happened on monitoring servers.
Tue Oct 25 21:16:46 2011 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!


What version of the product are you using? On what operating system?
$ rpm -qa | grep -i mysql
libmysqlclient16-5.1.59-alt1
perl-DBD-mysql-4.020-alt2
mha4mysql-node-0.52-alt1
MySQL-client-5.1.59-alt1
MySQL-server-5.1.59-alt1

Original issue reported on code.google.com by [email protected] on 25 Oct 2011 at 9:51

make ssh binary location configurable

Just a feature request that would help me out.  I'd like the location of "ssh" 
to be specifiable, rather than always derived from the PATH.

Original issue reported on code.google.com by [email protected] on 5 Jan 2012 at 12:26

Can't exec "mysqlbinlog": No such file or directory

What steps will reproduce the problem?
1. on the mha manager box, running the following command: masterha_manager 
--conf=/vol/mapi_qa.cnf
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?

CentOS 5 (Amazon ami image)

Please provide any additional information below.

Fri Feb 17 05:59:15 2012 - [info]   Connecting to 
[email protected](xx.xx.xxx.xxx:22).. 
Can't exec "mysqlbinlog": No such file or directory at 
/usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!
 at /usr/local/bin/apply_diff_relay_logs line 463

Original issue reported on code.google.com by [email protected] on 17 Feb 2012 at 6:02

send_report documentation needs a fix

Current documentation does not include "$conf" parameter which generates error 
during sending email notification.

Proposed code for (MHA Manager package)/samples/scripts/send_report script.

my ( $dead_master_host, $new_master_host, $new_slave_hosts, $conf, $subject, 
$body );

GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'conf=s'             => \$conf,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);

Original issue reported on code.google.com by [email protected] on 13 Oct 2011 at 6:47

Can't locate MHA/SSHCheck.pm in @INC

What steps will reproduce the problem?
1.  Trying to run masterha_check_ssh --conf=/etc/app1.cnf
2.
3.

What is the expected output? What do you see instead?
I was expecting to get back some data like what is on this page:  
http://code.google.com/p/mysql-master-ha/wiki/Requirements#SSH_public_key_authen
tication

What version of the product are you using? On what operating system?
mha4mysql-node-0.53-0.el6
mha4mysql-manager-0.53-0.el6
on CentOS6

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 25 Jun 2012 at 1:48

MySQL permission check needed.

What steps will reproduce the problem?
1. add user without SELECT privileges.
2. run masterha_check_repl
3. error will be "User repLAN does not exist or does not have REPLICATION SLAVE 
privilege"

User had REPLICATION SLAVE privileges.  Added code to DBHelper.pm to die on 
execute() and print DBI error string.


Output with debug:

Thu Nov 10 16:52:46 2011 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Thu Nov 10 16:52:46 2011 - [info] Reading application default configurations 
from /etc/wcdb_mha.cnf..
Thu Nov 10 16:52:46 2011 - [info] Reading server configurations from 
/etc/wcdb_mha.cnf..
Thu Nov 10 16:52:46 2011 - [info] MHA::MasterMonitor version 0.52.
Thu Nov 10 16:52:46 2011 - [info] Dead Servers:
Thu Nov 10 16:52:46 2011 - [info] Alive Servers:

... REMOVED ...

Thu Nov 10 16:52:46 2011 - [info] Checking replication filtering settings..
Thu Nov 10 16:52:46 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Nov 10 16:52:46 2011 - [info]  Replication filtering check ok.
repl_user: repLAN
user: repLAN
Repl_User_SQL: SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = ?
Thu Nov 10 16:52:46 2011 - 
[error][/usr/local/lib/perl5/site_perl/5.10.0/MHA/MasterMonitor.pm, ln315] 
Error happend on checking configurations. SELECT command denied to user 
'mha'@'XXX.XX.XXX.XX' for table 'user' at 
/usr/local/lib/perl5/site_perl/5.10.0/MHA/DBHelper.pm line 212.
Thu Nov 10 16:52:46 2011 - 
[error][/usr/local/lib/perl5/site_perl/5.10.0/MHA/MasterMonitor.pm, ln396] 
Error happened on monitoring servers.
Thu Nov 10 16:52:46 2011 - [info] Got exit code 1 (Not master dead).

The problem is that the mha user does not have access to query the mysql table! 
 Not a bug, but would be useful to display the proper error.

Original issue reported on code.google.com by [email protected] on 10 Nov 2011 at 9:56

masterha_manager doesnt start if one of slaves is dead and "ignore_fail=1"

I tested case when one of the slaves is dead and masterha_manager should start.

I add in cnf "ignore_fail=1" as written in wiki

part of my conf:
[server1]
hostname=172.16.50.11
candidate_master=1

[server2]
hostname=172.16.50.14
candidate_master=1

[server3]
ignore_fail=1
hostname=172.16.50.13

server1 is master, server2 slave of server1.
mysql on server3 was switched off.

Next i try start "masterha_manager"
masterha_manager --conf=/etc/mha_manager/app1.cnf

And it couldnt
It write messages:
###########
###########
Tue Nov 20 17:39:12 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Tue Nov 20 17:39:12 2012 - [info] Reading application default configurations 
from /etc/mha_manager/app1.cnf..
Tue Nov 20 17:39:12 2012 - [info] Reading server configurations from 
/etc/mha_manager/app1.cnf..
Tue Nov 20 17:39:12 2012 - [info] MHA::MasterMonitor version 0.54.
Tue Nov 20 17:39:12 2012 - [info] Dead Servers:
Tue Nov 20 17:39:12 2012 - [info]   172.16.50.13(172.16.50.13:3306)
Tue Nov 20 17:39:12 2012 - [info] Alive Servers:
Tue Nov 20 17:39:12 2012 - [info]   172.16.50.11(172.16.50.11:3306)
Tue Nov 20 17:39:12 2012 - [info]   172.16.50.14(172.16.50.14:3306)
Tue Nov 20 17:39:12 2012 - [info] Alive Slaves:
Tue Nov 20 17:39:12 2012 - [info]   172.16.50.11(172.16.50.11:3306)  
Version=5.5.28-MariaDB-log (oldest major version between slaves) log-bin:enabled
Tue Nov 20 17:39:12 2012 - [info]     Replicating from 
172.16.50.14(172.16.50.14:3306)
Tue Nov 20 17:39:12 2012 - [info]     Primary candidate for the new Master 
(candidate_master is set)
Tue Nov 20 17:39:12 2012 - [info] Current Alive Master: 
172.16.50.14(172.16.50.14:3306)
Tue Nov 20 17:39:12 2012 - [info] Checking slave configurations..
Tue Nov 20 17:39:12 2012 - [info]  read_only=1 is not set on slave 
172.16.50.11(172.16.50.11:3306).
Tue Nov 20 17:39:12 2012 - [warning]  relay_log_purge=0 is not set on slave 
172.16.50.11(172.16.50.11:3306).
Tue Nov 20 17:39:12 2012 - [info] Checking replication filtering settings..
Tue Nov 20 17:39:12 2012 - [info]  binlog_do_db= testdb, binlog_ignore_db=
Tue Nov 20 17:39:12 2012 - [info]  Replication filtering check ok.
Tue Nov 20 17:39:12 2012 - [info] Starting SSH connection tests..
Tue Nov 20 17:39:13 2012 - [info] All SSH connection tests passed successfully.
Tue Nov 20 17:39:13 2012 - [info] Checking MHA Node version..
Tue Nov 20 17:39:13 2012 - [info]  Version check ok.

Tue Nov 20 17:39:13 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln444]  Server 
172.16.50.13(172.16.50.13:3306) is dead, but must be alive! Check server 
settings.
Tue Nov 20 17:39:13 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln384] Error happend 
on checking configurations.  at 
/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm line 362
Tue Nov 20 17:39:13 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln480] Error 
happened on monitoring servers.
Tue Nov 20 17:39:13 2012 - [info] Got exit code 1 (Not master dead).
###########
###########

masterha_check_repl print same errors
###########
###########
Tue Nov 20 18:19:29 2012 - [info] All SSH connection tests passed successfully.
Tue Nov 20 18:19:29 2012 - [info] Checking MHA Node version..
Tue Nov 20 18:19:30 2012 - [info]  Version check ok.
Tue Nov 20 18:19:30 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln444]  Server 
172.16.50.13(172.16.50.13:3306) is dead, but must be alive! Check server 
settings.
Tue Nov 20 18:19:30 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln384] Error happend 
on checking configurations.  at 
/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm line 362
Tue Nov 20 18:19:30 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/MasterMonitor.pm, ln480] Error 
happened on monitoring servers.
Tue Nov 20 18:19:30 2012 - [info] Got exit code 1 (Not master dead).
###########
###########

it seems like masterha_* doesnt see in cnf option "ignore_fail=1"

I tried debug
I add print:

#more +440 ServerManager.pm | head

foreach (@dead_servers) {
    next if ( $_->{id} eq $current_master->{id} );
    next if ( $ignore_fail_check && $_->{ignore_fail} );
    print "\n" . $_->{ignore_fail} . $ignore_fail_check . "\n";
    $log->error(
      sprintf( " Server %s is dead, but must be alive! Check server settings.",
        $_->get_hostinfo() )
    );
    croak;
  }



and in messages it prints
####
####
Tue Nov 20 18:32:59 2012 - [info]  Version check ok.

10
Tue Nov 20 18:32:59 2012 - 
[error][/usr/local/share/perl/5.14.2/MHA/ServerManager.pm, ln444]  Server 
172.16.50.13(172.16.50.13:3306) is dead, but must be alive! Check server 
settings
####
####

So its strange and i expect that masterha_* will start with one dead slave and 
option "ignore_fail=1"

Original issue reported on code.google.com by [email protected] on 20 Nov 2012 at 2:37

Got ERROR: Use of uninitialized value in scalar chomp at /usr/lib/perl5/site_perl/5.8.8/MHA/ManagerConst.pm line 90

hi:
    When I manual or automatic switching time error occurs.

    masterha_master_switch --master_state=dead --conf=/etc/app1.conf  --dead_master_host=10.58.99.69  --new_master_host=10.58.99.71                     

    Mistakes as follows:

[error][/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerUtil.pm, ln178] Got ERROR: 
Use of uninitialized value in scalar chomp at 
/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerConst.pm line 90.

can not switch!
Sincere advice is what causes the situation?
thanks...

Original issue reported on code.google.com by [email protected] on 27 Jul 2012 at 9:41

Telnet.pm is not found when shutdown script included in /etc/masterha_default.cn file

What steps will reproduce the problem?
1.Enable shutdown script
2. masterha_check_repl --conf=/etc/app1.cnf
3.

What is the expected output? What do you see instead?
Checking shutdown script status:

What version of the product are you using? On what operating system?
0.53

Please provide any additional information below.

Can't locate Net/Telnet.pm in @INC (@INC contains: /usr/local/lib64/perl5 
/usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
/usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at 
/usr/local/sample/bin/power_manager line 27

I've tried to search for this telnet.pn file in older versions 0.52 , but I 
could not :(

Original issue reported on code.google.com by [email protected] on 16 Feb 2012 at 11:09

SSH access on other VLAN (different IP than the MySQL access)

In production environment, the masterha's script can't accessed to MySQL's host 
with the same IP than for SSH's access because we are in a multi VLAN 
configuration.
For security reason, the SSH is not permit on the data's VLAN.

Is it possible to do an evolution of your product for parameterize an SSH's IP 
per host in the configuration file, please?

It's for a client with SkySQL Support.

Original issue reported on code.google.com by [email protected] on 25 Oct 2011 at 4:47

mysql-master-ha fails to disable slave on a new master

Hi.

Testing mysql-master-ha (with 3 slaves and one master), I discovered that the 
new master will still be seen as a slave and masterha_manager then refuses to 
start.
It also won't remove the failed master from the config when I run:
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf

This is part of the log telling that mysql-master-ha failed to remove the slave 
part from the new master and that it still runs as slave:

Tue Sep 25 14:25:45 2012 - [info] * Phase 5: New master cleanup phease..
Tue Sep 25 14:25:45 2012 - [info]
Tue Sep 25 14:25:45 2012 - [info] Resetting slave info on the new master..
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, 
ln674]  SHOW SLAVE STATUS shows new master replicates from somewhere. Check for 
details!
Tue Sep 25 14:25:45 2012 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, 
ln688]  db02.db.cert.fronter.net: Resetting slave info failed.
Tue Sep 25 14:25:45 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1537] Master 
failover to db02.mynetwork.net(11.22.33.2:3306) done, but recovery on slave 
partially failed.
Tue Sep 25 14:25:45 2012 - [info]

This is output of show slave status:

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: db01.mynetwork.net
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000049
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysqld-relay-bin.000004
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000049
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 839
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:   
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master '[email protected]:3306' - retry-time: 10  retries: 86400
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)


And finally this is the error I get running
# masterha_manager --remove_dead_master_conf --conf=/etc/mha/app1.cnf

Tue Sep 25 15:28:10 2012 - [warning] SQL Thread is stopped(no error) on 
db02.mynetwork.net(11.22.33.2:3306)
Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln732] Multi-master 
configuration is detected, but two or more masters are either writable 
(read-only is not set) or dead! Check configurations for details. Master 
configurations are as below: 
Master db01.mynetwork.net(11.22.33.1:3306), dead
Master db02.db.cert.fronter.net(11.22.33.2:3306), replicating from 
db01.mynetwork.net(11.22.33.1:3306)

Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln383] Error happend 
on checking configurations.  at 
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 298
Tue Sep 25 15:28:10 2012 - 
[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln478] Error 
happened on monitoring servers.
Tue Sep 25 15:28:10 2012 - [info] Got exit code 1 (Not master dead).


Is it a known issue? Any idea why this fails?

Original issue reported on code.google.com by [email protected] on 25 Sep 2012 at 1:31

What if the Manager fails ?

First of all; Great project!

I wonder what happens when the manager fail.

For an example, the project MMM needs a writer with only readers, when your 
write is down you have an issue, so a SPOF.

In this case you have a manager which is also a SPOF because when it goes down, 
the servers are not checked anymore and if a Master fails another Slave cannot 
become a Master.

Is it not an idea to have all Slave nodes be some sort of Manager which checks 
what slave is the last Slave and becomes master when the master goes down ? In 
this case the master never needs to be a Manager too.

I think about this because I don't want to hev a SPOF. In a Master<>Master 
replacation is should have no SPOF but a split brain can occure instead which 
is not nice at all.

Original issue reported on code.google.com by [email protected] on 22 Feb 2012 at 10:35

masterha_check_repl ignores relay-log defined in /etc/my.cnf

What steps will reproduce the problem?
1. Define relay-log in /etc/my.cnf as: relay-log = /data/relaylogs/relay-bin
2. Define datadir in /etc/my.cnf as: datadir = /data
3. From MHA manager, run masterha_check_repl  --conf=/etc/MHA.cnf

What is the expected output? What do you see instead?
The relay_log_info should be /data/relaylogs/relay-log.info but it is 
"/data/relay-log.info" and fails.


What version of the product are you using? On what operating system?
0.53 of Node and Manager on CentOS 5.8

Please provide any additional information below.

Mon May 14 10:23:54 2012 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user=mhadmin --slave_host=db2 --slave_ip=99.99.99.239 
--slave_port=3306 --workdir=/var/log/masterha --target_version=5.0.92-50-log 
--manager_version=0.53 --relay_log_info=/data/relay-log.info  
--relay_dir=/data/  --slave_pass=xxx
Mon May 14 10:23:54 2012 - [info]   Connecting to [email protected](db2:22).. 
  Checking slave recovery environment settings..
    Opening /data/relay-log.info ...Could not open relay-log-info file /data/relay-log.info.
 at /usr/bin/apply_diff_relay_logs line 306

Original issue reported on code.google.com by [email protected] on 14 May 2012 at 5:34

Testing mysql connection and privileges..sh: mysql: command not found


1. Try start masterha_check_repl --conf=/etc/app1.cnf
2. Get this error:

[root@Manager ~]# masterha_check_repl --conf=/etc/masterha_default.cnf 
Thu May 24 14:32:05 2012 - [info] Reading default configuratoins from 
/etc/masterha_default.cnf..
Thu May 24 14:32:05 2012 - [info] Reading application default configurations 
from /etc/masterha_default.cnf..
Thu May 24 14:32:05 2012 - [info] Reading server configurations from 
/etc/masterha_default.cnf..
Thu May 24 14:32:05 2012 - [info] MHA::MasterMonitor version 0.52.
Thu May 24 14:32:05 2012 - [info] Dead Servers:
Thu May 24 14:32:05 2012 - [info] Alive Servers:
Thu May 24 14:32:05 2012 - [info]   Master(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info]   Slave(192.168.114.131:3306)
Thu May 24 14:32:05 2012 - [info]   Slave2(192.168.114.134:3306)
Thu May 24 14:32:05 2012 - [info] Alive Slaves:
Thu May 24 14:32:05 2012 - [info]   Slave(192.168.114.131:3306)  
Version=5.5.14-log (oldest major version between slaves) log-bin:enabled
Thu May 24 14:32:05 2012 - [info]     Replicating from 
192.168.114.132(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info]   Slave2(192.168.114.134:3306)  
Version=5.5.14-log (oldest major version between slaves) log-bin:enabled
Thu May 24 14:32:05 2012 - [info]     Replicating from 
192.168.114.132(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info] Current Alive Master: 
Master(192.168.114.132:3306)
Thu May 24 14:32:05 2012 - [info] Checking slave configurations..
Thu May 24 14:32:05 2012 - [warning]  read_only=1 is not set on slave 
Slave(192.168.114.131:3306).
Thu May 24 14:32:05 2012 - [warning]  relay_log_purge=0 is not set on slave 
Slave(192.168.114.131:3306).
Thu May 24 14:32:05 2012 - [warning]  read_only=1 is not set on slave 
Slave2(192.168.114.134:3306).
Thu May 24 14:32:05 2012 - [warning]  relay_log_purge=0 is not set on slave 
Slave2(192.168.114.134:3306).
Thu May 24 14:32:05 2012 - [info] Checking replication filtering settings..
Thu May 24 14:32:05 2012 - [info]  binlog_do_db= EcommerceDB, binlog_ignore_db= 
information_schema,mysql,performance_schema,test
Thu May 24 14:32:05 2012 - [info]  Replication filtering check ok.
Thu May 24 14:32:05 2012 - [info] Starting SSH connection tests..
Thu May 24 14:32:07 2012 - [info] All SSH connection tests passed successfully.
Thu May 24 14:32:07 2012 - [info] Checking MHA Node version..
Thu May 24 14:32:08 2012 - [info]  Version check ok.
Thu May 24 14:32:08 2012 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on the current master..
Thu May 24 14:32:08 2012 - [info]   Executing command: save_binary_logs 
--command=test --start_file=ecommerce-bin.000001 --start_pos=4 
--binlog_dir=/data/ecommerce_bin_log 
--output_file=/var/log/masterha/app1/save_binary_logs_test 
--manager_version=0.52 
Thu May 24 14:32:08 2012 - [info]   Connecting to root@Master(Master).. 
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/ecommerce_bin_log, up to ecommerce-bin.000001
Thu May 24 14:32:09 2012 - [info] Master setting check done.
Thu May 24 14:32:09 2012 - [info] Checking SSH publickey authentication and 
checking recovery script configurations on all alive slave servers..
Thu May 24 14:32:09 2012 - [info]   Executing command : apply_diff_relay_logs 
--command=test --slave_user=root --slave_host=Slave --slave_ip=192.168.114.131 
--slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.14-log 
--manager_version=0.52 --relay_log_info=/usr/local/mysql/data/relay-log.info  
--slave_pass=xxx
Thu May 24 14:32:09 2012 - [info]   Connecting to [email protected](Slave).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /data/ecommerce_relay_log, up to ecommerce-relay-bin.000003
    Temporary relay log file is /data/ecommerce_relay_log/ecommerce-relay-bin.000003
    Testing mysql connection and privileges..sh: mysql: command not found
mysql command failed with rc 127:0!
 at /usr/bin/apply_diff_relay_logs line 315
        main::check() called at /usr/bin/apply_diff_relay_logs line 429
        eval {...} called at /usr/bin/apply_diff_relay_logs line 409
        main::main() called at /usr/bin/apply_diff_relay_logs line 97
Thu May 24 14:32:09 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln129] Slaves 
settings check failed!
Thu May 24 14:32:09 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln304] Slave 
configuration failed.
Thu May 24 14:32:09 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln315] Error 
happend on checking configurations.  at /usr/bin/masterha_check_repl line 48
Thu May 24 14:32:09 2012 - 
[error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln396] Error 
happened on monitoring servers.
Thu May 24 14:32:09 2012 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!


3. This is my /etc/app1.cnf

manager_log=/var/log/masterha/app1/app1.log
manager_workdir=/var/log/masterha/app1
user=root
password=123456
remote_workdir=/data/ecommerce_bin_log

[server1]
hostname=Master

[server2]
hostname=Slave
candidate_master=1

[server3]
hostname=Slave2

Pls help me explain why does this error happen?


OS:    CentOS Release 5.2
Mysql: 5.5.14( build from source )
basedir: /usr/local/mysql
datadir: /usr/local/mysql/data


Pls help me check and give me an advice as soon as possible

Thanks,
[email protected]

Original issue reported on code.google.com by [email protected] on 24 May 2012 at 7:50

Principle of least privilege

This is not a problem report, rather a question or request for more 
documentation.
Do you know the minimum required privileges for the mysql user which will be 
used by mha to do the failover? I'd rather not use a full dba account if 
possible.
Of course I'm also chmoding 600 to the cnf file containing the password.

Original issue reported on code.google.com by [email protected] on 2 Jan 2013 at 9:30

Error with ---ignore_fail_on_start=1

What steps will reproduce the problem?
1. Get lastest(on 17.12.2012) code by git from  
https://github.com/yoshinorim/mha4mysql-manager.git

2. make && install

perl Makefile.PL  PREFIX=/usr
make
make install

3.my mha conf file
##
# init users and dirs
##
# list of servers
[server1]
hostname=us1
ip=172.16.50.11
candidate_master=1
ignore_fail=1

[server2]
hostname=us4
ip=172.16.50.14
candidate_master=1
ignore_fail=1

[server3]
ignore_fail=1
hostname=us3
ip=172.16.50.13
candidate_master=1



4. 

Check that mysql is stopped on server 172.16.50.13, and run on other

Run mha-manager
masterha_manager --ignore_fail_on_start=1  --conf=/home/mha4mysql/etc/app1.cnf

What is the expected output?
Mha should start and notice that US3 is dead and then continue work

What do you see instead?
Mha gone out with error

############
############
Mon Dec 17 10:58:08 2012 - [warning] Global configuration file 
/etc/masterha_default.cnf not found. Skipping.
Mon Dec 17 10:58:08 2012 - [info] Reading application default configurations 
from /home/mha4mysql/etc/app1.cnf..
Mon Dec 17 10:58:08 2012 - [info] Reading server configurations from 
/home/mha4mysql/etc/app1.cnf..
Mon Dec 17 10:58:08 2012 - [info] MHA::MasterMonitor version 0.55.
Mon Dec 17 10:58:08 2012 - [info] Dead Servers:
Mon Dec 17 10:58:08 2012 - [info]   us3(172.16.50.13:3306)
Mon Dec 17 10:58:08 2012 - [info] Alive Servers:
Mon Dec 17 10:58:08 2012 - [info]   funky(172.16.50.11:3306)
Mon Dec 17 10:58:08 2012 - [info]   us4(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info] Alive Slaves:
Mon Dec 17 10:58:08 2012 - [info]   funky(172.16.50.11:3306)  
Version=5.5.28-MariaDB-log (oldest major version between slaves) log-bin:enabled
Mon Dec 17 10:58:08 2012 - [info]     Replicating from 
172.16.50.14(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info]     Primary candidate for the new Master 
(candidate_master is set)
Mon Dec 17 10:58:08 2012 - [info] Current Alive Master: us4(172.16.50.14:3306)
Mon Dec 17 10:58:08 2012 - [info] Checking slave configurations..
Mon Dec 17 10:58:08 2012 - [info]  read_only=1 is not set on slave 
funky(172.16.50.11:3306).
Mon Dec 17 10:58:08 2012 - [warning]  relay_log_purge=0 is not set on slave 
funky(172.16.50.11:3306).
Mon Dec 17 10:58:08 2012 - [info] Checking replication filtering settings..
Mon Dec 17 10:58:08 2012 - [info]  binlog_do_db= testdb, binlog_ignore_db=
Mon Dec 17 10:58:08 2012 - [info]  Replication filtering check ok.
Mon Dec 17 10:58:08 2012 - [info] Starting SSH connection tests..
Mon Dec 17 10:58:09 2012 - [info] All SSH connection tests passed successfully.
Mon Dec 17 10:58:09 2012 - [info] Checking MHA Node version..
Mon Dec 17 10:58:09 2012 - [info]  Version check ok.
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/ServerManager.pm, 
ln443]  Server us3(172.16.50.13:3306) is dead, but must be alive! Check server 
settings.
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/MasterMonitor.pm, 
ln386] Error happend on checking configurations.  at 
/usr/share/perl/5.14/MHA/MasterMonitor.pm line 363
Mon Dec 17 10:58:09 2012 - [error][/usr/share/perl/5.14/MHA/MasterMonitor.pm, 
ln482] Error happened on monitoring servers.
Mon Dec 17 10:58:09 2012 - [info] Got exit code 1 (Not master dead).
###########
###########

Please provide any additional information below.

I am not ace in perl, but i try to debug error.
I add printf in MasterMonitor.pm after section "GetOptions(" in "sub main "
it prints evrytime 0 and not depend what i write in arg when execute
masterha_manager --ignore_fail_on_start=1  --conf=/home/mha4mysql/etc/app1.cnf
or 
masterha_manager --ignore_fail_on_start=0  --conf=/home/mha4mysql/etc/app1.cnf

I found what need to change

diff --git a/lib/MHA/MasterMonitor.pm b/lib/MHA/MasterMonitor.pm
index 71945de..ff80c89 100644
--- a/lib/MHA/MasterMonitor.pm
+++ b/lib/MHA/MasterMonitor.pm
@@ -636,7 +636,7 @@ sub main {
     'manager_log=s'           => \$g_logfile,
     'skip_ssh_check'          => \$g_skip_ssh_check,          # for testing
     'skip_check_ssh'          => \$g_skip_ssh_check,
-    'ignore_fail_on_start'    => \$g_ignore_fail_on_start,
+    'ignore_fail_on_start=i'    => \$g_ignore_fail_on_start,
   );
   setpgrp( 0, $$ ) unless ($g_interactive);

After that mha-manager works with argument correctly and as expected


Check it please

Original issue reported on code.google.com by [email protected] on 17 Dec 2012 at 11:05

Event scheduler can block the failover

What steps will reproduce the problem?

When event scheduler is activated, a process can block the failover due that 
the manager consider that like a long update.

What is the expected output?

You should ignore the event scheduler thread in the failover process.
It's possible to bypass this issue with a "set global event_scheduler=OFF" 
command on the mysql servers(Master and slaves)

What do you see instead?

Tue Jun 12 11:27:56 2012 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. 
This may take long time..
Tue Jun 12 11:27:56 2012 - [info]  ok.
Tue Jun 12 11:27:56 2012 - [info] Checking MHA is not monitoring or doing 
failover..
Tue Jun 12 11:27:56 2012 - [info] Checking replication health on XXX.XX.XX.XX..
Tue Jun 12 11:27:56 2012 - [info]  ok.
Tue Jun 12 11:27:56 2012 - [error][/.../MasterRotate.pm, ln161] We should not 
start online master switch when one of connections are running long updates on 
the current master. Currently 1 update thread(s) are running.
Details:
{'Time' => '48476','Command' => 'Daemon','db' => undef,'Id' => '1','Info' => 
undef,'User' => 'event_scheduler','State' => 'Waiting for next 
activation','Host' => 'localhost'}
Tue Jun 12 11:27:56 2012 - [error][/.../ManagerUtil.pm, ln178] Got ERROR:  at 
/.../masterha_master_switch line 53

What version of the product are you using? On what operating system?

0.53
Debian 6

Original issue reported on code.google.com by [email protected] on 12 Jun 2012 at 4:36

fantasyni / mysql-master-ha Goto Github PK

mysql-master-ha's People

Contributors

mysql-master-ha's Issues

Recommend Projects

Recommend Topics

Recommend Org