Coder Social home page Coder Social logo

codership / glb Goto Github PK

View Code? Open in Web Editor NEW
153.0 153.0 51.0 525 KB

Galera Load Balancer - a simple TCP connection proxy and load-balancing library

License: GNU General Public License v2.0

Shell 5.61% C 91.53% Makefile 0.41% M4 2.00% Dockerfile 0.45%

glb's People

Contributors

abychko avatar ayurchen avatar joffrey92 avatar temeo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glb's Issues

Destination failover does not work on CentOS 6

If the first tried destination is unavailable, the connecting client gets hung. The following patch seems to get it working:

--- a/src/glb_pool.c
+++ b/src/glb_pool.c
@@ -123,10 +123,10 @@ typedef enum pool_fd_ops
 {
 #ifdef USE_EPOLL
     POOL_FD_READ  = EPOLLIN,
-    POOL_FD_WRITE = EPOLLOUT,
+    POOL_FD_WRITE = EPOLLOUT | EPOLLERR,
 #else /* POLL */
     POOL_FD_READ  = POLLIN,
-    POOL_FD_WRITE = POLLOUT,
+    POOL_FD_WRITE = POLLOUT | POLLERR,
 #endif /* POLL */
     POOL_FD_RW    = POOL_FD_READ | POOL_FD_WRITE
 } pool_fd_ops_t;

But debug build asserts:

   INFO: glb_pool.c:400: Pool 0: added connection, (total pool connections: 1)
  DEBUG: glb_pool.c:727: pool_handle_write() to server: 0
   INFO: glb_pool.c:685: Async connection to 10.21.32.1:3305 failed: 111 (Connection refused)
   INFO: glb_pool.c:697: Reconnecting to 10.21.32.1:3304
   INFO: glb_listener.c:100: Accepted connection from 10.21.32.1:52057 to 10.21.32.1:3305

   INFO: glb_pool.c:400: Pool 0: added connection, (total pool connections: 42949672961)
glbd: glb_pool.c:733: pool_handle_write: Assertion `dst->end != POOL_END_INCOMPLETE' failed.
   INFO: glb_signal.c:42: Received signal 6. Terminating.
Aborted

Interestingly, this (client hanging, not assert) happens on CentOS, but does not seem to happen on Ubuntu...

It looks like the process goes into tight loop here:

Thread 4 (Thread 0x7f59c0114700 (LWP 19458)):
#0  0x00007f59c01fdf43 in epoll_wait () from /lib64/libc.so.6
#1  0x000000000040c901 in pool_fds_wait (pool=0x7f59c0a5c058) at glb_pool.c:260
#2  0x000000000040df81 in pool_thread (arg=0x7f59c0a5c058) at glb_pool.c:829
#3  0x00007f59c04af851 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f59c01fd94d in clone () from /lib64/libc.so.6

glb_listener.c:79: Failed to connect to destination: 112 (Host is down)

My webserver sometimes returns 503 statuses, hence not working, a retry of the request always fixes it.

Howerver, at the core, this is what glb returns for me:

glb_listener.c:79: Failed to connect to destination: 112 (Host is down)
And then a couple of time for 1 second and then no errors anymoer.

This happens every now and then, is this a known glb issue or do I need to investigate somewhere else?

Galera Load Balancer segment fault

Hi there!

I was trying to shot GLB for Galera cluster. In normal mode, it works correctly (mostly).
During verbose mode, it always take me to segfault.

I test it with CFLAGS -O2, result the same.

I test it on VirtualBox, Debian Wheezy 32-bit, 512Mb ram.

When i start it, i take a segfault:

root@galera1:~# glbd -w exec:"/root/glb-1.0.1/files/mysql.sh -uroot -proot" -t 3 0.0.0.0:3306 galera1:3308 galera2:3308 galera3:3308
glb v1.0.1 (epoll)
Incoming address: 0.0.0.0:3306, control FIFO: /tmp/glbd.fifo
Control address: none
Number of threads: 3, max conn: 493, nodelay: ON, keepalive: ON, defer accept: OFF, linger: OFF, daemon: NO, lat.count: 0, policy: 'least connected', top: NO, verbose: NO
Destinations: 3
0: 127.0.1.1:3308 , w: 1.000
1: 192.168.55.114:3308 , w: 1.000
2: 192.168.55.115:3308 , w: 1.000
Watchdog:
Address : exp setw state lat curw
127.0.1.1:3308 : + 1.000 READY 0.21315 1.000
192.168.55.114:3308 : + 1.000 READY 0.21868 1.000
192.168.55.115:3308 : + 1.000 NOTFOUND 0.00000 -1.000
Destinations: 3

Router:
Address : weight usage map conns
192.168.55.114:3308 : 1.000 0.000 N/A 0
127.0.1.1:3308 : 1.000 0.000 N/A 0
Destinations: 2, total connections: 0 of 493 max

Pool: connections per thread: 0 0 0

*** glibc detected *** glbd: realloc(): invalid pointer: 0x09b69230 ***
======= Backtrace: =========
/lib/i386-linux-gnu/i686/cmov/libc.so.6(+0x70f01)[0xb760ff01]
/lib/i386-linux-gnu/i686/cmov/libc.so.6(realloc+0x2bb)[0xb7615acb]
glbd[0x804d269]
======= Memory map: ========
08048000-0805d000 r-xp 00000000 08:01 13804 /usr/sbin/glbd
0805d000-0805e000 rw-p 00014000 08:01 13804 /usr/sbin/glbd
09b5b000-09b7c000 rw-p 00000000 00:00 0 [heap]
b2ca4000-b2cc0000 r-xp 00000000 08:01 129301 /lib/i386-linux-gnu/libgcc_s.so.1
b2cc0000-b2cc1000 rw-p 0001b000 08:01 129301 /lib/i386-linux-gnu/libgcc_s.so.1
b2cc6000-b2cc7000 ---p 00000000 00:00 0
b2cc7000-b34c7000 rw-p 00000000 00:00 0
b34c7000-b34c8000 ---p 00000000 00:00 0
b34c8000-b3cc8000 rw-p 00000000 00:00 0
b3cc8000-b3cc9000 ---p 00000000 00:00 0
b3cc9000-b44cb000 rw-p 00000000 00:00 0
b44cb000-b44cc000 ---p 00000000 00:00 0
b44cc000-b4ccc000 rw-p 00000000 00:00 0
b4ccc000-b4ccd000 ---p 00000000 00:00 0
b4ccd000-b54cd000 rw-p 00000000 00:00 0
b54cd000-b54ce000 ---p 00000000 00:00 0
b54ce000-b5cce000 rw-p 00000000 00:00 0
b5cce000-b5ccf000 ---p 00000000 00:00 0
b5ccf000-b64cf000 rw-p 00000000 00:00 0
b64cf000-b64d0000 ---p 00000000 00:00 0
b64d0000-b6cd0000 rw-p 00000000 00:00 0
b6cd0000-b6cd1000 ---p 00000000 00:00 0
b6cd1000-b7592000 rw-p 00000000 00:00 0
b7592000-b759c000 r-xp 00000000 08:01 134570 /lib/i386-linux-gnu/i686/cmov/libnss_files-2.13.so
b759c000-b759d000 r--p 00009000 08:01 134570 /lib/i386-linux-gnu/i686/cmov/libnss_files-2.13.so
b759d000-b759e000 rw-p 0000a000 08:01 134570 /lib/i386-linux-gnu/i686/cmov/libnss_files-2.13.so
b759e000-b759f000 rw-p 00000000 00:00 0
b759f000-b76fb000 r-xp 00000000 08:01 134576 /lib/i386-linux-gnu/i686/cmov/libc-2.13.so
b76fb000-b76fc000 ---p 0015c000 08:01 134576 /lib/i386-linux-gnu/i686/cmov/libc-2.13.so
b76fc000-b76fe000 r--p 0015c000 08:01 134576 /lib/i386-linux-gnu/i686/cmov/libc-2.13.so
b76fe000-b76ff000 rw-p 0015e000 08:01 134576 /lib/i386-linux-gnu/i686/cmov/libc-2.13.so
b76ff000-b7703000 rw-p 00000000 00:00 0
b7703000-b7718000 r-xp 00000000 08:01 134565 /lib/i386-linux-gnu/i686/cmov/libpthread-2.13.so
b7718000-b7719000 r--p 00014000 08:01 134565 /lib/i386-linux-gnu/i686/cmov/libpthread-2.13.so
b7719000-b771a000 rw-p 00015000 08:01 134565 /lib/i386-linux-gnu/i686/cmov/libpthread-2.13.so
b771a000-b771c000 rw-p 00000000 00:00 0
b771c000-b771e000 r-xp 00000000 08:01 134569 /lib/i386-linux-gnu/i686/cmov/libdl-2.13.so
b771e000-b771f000 r--p 00001000 08:01 134569 /lib/i386-linux-gnu/i686/cmov/libdl-2.13.so
b771f000-b7720000 rw-p 00002000 08:01 134569 /lib/i386-linux-gnu/i686/cmov/libdl-2.13.so
b7720000-b7727000 rw-p 00000000 00:00 0
b7727000-b7728000 r-xp 00000000 00:00 0 [vdso]
b7728000-b7744000 r-xp 00000000 08:01 129352 /lib/i386-linux-gnu/ld-2.13.so
b7744000-b7745000 r--p 0001b000 08:01 129352 /lib/i386-linux-gnu/ld-2.13.so
b7745000-b7746000 rw-p 0001c000 08:01 129352 /lib/i386-linux-gnu/ld-2.13.so
bfd7d000-bfd9e000 rw-p 00000000 00:00 0 [stack]
INFO: glb_signal.c:42: Received signal 6. Terminating.
Aborted

I looked up to messages:

root@galera1:~# cat /var/log/messages
Dec 20 15:40:06 galera1 kernel: [ 5426.947387] glbd[4956]: segfault at 135d ip b76452f4 sp b5d2cbdc error 4 in libc-2.13.so[b7601000+15c000

root@galera1:~# glbd -v -w exec:"/root/glb-1.0.1/files/mysql.sh -uroot -proot" -t 3 3306 galera1:3308 galera2:3308 galera3:3308
glb v1.0.1 (epoll)
Incoming address: 0.0.0.0:3306, control FIFO: /tmp/glbd.fifo
Control address: none
Number of threads: 3, max conn: 493, nodelay: ON, keepalive: ON, defer accept: OFF, linger: OFF, daemon: NO, lat.count: 0, policy: 'least connected', top: NO, verbose: YES
Destinations: 3
0: 127.0.1.1:3308 , w: 1.000
1: 192.168.55.114:3308 , w: 1.000
2: 192.168.55.115:3308 , w: 1.000
DEBUG: glb_wdog.c:156: Adding ' 127.0.1.1:3308 , w: 1.000' at pos. 0
DEBUG: glb_wdog.c:97: Created context for 127.0.1.1:3308
Segmentation fault

Trying to find in code, testing with debug output, clear out the segfault place:
File glb_wdog.c
line 186: pthread_create (&ctx->id, NULL, wdog->backend.thread, ctx);

int status;
status = pthread_create (&ctx->id, NULL, wdog->backend.thread, ctx);
if(status != 0)
printf("%i\n", status); // Not Execute !!!

Please, fix it.
Take my best wishes.
From Russia, with love.

Watchdog

When running
glbd -c 127.0.0.1:4444 -a --max_conn 20000 -l -w exec:"tcp.sh" 0.0.0.0:25565 ip:1336:5 ip:1337:5

It runs and then gives me a error:
http://pastiebin.com/53d83f3e424b4

Then the next time around it gives me this
http://pastiebin.com/53d83f5bbb14e

It removes the IPs from the Watchdog list completely.

This is what I'm using for the watchdog script:

It's a python script and then a shell script that executes the python script:
http://pastiebin.com/53d83f79b1429

shell script:

python tcp.py $1

I've tested this manually and it works.
root@1:# ./tcp.sh 8.8.8.8:53
3
root@1:
#

root@1:# ./tcp.sh 8.8.8.8:2754
0
root@1:
#

Any idea how to fix this?

Galera Load Balancer connections crush

Hi there!

I try to connect more than 2 clients to galera cluster through glb.
What have i got:

root@galera1:~/glb-1.0.1# glbd -w exec:"/root/glb-1.0.1/files/mysql.sh -uroot -proot" -m 512 -t 3 3306 galera3:3308 galera2:3308 galera1:3308

Watchdog:

Address : exp setw state lat curw
192.168.55.115:3308 : + 1.000 READY 0.18607 1.000
192.168.55.114:3308 : + 1.000 READY 0.17800 1.000
127.0.1.1:3308 : + 1.000 READY 0.17787 1.000
Destinations: 3

Router:
Address : weight usage map conns
127.0.1.1:3308 : 1.000 0.500 N/A 1
192.168.55.114:3308 : 1.000 0.500 N/A 1
192.168.55.115:3308 : 1.000 0.500 N/A 1
Destinations: 3, total connections: 3 of 512 max

Pool: connections per thread: 1 1 1

glbd: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
INFO: glb_signal.c:42: Received signal 6. Terminating.
Aborted

I can send you virtualbox vm with galera cluster and glb.
Please, fix it.
Sincerely,
from Russia, with love.

What is high latency according to watchdog?

I assume watchdog decides when a server is slow and take it out of the pool if latency is high, but what is considered high latency? A second, more?

Any thoughts on this?

Do I need to set the --latency flag to check latency or is there a default value and does the check happen anyway when not set?

Crash at startup with discovery feature

Hello,

When starting glbd, it crashes when used with -D flag.
What is strange is that I managed to make it work with discovery tomorrow, with now problem.
And today it's crashing.

I compiled version 1.0.1 on Ubuntu server 13.10. Simple cluster with 2 Galera nodes.

When adding the 2 nodes directly at startup, it runs OK, but if the cluster membership changes (node remove), it crashes.

I took a very quick look at the code. My guess is there is some parsing problem in glb_dst_parse, but I may be wrong on that.

Thanks in advance,
Marc

# /usr/local/sbin/glbd -v --fifo=/var/run/glbd.fifo -D -w exec:"mysql.sh -uroot -pXXXX" --control=127.0.0.1:8011 3306 192.168.56.104:3306
glb v1.0.1 (epoll)
Incoming address: 0.0.0.0:3306, control FIFO: /var/run/glbd.fifo
Control  address:  127.0.0.1:8011
Number of threads: 1, max conn: 493, nodelay: ON, keepalive: ON, defer accept: OFF, linger: OFF, daemon: NO, lat.count: 0, policy: 'least connected', top: NO, verbose: YES
Destinations: 1
   0:  192.168.56.104:3306 , w: 1.000
  DEBUG: glb_wdog.c:156: Adding ' 192.168.56.104:3306 , w: 1.000' at pos. 0
  DEBUG: glb_wdog.c:97: Created context for 192.168.56.104:3306
  DEBUG: glb_wdog_exec.c:194: exec thread: 139983978346240, errno: 0 (Success), pid: 6576, cmd: 'mysql.sh 192.168.56.104:3306 -uroot -ptoto'
  DEBUG: glb_wdog.c:188: Backend thread for '192.168.56.104:3306' started.
  DEBUG: glb_wdog.c:495: main loop collecting...
  DEBUG: glb_wdog.c:495: main loop collecting...
  DEBUG: glb_wdog.c:355: Setting memb_changed because changed_length: 1 or strcmp(
 old: ''
 new: '192.168.56.102:3306,192.168.56.104:3306
'): -49
  DEBUG: glb_wdog.c:568: Changing weight for '192.168.56.104:3306': -1.000 ->  1.000:  0 (Success)
  DEBUG: glb_wdog.c:156: Adding ' 192.168.56.102:3306 , w: 1.000' at pos. 1
  DEBUG: glb_wdog.c:97: Created context for 192.168.56.102:3306
  DEBUG: glb_wdog_exec.c:194: exec thread: 139983961560832, errno: 0 (Success), pid: 6594, cmd: 'mysql.sh 192.168.56.102:3306 -uroot -ptoto'
  DEBUG: glb_wdog.c:188: Backend thread for '192.168.56.102:3306' started.
  ERROR: glb_socket.c:90: Unknown host U.

  ERROR: glb_dst.c:86: Invalid argument
  ERROR: glb_wdog.c:450: Failed to parse destination 'U': 22 (Invalid argument). Skipping.
Watchdog:
------------------------------------------------------------
        Address       : exp  setw     state    lat     curw
 192.168.56.104:3306  :  +   1.000    READY  0.02680   1.000
 192.168.56.102:3306  :      1.000 NOTFOUND  0.00000  -1.000
------------------------------------------------------------
Destinations: 2

Router:
------------------------------------------------------
        Address       :   weight   usage    map  conns
 192.168.56.104:3306  :    1.000   0.000    N/A      0
------------------------------------------------------
Destinations: 1, total connections: 0 of 493 max

Pool: connections per thread:     0

  DEBUG: glb_wdog.c:495: main loop collecting...
  DEBUG: glb_wdog.c:355: Setting memb_changed because changed_length: 0 or strcmp(
 old: '192.168.56.102:3306,192.168.56.104:3306
U'
 new: '192.168.56.102:3306,192.168.56.104:3306
'): 85
  DEBUG: glb_wdog.c:355: Setting memb_changed because changed_length: 1 or strcmp(
 old: '
       $��'
 new: '192.168.56.102:3306,192.168.56.104:3306
'): -37
*** Error in `/usr/local/sbin/glbd': realloc(): invalid pointer: 0x00007f50800008f0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7f4c6)[0x7f508fd5c4c6]
/lib/x86_64-linux-gnu/libc.so.6(realloc+0x300)[0x7f508fd60cf0]
/usr/local/sbin/glbd[0x405273]
/usr/local/sbin/glbd[0x4096ac]
/usr/local/sbin/glbd[0x409f1d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f6e)[0x7f50900acf6e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f508fdd79cd]
======= Memory map: ========
00400000-00415000 r-xp 00000000 fc:00 27285                              /usr/local/sbin/glbd
00614000-00615000 r--p 00014000 fc:00 27285                              /usr/local/sbin/glbd
00615000-00616000 rw-p 00015000 fc:00 27285                              /usr/local/sbin/glbd
01b83000-01ba4000 rw-p 00000000 00:00 0                                  [heap]
7f5080000000-7f5080021000 rw-p 00000000 00:00 0 
7f5080021000-7f5084000000 ---p 00000000 00:00 0 
7f5084000000-7f5084022000 rw-p 00000000 00:00 0 
7f5084022000-7f5088000000 ---p 00000000 00:00 0 
7f5088000000-7f5088022000 rw-p 00000000 00:00 0 
7f5088022000-7f508c000000 ---p 00000000 00:00 0 
7f508c08c000-7f508c0a1000 r-xp 00000000 fc:00 153595                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f508c0a1000-7f508c2a0000 ---p 00015000 fc:00 153595                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f508c2a0000-7f508c2a1000 r--p 00014000 fc:00 153595                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f508c2a1000-7f508c2a2000 rw-p 00015000 fc:00 153595                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f508c2a2000-7f508c2a3000 ---p 00000000 00:00 0 
7f508c2a3000-7f508caa3000 rw-p 00000000 00:00 0                          [stack:6611]
7f508caa3000-7f508caa4000 ---p 00000000 00:00 0 
7f508caa4000-7f508d2a4000 rw-p 00000000 00:00 0                          [stack:6610]
7f508d2a4000-7f508d2a6000 r-xp 00000000 fc:00 153290                     /lib/libnss_mdns4.so.2
7f508d2a6000-7f508d4a6000 ---p 00002000 fc:00 153290                     /lib/libnss_mdns4.so.2
7f508d4a6000-7f508d4a7000 r--p 00002000 fc:00 153290                     /lib/libnss_mdns4.so.2
7f508d4a7000-7f508d4a8000 rw-p 00003000 fc:00 153290                     /lib/libnss_mdns4.so.2
7f508d4a8000-7f508d4be000 r-xp 00000000 fc:00 131187                     /lib/x86_64-linux-gnu/libresolv-2.17.so
7f508d4be000-7f508d6be000 ---p 00016000 fc:00 131187                     /lib/x86_64-linux-gnu/libresolv-2.17.so
7f508d6be000-7f508d6bf000 r--p 00016000 fc:00 131187                     /lib/x86_64-linux-gnu/libresolv-2.17.so
7f508d6bf000-7f508d6c0000 rw-p 00017000 fc:00 131187                     /lib/x86_64-linux-gnu/libresolv-2.17.so
7f508d6c0000-7f508d6c2000 rw-p 00000000 00:00 0 
7f508d6c2000-7f508d6c8000 r-xp 00000000 fc:00 131150                     /lib/x86_64-linux-gnu/libnss_dns-2.17.so
7f508d6c8000-7f508d8c7000 ---p 00006000 fc:00 131150                     /lib/x86_64-linux-gnu/libnss_dns-2.17.so
7f508d8c7000-7f508d8c8000 r--p 00005000 fc:00 131150                     /lib/x86_64-linux-gnu/libnss_dns-2.17.so
7f508d8c8000-7f508d8c9000 rw-p 00006000 fc:00 131150                     /lib/x86_64-linux-gnu/libnss_dns-2.17.so
7f508d8c9000-7f508d8cb000 r-xp 00000000 fc:00 153294                     /lib/libnss_mdns4_minimal.so.2
7f508d8cb000-7f508daca000 ---p 00002000 fc:00 153294                     /lib/libnss_mdns4_minimal.so.2
7f508daca000-7f508dacb000 r--p 00001000 fc:00 153294                     /lib/libnss_mdns4_minimal.so.2
7f508dacb000-7f508dacc000 rw-p 00002000 fc:00 153294                     /lib/libnss_mdns4_minimal.so.2
7f508dacc000-7f508dad8000 r-xp 00000000 fc:00 131152                     /lib/x86_64-linux-gnu/libnss_files-2.17.so
7f508dad8000-7f508dcd7000 ---p 0000c000 fc:00 131152                     /lib/x86_64-linux-gnu/libnss_files-2.17.so
7f508dcd7000-7f508dcd8000 r--p 0000b000 fc:00 131152                     /lib/x86_64-linux-gnu/libnss_files-2.17.so
7f508dcd8000-7f508dcd9000 rw-p 0000c000 fc:00 131152                     /lib/x86_64-linux-gnu/libnss_files-2.17.so
7f508dcd9000-7f508dcda000 ---p 00000000 00:00 0 
7f508dcda000-7f508e4da000 rw-p 00000000 00:00 0                          [stack:6593]
7f508e4da000-7f508e4db000 ---p 00000000 00:00 0 
7f508e4db000-7f508ecdb000 rw-p 00000000 00:00 0                          [stack:6577]
7f508ecdb000-7f508ecdc000 ---p 00000000 00:00 0 
7f508ecdc000-7f508f4dc000 rw-p 00000000 00:00 0                          [stack:6575]
7f508f4dc000-7f508f4dd000 ---p 00000000 00:00 0 
7f508f4dd000-7f508fcdd000 rw-p 00000000 00:00 0                          [stack:6574]
7f508fcdd000-7f508fe9a000 r-xp 00000000 fc:00 131095                     /lib/x86_64-linux-gnu/libc-2.17.so
7f508fe9a000-7f509009a000 ---p 001bd000 fc:00 131095                     /lib/x86_64-linux-gnu/libc-2.17.so
7f509009a000-7f509009e000 r--p 001bd000 fc:00 131095                     /lib/x86_64-linux-gnu/libc-2.17.so
7f509009e000-7f50900a0000 rw-p 001c1000 fc:00 131095                     /lib/x86_64-linux-gnu/libc-2.17.so
7f50900a0000-7f50900a5000 rw-p 00000000 00:00 0 
7f50900a5000-7f50900bc000 r-xp 00000000 fc:00 131183                     /lib/x86_64-linux-gnu/libpthread-2.17.so
7f50900bc000-7f50902bc000 ---p 00017000 fc:00 131183                     /lib/x86_64-linux-gnu/libpthread-2.17.so
7f50902bc000-7f50902bd000 r--p 00017000 fc:00 131183                     /lib/x86_64-linux-gnu/libpthread-2.17.so
7f50902bd000-7f50902be000 rw-p 00018000 fc:00 131183                     /lib/x86_64-linux-gnu/libpthread-2.17.so
7f50902be000-7f50902c2000 rw-p 00000000 00:00 0 
7f50902c2000-7f50902e5000 r-xp 00000000 fc:00 131075                     /lib/x86_64-linux-gnu/ld-2.17.so
7f5090458000-7f50904dd000 rw-p 00000000 00:00 0 
7f50904dd000-7f50904e4000 rw-p 00000000 00:00 0 
7f50904e4000-7f50904e5000 r--p 00022000 fc:00 131075                     /lib/x86_64-linux-gnu/ld-2.17.so
7f50904e5000-7f50904e7000 rw-p 00023000 fc:00 131075                     /lib/x86_64-linux-gnu/ld-2.17.so
7fff2cf25000-7fff2cf46000 rw-p 00000000 00:00 0                          [stack]
7fff2cffe000-7fff2d000000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
   INFO: glb_signal.c:42: Received signal 6. Terminating.
Aborted (core dumped)

glb/mysql.sh parsing error when starting arbitrator daemon

Hi,

I start a galera load balancer 1.0.1 with following start options:

[root@lb01 bin]# glbd -w exec:"mysql.sh -d 2 -umon -pmon" -b 10.0.0.1:3333 10.0.0.2:3306:1 -D

It detects successfully the cluster topology, and adds the available nodes to the load balancing policy.

When I start an arbitration daemon (garbd), this changes the wsrep_incoming_address from:
| wsrep_incoming_addresses | 10.0.0.2:3306,10.0.0.3:3306,10.0.04 |
to:
| wsrep_incoming_addresses | ,10.0.0.2:3306,10.0.0.3:3306,10.0.04 |

And the following error is displayed in glbd:
ERROR: glb_socket.c:90: Unknown host ,.

ERROR: glb_dst.c:86: Invalid argument
ERROR: glb_wdog.c:450: Failed to parse destination ',': 22 (Invalid argument). Skipping.

The parsing from mysql.sh is failing.
Note: the load-balancer still works correctly, and route queries to healthy nodes.

Idea fix1: Do not try to contact empty IPs
Idea fix2: Put the IP of the garbd daemon in the listening adresses, and tag it with a "garbd" so that it is not used for load-balancing or other purposes.

Thanks in advance,
Joffrey

Negative connection count in glbd status

After some days of running glbd without problems I'm seeing weird negative connection count values in glbd status info. The backend gives no errors in log output.

[root@glb1 ~]# service glbd status
Router:
------------------------------------------------------
        Address       :   weight   usage    map  conns
      10.1.4.88:3306  :   20.000   0.750    N/A      3
      10.1.4.87:3306  :   10.000   1.250    N/A     -5
      10.1.4.86:3306  :    1.000   0.000    N/A      0
------------------------------------------------------
Destinations: 3, total connections: -2 of 10000 max
[root@glb1 ~]# echo "10.1.4.87:3306:0" | nc 127.0.0.1 4444
Ok
[root@glb1 ~]# service glbd status
Router:
------------------------------------------------------
        Address       :   weight   usage    map  conns
      10.1.4.88:3306  :   20.000   0.667    N/A      2
      10.1.4.87:3306  :    0.000    -nan    N/A     -5
      10.1.4.86:3306  :    1.000   0.000    N/A      0
------------------------------------------------------
Destinations: 3, total connections: -3 of 10000 max

about watchdog

Hello,
I've devellop some watchdog using glb_wdog_backend model.
It is working well, excect that router are only update after second watchdog call.
The first call ( after add with echo ip:port;weight ) set ctx ->state to 3 , but it seems that's router is not update this first time, i must patient during -i interval
Can you explain this behaviour
Regards,
Nicolas .

"start()" command in the start/stop script

In files/glbd.sh on line 110-115 the start() command checks the exit code of the "pidof" statement. Shouldn't it check the exit code of the preceding "eval" statement instead? (edit -- maybe not but it didnt work in my alpine container)

Additionally, the statement pidof /usr/local/sbin/glbd (pidof $exec) returns a blank string instead of a pid, but pidof $prog returns a value. Actually it returns 2 PIDs, one of which is the executable and the second is the init script, but only the first should go into the .pid file.

I replaced lines 110-115 with the following and it seems to run well:

if ! eval $exec $GLBD_OPTIONS $LISTEN_ADDR $DEFAULT_TARGETS; then
   echo "[`date`] $prog: failed to start."
   exit 1
fi
PID=`pidof -o %PPID $prog`

glb does not compile on Ubuntu 24.04

The same code and the same build script worked in Ubuntu 20.04 and 22.04 as well as on Debian 10 - 12. Compiling on Ubuntu 24.04 gave an error:

gcc --version
gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0

make
make all-recursive
make[1]: Entering directory '/home/build/buildpack/glb-1.0.1'
Making all in src
make[2]: Entering directory '/home/build/buildpack/glb-1.0.1/src'
gcc -DHAVE_CONFIG_H -I. -I.. -DNDEBUG -D_GNU_SOURCE -DUSE_EPOLL -Wdate-time -D_FORTIFY_SOURCE=3 -g -O3 -Wall -Werror -DGLBD -g -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -ffile-prefix-map=/home/build/buildpack/glb-1.0.1=. -flto=auto -ffat-lto-objects -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -fdebug-prefix-map=/home/build/buildpack/glb-1.0.1=/usr/src/glb-1.0.1-3-noble -c -o glbd-glb_wdog.o test -f 'glb_wdog.c' || echo './'glb_wdog.c
glb_wdog.c: In function 'wdog_copy_result':
glb_wdog.c:335:21: error: pointer 'others_72' may be used after 'realloc' [-Werror=use-after-free]
335 | free (others);
| ^~~~~~~~~~~~~
glb_wdog.c:332:36: note: call to 'realloc' here
332 | d->result.others = realloc (others, res->others_len);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [Makefile:720: glbd-glb_wdog.o] Error 1
make[2]: Leaving directory '/home/build/buildpack/glb-1.0.1/src'
make[1]: *** [Makefile:413: all-recursive] Error 1
make[1]: Leaving directory '/home/build/buildpack/glb-1.0.1'
make: *** [Makefile:345: all] Error 2

329 if (others_len < res->others_len ||
330 others_len > (res->others_len * 2)) {
331 // buffer size is too different, reallocate
332 d->result.others = realloc (others, res->others_len);
333 if (!d->result.others && res->others_len > 0) {
334 // this is pretty much fatal, but we'll try
335 free (others);
336 d->result.others_len = 0;
337 }
338 else {
339 changed_length = true;
340 d->result.others_len = res->others_len;
341 }
342 }

Would be happy for a fix/hint. Thanks!

Connection_timeout for mysql watchdog

In mysql.sh (row #53 in release 1.0.1) defines no 'connect_timeout' which defaults to 0 in mysql client. With the default value watchdog does not identify properly issue with backend server crash. Setting the 'connect_timeout' to some reasonably short value gives the desired effect of dropping the backend from the pool.

One way the set the 'connect_timeout' parameter is to use OTHER_OPTIONS variable (in glbd.cfg):

OTHER_OPTIONS="-w exec:'mysql.sh --connect_timeout=1 -uglbpinger -pingerpwd'"

This should be noted in the comments of the files/glbd.cfg file. The alternative approach would be to include the connection_timeout parameter in files/mysql.sh row #53 using some variable for setting the timeout value.

Reopen log file on -HUP signal

This is required to implement logrotate functionality via logrotate.d script, e.g.

/var/log/galera/*.log {
daily
rotate 5
compress
create 644 root root
dateext
postrotate
test -f /var/lock/subsys/garbd && /etc/init.d/garb reload || :
endscript
}

where "reload" command means -HUP

watchdog processes don't get cleaned up

I have been doing some testing on my servers with glbd and I have noticed that the watchdog processes apparently don't get cleaned up when glbd stops. I have been frequently restarting glbd intentionally and unintentionally and I currently have 378 watchdog processes running. The old processes aren't taking a significant number cpu cycles, however, since from what I can tell it is just sitting in a loop waiting for the main glbd process to tell it to "poll".

If I run glbd NOT with the --daemon switch, the children do get stopped. With verbose mode on, there are messages that confirm that the children are stopped, however, it appears that while running with --daemon verbose mode is always disabled so I can't see what happens at that point.

The processes should be cleaned up when glbd stops.

I am running glbd 1.0.1 on RHEL 6.5. You should be able to see the processes building up by starting and stopping glbd several times with the watchdog configured and checking the count of watchdog processes with "ps -A | grep mysql.sh | wc -l"

Native MySQL watchdog backend

Implement native mysql watchdog backend that would use libmysqlclient to maintain constant connections to destinations and just periodically ping them for states. That should be much more lightweight than the current forking of mysql client process with the 'exec' backend

GLB_DST_AVOID should not be zero

If you bootstrap your cluster, you have the problem that you start with a single node and join another. The first and only node will get into Donor/Desynced state. If you use the watchdog feature, this will cause glbd to weight it with 0.000, so nobody is able to use this node. If using xtrabackup sync method, the first node will be useable for the most of the time, so there is no reason not to redirect traffic to it while it is donating (there is only a short period where WSREP is not ready errors can occur).

i solved this for me by simply setting GLB_DST_AVOID to 0.001 in glb_wdog.c:394 so a desync/donor server is only used if there is no alternative.

This could be a problem in single mode, because i think it won't change the active server if the weight is not zero or below, right?

Thank you.

Random IPs keep getting added to glbd

Every couple of days I have random IPs appearing under getinfo. I have deleted and recreated the vm multiple times. They all keep going to amazons system. I feel someone has messed with the code of this project.

Stopping a MariaDB-Node: Watchdog crashes GLB

Environment

Debian Jessie
mariadb 10.1.10
galera-3 25.3.9
galera-load-balancer 1.0.1

Error description

I set up glb and it works basically. When I stop one of the cluster nodes, glb crashes:

Sequence of Commands

Starting glb

glbd -c 127.0.0.1:4444 -w exec:"/usr/local/bin/mysql.sh -unode_checker -p'secret'"  192.168.199.10:13306 192.168.199.11:3306:100 192.168.199.12:3306:1

glb shows that it is running correctly

Destinations: 2
   0:  192.168.199.11:3306 , w: 100.000
   1:  192.168.199.12:3306 , w: 1.000
Watchdog:
------------------------------------------------------------
        Address       : exp  setw     state    lat     curw
 192.168.199.11:3306  :  + 100.000    READY  0.01183 100.000
 192.168.199.12:3306  :  +   1.000    READY  0.01517   1.000
------------------------------------------------------------
Destinations: 2

Router:
------------------------------------------------------
        Address       :   weight   usage    map  conns
 192.168.199.12:3306  :    1.000   0.000    N/A      0
 192.168.199.11:3306  :  100.000   0.000    N/A      0
------------------------------------------------------
Destinations: 2, total connections: 0 of 32749 max

Stopping a DB-Node

systemctl stop mysql.service

glb crash


Pool: connections per thread:     0

Watchdog:
------------------------------------------------------------
        Address       : exp  setw     state    lat     curw
 192.168.199.11:3306  :  + 100.000    READY  0.00995 100.000
 192.168.199.12:3306  :  +   1.000    READY  0.00996   1.000
------------------------------------------------------------
Destinations: 2

Router:
------------------------------------------------------
        Address       :   weight   usage    map  conns
 192.168.199.12:3306  :    1.000   0.000    N/A      0
 192.168.199.11:3306  :  100.000   0.000    N/A      0
------------------------------------------------------
Destinations: 2, total connections: 0 of 32749 max

Pool: connections per thread:     0

Watchdog:
------------------------------------------------------------
        Address       : exp  setw     state    lat     curw
 192.168.199.11:3306  :  + 100.000    READY  0.00928 100.000
 192.168.199.12:3306  :  +   1.000    AVOID  0.01066   0.000
------------------------------------------------------------
Destinations: 2

Router:
------------------------------------------------------
        Address       :   weight   usage    map  conns
 192.168.199.12:3306  :    0.000    -nan    N/A      0
 192.168.199.11:3306  :  100.000   0.000    N/A      0
------------------------------------------------------
Destinations: 2, total connections: 0 of 32749 max

Pool: connections per thread:     0

ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104 "Connection reset by peer"
*** Error in `glbd': realloc(): invalid pointer: 0x00007f6aec000920 ***
   INFO: glb_signal.c:42: Received signal 6. Terminating.
Abgebrochen

Init Script RHEL update

Hello Everyone,
I updated init script for RHEL based systems. As far I understand daemon garbd is not supports -HUP, this options will be nice to have to purpose like log rotate. Please review the change.

Configuration file /etc/sysconfig/garb

#Copyright (C) 2012 Coedership Oy
# This config file is to be sourced by garb service script.

# A space-separated list of node addresses (address[:port]) in the cluster
GALERA_NODES=""

# Galera cluster name, should be the same as on the rest of the nodes. Example: "-g groupname"
GALERA_GROUP="-g mygroupname"

# Galera node name. Example: "-n mynode"
GALERA_NODE_NAME="-n my-node"

# Optional Galera internal options string (e.g. SSL settings)
# see http://www.codership.com/wiki/doku.php?id=galera_parameters. Example: -o "myoption; myoption2=1"
GALERA_OPTIONS="-o evs.suspect_timeout=PT30S; socket.ssl=yes; socket.ssl_compression=yes; socket.ssl_ca=; socket.ssl_cert=; socket.ssl_key="
# Log file for garbd. Optional, by default logs to syslog. Example: -l "path to my log"
LOG_FILE="-l /var/log/galera/galera.log"

Init script

#!/bin/bash
#
# Copyright (C) 2012-2013 Codership Oy <[email protected]>
#
# init.d script for garbd
#
# chkconfig: - 99 01
# config: /etc/sysconfig/garb | /etc/default/garb
#
#### BEGIN INIT INFO
# Provides:          garbd
# Required-Start:    $network
# Should-Start:
# Required-Stop:     $network
# Should-Stop:
# Default-Start:     3 4 5
# Default-Stop:      0 1 2 6
# Short-Description: Galera Arbitrator Daemon
# Description:       Galera Arbitrator Daemon
### END INIT INFO

# Source function library.
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Config file
. /etc/sysconfig/garb

# Check that networking is enabled.
[ ${NETWORKING} = "no" ] && exit 1

RETVAL=0
pidfile="/var/run/garbd.pid"
garbd=${GARBD-/usr/bin/garbd}
prog="garbd"

start_prog() {
        # Check that node addresses are configured
    if [ -z "${GALERA_NODES}" ]; then
        echo "List of GALERA_NODES is not configured"
        exit 1
    fi
    if [ -z "${GALERA_GROUP}" ]; then
        echo "GALERA_GROUP name is not configured"
        exit 1
    fi

    GALERA_PORT=${GALERA_PORT:-4567}

    # Find a working node
    for ADDRESS in ${GALERA_NODES} 0; do
        HOST=$(echo $ADDRESS | cut -d \: -f 1 )
        PORT=$(echo $ADDRESS | cut -d \: -f 2 )
        PORT=${PORT:-$GALERA_PORT}
        nc -z $HOST $PORT >/dev/null && break
    done
    if [ ${ADDRESS} == "0" ]; then
        echo "None of the nodes in $GALERA_NODES is accessible"
        exit 1
    fi

    OPTIONS="-d -a gcomm://${ADDRESS}"
        GALERA_OPTION="\"$GALERA_OPTIONS\""
    [ -n "${GALERA_GROUP#'-g'*}" ]
    [ -n "${GALERA_NODE_NAME#'-n'*}" ]
    [ -n "${LOG_FILE#'-l'*}" ]

        echo -n $"Starting $prog: "
        daemon --pidfile=${pidfile} ${garbd} $OPTIONS ${GALERA_GROUP} ${GALERA_NODE_NAME} ${GALERA_OPTION} ${LOG_FILE}
        RETVAL=$?
        echo
        [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog && echo "`pidof $prog`" > ${pidfile}
        echo "Galera Arbitrator running PID: "`cat ${pidfile}`

}

stop_prog() {
        echo -n $"Shutting down $prog: "
        killproc $prog
        RETVAL=$?
        echo
        [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog
}

# This is in question.
reload_prog() {
        echo -n $"Reloading $prog: "
        killproc -p ${pidfile} $prog -HUP
        RETVAL=$?
        echo
}


# See how we were called.
case "$1" in
  start)
    start_prog
    ;;
  stop)
    stop_prog
    ;;
  status)
    status $prog > /dev/null
        RETVAL=$?
        [ $RETVAL -eq 0 ] && echo "Galera Arbitrator running PID: "`cat ${pidfile}`
    ;;
  restart)
        stop_prog
        start_prog
    ;;
  reload)
        stop_prog
        start_prog
       ;;
  condrestart)
    if status $prog > /dev/null; then
        stop_prog
        start_prog
    fi
    ;;
  *)
    echo $"Usage: $0 {start|stop|status|restart|reload}"
    exit 2
esac

exit $RETVAL

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.