clusterlabs / booth Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jjzhang/booth

52.0 52.0 26.0 1.89 MB

The Booth Cluster Ticket Manager

License: GNU General Public License v2.0

Makefile 3.26% Shell 17.63% Python 6.91% Lua 0.64% C 66.83% M4 4.73%

booth's People

Contributors

Stargazers

Watchers

booth's Issues

Introduce manual tickets for 2-site clusters

Goal

Many customers would like to have a possibility to set up 2-site Booth clusters. So, we would like to allow them to create a cluster without an arbitrator.
In such a case, the problem could arise if one site broke down or the communication was corrupted, because we would lose quorum. Therefore, we it wouldn't be possible for an administrator to grant a ticket to any site.

Proposed Solution

The idea would be to create a new ticket attribute, 'manual mode'.
In a manual mode, if both sites are online, Booth could still ensure a ticket is granted to only one site at a time.
In case one site is down/lost or split-brain occurs, an administrator can decide to bypass the requirement of the quorum and manually grant a ticket to the considered healthy site, which basically enforces manual failover. It also means that the users are on their own for this: they have to make sure by themselves that this ticket hasn't been already granted to another site at the same time.

Manual Tickets' Behavior

The idea would be to disable election for 'manual tickets' and, similarly, to disable expiration time for them. If a manual ticket was granted to a site, other sites would not try to deal with it. Also, after 'booth grant' command, the site would only notify other nodes about acquiring the ticket.

Artificial Arbitrator

Another way to solve this issue would be to introduce some kind of 'fake'/'artificial' arbitrator, which would automatically add its vote when granting a ticket in a manual mode.

Discussion

Could you let me know what do you think about this? And any other comments you may have regarding the idea and the implementation? Any comments are warmly welcomed!

bashisms in init file

While packaging for gentoo I've found this

 * QA Notice: shell script appears to use non-POSIX feature(s):
 *    possible bashism in /etc/init.d/booth-arbitrator line 60 (echo -n):
 *      echo -n "BOOTH daemon is "
 *    possible bashism in /etc/init.d/booth-arbitrator line 82 ($"foo" should be eval_gettext "foo"):
 *              echo -n $"Starting BOOTH arbitrator daemon: "
 *    possible bashism in /etc/init.d/booth-arbitrator line 82 (echo -n):
 *              echo -n $"Starting BOOTH arbitrator daemon: "
 *    possible bashism in /etc/init.d/booth-arbitrator line 105 ($"foo" should be eval_gettext "foo"):
 *      echo -n $"Stopping BOOTH arbitrator daemon: "
 *    possible bashism in /etc/init.d/booth-arbitrator line 105 (echo -n):
 *      echo -n $"Stopping BOOTH arbitrator daemon: "
 *    possible bashism in /etc/init.d/booth-arbitrator line 166 ($"foo" should be eval_gettext "foo"):
 *      echo $"Usage: $0 {start|stop|restart|try-restart|condrestart|reload|force-reload|status}"
 *

Add booth-cfg-name attribute to ticket

This is mostly to be able to mark what tickets were created by booth and what specific booth instance it was. The idea is mostly to allow pcs remove booth tickets, which were removed from config. It is questionable if booth itself should cleanup old (in config file non-existing) tickets or not, but if so, this can be different issue/PR/commit.

booth-cfg-name should be same as used in pid file so booth_conf->name

More context can be found in https://issues.redhat.com/browse/RHEL-7602

Ticket not revoked when starting without majority

Scenario:

Node1 has ticket granted and majority.
Node1 network disconnect
booth restarted before ticket expiry
booth do endless elections and the granted ticket does never expiry
Tested today with GIT head.

The booth restart in this case can happen, when it is running as cluster resource and there will be a node migration inside the cluster.

In the described case, the granted ticket will never expire (until booth sees the other booth instances), even it does not have majority.
This is not the intention of booth, which should guarantee a ticket granted only once (with some expire times).

The reason for this behavior comes from the struct member in_election.
During election the CIB ticket is not updated, but elections_end immediately start a new election after nobody won it. Therefore, in ticket_cron in_election is always 1.

It might be better to let the ticket normally expire (dont set the expire time to zero) and revoke it locally.
If the booth instance with granted ticket is not able to win the election until the ticket expiry date, then it shall expire. Some other instance with present majority will take over and the ticket exists only one time.

Rainer

Authfile seems to be ignored Ubuntu 20.04.4 LTS

I have created the authkey with booth-keygen and simply have the following line in my booth.conf
authfile=/etc/booth/authkey

I tested this on 5 node cluster (a small vm test setup), on each I created a unique authfile so I would assume they would no longer be able to connect to each other.

But after restarting all the booth services they where all still happily communicating, tickets could be granted and revoked on remote notes.

Tested with Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-121-generic aarch64)

booth: Got a site-ID collision

While restarting the booth arbitrator, I got the following error:

Mar 28 08:28:56 Vlab20Node1 booth: [29795]: ERROR:
Got a site-ID collision. Please file a bug on
https://github.com/ClusterLabs/booth/issues/new, attaching the configuration file.

$ boothd --version
boothd 0.2.0 (build v0.2.0-6-g9eae45f)

$ uname -a
Linux Vlab20Node1 3.0.101-94-default #1 SMP Thu Jan 26 12:20:59 UTC 2017 (c499ea8) x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 4

The booth configuration file:

transport="UDP"

port="9929"

arbitrator="10.150.20.132"

site="10.150.20.144"
site="10.150.20.144"

ticket="geo-ticket"
expire=300

Check "-c" argument before use

$ booth list -c .
Feb 18 15:54:12 cacao booth: [3208]: WARN: Odd number of nodes is strongly recommended!
heartbeat: inline-fn.h:83: init_header_bare: Assertion `local && local->site_id' failed.
Aborted

[TODO] review digest-related code, offer stronger hashing algos with a prospect of eventually deprecating SHA1

This is something that should be resolved in future, even more so when
following is considered:

SHA1 being discouraged at this point
fragility arising of two different hash implementation providers
without strict signalling which one is in use (and hence which
hash algo id to actual algo mapping is applicable)
cross-node agreement on the algo

Debug output in foregroud

Hello All,
Are there any reasons why we want to keep Booth working in a 'debug output' mode (-D) in the foreground?
Currently the parameter "-D" does two things: enable 'debug output' and moves Booth to the foreground. To make it run with resource agents in debug output mode, we need it run in the background.
Would you mind me adding another parameter for debug output + background operations? Or changing the behavior of -D flag? Do you have any preferences how it should be handled?
Regards,
Chris

Double granting of ticket after reconnection

Ran into challenge:

In chaka.txt:
Apr 19 13:12:24 Network failure
Apr 19 13:13:04 New election started on site2-db1 while disconnected
Apr 19 13:13:25 site2-db1 kernel: drbd dforce: role( Primary -> Secondary )

In journalctl-with-split-brain.txt:
Apr 19 13:13:38 Ticket granted to site1-db1
Apr 19 13:13:38 site1-db1 kernel: drbd dforce: role( Secondary -> Primary )

In chaka.txt:
Apr 19 13:19:55 site2-db1 boothd-site[1487]: [info] drbdticket (Lead/20/59999): granted successfully here
Apr 19 13:19:55 site2-db1 kernel: drbd dforce: role( Secondary -> Primary )

Both logs then show split brain problems, because both believe they are primary.

These are run in CentOS 7 VM boxes. The network was disconnected by unplugging the network cable on site2-db1, then plugging it back in later.

We're wondring why site2-db1 was able to re-acquire the ticket after the connection was restored.

We've had difficulty reproducing the problem. These are logs and conf files from the reproduction. Please let us know if you need anything else.

chaka.txt
journalctl-with-split-brain.txt

booth.conf.txt
five-server-poc-setup.txt

Consider using getifaddrs for _find_myself

Current _find_myself function is using netlink and that's not very portable and as #139 shows it can "change". Idea is to migrate _find_myself function to use getifaddrs which seems to be more stable and also better portable (not really needed because booth is Linux only, but still might be nice to have)

Booth uses 32-bit time_t

32-bit time_t means booth stops working by 2k38. This must be somehow solved, sadly it is pretty hard because of binary nature of protocol and backwards compatibility.

Detailed info - #115 (comment).

One of the solution might be to add better cryptography (not just signing of messages but also encryption).

Enahnce pacemaker.c

pacemaker.c is full of popen/system calls. This are not very safe (escape of arguments, need to run shell, ...) and it would be better to use exec and proper redirection (so stderr are split and could be logged). Another possibility might be to use some library instead of calling crm_ticket (if such library exists).

Booth package-wise modularizing (discussion)

Hello @dmuhamedagic at al.,

it occurs to me that it's an overhead to have booth package serve
all possible roles incl. arbitrator.

I envision that arbitrator specifically could be split to a package
on it's own (booth-arbitrator containing the initscript/systemd units
and its config file).

However, as it uses the same binary as a proper booth package, more
packaging surgery would be needed. So one of the possible lines of
splitting could be that the OCF agent stuff is in a dedicated
booth-agent package.

Plain booth would remain the core carrying the binary,
booth-keygen, doc and license files etc., and would be required
by both booth-arbitrator and booth-agent.

It's a half-baked idea, only to see if it's viable.
If positive, also a safe mechanism for upgrade/downgrade path would
have to be ensured (via Obsoletes, etc.).

Suggestions/objections?

Better logging of stderr/popen problems

Some of the problems are visible only when booth is running in console/standalone because they don't get into syslog. 2 examples of problems I've hit:

Pacemaker was reporting deprecated calling of crm_ticket (using dash argument without space - reason for PR#130). This could be solved by enhancing popen/refactoring popen (issue #136)
glib were reporting problems when using glib hash table (PR #125).

I'm not entirely sure what is the best solution for this problem, but maybe redirecting of stderr into pipe which is then logged by booth might be the solution.

Booth capable of operating more than one set of clusters?

Hi,
I've got following https://bugzilla.redhat.com/show_bug.cgi?id=1986308 report.

Main idea (if I understand it correctly) is to support booth arbitrator to arbitrate multiple set of clusters. IMHO it doesn't make too much sense (or actually it makes no sense because booth itself creates cluster membership), but I would like to see more opinions (or maybe it's already implemented?)

I think current ability to run multiple booth daemons is more than viable alternative ("workaround").

Booth for RHEL/CentOS7

Hello,

I've installed Pacemaker 1.1.12 and Corosync 2.3.4 from RHEL7 repos and I'm now looking to use Booth on RHEL 7 or CentOS 7 but can't find any existing RPM, can you confirm ?

I have then tried to compile it but encountered difficulties.

yum install autoconf OK
yum install automake OK
yum install gcc OK
yum install glib2-devel OK
yum install zlib-devel OK
yum install pacemaker-libs-devel OK

./autogen.sh OK
./configure OK

make KO

[root@localhost booth-0.2.0]# make
GNUmakefile:41: warning: overriding recipe for target srpm' Makefile:1486: warning: ignoring old recipe for targetsrpm'
GNUmakefile:46: warning: overriding recipe for target rpm' Makefile:1491: warning: ignoring old recipe for targetrpm'
Making all in src
make[1]: Entering directory /tmp/booth-0.2.0/src' make all-am make[2]: Entering directory/tmp/booth-0.2.0/src'
gcc -DHAVE_CONFIG_H -I. -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -D_GNU_SOURCE -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIC -Werror -funsigned-char -Wno-pointer-sign -g -O2 -O3 -ggdb3 -Wall -Wshadow -Wmissing-prototypes -Wmissing-declarations -Wdeclaration-after-statement -Wpointer-arith -Wwrite-strings -Wbad-function-cast -Wmissing-format-attribute -Wformat=2 -Wno-long-long -Wno-strict-aliasing -MT boothd-config.o -MD -MP -MF .deps/boothd-config.Tpo -c -o boothd-config.o test -f 'config.c' || echo './'config.c
In file included from ticket.h:29:0,
from config.c:33:
log.h:24:35: fatal error: heartbeat/glue_config.h: No such file or directory
#include <heartbeat/glue_config.h>
^
compilation terminated.
make[2]: *** [boothd-config.o] Error 1
make[2]: Leaving directory /tmp/booth-0.2.0/src' make[1]: *** [all] Error 2 make[1]: Leaving directory/tmp/booth-0.2.0/src'
make: *** [all-recursive] Error 1

Now I am trying to compile "Reusable Cluster Components ("glue")" from http://hg.linux-ha.org/ to get "heartbeat/glue_config.h" but encounter other difficulties.

Can you confirm that "Reusable Cluster Components ("glue")" is mandatory ?

Is there a list of required dependencies for booth ? Is it even working on RHEL/CentOS7 ?

Thanx for your help.

Support of alerts in booth site and boothd arbitrator

I am requesting pacemaker alerts kind of feature for booth nodes. I understand few of below may not be possible to implement due to technical constraints. Would be interesting how most the cases can be handled.
Please check the attached image.

Node5 booth-arbitrator should be able to give event when any of the booth site node joins or leaves.
Geo site booth should be able to give event when its booth peers joins/leaves. For example, Geo site1 gives an event when node5 booth-arbitrator joins/leaves OR site2 booth joins/leaves. booth-ip can be passed in event.
On ticket movements (revoke/grant), every booth node(Site1/2 and node5) should give events.

If you see on high level, then these are kind of node/resource events wrt booth.

As of today wrt booth, there is no provision where any of the nodes gives any event when its peer leaves/joins. This makes it difficult to know whether geo sites nodes can see booth-arbitrator or not. This is true the other way around also where booth-arbitrator cannot see geo booth sites.
I am not sure how others are doing it in today's deployment, but I see need of monitoring of every other booth node. So that on basis of event, appropriate alarms can be raised and action can be taken accordingly.

Remove getclock

More (detailed) info in #118 (comment)

Resource agent handling for lost lockfile wrong

When booth is running and the lock file is accidentally removed, Pacemaker cannot repair the state any more.

In this case "booth status" return empty output and return code 7.
The RA monitor operation reports then to Pacemaker OCF_NOT_RUNNING, which tries to restart and call stop.
The stop operation does nothing, because it thinks already stopped.
The start operation will fail, since booth daemon cannot allocate port.
So on this node the status can never be cleared except manually killing the booth process.

One suggestion is to handle this error on two parts.

First, "booth status" return the output "booth_lockpid=..." with the PID of the running process, the rest of the variables not present and the return code 1, which means the same as OCF_ERR_GENERIC. With this implementation, the output is a very clear indication of "i find a process running, but i do not know anything else".
The RA monitor operation then returns OCF_ERR_GENERIC, which is the correct interpretation, as booth is not running cleanly any more, it is running but lock file lost.
Pacemaker then restart the resource.

Second is to change the stop operation of the RA a bit.
Just returning OCF_ERR_GENERIC is not OK, because nobody stops a possible running booth.
With the described above implementation "booth status" returns booth_lockpid with the PID of the already running process. In this case it is possible to just add
$BOOTH_ERROR_GENERIC) ;;
in the first case statement.
If booth_lockpid is empty for any other reason, stop return with OCF_ERR_GENERIC.
If booth_lockpid is not empty, stop will cleanup correctly a running booth process.
RA then report OCF_SUCCESS and booth restart can be handled cleanly.

Why was booth_resource_monitord deleted?

Hi, booth developers.

Before, booth_resource_monitord which I requested and had merged has been deleted.
Why was this deleted?

Although I found the next code, this becomes instead?
https://github.com/ClusterLabs/booth/blob/master/script/service-still-runnable

arbitrator-user and arbitrator-group is ignored

Config file contains possibility to change user/group of boothd when running as arbitrator. This is never used and always use default of site user/group (hacluster/haclient) or site-user/site-group config option.

This bug is long time behavior and changing to nobody/nobody would be probably unsafe so I would recommend to:

change default for arbitrator to same as site = hacluster/haclient
change documentation (again, default is hacluster/haclient)
Fix arbitrator-user and arbitrator-group behavior

Running booth arbitrator in kubernetes environment

Right now it is not possible to run booth arbitrator in docker/podman environment because of how network configuration in these environments (NAT) works. Idea is to allow such functionality.

The main problem is hidden in the fact, that:

Config file contains external IP, internal in the docker/podman differs
Changing external IP to internal IP on both site and arbitrator doesn't work because sites cannot reach internal IP of arbitrator
Changing external IP to internal only on arbitrator site mostly works, because sites can reach arbitrator and arbitrator can find itself, sadly it will generate different site_id and message sent from arbitrator will be ignored by sites.

As a possible solution we (probably) need to enhance file so it will contain two addresses of arbitrator (one internal and one external) and arbitrator will use external as an site_id. I think it might be handy to allow specify internal IP as an ANY so user don't need to find out internal IP (not super easy in docker environment).

So proposed solution is to have sites like:

authfile = /etc/booth/booth.key
site = site_ip
site = site_ip
arbitrator = arbitrator_external_ip
ticket = "apacheticket"

and arbitrator as:

authfile = /etc/booth/booth.key
site = site_ip
site = site_ip
arbitrator = arbitrator_external_ip|ANY
ticket = "apacheticket"

or some flag like force_arbitrator_mode_bind_in_any or maybe different (better) solution.

Example how to test in docker (copy&paste from original report):

Arbitrator running inside docker container tries to send UDP packet to
booth site - but this UDP packet gets dropped after getting out of docker
bridge (on host machine), whereas non-arbitrator UDP packets reach
destination booth sites perfectly. Issue is observed only with arbitrator
UDP packets.

Steps to reproduce:

   - Extract the zip file and cd into dockerfile directory
   - docker build -t arbitrator .
   - docker run -d --privileged arbitrator
   - docker ps (check CONTAINER ID for arbitrator container)
   - docker exec -it <container-id> bash
   - Once in docker container fire below commands
   - /bin/supervisord
   - pcs cluster auth <booth-ip>
   - pcs booth pull <booth-ip>
   - replace arbitrator ip (from /etc/booth/booth.conf) with eth0 ip (to
   check eth0 ip fire "ip address show" command)
   - supervisorctl start booth

Note: We are running centos7 in docker container, so supervisord is used
instead of systemd/systemctl (as systemd does`t work inside container/k8s
pod)

dockerfile.zip