clusterlabs / booth Goto Github PK
View Code? Open in Web Editor NEWThis project forked from jjzhang/booth
The Booth Cluster Ticket Manager
License: GNU General Public License v2.0
This project forked from jjzhang/booth
The Booth Cluster Ticket Manager
License: GNU General Public License v2.0
Many customers would like to have a possibility to set up 2-site Booth clusters. So, we would like to allow them to create a cluster without an arbitrator.
In such a case, the problem could arise if one site broke down or the communication was corrupted, because we would lose quorum. Therefore, we it wouldn't be possible for an administrator to grant a ticket to any site.
The idea would be to create a new ticket attribute, 'manual mode'.
In a manual mode, if both sites are online, Booth could still ensure a ticket is granted to only one site at a time.
In case one site is down/lost or split-brain occurs, an administrator can decide to bypass the requirement of the quorum and manually grant a ticket to the considered healthy site, which basically enforces manual failover. It also means that the users are on their own for this: they have to make sure by themselves that this ticket hasn't been already granted to another site at the same time.
The idea would be to disable election for 'manual tickets' and, similarly, to disable expiration time for them. If a manual ticket was granted to a site, other sites would not try to deal with it. Also, after 'booth grant' command, the site would only notify other nodes about acquiring the ticket.
Another way to solve this issue would be to introduce some kind of 'fake'/'artificial' arbitrator, which would automatically add its vote when granting a ticket in a manual mode.
Could you let me know what do you think about this? And any other comments you may have regarding the idea and the implementation? Any comments are warmly welcomed!
While packaging for gentoo I've found this
* QA Notice: shell script appears to use non-POSIX feature(s):
* possible bashism in /etc/init.d/booth-arbitrator line 60 (echo -n):
* echo -n "BOOTH daemon is "
* possible bashism in /etc/init.d/booth-arbitrator line 82 ($"foo" should be eval_gettext "foo"):
* echo -n $"Starting BOOTH arbitrator daemon: "
* possible bashism in /etc/init.d/booth-arbitrator line 82 (echo -n):
* echo -n $"Starting BOOTH arbitrator daemon: "
* possible bashism in /etc/init.d/booth-arbitrator line 105 ($"foo" should be eval_gettext "foo"):
* echo -n $"Stopping BOOTH arbitrator daemon: "
* possible bashism in /etc/init.d/booth-arbitrator line 105 (echo -n):
* echo -n $"Stopping BOOTH arbitrator daemon: "
* possible bashism in /etc/init.d/booth-arbitrator line 166 ($"foo" should be eval_gettext "foo"):
* echo $"Usage: $0 {start|stop|restart|try-restart|condrestart|reload|force-reload|status}"
*
This is mostly to be able to mark what tickets were created by booth and what specific booth instance it was. The idea is mostly to allow pcs remove booth tickets, which were removed from config. It is questionable if booth itself should cleanup old (in config file non-existing) tickets or not, but if so, this can be different issue/PR/commit.
booth-cfg-name should be same as used in pid file so booth_conf->name
More context can be found in https://issues.redhat.com/browse/RHEL-7602
Scenario:
The booth restart in this case can happen, when it is running as cluster resource and there will be a node migration inside the cluster.
In the described case, the granted ticket will never expire (until booth sees the other booth instances), even it does not have majority.
This is not the intention of booth, which should guarantee a ticket granted only once (with some expire times).
The reason for this behavior comes from the struct member in_election.
During election the CIB ticket is not updated, but elections_end immediately start a new election after nobody won it. Therefore, in ticket_cron in_election is always 1.
It might be better to let the ticket normally expire (dont set the expire time to zero) and revoke it locally.
If the booth instance with granted ticket is not able to win the election until the ticket expiry date, then it shall expire. Some other instance with present majority will take over and the ticket exists only one time.
Rainer
I have created the authkey with booth-keygen and simply have the following line in my booth.conf
authfile=/etc/booth/authkey
I tested this on 5 node cluster (a small vm test setup), on each I created a unique authfile so I would assume they would no longer be able to connect to each other.
But after restarting all the booth services they where all still happily communicating, tickets could be granted and revoked on remote notes.
Tested with Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-121-generic aarch64)
While restarting the booth arbitrator, I got the following error:
Mar 28 08:28:56 Vlab20Node1 booth: [29795]: ERROR:
Got a site-ID collision. Please file a bug on
https://github.com/ClusterLabs/booth/issues/new, attaching the configuration file.
$ boothd --version
boothd 0.2.0 (build v0.2.0-6-g9eae45f)
$ uname -a
Linux Vlab20Node1 3.0.101-94-default #1 SMP Thu Jan 26 12:20:59 UTC 2017 (c499ea8) x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 4
transport="UDP"
port="9929"
arbitrator="10.150.20.132"
site="10.150.20.144"
site="10.150.20.144"
$ booth list -c .
Feb 18 15:54:12 cacao booth: [3208]: WARN: Odd number of nodes is strongly recommended!
heartbeat: inline-fn.h:83: init_header_bare: Assertion `local && local->site_id' failed.
Aborted
This is something that should be resolved in future, even more so when
following is considered:
Hello All,
Are there any reasons why we want to keep Booth working in a 'debug output' mode (-D) in the foreground?
Currently the parameter "-D" does two things: enable 'debug output' and moves Booth to the foreground. To make it run with resource agents in debug output mode, we need it run in the background.
Would you mind me adding another parameter for debug output + background operations? Or changing the behavior of -D flag? Do you have any preferences how it should be handled?
Regards,
Chris
Ran into challenge:
In chaka.txt:
Apr 19 13:12:24 Network failure
Apr 19 13:13:04 New election started on site2-db1 while disconnected
Apr 19 13:13:25 site2-db1 kernel: drbd dforce: role( Primary -> Secondary )
In journalctl-with-split-brain.txt:
Apr 19 13:13:38 Ticket granted to site1-db1
Apr 19 13:13:38 site1-db1 kernel: drbd dforce: role( Secondary -> Primary )
In chaka.txt:
Apr 19 13:19:55 site2-db1 boothd-site[1487]: [info] drbdticket (Lead/20/59999): granted successfully here
Apr 19 13:19:55 site2-db1 kernel: drbd dforce: role( Secondary -> Primary )
Both logs then show split brain problems, because both believe they are primary.
These are run in CentOS 7 VM boxes. The network was disconnected by unplugging the network cable on site2-db1, then plugging it back in later.
We're wondring why site2-db1 was able to re-acquire the ticket after the connection was restored.
We've had difficulty reproducing the problem. These are logs and conf files from the reproduction. Please let us know if you need anything else.
Current _find_myself
function is using netlink and that's not very portable and as #139 shows it can "change". Idea is to migrate _find_myself
function to use getifaddrs
which seems to be more stable and also better portable (not really needed because booth is Linux only, but still might be nice to have)
32-bit time_t means booth stops working by 2k38. This must be somehow solved, sadly it is pretty hard because of binary nature of protocol and backwards compatibility.
Detailed info - #115 (comment).
One of the solution might be to add better cryptography (not just signing of messages but also encryption).
pacemaker.c is full of popen
/system
calls. This are not very safe (escape of arguments, need to run shell, ...) and it would be better to use exec
and proper redirection (so stderr are split and could be logged). Another possibility might be to use some library instead of calling crm_ticket
(if such library exists).
Hello @dmuhamedagic at al.,
it occurs to me that it's an overhead to have booth package serve
all possible roles incl. arbitrator.
I envision that arbitrator specifically could be split to a package
on it's own (booth-arbitrator
containing the initscript/systemd units
and its config file).
However, as it uses the same binary as a proper booth package, more
packaging surgery would be needed. So one of the possible lines of
splitting could be that the OCF agent stuff is in a dedicated
booth-agent
package.
Plain booth
would remain the core carrying the binary,
booth-keygen, doc and license files etc., and would be required
by both booth-arbitrator
and booth-agent
.
It's a half-baked idea, only to see if it's viable.
If positive, also a safe mechanism for upgrade/downgrade path would
have to be ensured (via Obsoletes, etc.).
Suggestions/objections?
Some of the problems are visible only when booth is running in console/standalone because they don't get into syslog. 2 examples of problems I've hit:
I'm not entirely sure what is the best solution for this problem, but maybe redirecting of stderr into pipe which is then logged by booth might be the solution.
Hi,
I've got following https://bugzilla.redhat.com/show_bug.cgi?id=1986308 report.
Main idea (if I understand it correctly) is to support booth arbitrator to arbitrate multiple set of clusters. IMHO it doesn't make too much sense (or actually it makes no sense because booth itself creates cluster membership), but I would like to see more opinions (or maybe it's already implemented?)
I think current ability to run multiple booth daemons is more than viable alternative ("workaround").
Hello,
I've installed Pacemaker 1.1.12 and Corosync 2.3.4 from RHEL7 repos and I'm now looking to use Booth on RHEL 7 or CentOS 7 but can't find any existing RPM, can you confirm ?
I have then tried to compile it but encountered difficulties.
yum install autoconf OK
yum install automake OK
yum install gcc OK
yum install glib2-devel OK
yum install zlib-devel OK
yum install pacemaker-libs-devel OK
./autogen.sh OK
./configure OK
make KO
[root@localhost booth-0.2.0]# make
GNUmakefile:41: warning: overriding recipe for target srpm' Makefile:1486: warning: ignoring old recipe for target
srpm'
GNUmakefile:46: warning: overriding recipe for target rpm' Makefile:1491: warning: ignoring old recipe for target
rpm'
Making all in src
make[1]: Entering directory /tmp/booth-0.2.0/src' make all-am make[2]: Entering directory
/tmp/booth-0.2.0/src'
gcc -DHAVE_CONFIG_H -I. -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -D_GNU_SOURCE -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIC -Werror -funsigned-char -Wno-pointer-sign -g -O2 -O3 -ggdb3 -Wall -Wshadow -Wmissing-prototypes -Wmissing-declarations -Wdeclaration-after-statement -Wpointer-arith -Wwrite-strings -Wbad-function-cast -Wmissing-format-attribute -Wformat=2 -Wno-long-long -Wno-strict-aliasing -MT boothd-config.o -MD -MP -MF .deps/boothd-config.Tpo -c -o boothd-config.o test -f 'config.c' || echo './'
config.c
In file included from ticket.h:29:0,
from config.c:33:
log.h:24:35: fatal error: heartbeat/glue_config.h: No such file or directory
#include <heartbeat/glue_config.h>
^
compilation terminated.
make[2]: *** [boothd-config.o] Error 1
make[2]: Leaving directory /tmp/booth-0.2.0/src' make[1]: *** [all] Error 2 make[1]: Leaving directory
/tmp/booth-0.2.0/src'
make: *** [all-recursive] Error 1
Now I am trying to compile "Reusable Cluster Components ("glue")" from http://hg.linux-ha.org/ to get "heartbeat/glue_config.h" but encounter other difficulties.
Can you confirm that "Reusable Cluster Components ("glue")" is mandatory ?
Is there a list of required dependencies for booth ? Is it even working on RHEL/CentOS7 ?
Thanx for your help.
I am requesting pacemaker alerts kind of feature for booth nodes. I understand few of below may not be possible to implement due to technical constraints. Would be interesting how most the cases can be handled.
Please check the attached image.
If you see on high level, then these are kind of node/resource events wrt booth.
As of today wrt booth, there is no provision where any of the nodes gives any event when its peer leaves/joins. This makes it difficult to know whether geo sites nodes can see booth-arbitrator or not. This is true the other way around also where booth-arbitrator cannot see geo booth sites.
I am not sure how others are doing it in today's deployment, but I see need of monitoring of every other booth node. So that on basis of event, appropriate alarms can be raised and action can be taken accordingly.
More (detailed) info in #118 (comment)
When booth is running and the lock file is accidentally removed, Pacemaker cannot repair the state any more.
In this case "booth status" return empty output and return code 7.
The RA monitor operation reports then to Pacemaker OCF_NOT_RUNNING, which tries to restart and call stop.
The stop operation does nothing, because it thinks already stopped.
The start operation will fail, since booth daemon cannot allocate port.
So on this node the status can never be cleared except manually killing the booth process.
One suggestion is to handle this error on two parts.
First, "booth status" return the output "booth_lockpid=..." with the PID of the running process, the rest of the variables not present and the return code 1, which means the same as OCF_ERR_GENERIC. With this implementation, the output is a very clear indication of "i find a process running, but i do not know anything else".
The RA monitor operation then returns OCF_ERR_GENERIC, which is the correct interpretation, as booth is not running cleanly any more, it is running but lock file lost.
Pacemaker then restart the resource.
Second is to change the stop operation of the RA a bit.
Just returning OCF_ERR_GENERIC is not OK, because nobody stops a possible running booth.
With the described above implementation "booth status" returns booth_lockpid with the PID of the already running process. In this case it is possible to just add
$BOOTH_ERROR_GENERIC) ;;
in the first case statement.
If booth_lockpid is empty for any other reason, stop return with OCF_ERR_GENERIC.
If booth_lockpid is not empty, stop will cleanup correctly a running booth process.
RA then report OCF_SUCCESS and booth restart can be handled cleanly.
Hi, booth developers.
Before, booth_resource_monitord which I requested and had merged has been deleted.
Why was this deleted?
Although I found the next code, this becomes instead?
https://github.com/ClusterLabs/booth/blob/master/script/service-still-runnable
Config file contains possibility to change user/group of boothd when running as arbitrator. This is never used and always use default of site user/group (hacluster/haclient) or site-user
/site-group
config option.
This bug is long time behavior and changing to nobody/nobody would be probably unsafe so I would recommend to:
arbitrator-user
and arbitrator-group
behaviorRight now it is not possible to run booth arbitrator in docker/podman environment because of how network configuration in these environments (NAT) works. Idea is to allow such functionality.
The main problem is hidden in the fact, that:
site_id
and message sent from arbitrator will be ignored by sites.As a possible solution we (probably) need to enhance file so it will contain two addresses of arbitrator (one internal and one external) and arbitrator will use external as an site_id. I think it might be handy to allow specify internal IP as an ANY so user don't need to find out internal IP (not super easy in docker environment).
So proposed solution is to have sites like:
authfile = /etc/booth/booth.key
site = site_ip
site = site_ip
arbitrator = arbitrator_external_ip
ticket = "apacheticket"
and arbitrator as:
authfile = /etc/booth/booth.key
site = site_ip
site = site_ip
arbitrator = arbitrator_external_ip|ANY
ticket = "apacheticket"
or some flag like force_arbitrator_mode_bind_in_any
or maybe different (better) solution.
Example how to test in docker (copy&paste from original report):
Arbitrator running inside docker container tries to send UDP packet to
booth site - but this UDP packet gets dropped after getting out of docker
bridge (on host machine), whereas non-arbitrator UDP packets reach
destination booth sites perfectly. Issue is observed only with arbitrator
UDP packets.
Steps to reproduce:
- Extract the zip file and cd into dockerfile directory
- docker build -t arbitrator .
- docker run -d --privileged arbitrator
- docker ps (check CONTAINER ID for arbitrator container)
- docker exec -it <container-id> bash
- Once in docker container fire below commands
- /bin/supervisord
- pcs cluster auth <booth-ip>
- pcs booth pull <booth-ip>
- replace arbitrator ip (from /etc/booth/booth.conf) with eth0 ip (to
check eth0 ip fire "ip address show" command)
- supervisorctl start booth
Note: We are running centos7 in docker container, so supervisord is used
instead of systemd/systemctl (as systemd does`t work inside container/k8s
pod)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.