chenhuajun / pha4pgsql Goto Github PK
View Code? Open in Web Editor NEWPacemaker High Availability for PostgreSQL
License: GNU General Public License v3.0
Pacemaker High Availability for PostgreSQL
License: GNU General Public License v3.0
我一开始做的是双节点,如何在线添加一个节点做成1主2从啊?还是说需要停机重新部署pha4pgsql
你好作者:
我在配置lvs的时候遇到了一个问题报错如下
Waiting for 3 replies from the CRMd... OK
Error: resource 'msPostgresql' is not running on any node
failed to execute "pcs resource enable msPostgresql --wait" rc=1
我按照文档,将pcs_template=muti_with_lvs.pcs.template 设置好了,是否还需要额外的步骤啊?
在config_dual.ini.sample配置文件中,设置了vip_cidr_netmask=24、writer_vip=192.168.0.236,那网关呢?该如何设置?
使用corosync +pacemaker +crm搭建的一个一主两从高可用节点,目前高可用集群中使用vip master指向写节点,vip slaver指向其中的一个读节点,目前由于数据读取要求比较高,想充分利用两个读节点,有没有什么方案可以实现读节点的负载均衡?
想跟您請教幾個問題
以上問題 謝謝
resource agent脚本,expgsql第2149行
您好,首先感谢您提供的脚本,有点问题需要咨询下:
异地双中心双活,每个中心一主一备,我设想A中心采用pacemaker和corosync做的主备同步,B中心如何同步数据?因为异地的,应该不能和A中心一起做HA?
当然这个B中心在A中心活着的时候,仅为read only ,当A中心去世了,那我需要做些什么,才可以正常的切换到B中心?
谢谢!
您好,本人是搞mssql的,从mssql2017开始,mssql支持在Linux上运行
然后 mssql在Linux的高可用方案变为 pacemaker和corosync 上搭建alwayson可用性组
请问pacemaker和corosync 是Linux标准的ha方案吗
如果楼主对微软有了解就知道,mssql会自带自己的ha方案,自带的ha方案都是微软自己实现的底层基础设施,用户不再需要考虑其他的ha方案
并且微软自带的ha方案都非常完善和傻瓜,而微软选择 第三方的pacemaker和corosync 而不是自己实现的ha底层基础设施
那就是说pacemaker和corosync 已经非常完善了,非常强大了,所以微软选择了pacemaker和corosync 而不再在Linux上造轮子?
因为不是标准的postgresql,没有用户postgres,clt_start后是使用postgres用户去启动程序?这个用户的配置是在哪呢?
今天早上看到PG备库挂了,cls_status 看到的是 备库:Can't obtain distributed lock on promote
这个有啥解决办法吗?仲裁节点正常。
2018-09-04 14:15:54 CST [15171-4] LOG: invalid record length at 0/30000098: wanted 24, got 0
2018-09-04 14:15:54 CST [15169-6] LOG: database system is ready to accept read only connections
2018-09-04 14:15:55 CST [15470-1] postgres@template1 ERROR: function pg_last_xlog_replay_location() does not exist at character 8
2018-09-04 14:15:55 CST [15470-2] postgres@template1 HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2018-09-04 14:15:55 CST [15470-3] postgres@template1 STATEMENT: select pg_last_xlog_replay_location(),pg_last_xlog_receive_location()
执行一个delete从500w的表里删除13w条记录,发现Master挂了。查看日志,是RA monitor超时,认为资源失败。
Pacemaker 在发生Timed Out后重启postgres,迁移阈值是3,可以重启3次。
/var/log/messages
Nov 18 13:22:05 node1 expgsql(pgsql)[7774]: INFO: Stopping PostgreSQL on demote.
Nov 18 13:22:05 node1 expgsql(pgsql)[7774]: INFO: server shutting down
Nov 18 13:22:11 node1 expgsql(pgsql)[7774]: INFO: PostgreSQL is down
Nov 18 13:22:11 node1 expgsql(pgsql)[7774]: INFO: Changing pgsql-status on node1 : PRI->STOP.
Nov 18 13:22:11 node1 expgsql(pgsql)[8609]: INFO: PostgreSQL is already stopped.
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: Set all nodes into async mode.
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: server starting
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: PostgreSQL start command sent.
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: PostgreSQL is down
Nov 18 13:22:13 node1 expgsql(pgsql)[8714]: INFO: PostgreSQL is started.
corosync.log中发现在发生超时前,系统负载很高。
/var/log/cluster/corosync.log
Nov 18 13:20:55 [2353] node1 crmd: info: throttle_handle_load: Moderate CPU load detected: 12.060000
Nov 18 13:20:55 [2353] node1 crmd: info: throttle_send_command: New throttle mode: 0010 (was 0001)
Nov 18 13:21:25 [2353] node1 crmd: notice: throttle_handle_load: High CPU load detected: 16.379999
Nov 18 13:21:25 [2353] node1 crmd: info: throttle_send_command: New throttle mode: 0100 (was 0010)
Nov 18 13:21:44 [2350] node1 lrmd: warning: child_timeout_callback: pgsql_monitor_3000 process (PID 4822) timed out
Nov 18 13:21:44 [2350] node1 lrmd: warning: operation_finished: pgsql_monitor_3000:4822 - timed out after 60000ms
Nov 18 13:21:44 [2353] node1 crmd: error: process_lrm_event: Operation pgsql_monitor_3000: Timed Out (node=node1, call=837, timeout=60000ms)
Nov 18 13:21:44 [2348] node1 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/462)
Pacemaker输出系统负载高告警,可能由于IO wait高
http://clusterlabs.org/pipermail/users/2015-May/000518.html
有人遇到类似的问题,处理办法就是增大monitor的超时时间。
https://bugs.launchpad.net/fuel/+bug/1464131
https://review.openstack.org/#/c/191715/1/deployment/puppet/pacemaker_wrappers/manifests/rabbitmq.pp
在线修改的方法如下:
pcs resource update pgsql op monitor interval=4s timeout=180s on-fail=restart
pcs resource update pgsql op monitor role=Master timeout=180s on-fail=restart interval=3s
您好,我在研究您的代码的过程中感觉,common.sh中的check_resource_started和stoped函数,for resource in $1,$1实际上每次都是获取第一个参数,无法获取整个参数列表,所以在cls_start 和cls_stop只是验证了第一个资源,后续的资源没有验证
感谢大神的脚本。我打算按三节点配置配置一个PG HA的集群,跟着教程走到这一步出错了
报错如下:
[root@mdw pha4pgsql]# cls_start
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
Cleaned up pgsql:0 on sdw2
Cleaned up pgsql:0 on sdw1
Cleaned up pgsql:0 on mdw
Cleaned up pgsql:1 on sdw2
Cleaned up pgsql:1 on sdw1
Cleaned up pgsql:1 on mdw
Cleaned up pgsql:2 on sdw2
Cleaned up pgsql:2 on sdw1
Cleaned up pgsql:2 on mdw
Error: resource 'msPostgresql' is not running on any node
failed to execute "pcs resource enable msPostgresql --wait" rc=1
我尝试自己排查原因,但我感觉是我没理解一些参数的含义,所以提一个issue...主要的疑惑在于:
writer-vip和reader-vip在被客户端访问时,是如何转换到具体的IP上的,以及虚拟IP参数设置上有什么要注意的吗。
我这边的情况应该是三台虚拟机(公司里的,应该是一个物理机划分出来的),各有三个ip(192.168.x.94/95/96),从ifconfig
看都是ens160,这种能做三节点PG HA吗。
感觉主要的错误就在于我盲目设置writer_vip=192.168.x.100, reader_vip=192.168.x.101,但我不是很懂这里应该如何设置。我看了一下这里的issue,好像是要配置一些网关之类的?但从集群未启动的原因来看又是因为资源根本没启动,果然还是要pcs的log...我先去找下资料,回来补充
接下来是我自己的理解和情况补充说明
clusterlab的PG备份集群,里面的说明图,它把虚拟IP1设在eth0上,虚拟IP2设在eth2上。然后我看教程里的node1/2/3和writer/reader_vip 都是192.168.0.231-237
,所以我在想您的教程里是不是其实访问writer_vip/reader_vip就会指向具体的node1/2/3
我之后敲了如下的命令查看状态。
# cls_status
Cluster name: pgcluster
Stack: corosync
Current DC: sdw1 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Wed Nov 4 11:55:19 2020
Last change: Wed Nov 4 11:44:41 2020 by root via cibadmin on mdw
3 nodes configured
5 resources configured
Online: [ mdw sdw1 sdw2 ]
Full list of resources:
vip-master (ocf::heartbeat:IPaddr2): Stopped
vip-slave (ocf::heartbeat:IPaddr2): Stopped
Master/Slave Set: msPostgresql [pgsql]
Stopped: [ mdw sdw1 sdw2 ]
Failed Resource Actions:
* pgsql_start_0 on mdw 'unknown error' (1): call=45, status=Timed Out, exitreason='',
last-rc-change='Wed Nov 4 11:43:49 2020', queued=0ms, exec=60001ms
* pgsql_start_0 on sdw2 'unknown error' (1): call=45, status=Timed Out, exitreason='',
last-rc-change='Wed Nov 4 11:43:49 2020', queued=0ms, exec=60001ms
* pgsql_start_0 on sdw1 'unknown error' (1): call=45, status=Timed Out, exitreason='',
last-rc-change='Wed Nov 4 11:43:49 2020', queued=0ms, exec=60002ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
我用的config.ini
如下
pcs_template=muti.pcs.template
OCF_ROOT=/usr/lib/ocf
RESOURCE_LIST="msPostgresql vip-master vip-slave"
pha4pgsql_dir=/opt/pha4pgsql
writer_vip=192.168.x.100
reader_vip=192.168.x.101
node1=mdw
node2=sdw1
node3=sdw2
othernodes=""
vip_nic=ens160
vip_cidr_netmask=24
pgsql_pgctl=/usr/pgsql-12/bin/pg_ctl
pgsql_psql=/usr/pgsql-12/bin/psql
pgsql_pgdata=/pgsql/data
pgsql_pgport=5432
pgsql_restore_command=""
pgsql_rep_mode=sync
pgsql_repuser=replication
pgsql_reppassord=replication
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.