Coder Social home page Coder Social logo

pha4pgsql's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pha4pgsql's Issues

如何在线添加一个节点

我一开始做的是双节点,如何在线添加一个节点做成1主2从啊?还是说需要停机重新部署pha4pgsql

LVS配置疑问

你好作者:
我在配置lvs的时候遇到了一个问题报错如下
Waiting for 3 replies from the CRMd... OK
Error: resource 'msPostgresql' is not running on any node
failed to execute "pcs resource enable msPostgresql --wait" rc=1
我按照文档,将pcs_template=muti_with_lvs.pcs.template 设置好了,是否还需要额外的步骤啊?

网关如何设置

在config_dual.ini.sample配置文件中,设置了vip_cidr_netmask=24、writer_vip=192.168.0.236,那网关呢?该如何设置?

有關node3的建置問題

想跟您請教幾個問題

  1. node3是否是一台額外的主機呢?
  2. node3也是要加在hosts裡面設定嗎?
  3. node3只是用來利用防止腦裂,因此當此機器當機時是否會影響到原來主機的運行呢?

以上問題 謝謝

PostgreSQL 11资源脚本bug

resource agent脚本,expgsql第2149行 $? -eq 2 里的$?被第一个判断修改了,ocf_version_cmp 结果应该使用变量保存才对

异地双中心双活,如何去做HA?

您好,首先感谢您提供的脚本,有点问题需要咨询下:
异地双中心双活,每个中心一主一备,我设想A中心采用pacemaker和corosync做的主备同步,B中心如何同步数据?因为异地的,应该不能和A中心一起做HA?
当然这个B中心在A中心活着的时候,仅为read only ,当A中心去世了,那我需要做些什么,才可以正常的切换到B中心?

谢谢!

pacemaker和corosync是Linux标准的ha方案吗

您好,本人是搞mssql的,从mssql2017开始,mssql支持在Linux上运行
然后 mssql在Linux的高可用方案变为 pacemaker和corosync 上搭建alwayson可用性组
请问pacemaker和corosync 是Linux标准的ha方案吗
如果楼主对微软有了解就知道,mssql会自带自己的ha方案,自带的ha方案都是微软自己实现的底层基础设施,用户不再需要考虑其他的ha方案
并且微软自带的ha方案都非常完善和傻瓜,而微软选择 第三方的pacemaker和corosync 而不是自己实现的ha底层基础设施

那就是说pacemaker和corosync 已经非常完善了,非常强大了,所以微软选择了pacemaker和corosync 而不再在Linux上造轮子?

怎么不使用postgres用户?

因为不是标准的postgresql,没有用户postgres,clt_start后是使用postgres用户去启动程序?这个用户的配置是在哪呢?

lvs负载均衡功能未正常启动.

首先,感谢您的代码,可以快速搭建pg集群.我在搭建过程中遇到了一些问题.lvs启动,显示未知错误.查看日志找不到相关信息.望给予帮助
TIM图片20200421192739
TIM图片20200421192747

增加对PG10版本的支持

2018-09-04 14:15:54 CST [15171-4] LOG: invalid record length at 0/30000098: wanted 24, got 0
2018-09-04 14:15:54 CST [15169-6] LOG: database system is ready to accept read only connections
2018-09-04 14:15:55 CST [15470-1] postgres@template1 ERROR: function pg_last_xlog_replay_location() does not exist at character 8
2018-09-04 14:15:55 CST [15470-2] postgres@template1 HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2018-09-04 14:15:55 CST [15470-3] postgres@template1 STATEMENT: select pg_last_xlog_replay_location(),pg_last_xlog_receive_location()

throttle_handle_load: High CPU load detected导致pgsql_monitor超时pg重启

执行一个delete从500w的表里删除13w条记录,发现Master挂了。查看日志,是RA monitor超时,认为资源失败。

Pacemaker 在发生Timed Out后重启postgres,迁移阈值是3,可以重启3次。
/var/log/messages

Nov 18 13:22:05 node1 expgsql(pgsql)[7774]: INFO: Stopping PostgreSQL on demote.
Nov 18 13:22:05 node1 expgsql(pgsql)[7774]: INFO: server shutting down
Nov 18 13:22:11 node1 expgsql(pgsql)[7774]: INFO: PostgreSQL is down
Nov 18 13:22:11 node1 expgsql(pgsql)[7774]: INFO: Changing pgsql-status on node1 : PRI->STOP.
Nov 18 13:22:11 node1 expgsql(pgsql)[8609]: INFO: PostgreSQL is already stopped.
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: Set all nodes into async mode.
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: server starting
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: PostgreSQL start command sent.
Nov 18 13:22:12 node1 expgsql(pgsql)[8714]: INFO: PostgreSQL is down
Nov 18 13:22:13 node1 expgsql(pgsql)[8714]: INFO: PostgreSQL is started.

corosync.log中发现在发生超时前,系统负载很高。
/var/log/cluster/corosync.log

Nov 18 13:20:55 [2353] node1       crmd:     info: throttle_handle_load:	Moderate CPU load detected: 12.060000
Nov 18 13:20:55 [2353] node1       crmd:     info: throttle_send_command:	New throttle mode: 0010 (was 0001)
Nov 18 13:21:25 [2353] node1       crmd:   notice: throttle_handle_load:	High CPU load detected: 16.379999
Nov 18 13:21:25 [2353] node1       crmd:     info: throttle_send_command:	New throttle mode: 0100 (was 0010)
Nov 18 13:21:44 [2350] node1       lrmd:  warning: child_timeout_callback:	pgsql_monitor_3000 process (PID 4822) timed out
Nov 18 13:21:44 [2350] node1       lrmd:  warning: operation_finished:	pgsql_monitor_3000:4822 - timed out after 60000ms
Nov 18 13:21:44 [2353] node1       crmd:    error: process_lrm_event:	Operation pgsql_monitor_3000: Timed Out (node=node1, call=837, timeout=60000ms)
Nov 18 13:21:44 [2348] node1        cib:     info: cib_process_request:	Forwarding cib_modify operation for section status to master (origin=local/crmd/462)

Pacemaker输出系统负载高告警,可能由于IO wait高
http://clusterlabs.org/pipermail/users/2015-May/000518.html

有人遇到类似的问题,处理办法就是增大monitor的超时时间。
https://bugs.launchpad.net/fuel/+bug/1464131
https://review.openstack.org/#/c/191715/1/deployment/puppet/pacemaker_wrappers/manifests/rabbitmq.pp

在线修改的方法如下:
pcs resource update pgsql op monitor interval=4s timeout=180s on-fail=restart
pcs resource update pgsql op monitor role=Master timeout=180s on-fail=restart interval=3s

lib/common.sh函数可能有误

您好,我在研究您的代码的过程中感觉,common.sh中的check_resource_started和stoped函数,for resource in $1,$1实际上每次都是获取第一个参数,无法获取整个参数列表,所以在cls_start 和cls_stop只是验证了第一个资源,后续的资源没有验证

writer/reader-vip 如何配置

感谢大神的脚本。我打算按三节点配置配置一个PG HA的集群,跟着教程走到这一步出错了

报错如下:

[root@mdw pha4pgsql]# cls_start
resource msPostgresql is NOT running
resource msPostgresql is NOT running
resource msPostgresql is NOT running
Cleaned up pgsql:0 on sdw2
Cleaned up pgsql:0 on sdw1
Cleaned up pgsql:0 on mdw
Cleaned up pgsql:1 on sdw2
Cleaned up pgsql:1 on sdw1
Cleaned up pgsql:1 on mdw
Cleaned up pgsql:2 on sdw2
Cleaned up pgsql:2 on sdw1
Cleaned up pgsql:2 on mdw
Error: resource 'msPostgresql' is not running on any node
failed to execute "pcs resource enable msPostgresql --wait" rc=1

我尝试自己排查原因,但我感觉是我没理解一些参数的含义,所以提一个issue...主要的疑惑在于:

  1. writer-vip和reader-vip在被客户端访问时,是如何转换到具体的IP上的,以及虚拟IP参数设置上有什么要注意的吗。

  2. 我这边的情况应该是三台虚拟机(公司里的,应该是一个物理机划分出来的),各有三个ip(192.168.x.94/95/96),从ifconfig看都是ens160,这种能做三节点PG HA吗。

感觉主要的错误就在于我盲目设置writer_vip=192.168.x.100, reader_vip=192.168.x.101,但我不是很懂这里应该如何设置。我看了一下这里的issue,好像是要配置一些网关之类的?但从集群未启动的原因来看又是因为资源根本没启动,果然还是要pcs的log...我先去找下资料,回来补充

接下来是我自己的理解和情况补充说明

clusterlab的PG备份集群,里面的说明图,它把虚拟IP1设在eth0上,虚拟IP2设在eth2上。然后我看教程里的node1/2/3和writer/reader_vip 都是192.168.0.231-237,所以我在想您的教程里是不是其实访问writer_vip/reader_vip就会指向具体的node1/2/3

我之后敲了如下的命令查看状态。

# cls_status
Cluster name: pgcluster
Stack: corosync
Current DC: sdw1 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Wed Nov  4 11:55:19 2020
Last change: Wed Nov  4 11:44:41 2020 by root via cibadmin on mdw

3 nodes configured
5 resources configured

Online: [ mdw sdw1 sdw2 ]

Full list of resources:

 vip-master	(ocf::heartbeat:IPaddr2):	Stopped
 vip-slave	(ocf::heartbeat:IPaddr2):	Stopped
 Master/Slave Set: msPostgresql [pgsql]
     Stopped: [ mdw sdw1 sdw2 ]

Failed Resource Actions:
* pgsql_start_0 on mdw 'unknown error' (1): call=45, status=Timed Out, exitreason='',
    last-rc-change='Wed Nov  4 11:43:49 2020', queued=0ms, exec=60001ms
* pgsql_start_0 on sdw2 'unknown error' (1): call=45, status=Timed Out, exitreason='',
    last-rc-change='Wed Nov  4 11:43:49 2020', queued=0ms, exec=60001ms
* pgsql_start_0 on sdw1 'unknown error' (1): call=45, status=Timed Out, exitreason='',
    last-rc-change='Wed Nov  4 11:43:49 2020', queued=0ms, exec=60002ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

我用的config.ini如下

pcs_template=muti.pcs.template
OCF_ROOT=/usr/lib/ocf
RESOURCE_LIST="msPostgresql vip-master vip-slave"
pha4pgsql_dir=/opt/pha4pgsql
writer_vip=192.168.x.100 
reader_vip=192.168.x.101 
node1=mdw
node2=sdw1
node3=sdw2
othernodes=""
vip_nic=ens160
vip_cidr_netmask=24
pgsql_pgctl=/usr/pgsql-12/bin/pg_ctl
pgsql_psql=/usr/pgsql-12/bin/psql
pgsql_pgdata=/pgsql/data
pgsql_pgport=5432
pgsql_restore_command=""
pgsql_rep_mode=sync
pgsql_repuser=replication
pgsql_reppassord=replication

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.