zabbix ha 官方推荐高可用集群

一直说写一篇关于zabbix官方推荐的zabbix 333 ha的实践文章、一直没时间写。今天、他来了。zabbix官方提供了一篇关于zabbix 333 ha配置的文章;但是由于很多内容没有写的很完善很具体、导致很多小伙伴在配置的过程中总是失败;包括我自己在配置的过程中也遇到了很多坑。主要原因还是因为对 High Availability 的基础组件和架构不太熟悉、那么今天我们就一起来看看到底如何部署zabbix 333 ha。

1、什么是 CentOS High Availability

CentOS High Availability 是红帽®企业 Linux ®高可用性附加组件,他允许服务从一个节点故障转移到另一个节点,而不会明显中断集群客户端,在传输过程中驱逐故障节点以防止数据损坏。此附加组件可以针对大多数应用程序(现成的和自定义的)和虚拟来宾进行配置,最多支持 16 个节点。高可用性附加组件具有集群管理器、锁管理、防护、命令行集群配置和 Conga 管理工具。

High Availability 高可用性集群又称Failover-Cluster(主动-被动集群)是生产环境中使用最广泛的集群类型之一,这种集群可以为我们提供服务的持续可用性,即使计算机组中的某个节点出现故障;如果服务器运行的应用程序由于某种原因(硬件故障)失败,集群管理软件(Pacemaker)将在另一个节点上重新启动应用程序。在生产环境中,这种类型的集群主要用于数据库、自定义应用程序和文件共享。故障转移不仅仅是启动一个应用程序,它有一系列的相关操作,像安装文件系统一样,配置网络和启动依赖应用程序。 CentOS 7 / RHEL 7使用 Pacemaker 支持故障转移集群。High Availability 也包括相关的服务组件、比如:pcs,pacemaker,corosync,fence-agents-all 以及其他的需要配置的对应软件。前面我们两次提到了 Pacemaker ,那么到底什么是 Pacemaker 呢?

1.1、什么是Pacemaker

Pacemaker是 Linux环境中使用最为广泛的开源集群资源管理器。Pacemaker利用集群基础架构(Corosync或者 Heartbeat)提供的消息和集群成员管理功能,实现节点和资源级别的故障检测和资源恢复,从而最大程度保证集群服务的高可用。从逻辑功能而言,Pacemaker在集群管理员所定义的资源规则驱动下,负责集群中软件服务的全生命周期管理,这种管理甚至包括整个软件系统以及软件系统彼此之间的交互。Pacemaker在实际应用中可以管理任何规模的集群,由于其具备强大的资源依赖模型,这使得集群管理员能够精确描述和表达集群资源之间的关系(包括资源的顺序和位置等关系)。同时,对于任何形式的软件资源,通过为其自定义资源启动与管理脚本(资源代理),几乎都能作为资源对象而被 Pacemaker管理。

但是 Pacemaker 只是一个资源管理器,并不提供集群心跳信息;由于任何高可用集群都必须具备心跳监测机制,因而很多初学者总会误以为 Pacemaker 本身具有心跳检测功能,而事实上 Pacemaker 的心跳机制主要基于 Corosync 或 Heartbeat 来实现。Pacemaker只是作为HA的资源管理器,所以不要想当然理解它能够直接管控资源,如果你的资源没有做脚本配置那么对于pacemaker来说它就是不可管理的。

Pacemaker特性:
1、监测并恢复节点和服务级别的故障;
2、存储无关,并不需要共享存储;
3、资源无关,任何能用脚本控制的资源都可以作为集群服务;
4、支持节点 STONITH功能以保证集群数据的完整性和防止集群脑裂;
5、支持大型或者小型集群;
6、支持 Quorum机制和资源驱动类型的集群;
7、支持几乎是任何类型的冗余配置;
8、自动同步各个节点的配置文件;
9、可以设定集群范围内的 Ordering、 Colocation and Anti-colocation等约束;
10、高级服务类型支持,例如:Clone功能:即那些要在多个节点运行的服务可以通过Clone功能实现,Clone功能将会在多个节点上启动相同的服务;Multi-state功能:即那些需要运行在多状态下的服务可以通过 Multi–state 实现,在高可用集群的服务中,有很多服务会运行在不同的高可用模式下,如:Active/Active模式或者 Active/passive模式等,并且这些服务可能会在 Active 与standby(Passive)之间切换。
11、具有统一的、脚本化的集群管理工具。

1.2、Pacemaker架构

提供消息和集群关系功能的集群核心基础组建(标红的部分);集群无关的组件(蓝色部分)。

在Pacemaker架构中,这部分不仅包含有怎么样启动,关闭,监控资源的脚本,而且还有一个本地的守护进程来消除这些脚本实现的(采用的)不同标准之间的差异。

大脑(绿色部分)处理并响应来自集群和资源的事件(比如节点的离开和加入,资源的失效),以及管理员对配置文件的修改。在对所有这些事件的响应中,Pacemaker会计算集群理想的状态,并规划一个途径来实现它。这个操作可能会包含移动资源,停止节点,甚至使用远程电源管理来强制使他们下线。

image-20210902112441226

Pacemaker – 资源管理器(CRM),负责启动和停止服务,而且保证它们是一直运行着的以及某个时刻某服务只在一个节点上运行(避免多服务同时操作数据造成的混乱)。
Corosync – 消息层组件(Messaging Layer),管理成员关系、消息和仲裁。
Resource Agents – 资源代理,实现在节点上接收 CRM 的调度对某一个资源进行管理的工具,这个管理的工具通常是脚本,所以我们通常称为资源代理。任何资源代理都要使用同一种风格,接收四个参数:{start|stop|restart|status},包括配置IP地址的也是。每个种资源的代理都要完成这四个参数据的输出。Pacemaker 的 RA 可以分为三种:(1)Pacemaker 自己实现的 (2)第三方实现的,比如 RabbitMQ 的 RA (3)自己实现的,比如 OpenStack 实现的它的各种服务的RA,这是 mysql 的 RA。

1.3、Pacemaker 内部组件

Pacemaker 作为一个独立的集群资源管理器项目,其本身由多个内部组件构成,这些内部组件彼此之间相互通信协作并最终实现了集群的资源管理。

CIB:集群信息基础( Cluster Information Base):集群信息基础,在内存中的一个xml格式集群配置文件,包含所有群集选项,节点,资源,他们彼此之间的关系和现状的定义。同步更新到所有集群节点。

CRMd:集群资源管理进程( Cluster Resource Manager deamon):集群资源管理守护进程,每个crmd上有一个cib用来定义维护资源, 主要是消息代理的PEngine和 LRM,还选举一个领导者(DC)统筹活动(包括启动/停止资源)的集群。

LRMd:本地资源管理进程(Local Resource Manager deamon):本地资源管理守护进程。它提供了一个通用的接口支持的资源类型。直接调用资源代理(脚本)。

PEngine(PE):策略引擎(PolicyEngine):根据当前状态和配置集群计算的下一个状态。产生一个过渡图,包含行动和依赖关系的列表。

STONITHd:集群 Fencing进程( Shoot The Other Node In The Head deamon)。

CIB主要负责集群最基本的信息配置与管理,Pacemaker中的 CIB主要使用 XML的格式来显示集群的配置信息和集群所有资源的当前状态信息。CIB所管理的配置信息会自动在集群节点之间进行同步, PE将会使用 CIB所提供的集群信息来规划集群的最佳运行状态。并根据当前 CIB信息规划出集群应该如何控制和操作资源才能实现这个最佳状态,在 PE做出决策之后,会紧接着发出资源操作指令,而 PE发出的指令列表最终会被转交给集群最初选定的控制器节点( Designated controller,DC),通常 DC便是运行 Master CRMd的节点。
在集群启动之初, pacemaker便会选择某个节点上的 CRM进程实例来作为集群 Master CRMd,然后集群中的 CRMd便会集中处理 PE根据集群 CIB信息所决策出的全部指令集。在这个过程中,如果作为 Master的 CRM进程出现故障或拥有 Master CRM进程的节点出现故障,则集群会马上在其他节点上重新选择一个新的 Master CRM进程。

在 PE的决策指令处理过程中, DC会按照指令请求的先后顺序来处理PEngine发出的指令列表,简单来说, DC处理指令的过程就是把指令发送给本地节点上的 LRMd(当前节点上的 CRMd已经作为 Master在集中控制整个集群,不会再并行处理集群指令)或者通过集群消息层将指令发送给其他节点上的 CRMd进程,然后这些节点上的 CRMd再将指令转发给当前节点的 LRMd去处理。当集群节点运行完指令后,运行有 CRMd进程的其他节点会把他们接收到的全部指令执行结果以及日志返回给 DC(即 DC最终会收集全部资源在运行集群指令后的结果和状态),然后根据执行结果的实际情况与预期的对比,从而决定当前节点是应该等待之前发起的操作执行完成再进行下一步的操作,还是直接取消当前执行的操作并要求 PEngine根据实际执行结果再重新规划集群的理想状态并发出操作指令。

在某些情况下,集群可能会要求节点关闭电源以保证共享数据和资源恢复的完整性,为此, Pacemaker引人了节点隔离机制,而隔离机制主要通过 STONITH进程实现。 STONITH是一种强制性的隔离措施, STONINH功能通常是依靠控制远程电源开关以关闭或开启节点来实现。在 Pacemaker中, STONITH设备被当成资源模块并被配置到集群信息 CIB中,从而使其故障情况能够被轻易地监控到。同时, STONITH进程( STONITHd)能够很好地理解 STONITH设备的拓扑情况,因此,当集群管理器要隔离某个节点时,只需 STONITHd的客户端简单地发出 Fencing某个节点的请求, STONITHd就会自动完成全部剩下的工作,即配置成为集群资源的 STONITH设备最终便会响应这个请求,并对节点做出 Fenceing操作,而在实际使用中,根据不同厂商的服务器类型以及节点是物理机还是虚拟机,用户需要选择不同的 STONITH设备。

1.4、Pacemaker 支持的集群模式

Pacemaker 支持多种类型的集群,包括 Active/Active,Active/Passive,N+1,N+M,N-to-1 and N-to-N 等。

Active/Active :在这种模式下,故障节点上的访问请求或自动转到另外一个正常运行节点上,或通过负载均衡器在剩余的正常运行的节点上进行负载均衡。这种模式下集群中的节点通常部署了相同的软件并具有相同的参数配置,同时各服务在这些节点上并行运行。

Active/Passive模式 :在这种模式下,每个节点上都部署有相同的服务实例,但是正常情况下只有一个节点上的服务实例处于激活状态,只有当前活动节点发生故障后,另外的处于 standby状态的节点上的服务才会被激活,这种模式通常意味着需要部署额外的且正常情况下不承载负载的硬件。

N+1模式:所谓的N+1就是多准备一个额外的备机节点,当集群中某一节点故障后该备机节点会被激活从而接管故障节点的服务。在不同节点安装和配置有不同软件的集群中,即集群中运行有多个服务的情况下,该备机节点应该具备接管任何故障服务的能力,而如果整个集群只运行同一个服务,则N+1模式便退变为 Active/Passive模式。

N+M模式:在单个集群运行多种服务的情况下,N+1模式下仅有的一个故障接管节点可能无法提供充分的冗余,因此,集群需要提供 M(M>l)个备机节点以保证集群在多个服务同时发生故障的情况下仍然具备高可用性, M的具体数目需要根据集群高可用性的要求和成本预算来权衡。

N-to-l模式:在 N-to-l模式中,允许接管服务的备机节点临时成为活动节点(此时集群已经没有备机节点),但是,当故障主节点恢复并重新加人到集群后,备机节点上的服务会转移到主节点上运行,同时该备机节点恢复 standby状态以保证集群的高可用。

N-to-N模式:N-to-N是 Active/Active模式和N+M模式的结合, N-to-N集群将故障节点的服务和访问请求分散到集群其余的正常节点中,在N-to-N集群中并不需要有Standby节点的存在、但是需要所有Active的节点均有额外的剩余可用资源。

2、什么是zabbix 333 ha

讲了那么多、我们为什么需要zabbix ha集群呢?因为 zabbix 在监控海量数据的设备时,单一 zabbix 已经无法满足冗余的要求,这时使用HA集群架构是非常有必要的;使用HA架构集群可以让切换宕机恢复事情变得更简单,而且这套HA经得住测试,官方介绍、目前来说在欧洲市场常见的,经过时间检验的解决方案就是我们今天要部署的zabbix 333 ha。他的架构如下:

image-20210901212902684

从上图我们可以看到,有3个数据库节点、3个服务器节点和3个前端节点;对于每个集群,都有一个虚拟IP (VIP),用于显示当前哪些节点处于活动状态;如果基本资源死亡或连接失败,节点将自动切换;手动控制也可以在出现问题或执行更新时进行覆盖,其思想是,用户可以通过Zabbix随时切换节点。因此,如果出现错误,或者只是想关闭第一个服务器节点,那么可以单击第二个节点并将资源转移到它,这可以通过Zabbix接口完成(这个我们后面会详细演示)。

我们先来看看在这种高可用集群下、MySQL复制的工作原理。为了理解MySQL复制是如何工作的,让我们来看看这个有3个节点的循环主-从设置:

image-20210901215410606

复制通过二进制日志进行,并且可以是异步的;假设需要更新或测试,并且中间节点被禁用,因此可以处理第一个节点。复制将停止,例如,如果将二进制日志的有效期设置为3天,并且将节点关闭3天,然后重新激活,则复制仍将被推到所有节点。无论禁用哪个节点,复制都将在重新激活后继续其停止的位置。

那么我们如何才能完整的部署zabbix 333 ha呢?首先,我们需要准备9个虚拟机,每个虚拟机在主机文件中都有一个独立的IP和一个有意义的主机名。此外,我们有3个vip与主机名分配给他们,这样会理解起来更加简单。

注:这里我们无需使用DNS或DHCP,我们推荐使用IP直连方式的方式来保证网络。然后我们需要在讲每个节点的基础环境初始化、初始化内容包括:时钟同步;本地化;防火墙(最好要关闭它);SELinux(总是一个麻烦制造者,所以我们只是禁用它);主机文件;存储(最好为DB、日志、应用程序和配置使用单独的块设备);所有节点上的Zabbix代理(启用远程命令,设置适当的IP地址)。

3、集群初始化

3.1、配置主机名和hosts

# 可以将下面的节点名称替换为自己的主机名称
hostnamectl set-hostname zabbix-ha-fe01
hostnamectl set-hostname zabbix-ha-fe02
hostnamectl set-hostname zabbix-ha-fe03

hostnamectl set-hostname zabbix-ha-srv01
hostnamectl set-hostname zabbix-ha-srv02
hostnamectl set-hostname zabbix-ha-srv03

hostnamectl set-hostname zabbix-ha-db01
hostnamectl set-hostname zabbix-ha-db02
hostnamectl set-hostname zabbix-ha-db03

# 配置hosts文件
cat >> /etc/hosts <<EOF
# IP for Frontend nodes
172.16.200.90   zabbix-ha-fe01
172.16.200.91   zabbix-ha-fe02
172.16.200.92   zabbix-ha-fe03

# IP for Server nodes
172.16.200.93   zabbix-ha-srv01
172.16.200.94   zabbix-ha-srv02
172.16.200.95   zabbix-ha-srv03

# IP for DB nodes
172.16.200.96   zabbix-ha-db01
172.16.200.97   zabbix-ha-db02
172.16.200.98   zabbix-ha-db03

# VIP for Cluster
172.16.200.87   zabbix-ha-app
172.16.200.88   zabbix-ha-fe-app
172.16.200.89   zabbix-ha-db-app
EOF

3.2、关闭防火墙

[root@zabbix-ha-db01 ~]# systemctl stop firewalld && systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@zabbix-ha-db01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

Aug 29 17:06:59 localhost.localdomain systemd[1]: Starting firewalld - dynamic firewall daemon...
Aug 29 17:07:00 localhost.localdomain systemd[1]: Started firewalld - dynamic firewall daemon.
Aug 29 17:12:55 cdh001 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Aug 29 17:12:56 cdh001 systemd[1]: Stopped firewalld - dynamic firewall daemon.
[root@zabbix-ha-db01 ~]# 
[root@zabbix-ha-db01 ~]# iptables -F

3.3、关闭SELinux

cat << EOF > /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disable
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted 
EOF

[root@zabbix-ha-db01 ~]# setenforce 0
[root@zabbix-ha-db01 ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disable
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted 
[root@zabbix-ha-db01 ~]# 

3.4、时间同步

数据库节点一操作

[root@zabbix-ha-db01 ~]# yum install ntp -y
[root@zabbix-ha-db01 ~]# cat /etc/ntp.conf 
......
server 3.centos.pool.ntp.org iburst

server 127.127.1.0 iburst local clock
restrict 172.16.200.0 mask 255.255.255.0 nomodify notrap

#broadcast 192.168.1.255 autokey        # broadcast server
......
[root@zabbix-ha-db01 ~]#

数据库节点二和数据库节点三操作,NTP时间服务验证:

[root@zabbix-ha-db01 ~]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+electrode.felix 85.10.240.253    3 u   13  256  377  202.393  -11.534   4.485
*tick.ntp.infoma .GPS.            1 u   15  256  377  236.613    3.073  20.278
 time.cloudflare .STEP.          16 u    - 1024    0    0.000    0.000   0.000
+139.199.215.251 100.122.36.4     2 u  169  256  377   33.075   11.587   3.598
 LOCAL(0)        .LOCL.           5 l  42m   64    0    0.000    0.000   0.000
[root@zabbix-ha-db01 ~]# 

[root@zabbix-ha-db02 ~]# systemctl stop ntpd
[root@zabbix-ha-db02 ~]# systemctl disable ntpd
[root@zabbix-ha-db02 ~]# ntpdate zabbix-ha-db01
 2 Sep 11:39:23 ntpdate[6999]: adjust time server 172.16.200.96 offset -0.000040 sec
[root@zabbix-ha-db02 ~]# 

# 配置每日同步
[root@zabbix-ha-db02 ~]#  crontab -e
no crontab for root - using an empty one
crontab: installing new crontab
[root@zabbix-ha-db02 ~]# 

# 插入内容如下,前面5个星号代表时间
* * * * * /usr/sbin/ntpdate zabbix-ha-db01

4、数据库集群

MySQL数据集群有多种方案,常见的方案主要分类两类:Replication,Percona XtraDB Cluster(PXC)。

Replication:Replication 数据同步是单向的,master负责写,然后异步复制给slave;如果slave写入数据,不会复制给master。优点是速度快、弱一致性、适合存储一些低价值的数据,比如日志、新闻、帖子等;缺点就是异步复制,从和主无法保证数据的一致性。

Percona XtraDB Cluster(PXC):Percona XtraDB Cluster(PXC)数据同步时双向的,任何一个mysql节点写入数据,都会同步到集群中其它的节点。优点是速度慢、强一致性、适合存储一些高价值的数据,比如订单信息、账户信息、财务数据等;缺点就是同步复制,事务在所有集群节点要么同时提交,要么同时不提交。

zabbix中大部分数据都是日志数据、所以这里我们采用 Replication 的复制方案;在MySQL中复制方案有多种模式,比如:主从模式、主主模式、链式复制模式、环形复制模式。以上4种模式为复制的主要模式,生产中一般建议部署为主从模式,这也是最稳健的一种方式。为了方便切换,在一定程度上提高可用性,也可以选择主主模式。需要注意的是,主主模式必须确保任何时刻都只有一个数据库是主动(Active)状态,也就是说同一个时刻只能写入一个主(Master)节点,否则可能导致数据异常。链式或环形复制在生产中很少用到,它们的主要缺点在于,随着节点的增加,整个复制系统的稳健性会下降;不巧的是、这里官方推荐的方案就是链式复制模式。好了、我们一起来看一下详细操作过程。

关于MySQL复制技术的相关内容请各位小伙伴参考这篇文章:https://blog.51cto.com/binghe001/2925939

4.1、部署 High Availability 组件

基础环境部署完成之后、我们开始部署 High Availability:

yum groupinstall 'High Availability' -y
# 或者
yum groupinstall ha –y

# 为集群创建一个用户:
[root@zabbix-ha-db01 ~]# echo <CLUSTER_PASSWORD> | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@zabbix-ha-db01 ~]# 

# 启动pcsd
[root@zabbix-ha-db01 ~]# systemctl start pcsd
[root@zabbix-ha-db01 ~]# systemctl enable pcsd
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@zabbix-ha-db01 ~]# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-09-01 19:27:29 CST; 6s ago
     Docs: man:pcsd(8)
           man:pcs(8)
 Main PID: 14720 (pcsd)
   CGroup: /system.slice/pcsd.service
           └─14720 /usr/bin/ruby /usr/lib/pcsd/pcsd

Sep 01 19:27:28 zabbix-ha-01 systemd[1]: Starting PCS GUI and remote configuration interface...
Sep 01 19:27:29 zabbix-ha-01 systemd[1]: Started PCS GUI and remote configuration interface.
[root@zabbix-ha-db01 ~]# ^C

# 所有节点执行完成以后,我们需要使用相同的密码对节点进行身份验证
[root@zabbix-ha-db01 corosync]# pcs cluster auth zabbix-ha-db01 zabbix-ha-db02 zabbix-ha-db03 -u hacluster
Password: 
zabbix-ha-db01: Authorized
zabbix-ha-db03: Authorized
zabbix-ha-db02: Authorized
[root@zabbix-ha-db01 corosync]# 

# 如果出现错误、看看是不是你的corosync没有启动
[root@zabbix-ha-db02 ~]# pcs cluster auth zabbix-ha-01 zabbix-ha-02 zabbix-ha-03 -u hacluster
Password: 
Error: Unable to communicate with zabbix-ha-01
Error: Unable to communicate with zabbix-ha-03
Error: Unable to communicate with zabbix-ha-02
[root@zabbix-ha-db02 ~]# 

# 启动corosync失败、查看日志发现没有配置文件
[root@zabbix-ha-db02 ~]# systemctl start corosync
Job for corosync.service failed because the control process exited with error code. See "systemctl status corosync.service" and "journalctl -xe" for details.
[root@zabbix-ha-db02 ~]# journalctl -xe
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-14.scope has finished starting up.
-- 
-- The start-up result is done.
Sep 02 10:22:01 zabbix-ha-db02 CROND[17445]: (root) CMD (/usr/sbin/ntpdate zabbix-ha-db01)
Sep 02 10:22:07 zabbix-ha-db02 CROND[17451]: (CRON) EXEC FAILED (/usr/sbin/sendmail): No such file or directory
Sep 02 10:22:07 zabbix-ha-db02 CROND[17443]: (root) MAIL (mailed 85 bytes of output but got status 0x0001
                                             )
Sep 02 10:22:10 zabbix-ha-db02 polkitd[5679]: Registered Authentication Agent for unix-process:17456:495622 (system bus name :1.55 [/
Sep 02 10:22:10 zabbix-ha-db02 systemd[1]: Starting Corosync Cluster Engine...
-- Subject: Unit corosync.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit corosync.service has begun starting up.
Sep 02 10:22:10 zabbix-ha-db02 corosync[17469]: Can't read file /etc/corosync/corosync.conf reason = (No such file or directory)
Sep 02 10:22:10 zabbix-ha-db02 corosync[17462]: Starting Corosync Cluster Engine (corosync): [FAILED]
Sep 02 10:22:10 zabbix-ha-db02 systemd[1]: corosync.service: control process exited, code=exited status=1
Sep 02 10:22:10 zabbix-ha-db02 systemd[1]: Failed to start Corosync Cluster Engine.
-- Subject: Unit corosync.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit corosync.service has failed.
-- 
-- The result is failed.
Sep 02 10:22:10 zabbix-ha-db02 systemd[1]: Unit corosync.service entered failed state.
Sep 02 10:22:10 zabbix-ha-db02 systemd[1]: corosync.service failed.
Sep 02 10:22:10 zabbix-ha-db02 polkitd[5679]: Unregistered Authentication Agent for unix-process:17456:495622 (system bus name :1.55,

[root@zabbix-ha-db02 ~]# 

# 提示没有corosync配置文件、我们把/etc/corosync/目录下面自带的配置文件模板拷贝一份出来
[root@zabbix-ha-db02 ~]# cd /etc/corosync/
[root@zabbix-ha-db02 corosync]# cp corosync.conf.example corosync.conf
[root@zabbix-ha-db02 corosync]# 
[root@zabbix-ha-db02 corosync]# systemctl start corosync
[root@zabbix-ha-db02 corosync]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-09-02 10:23:50 CST; 4s ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
  Process: 17566 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 17574 (corosync)
   CGroup: /system.slice/corosync.service
           └─17574 corosync

Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [SERV  ] Service engine loaded: corosync cluster closed process group serv...01 [2]
Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [QB    ] server name: cpg
Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [QB    ] server name: quorum
Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [TOTEM ] A new membership (127.0.0.1:5) was formed. Members joined: 2130706433
Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [CPG   ] downlist left_list: 0 received
Sep 02 10:23:49 zabbix-ha-db02 corosync[17574]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep 02 10:23:50 zabbix-ha-db02 corosync[17566]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Sep 02 10:23:50 zabbix-ha-db02 systemd[1]: Started Corosync Cluster Engine.
Hint: Some lines were ellipsized, use -l to show in full.
[root@zabbix-ha-db02 corosync]# 


# 然后再次使用相同的密码对节点进行身份验证:
[root@zabbix-ha-db01 corosync]# pcs cluster auth zabbix-ha-db01 zabbix-ha-db02 zabbix-ha-db03 -u hacluster
Password: 
zabbix-ha-db01: Authorized
zabbix-ha-db03: Authorized
zabbix-ha-db02: Authorized
[root@zabbix-ha-db01 corosync]# 

# 接下来的步骤将只在一个节点上执行—哪个节点并不重要,因为节点将同步
# 创建数据库集群并添加资源。在这个最小的设置中,我们唯一的资源是DB集群的一个VIP地址。
# 创建一个DB集群
[root@zabbix-ha-db01 ~]# pcs cluster setup --name zabbix_db_cluster \
zabbix-ha-db01 zabbix-ha-db02 zabbix-ha-db03 --force 
Destroying cluster on nodes: zabbix-ha-db01, zabbix-ha-db02, zabbix-ha-db03...
zabbix-ha-db01: Stopping Cluster (pacemaker)...
zabbix-ha-db02: Stopping Cluster (pacemaker)...
zabbix-ha-db03: Stopping Cluster (pacemaker)...
zabbix-ha-db01: Successfully destroyed cluster
zabbix-ha-db02: Successfully destroyed cluster
zabbix-ha-db03: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'zabbix-ha-db01', 'zabbix-ha-db02', 'zabbix-ha-db03'
zabbix-ha-db01: successful distribution of the file 'pacemaker_remote authkey'
zabbix-ha-db02: successful distribution of the file 'pacemaker_remote authkey'
zabbix-ha-db03: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
zabbix-ha-db01: Succeeded
zabbix-ha-db02: Succeeded
zabbix-ha-db03: Succeeded

Synchronizing pcsd certificates on nodes zabbix-ha-db01, zabbix-ha-db02, zabbix-ha-db03...
zabbix-ha-db01: Success
zabbix-ha-db03: Success
zabbix-ha-db02: Success
Restarting pcsd on the nodes in order to reload the certificates...
zabbix-ha-db01: Success
zabbix-ha-db03: Success
zabbix-ha-db02: Success
[root@zabbix-ha-db01 ~]#  

# 开机启动集群
[root@zabbix-ha-db01 ~]# pcs cluster enable --all
zabbix-ha-db01: Cluster Enabled
zabbix-ha-db02: Cluster Enabled
zabbix-ha-db03: Cluster Enabled
[root@zabbix-ha-db01 ~]#

# 启动集群
[root@zabbix-ha-db01 ~]# pcs cluster start --all
zabbix-ha-db01: Starting Cluster (corosync)...
zabbix-ha-db02: Starting Cluster (corosync)...
zabbix-ha-db03: Starting Cluster (corosync)...
zabbix-ha-db03: Starting Cluster (pacemaker)...
zabbix-ha-db02: Starting Cluster (pacemaker)...
zabbix-ha-db01: Starting Cluster (pacemaker)...
[root@zabbix-ha-db01 ~]# 

# 查看集群状态
[root@zabbix-ha-db01 ~]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: zabbix-ha-db02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
 Last updated: Thu Sep  2 10:27:38 2021
 Last change: Thu Sep  2 10:27:37 2021 by hacluster via crmd on zabbix-ha-db02
 3 nodes configured
 0 resource instances configured

PCSD Status:
  zabbix-ha-db03: Online
  zabbix-ha-db02: Online
  zabbix-ha-db01: Online
[root@zabbix-ha-db01 ~]# 

# 查看集群状态
[root@zabbix-ha-db01 ~]# pcs status 
Cluster name: zabbix_db_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: zabbix-ha-db02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Sep  2 10:27:58 2021
Last change: Thu Sep  2 10:27:37 2021 by hacluster via crmd on zabbix-ha-db02

3 nodes configured
0 resource instances configured

Online: [ zabbix-ha-db01 zabbix-ha-db02 zabbix-ha-db03 ]

No resources


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@zabbix-ha-db01 ~]#  

# 查看节点状态
[root@zabbix-ha-db01 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 172.16.200.96
        status  = ring 0 active with no faults

# 检查集群成员关系及Quorum API
[root@zabbix-ha-db01 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(172.16.200.96) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(172.16.200.97) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(172.16.200.98) 
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined

# 查看corosync状态
[root@zabbix-ha-db01 ~]# pcs status corosync 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 zabbix-ha-db01 (local)
         2          1 zabbix-ha-db02
         3          1 zabbix-ha-db03
[root@zabbix-ha-db01 ~]# 

# 创建一个源的虚拟IP(VIP)
[root@zabbix-ha-db01 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
> ip=172.16.200.89 op monitor interval=5s --group zabbix_db_cluster

# 查看集群资源
[root@zabbix-ha-db01 ~]# pcs resource show
 Resource Group: zabbix_db_cluster
     ClusterIP  (ocf::heartbeat:IPaddr2):       Stopped
[root@zabbix-ha-db01 ~]# pcs status
Cluster name: zabbix_db_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: zabbix-ha-db02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Sep  2 10:30:47 2021
Last change: Thu Sep  2 10:30:22 2021 by root via cibadmin on zabbix-ha-db01

3 nodes configured
1 resource instance configured

Online: [ zabbix-ha-db01 zabbix-ha-db02 zabbix-ha-db03 ]

Full list of resources:

 Resource Group: zabbix_db_cluster
     ClusterIP  (ocf::heartbeat:IPaddr2):       Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

# 启动集群
[root@zabbix-ha-db01 ~]# pcs cluster stop --all && pcs cluster start --all
zabbix-ha-db03: Stopping Cluster (pacemaker)...
zabbix-ha-db02: Stopping Cluster (pacemaker)...
zabbix-ha-db01: Stopping Cluster (pacemaker)...
zabbix-ha-db01: Stopping Cluster (corosync)...
zabbix-ha-db03: Stopping Cluster (corosync)...
zabbix-ha-db02: Stopping Cluster (corosync)...
zabbix-ha-db01: Starting Cluster (corosync)...
zabbix-ha-db02: Starting Cluster (corosync)...
zabbix-ha-db03: Starting Cluster (corosync)...
zabbix-ha-db02: Starting Cluster (pacemaker)...
zabbix-ha-db03: Starting Cluster (pacemaker)...
zabbix-ha-db01: Starting Cluster (pacemaker)...
[root@zabbix-ha-db01 ~]# 

# 添加防火墙策略规则
[root@zabbix-ha-db01 ~]# firewall-cmd --permanent --add-service=high-availability && firewall-cmd --reload
FirewallD is not running

# 防止虚拟VIP资源在恢复后移动
[root@zabbix-ha-db01 ~]# pcs resource defaults resource-stickiness=100
Warning: Defaults do not apply to resources which override them with their own defined values

# 如果你不使用防火墙策略,同样的禁止STONITH
[root@zabbix-ha-db01 ~]# pcs property set stonith-enabled=false

# 集群VIP已经成功创建
[root@zabbix-ha-db01 ~]# ping 172.16.200.89
PING 172.16.200.89 (172.16.200.89) 56(84) bytes of data.
64 bytes from 172.16.200.89: icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from 172.16.200.89: icmp_seq=2 ttl=64 time=0.024 ms
64 bytes from 172.16.200.89: icmp_seq=3 ttl=64 time=0.022 ms
^C
--- 172.16.200.89 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.022/0.024/0.026/0.001 ms
[root@zabbix-ha-db01 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:01:c6:e5 brd ff:ff:ff:ff:ff:ff
    inet 172.16.200.96/24 brd 172.16.200.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
    inet 172.16.200.89/24 brd 172.16.200.255 scope global secondary ens192
       valid_lft forever preferred_lft forever
    inet6 fe80::2efa:23b1:e0af:4282/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@zabbix-ha-db01 ~]#

注:上面的很多常用操作命令是我自己补充的、后面的2个集群操作的过程中会频繁用到;后面我就不再写了、小伙伴们请自行参考。

Pacemaker 资源管理器默认提供一个WEB管理界面、我们使用前面的集群账号和密码进行登录。

image-20210902103341524

登录进去之后,我们可以看到当前没有集群显示,那是因为我们还没有把前面创建的集群添加进来而已。我们点击 Add Exising 添加集群。

image-20210902103523229

输入集群的 IP 地址和端口号

image-20210902103552520

输入集群密码添加。

image-20210902103607760

添加完成以后我们可以点击集群进入、可以看到当前的寄存是Cluster zabbix_db_cluster,有是三个节点。我们也可以看到集群的组成、组件是否连接以及正常运行时间;你有控制权,你不需要再手动敲命令了。

image-20210902103754531

注:有关 pcsd Web UI 配置高可用性集群的更多信息各位小伙伴可以自行查看官方文档:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/ch-pcsd-haar 后面的两个集群请各位小伙伴采用同样的方式操作。

4.2、部署数据库

现在我们就可以开始部署数据库了、zabbix 推荐我们安装 maraidb 数据库、这里我们来安装 mysql 5.7 数据库;我们采用 Yum 在线安装、我们去MySQL官方网站把Yum存储库文件下载到服务器上;最新的Yum存储库可以安装MySQL所有的软件版本:

# 下载MySQL存储库
[root@zabbix-ha-db01 ~]# wget https://repo.mysql.com//mysql80-community-release-el7-3.noarch.rpm
[root@zabbix-ha-db01 ~]# ls
anaconda-ks.cfg  ip=172.16.200.89  mysql80-community-release-el7-3.noarch.rpm  orce

# 安装MySQL存储库
[root@zabbix-ha-db01 ~]# rpm -Uvh mysql80-community-release-el7-3.noarch.rpm 
warning: mysql80-community-release-el7-3.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:mysql80-community-release-el7-3  ################################# [100%]
[root@zabbix-ha-db01 ~]# 

# 更新存储库
[root@zabbix-ha-db01 ~]# yum makecache
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.ustc.edu.cn
 * extras: mirrors.ustc.edu.cn
 * updates: mirrors.ustc.edu.cn
......

更新完存储库之后再次执行 yum repolist 命令我们可以看到存储库里面新增了MySQL数据库的存储库。

[root@zabbix-ha-db01 ~]# yum repolist
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.ustc.edu.cn
 * extras: mirrors.ustc.edu.cn
 * updates: mirrors.ustc.edu.cn
repo id                                    repo name                                   status
base/7/x86_64                              CentOS-7 - Base                             10,072
extras/7/x86_64                            CentOS-7 - Extras                           498
mariadb                                    MariaDB                                     93
mysql-connectors-community/x86_64          MySQL Connectors Community                  212
mysql-tools-community/x86_64               MySQL Tools Community                       132
mysql80-community/x86_64                   MySQL 8.0 Community Server                  283
updates/7/x86_64                           CentOS-7 - Updates                          2,741
repolist: 14,031
[root@zabbix-ha-db01 ~]# 

默认情况下,默认启用最新GA系列(当前为MySQL 8.0)的子存储库,而所有其他系列(例如,MySQL 5.7系列)的子存储库均被禁用。使用此命令可查看MySQL Yum存储库中的所有子存储库,并查看已启用或禁用了哪些子存储库。

[root@zabbix-ha-db01 ~]# yum repolist all | grep mysql
mysql-cluster-7.5-community/x86_64 MySQL Cluster 7.5 Community   disabled
mysql-cluster-7.5-community-source MySQL Cluster 7.5 Community - disabled
mysql-cluster-7.6-community/x86_64 MySQL Cluster 7.6 Community   disabled
mysql-cluster-7.6-community-source MySQL Cluster 7.6 Community - disabled
mysql-cluster-8.0-community/x86_64 MySQL Cluster 8.0 Community   disabled
mysql-cluster-8.0-community-source MySQL Cluster 8.0 Community - disabled
mysql-connectors-community/x86_64  MySQL Connectors Community    enabled:    212
mysql-connectors-community-source  MySQL Connectors Community -  disabled
mysql-tools-community/x86_64       MySQL Tools Community         enabled:    132
mysql-tools-community-source       MySQL Tools Community - Sourc disabled
mysql-tools-preview/x86_64         MySQL Tools Preview           disabled
mysql-tools-preview-source         MySQL Tools Preview - Source  disabled
mysql55-community/x86_64           MySQL 5.5 Community Server    disabled
mysql55-community-source           MySQL 5.5 Community Server -  disabled
mysql56-community/x86_64           MySQL 5.6 Community Server    disabled
mysql56-community-source           MySQL 5.6 Community Server -  disabled
mysql57-community/x86_64           MySQL 5.7 Community Server    disabled
mysql57-community-source           MySQL 5.7 Community Server -  disabled
mysql80-community/x86_64           MySQL 8.0 Community Server    enabled:    283
mysql80-community-source           MySQL 8.0 Community Server -  disabled
[root@zabbix-ha-db01 ~]# 

从上面我们可以看到当前除了 mysql80-community/x86_64 、其他的MySQL版本都已经被禁止了;这里我们需要安装5.7版本的,所以我们把 MySQL 8.0 进行禁用,然后再启用 MySQL 5.7 版本。

# 禁用MySQL 8.0
yum-config-manager --disable mysql80-community

# 启用MySQL 5.7
yum-config-manager --enable mysql57-community

# 注:运行 yum-config-manager命令提示 -bash: yum-config-manager: command not found 命令未找到,因为 yum-config-manager 在 yum-utils包里面;由于系统默认没有这个命令,需要另外进行安装。
yum -y install yum-utils

然后我们执行下面的命令安装 MySQL 5.7 数据库

# 安装MySQL数据库
yum install mysql-community-server -y

Yum安装完成以后、MySQL默认的数据目录在 /var/lib/mysql 里面;这里我们通过下面的命令来启动并初始化 MySQL(初始化过程中MySQL会在 /var/lib/mysql 目录自动生成数据文件) :

[root@zabbix-ha-db01 ~]# cd /var/lib/mysql
[root@zabbix-ha-db01 mysql]# ls
[root@zabbix-ha-db01 mysql]# systemctl start mysqld
[root@zabbix-ha-db01 mysql]# ls
auto.cnf    client-cert.pem  ibdata1      ibtmp1      mysql.sock.lock     public_key.pem   sys
ca-key.pem  client-key.pem   ib_logfile0  mysql       performance_schema  server-cert.pem
ca.pem      ib_buffer_pool   ib_logfile1  mysql.sock  private_key.pem     server-key.pem
[root@zabbix-ha-db01 mysql]# 

# 设置MySQL开机启动
[root@zabbix-ha-db01 mysql]# systemctl enable mysqld
[root@zabbix-ha-db01 mysql]# mysql --version
mysql  Ver 14.14 Distrib 5.7.35, for Linux (x86_64) using  EditLine wrapper

# 查看MySQL服务器初始化的时候会创建 'root@localhost' 超级用户账号;设置超级用户的密码并将其存储在错误日志文件中。
[root@zabbix-ha-db01 mysql]# grep 'password' /var/log/mysqld.log 
2021-09-02T09:00:28.058857Z 1 [Note] A temporary password is generated for root@localhost: M>!yamC.8hjr
[root@zabbix-ha-db01 mysql]# 

安装完成之后我们通过 mysql –version 命令查看一下MySQL数据库版本、并用上面查询到的默认密码登录数据库;我们还需要把默认的数据库密码修改掉、然后授权用户可以远程访问、具体操作如下:

# 查看MySQL版本
[root@zabbix-ha-db01 mysql]# mysql --version
mysql  Ver 14.14 Distrib 5.7.35, for Linux (x86_64) using  EditLine wrapper
[root@zabbix-ha-db01 mysql]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.35

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# 更新数据库密码
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY '<CLUSTER_PASSWORD>';
Query OK, 0 rows affected (0.00 sec)

# 修改root用户的host权限为%
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> update user set host = "%" where user = "root";
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

# 刷新权限
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit;
Bye
[root@zabbix-ha-db01 mysql]# 

配置文件:

[root@zabbix-ha-db01 ~]# cat /etc/my.cnf 
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

# ZABBIX specific settings and tuning
default-storage-engine          = InnoDB
innodb                          = FORCE
innodb_file_per_table           = 1
innodb_buffer_pool_size         = 3072M         # 50-75% of total RAM
innodb_buffer_pool_instances    = 8             # For MySQL 5.5 - 4, for 5.6+ - 8
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_io_capacity              = 800           # HDD disks 500-800,    SSD disks - 2000
sync-binlog                     = 0
query-cache-size                = 0
# server_id和report_host 按照实际节点修改
server_id                       = 96             # for id settings IPs last number used
report_host                     = zabbix-ha-db01

log-slave-updates               = on
log_bin                         = /var/lib/mysql/log-bin
log_bin_index                   = /var/lib/mysql/log-bin.index
relay_log                       = /var/lib/mysql/relay-bin
relay_log_index                 = /var/lib/mysql/relay-bin.index
binlog_format                   = mixed
binlog_cache_size               = 64M
max_binlog_size                 = 1G
expire_logs_days                = 5
binlog_checksum                 = crc32
max_allowed_packet              = 500M
gtid_mode                       = on
enforce_gtid_consistency        = on
[root@zabbix-ha-db01 ~]# 

# 配置完成之后重启mysql数据库
[root@zabbix-ha-db01 ~]# systemctl restart mysqld

4.3、配置数据库

好了、我们要再次把前面的数据库集群架构搬出来了;为了理解MySQL复制是如何工作的,让我们来看看这个有3个节点的循环主-从设置。我们先从 zabbix-ha-db01 节点开始:

image-20210901215410606

配置 zabbix-ha-db01 节点:

[root@zabbix-ha-db01 mysql]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> STOP SLAVE;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000003
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'172.16.200.97' identified by '<CLUSTER_PASSWORD>';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000003      # file后面会用到
         Position: 447                 # POS后面会用到
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: fbb75b9e-0ca8-11ec-a951-000c2901c6e5:1
1 row in set (0.00 sec)

mysql> exit
Bye
[root@zabbix-ha-db01 mysql]# 
# 注:这里我们需要保存log-bin文件的详细信息及其位置,后面会用到。

重复执行 zabbix-ha-db02 节点:

注:这里我们要进入 zabbix-ha-db02 节点,然后停止slave模式,修改节点二为Master。

[root@zabbix-ha-db02 ~]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> STOP SLAVE;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000002
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

mysql> CHANGE MASTER TO MASTER_HOST ='172.16.200.96', MASTER_USER = 'replicator', MASTER_PASSWORD = '<CLUSTER_PASSWORD>', MASTER_LOG_FILE = 'log-bin.000003', MASTER_LOG_POS = 447;
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'172.16.200.98' identified by '<CLUSTER_PASSWORD>';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> RESET MASTER;
Query OK, 0 rows affected (0.01 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000001
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.200.96
                  Master_User: replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: log-bin.000003
          Read_Master_Log_Pos: 447
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 318
        Relay_Master_Log_File: log-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 447
              Relay_Log_Space: 519
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 96
                  Master_UUID: fbb75b9e-0ca8-11ec-a951-000c2901c6e5
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000001
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

mysql> exit
Bye
[root@zabbix-ha-db02 ~]# 

重复执行 zabbix-ha-db03 节点:

[root@zabbix-ha-db03 ~]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> STOP SLAVE;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CHANGE MASTER TO MASTER_HOST = '172.16.200.97', MASTER_USER = 'replicator', MASTER_PASSWORD = '<CLUSTER_PASSWORD>', MASTER_LOG_FILE='log−bin.000001', MASTER_LOG_POS = 154;
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000002
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'172.16.200.96' identified by '<CLUSTER_PASSWORD>';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> RESET MASTER;
Query OK, 0 rows affected (0.01 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 172.16.200.97
                  Master_User: replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: log−bin.000001
          Read_Master_Log_Pos: 154
               Relay_Log_File: relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: log−bin.000001
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 97
                  Master_UUID: fafc6364-0ca8-11ec-870e-000c29ed0563
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 210903 19:51:54
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000001
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

# 从上面我们可以看到SLAVE和MASTER之间链接失败、我们把SLAVE停止、然后重新 RESET SLAVE 
mysql> STOP SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> RESET SLAVE;
Query OK, 0 rows affected (0.01 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.200.97
                  Master_User: replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: log-bin.000002
          Read_Master_Log_Pos: 154
               Relay_Log_File: relay-bin.000005
                Relay_Log_Pos: 363
        Relay_Master_Log_File: log-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 609
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 97
                  Master_UUID: fafc6364-0ca8-11ec-870e-000c29ed0563
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

mysql> show master status \G
*************************** 1. row ***************************
             File: log-bin.000001
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 
1 row in set (0.00 sec)

mysql> 

mysql> exit
Bye
[root@zabbix-ha-db03 ~]# 

重复执行 zabbix-ha-db01 节点:

mysql> STOP SLAVE;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CHANGE MASTER TO MASTER_HOST ='172.16.200.98', MASTER_USER = 'replicator', MASTER_PASSWORD = '<CLUSTER_PASSWORD>', MASTER_LOG_FILE='log-bin.000001', MASTER_LOG_POS =154;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.200.98
                  Master_User: replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: log-bin.000001
          Read_Master_Log_Pos: 154
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 318
        Relay_Master_Log_File: log-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 519
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 98
                  Master_UUID: fb0672de-0ca8-11ec-8ba1-000c2995f4d6
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: fbb75b9e-0ca8-11ec-a951-000c2901c6e5:1
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: log-bin.000004
         Position: 194
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: fbb75b9e-0ca8-11ec-a951-000c2901c6e5:1
1 row in set (0.00 sec)

mysql> exit
Bye
[root@zabbix-ha-db01 mysql]#

好了、环形数据库集群就已经配置完成了;现在我们可以到三个节点中的任何一个,执行任何 SQL 查询,你所执行的 SQL 语句会将被复制到所有其他节点;下面我们来创建数据库并验证是否数据库集群是否正常。

4.4、创建数据库

[root@zabbix-ha-db01 mysql]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database zabbix character set utf8 collate utf8_bin;
Query OK, 1 row affected (0.00 sec)

mysql> grant all privileges on zabbix.* to zabbix@'%' identified by '<CLUSTER_PASSWORD>'; 
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| zabbix             |
+--------------------+
5 rows in set (0.00 sec)

mysql> exit
Bye
[root@zabbix-ha-db01 mysql]# 

# 到另外两个验证并查看是否有zabbix数据库
# 节点二
[root@zabbix-ha-db02 ~]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| zabbix             |
+--------------------+
5 rows in set (0.00 sec)

mysql> exit
Bye
[root@zabbix-ha-db02 ~]# 

# 节点三
[root@zabbix-ha-db03 ~]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.7.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| zabbix             |
+--------------------+
5 rows in set (0.01 sec)

mysql> exit
Bye
[root@zabbix-ha-db03 ~]# 

常用操作命令:

SHOW BINARY LOGS;
mysql> SHOW BINARY LOGS;
+----------------+-----------+
| Log_name       | File_size |
+----------------+-----------+
| log-bin.000001 |       177 |
| log-bin.000002 |       177 |
| log-bin.000003 |       470 |
| log-bin.000004 |       683 |
+----------------+-----------+
4 rows in set (0.00 sec)
SHOW SLAVE STATUS;
SHOW MASTER STATUS\G
RESET MASTER; ## removes all binary log files that are listed in the index file, leaving
## only a single, empty binary log file with a numeric suffix of .000001
RESET MASTER TO 1234; ## reset to specific binary log position
PURGE BINARY LOGS BEFORE '2019-10-11 00:20:00';
## Numbering is not reset, may be safely used while replication
## slaves are running.
FLUSH BINARY LOGS; ## Will reset state of binary logs and restarts numbering

5、zabbix server 集群

5.1、部署 High Availability 组件

yum groupinstall 'High Availability' -y
# 或者
yum groupinstall ha –y

[root@zabbix-ha-srv01 ~]# systemctl enable pcsd corosync pacemaker --now
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
[root@zabbix-ha-srv01 ~]# 

[root@zabbix-ha-srv01 corosync]# echo <CLUSTER_PASSWORD> | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@zabbix-ha-srv01 corosync]# 

5.2、部署zabbix server

[root@zabbix-ha-srv01 ~]# rpm -Uvh https://repo.zabbix.com/zabbix/5.0/rhel/7/x86_64/zabbix-release-5.0-1.el7.noarch.rpm
Retrieving https://repo.zabbix.com/zabbix/5.0/rhel/7/x86_64/zabbix-release-5.0-1.el7.noarch.rpm
warning: /var/tmp/rpm-tmp.cBNQ0c: Header V4 RSA/SHA512 Signature, key ID a14fe591: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
   1:zabbix-release-5.0-1.el7         ################################# [100%]
[root@zabbix-ha-srv01 corosync]# yum clean all
Loaded plugins: fastestmirror
Cleaning repos: base extras updates zabbix zabbix-non-supported
Cleaning up list of fastest mirrors
[root@zabbix-ha-srv01 ~]# 

[root@zabbix-ha-srv01 ~]# yum install zabbix-server-mysql zabbix-agent -y
Loaded plugins: fastestmirror
Determining fastest mirrors
 * base: mirrors.163.com
 * extras: mirrors.aliyun.com
 * updates: mirrors.163.com
base                                                                                                          | 3.6 kB  00:00:00     
extras                        
......
Installed:
  zabbix-agent.x86_64 0:5.0.15-1.el7                zabbix-server-mysql.x86_64 0:5.0.15-1.el7                           
Dependency Installed:
  fping.x86_64 0:3.16-1.el7                         unixODBC.x86_64 0:2.3.1-14.el7                                 
Complete!
[root@zabbix-ha-srv01 ~]# 

[root@zabbix-ha-srv01 ~]# find / -name zabbix_server.conf
/etc/zabbix/zabbix_server.conf
[root@zabbix-ha-srv01 ~]# 
[root@zabbix-ha-srv01 ~]# grep -v '#' /etc/zabbix/zabbix_server.conf
SourceIP=172.16.200.87
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=172.16.200.89
DBName=zabbix
DBUser=zabbix
DBPassword=<CLUSTER_PASSWORD>
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
Timeout=4
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
LogSlowQueries=3000
StatsAllowedIP=127.0.0.1
[root@zabbix-ha-srv01 ~]# 


[root@zabbix-ha-srv01 ~]# pcs cluster setup --name zabbix_server_cluster \
> zabbix-ha-srv01 zabbix-ha-srv02 zabbix-ha-srv03 --force 
Destroying cluster on nodes: zabbix-ha-srv01, zabbix-ha-srv02, zabbix-ha-srv03...
zabbix-ha-srv02: Stopping Cluster (pacemaker)...
zabbix-ha-srv03: Stopping Cluster (pacemaker)...
zabbix-ha-srv01: Stopping Cluster (pacemaker)...
zabbix-ha-srv02: Successfully destroyed cluster
zabbix-ha-srv03: Successfully destroyed cluster
zabbix-ha-srv01: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'zabbix-ha-srv01', 'zabbix-ha-srv02', 'zabbix-ha-srv03'
zabbix-ha-srv03: successful distribution of the file 'pacemaker_remote authkey'
zabbix-ha-srv01: successful distribution of the file 'pacemaker_remote authkey'
zabbix-ha-srv02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
Warning: Unable to set pcsd configs on zabbix-ha-srv01
Warning: Unable to set pcsd configs on zabbix-ha-srv02
Warning: Unable to set pcsd configs on zabbix-ha-srv03
zabbix-ha-srv01: Succeeded
zabbix-ha-srv02: Succeeded
zabbix-ha-srv03: Succeeded

Synchronizing pcsd certificates on nodes zabbix-ha-srv01, zabbix-ha-srv02, zabbix-ha-srv03...
zabbix-ha-srv03: Success
zabbix-ha-srv02: Success
zabbix-ha-srv01: Success
Restarting pcsd on the nodes in order to reload the certificates...
zabbix-ha-srv03: Success
zabbix-ha-srv01: Success
zabbix-ha-srv02: Success
[root@zabbix-ha-srv01 ~]#

[root@zabbix-ha-srv01 ~]# pcs cluster stop --all && pcs cluster start --all
zabbix-ha-srv03: Stopping Cluster (pacemaker)...
zabbix-ha-srv01: Stopping Cluster (pacemaker)...
zabbix-ha-srv02: Stopping Cluster (pacemaker)...
zabbix-ha-srv03: Stopping Cluster (corosync)...
zabbix-ha-srv01: Stopping Cluster (corosync)...
zabbix-ha-srv02: Stopping Cluster (corosync)...
zabbix-ha-srv01: Starting Cluster (corosync)...
zabbix-ha-srv02: Starting Cluster (corosync)...
zabbix-ha-srv03: Starting Cluster (corosync)...
zabbix-ha-srv03: Starting Cluster (pacemaker)...
zabbix-ha-srv02: Starting Cluster (pacemaker)...
zabbix-ha-srv01: Starting Cluster (pacemaker)...
[root@zabbix-ha-srv01 ~]# pcs cluster enable --all
zabbix-ha-srv01: Cluster Enabled
zabbix-ha-srv02: Cluster Enabled
zabbix-ha-srv03: Cluster Enabled
[root@zabbix-ha-srv01 ~]# pcs property set stonith-enabled=false
[root@zabbix-ha-srv01 ~]# pcs resource defaults resource-stickiness=100
Warning: Defaults do not apply to resources which override them with their own defined values
[root@zabbix-ha-srv01 ~]# pcs resource create virtual_ip_server ocf:heartbeat:IPaddr2 \
> ip=172.16.200.87 op monitor interval=5s --group zabbix_server_cluster
[root@zabbix-ha-srv01 ~]# pcs resource create ZabbixServer systemd:zabbix-server op monitor \
> interval=10s --group zabbix_server_cluster
[root@zabbix-ha-srv01 ~]# 
[root@zabbix-ha-srv01 ~]# pcs constraint colocation add virtual_ip_server ZabbixServer INFINITY --force
[root@zabbix-ha-srv01 ~]# pcs constraint order virtual_ip_server then ZabbixServer
Adding virtual_ip_server ZabbixServer (kind: Mandatory) (Options: first-action=start then-action=start)
[root@zabbix-ha-srv01 ~]# pcs resource op add ZabbixServer start interval=0s timeout=60s
start interval=0s timeout=100 (ZabbixServer-start-interval-0s)
[root@zabbix-ha-srv01 ~]# pcs resource op add ZabbixServer stop interval=0s timeout=120s
stop interval=0s timeout=100 (ZabbixServer-stop-interval-0s)
[root@zabbix-ha-srv01 ~]# 

5.3、集群验证

[root@zabbix-ha-srv01 ~]# pcs status
Cluster name: zabbix_server_cluster
Stack: corosync
Current DC: NONE
Last updated: Sat Sep  4 09:43:14 2021
Last change: Sat Sep  4 09:25:48 2021 by root via cibadmin on zabbix-ha-srv02

3 nodes configured
2 resource instances configured

OFFLINE: [ zabbix-ha-srv01 zabbix-ha-srv02 zabbix-ha-srv03 ]

Full list of resources:

 Resource Group: zabbix_server_cluster
     virtual_ip_server  (ocf::heartbeat:IPaddr2):       Stopped
     ZabbixServer       (systemd:zabbix-server):        Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@zabbix-ha-srv01 ~]# 

[root@zabbix-ha-srv01 ~]# netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      6507/rpcbind        
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      6803/sshd           
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      6972/master         
tcp        0      0 0.0.0.0:10051           0.0.0.0:*               LISTEN      19870/zabbix_server 
tcp6       0      0 :::111                  :::*                    LISTEN      6507/rpcbind        
tcp6       0      0 :::2224                 :::*                    LISTEN      6810/ruby           
tcp6       0      0 :::22                   :::*                    LISTEN      6803/sshd           
tcp6       0      0 ::1:25                  :::*                    LISTEN      6972/master         
tcp6       0      0 :::10051                :::*                    LISTEN      19870/zabbix_server 
[root@zabbix-ha-srv01 ~]# systemctl status zabbix-server
● zabbix-server.service - Cluster Controlled zabbix-server
   Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/zabbix-server.service.d
           └─50-pacemaker.conf
   Active: active (running) since Sat 2021-09-04 09:43:30 CST; 1min 6s ago
  Process: 19868 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
 Main PID: 19870 (zabbix_server)
   CGroup: /system.slice/zabbix-server.service
           ├─19870 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
           ├─19872 /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.077410 sec, idle 60 sec]
           ├─19873 /usr/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.009373 sec during 5.009490 sec]
           ├─19874 /usr/sbin/zabbix_server: alerter #1 started
           ├─19875 /usr/sbin/zabbix_server: alerter #2 started
           ├─19876 /usr/sbin/zabbix_server: alerter #3 started
           ├─19877 /usr/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 0 values, idle 5.005414 sec during 5.005...
           ├─19878 /usr/sbin/zabbix_server: preprocessing worker #1 started
           ├─19879 /usr/sbin/zabbix_server: preprocessing worker #2 started
           ├─19880 /usr/sbin/zabbix_server: preprocessing worker #3 started
           ├─19881 /usr/sbin/zabbix_server: lld manager #1 [processed 0 LLD rules, idle 5.955644sec during 5.955738 sec]
           ├─19882 /usr/sbin/zabbix_server: lld worker #1 started
           ├─19883 /usr/sbin/zabbix_server: lld worker #2 started
           ├─19884 /usr/sbin/zabbix_server: housekeeper [startup idle for 30 minutes]
           ├─19885 /usr/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.001238 sec, idle 59 sec]
           ├─19886 /usr/sbin/zabbix_server: http poller #1 [got 0 values in 0.000808 sec, idle 5 sec]
           ├─19887 /usr/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.001302 sec, idle 60 sec]
           ├─19888 /usr/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000034 sec, idle 1 sec]
           ├─19889 /usr/sbin/zabbix_server: history syncer #2 [processed 2 values, 2 triggers in 0.002515 sec, idle 1 sec]
           ├─19890 /usr/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000026 sec, idle 1 sec]
           ├─19891 /usr/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000041 sec, idle 1 sec]
           ├─19892 /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.001882 sec, idle 3 sec]
           ├─19893 /usr/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000045 sec, idle 5 sec]
           ├─19894 /usr/sbin/zabbix_server: self-monitoring [processed data in 0.000039 sec, idle 1 sec]
           ├─19895 /usr/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000750 sec, idle 5 sec]
           ├─19896 /usr/sbin/zabbix_server: poller #1 [got 1 values in 0.000175 sec, idle 1 sec]
           ├─19897 /usr/sbin/zabbix_server: poller #2 [got 0 values in 0.000034 sec, idle 1 sec]
           ├─19898 /usr/sbin/zabbix_server: poller #3 [got 1 values in 0.000150 sec, idle 1 sec]
           ├─19899 /usr/sbin/zabbix_server: poller #4 [got 0 values in 0.000032 sec, idle 1 sec]
           ├─19900 /usr/sbin/zabbix_server: poller #5 [got 0 values in 0.000024 sec, idle 1 sec]
           ├─19901 /usr/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000025 sec, idle 5 sec]
           ├─19902 /usr/sbin/zabbix_server: trapper #1 [processed data in 0.000000 sec, waiting for connection]
           ├─19903 /usr/sbin/zabbix_server: trapper #2 [processed data in 0.000000 sec, waiting for connection]
           ├─19904 /usr/sbin/zabbix_server: trapper #3 [processed data in 0.000000 sec, waiting for connection]
           ├─19905 /usr/sbin/zabbix_server: trapper #4 [processed data in 0.000000 sec, waiting for connection]
           ├─19906 /usr/sbin/zabbix_server: trapper #5 [processed data in 0.000000 sec, waiting for connection]
           ├─19907 /usr/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000024 sec, idle 5 sec]
           └─19908 /usr/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.001344 sec, idle 1 sec]

Sep 04 09:43:30 zabbix-ha-srv01 systemd[1]: Starting Cluster Controlled zabbix-server...
Sep 04 09:43:30 zabbix-ha-srv01 systemd[1]: Started Cluster Controlled zabbix-server.
[root@zabbix-ha-srv01 ~]# 

6、部署 Frontend 集群

6.1、部署 High Availability 组件

yum groupinstall 'High Availability' -y
# 或者
yum groupinstall ha –y

[root@zabbix-ha-fe01 ~]# systemctl enable pcsd corosync pacemaker --now
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
[root@zabbix-ha-fe01 ~]# 

[root@zabbix-ha-fe01 ~]# echo <CLUSTER_PASSWORD> | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@zabbix-ha-fe01 ~]#

6.2、部署Frontend集群

[root@zabbix-ha-fe01 yum.repos.d]# rpm -Uvh https://repo.zabbix.com/zabbix/5.0/rhel/7/x86_64/zabbix-release-5.0-1.el7.noarch.rpm
Retrieving https://repo.zabbix.com/zabbix/5.0/rhel/7/x86_64/zabbix-release-5.0-1.el7.noarch.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:zabbix-release-5.0-1.el7         ################################# [100%]
[root@zabbix-ha-fe01 yum.repos.d]# yum clean all
Loaded plugins: fastestmirror
Cleaning repos: base extras updates zabbix zabbix-non-supported
Cleaning up list of fastest mirrors
Other repos take up 3.5 M of disk space (use --verbose for details)

[root@zabbix-ha-fe01 ~]# yum install centos-release-scl
Loaded plugins: fastestmirror
Determining fastest mirrors
 * base: mirrors.ustc.edu.cn
 * extras: mirrors.163.com
 * updates: mirrors.ustc.edu.cn
......
Installed:
  centos-release-scl.noarch 0:2-3.el7.centos                                                                                         
Dependency Installed:
  centos-release-scl-rh.noarch 0:2-3.el7.centos                                                                                      
Complete!
[root@zabbix-ha-fe01 ~]#

# 开启frontend的yum源
[root@zabbix-ha-fe01 yum.repos.d]# vi /etc/yum.repos.d/zabbix.repo
[root@zabbix-ha-fe01 yum.repos.d]# cat /etc/yum.repos.d/zabbix.repo 
[zabbix]
......
[zabbix-frontend]
name=Zabbix Official Repository frontend - $basearch
baseurl=http://repo.zabbix.com/zabbix/5.0/rhel/7/$basearch/frontend
# 修改为1
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ZABBIX-A14FE591
......
[root@zabbix-ha-fe01 yum.repos.d]# 

# 安装过程中如果报错、说明没有安装php
Error: Package: zabbix-web-deps-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72-php-ldap
Error: Package: zabbix-web-deps-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72-php-bcmath
Error: Package: zabbix-web-mysql-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72-php-mysqlnd
Error: Package: zabbix-web-deps-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72-php-fpm
Error: Package: zabbix-web-deps-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72
Error: Package: zabbix-web-deps-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72-php-xml
Error: Package: zabbix-web-deps-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72-php-mbstring
Error: Package: zabbix-web-deps-scl-5.0.15-1.el7.noarch (zabbix-frontend)
           Requires: rh-php72-php-gd
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

# PHP没有安装、我们需要执行下面的命令进行安装
[root@zabbix-ha-fe01 ~]# yum install zabbix-web-mysql-scl zabbix-apache-conf-scl
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.ustc.edu.cn
 * centos-sclo-rh: mirrors.ustc.edu.cn
 * centos-sclo-sclo: mirrors.ustc.edu.cn
 * extras: mirrors.163.com
 * updates: mirrors.ustc.edu.cn
centos-sclo-rh                                                                                                | 3.0 kB  00:00:00     
centos-sclo-sclo                                                                                              | 3.0 kB  00:00:00     
zabbix                                                                                                        | 2.9 kB  00:00:00     
zabbix-frontend                                                                                               | 2.9 kB  00:00:00     
zabbix-non-supported                                                                                          | 2.9 kB  00:00:00     
(1/3): centos-sclo-sclo/x86_64/primary_db                                                                     | 300 kB  00:00:00     
(2/3): centos-sclo-rh/x86_64/primary_db                                                                       | 3.2 MB  00:00:00     
(3/3): zabbix-frontend/x86_64/primary_db                                                                      |  36 kB  00:00:00     
Resolving Dependencies
--> Running transaction check
Installed:
  zabbix-apache-conf-scl.noarch 0:5.0.15-1.el7                       zabbix-web-mysql-scl.noarch 0:5.0.15-1.el7                      

Dependency Installed:
  apr.x86_64 0:1.4.8-7.el7                    apr-util.x86_64 0:1.5.2-6.el7              dejavu-fonts-common.noarch 0:2.33-6.el7     
  dejavu-sans-fonts.noarch 0:2.33-6.el7       httpd.x86_64 0:2.4.6-97.el7.centos         httpd-tools.x86_64 0:2.4.6-97.el7.centos    
  libX11.x86_64 0:1.6.7-4.el7_9               libX11-common.noarch 0:1.6.7-4.el7_9       libXau.x86_64 0:1.0.8-2.1.el7               
  libXpm.x86_64 0:3.5.12-1.el7                libxcb.x86_64 0:1.13-1.el7                 mailcap.noarch 0:2.1.41-2.el7               
  rh-php72.x86_64 0:1-2.el7                   rh-php72-php-bcmath.x86_64 0:7.2.24-1.el7  rh-php72-php-cli.x86_64 0:7.2.24-1.el7      
  rh-php72-php-common.x86_64 0:7.2.24-1.el7   rh-php72-php-fpm.x86_64 0:7.2.24-1.el7     rh-php72-php-gd.x86_64 0:7.2.24-1.el7       
  rh-php72-php-json.x86_64 0:7.2.24-1.el7     rh-php72-php-ldap.x86_64 0:7.2.24-1.el7    rh-php72-php-mbstring.x86_64 0:7.2.24-1.el7 
  rh-php72-php-mysqlnd.x86_64 0:7.2.24-1.el7  rh-php72-php-pdo.x86_64 0:7.2.24-1.el7     rh-php72-php-pear.noarch 1:1.10.5-1.el7     
  rh-php72-php-process.x86_64 0:7.2.24-1.el7  rh-php72-php-xml.x86_64 0:7.2.24-1.el7     rh-php72-php-zip.x86_64 0:7.2.24-1.el7      
  rh-php72-runtime.x86_64 0:1-2.el7           scl-utils.x86_64 0:20130529-19.el7         zabbix-web.noarch 0:5.0.15-1.el7            
  zabbix-web-deps-scl.noarch 0:5.0.15-1.el7  

Complete!
[root@zabbix-ha-fe01 ~]#


# 模板文件位置:cat /usr/share/zabbix/conf/zabbix.conf.php.example
[root@zabbix-ha-fe01 ~]# cat /etc/zabbix/web/zabbix.conf.php
<?php
// Zabbix GUI configuration file.

# 需要修改的内容
$DB['TYPE']                             = 'MYSQL';
$DB['SERVER']                   = '172.16.200.89';
$DB['PORT']                             = '0';
$DB['DATABASE']                 = 'zabbix';
$DB['USER']                             = 'zabbix';
$DB['PASSWORD']                 = '<CLUSTER_PASSWORD>';

// Schema name. Used for PostgreSQL.
$DB['SCHEMA']                   = '';

// Used for TLS connection.
$DB['ENCRYPTION']               = false;
$DB['KEY_FILE']                 = '';
$DB['CERT_FILE']                = '';
$DB['CA_FILE']                  = '';
$DB['VERIFY_HOST']              = false;
$DB['CIPHER_LIST']              = '';

// Use IEEE754 compatible value range for 64-bit Numeric (float) history values.
// This option is enabled by default for new Zabbix installations.
// For upgraded installations, please read database upgrade notes before enabling this option.
$DB['DOUBLE_IEEE754']   = true;

# 需要修改的内容
$ZBX_SERVER                             = '172.16.200.87';
$ZBX_SERVER_PORT                = '10051';
$ZBX_SERVER_NAME                = 'ZABBIX-HA-APP';

$IMAGE_FORMAT_DEFAULT   = IMAGE_FORMAT_PNG;

// Uncomment this block only if you are using Elasticsearch.
// Elasticsearch url (can be string if same url is used for all types).
//$HISTORY['url'] = [
//      'uint' => 'http://localhost:9200',
//      'text' => 'http://localhost:9200'
//];
// Value types stored in Elasticsearch.
//$HISTORY['types'] = ['uint', 'text'];

// Used for SAML authentication.
// Uncomment to override the default paths to SP private key, SP and IdP X.509 certificates, and to set extra settings.
//$SSO['SP_KEY']                        = 'conf/certs/sp.key';
//$SSO['SP_CERT']                       = 'conf/certs/sp.crt';
//$SSO['IDP_CERT']              = 'conf/certs/idp.crt';
//$SSO['SETTINGS']              = [];


[root@zabbix-ha-fe01 ~]# cat /etc/httpd/conf.d/serverstatus.conf
Listen 127.0.0.1:8080
<VirtualHost localhost:8080>
<Location /server-status>
RewriteEngine Off
SetHandler server-status
Allow from 127.0.0.1
Order deny,allow
Deny from all
</Location>
</VirtualHost>
[root@zabbix-ha-fe01 ~]# 


## set apache to listen only on VIP
vi /etc/httpd/conf/httpd.conf +/^Listen 80
## change to:
...
Listen 172.16.200.88:80
...
## Or...
sed -ir 's/^Listen 80/Listen 172.16.200.88:80/' /etc/httpd/conf/httpd.conf


# 继续执行

[root@zabbix-ha-fe01 ~]# pcs cluster auth zabbix-ha-fe01 zabbix-ha-fe02 zabbix-ha-fe03
Username: hacluster
Password: 
zabbix-ha-fe01: Authorized
zabbix-ha-fe02: Authorized
zabbix-ha-fe03: Authorized
[root@zabbix-ha-fe01 ~]# 


## Create zabbix_frontend_cluster:
pcs cluster setup --name zabbix_fe_cluster \
zabbix-ha-fe01 zabbix-ha-fe02 zabbix-ha-fe03 --force --start
[root@zabbix-ha-fe01 ~]# pcs cluster setup --name zabbix_fe_cluster \
> zabbix-ha-fe01 zabbix-ha-fe02 zabbix-ha-fe03 --force --start
Destroying cluster on nodes: zabbix-ha-fe01, zabbix-ha-fe02, zabbix-ha-fe03...
zabbix-ha-fe02: Stopping Cluster (pacemaker)...
zabbix-ha-fe01: Stopping Cluster (pacemaker)...
zabbix-ha-fe03: Stopping Cluster (pacemaker)...
zabbix-ha-fe03: Successfully destroyed cluster
zabbix-ha-fe02: Successfully destroyed cluster
zabbix-ha-fe01: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'zabbix-ha-fe01', 'zabbix-ha-fe02', 'zabbix-ha-fe03'
zabbix-ha-fe01: successful distribution of the file 'pacemaker_remote authkey'
zabbix-ha-fe02: successful distribution of the file 'pacemaker_remote authkey'
zabbix-ha-fe03: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
zabbix-ha-fe01: Succeeded
zabbix-ha-fe02: Succeeded
zabbix-ha-fe03: Succeeded

Starting cluster on nodes: zabbix-ha-fe01, zabbix-ha-fe02, zabbix-ha-fe03...
zabbix-ha-fe01: Starting Cluster (corosync)...
zabbix-ha-fe02: Starting Cluster (corosync)...
zabbix-ha-fe03: Starting Cluster (corosync)...
zabbix-ha-fe02: Starting Cluster (pacemaker)...
zabbix-ha-fe03: Starting Cluster (pacemaker)...
zabbix-ha-fe01: Starting Cluster (pacemaker)...

Synchronizing pcsd certificates on nodes zabbix-ha-fe01, zabbix-ha-fe02, zabbix-ha-fe03...
zabbix-ha-fe01: Success
zabbix-ha-fe02: Success
zabbix-ha-fe03: Success
Restarting pcsd on the nodes in order to reload the certificates...
zabbix-ha-fe01: Success
zabbix-ha-fe02: Success
zabbix-ha-fe03: Success
[root@zabbix-ha-fe01 ~]# 


[root@zabbix-ha-fe01 ~]# pcs cluster stop --all && pcs cluster start --all
zabbix-ha-fe02: Stopping Cluster (pacemaker)...
zabbix-ha-fe01: Stopping Cluster (pacemaker)...
zabbix-ha-fe03: Stopping Cluster (pacemaker)...
zabbix-ha-fe02: Stopping Cluster (corosync)...
zabbix-ha-fe01: Stopping Cluster (corosync)...
zabbix-ha-fe03: Stopping Cluster (corosync)...
zabbix-ha-fe01: Starting Cluster (corosync)...
zabbix-ha-fe02: Starting Cluster (corosync)...
zabbix-ha-fe03: Starting Cluster (corosync)...
zabbix-ha-fe02: Starting Cluster (pacemaker)...
zabbix-ha-fe01: Starting Cluster (pacemaker)...
zabbix-ha-fe03: Starting Cluster (pacemaker)...
[root@zabbix-ha-fe01 ~]# 

[root@zabbix-ha-fe01 ~]# pcs cluster enable --all
zabbix-ha-fe01: Cluster Enabled
zabbix-ha-fe02: Cluster Enabled
zabbix-ha-fe03: Cluster Enabled
[root@zabbix-ha-fe01 ~]#

[root@zabbix-ha-fe01 ~]# pcs property set stonith-enabled=false
[root@zabbix-ha-fe01 ~]# pcs resource create virtual_ip_fe ocf:heartbeat:IPaddr2 ip=172.16.200.88 \
> op monitor interval=5s --group zabbix_fe_cluster
[root@zabbix-ha-fe01 ~]# pcs resource create zabbix_fe ocf:heartbeat:apache \
> configfile=/etc/httpd/conf/httpd.conf \
> statusurl="http://localhost:8080/server-status" op \
> monitor interval=30s --group zabbix_fe_cluster
[root@zabbix-ha-fe01 ~]# 

[root@zabbix-ha-fe01 ~]# pcs resource show
 Resource Group: zabbix_fe_cluster
     virtual_ip_fe      (ocf::heartbeat:IPaddr2):       Started zabbix-ha-fe01
     zabbix_fe  (ocf::heartbeat:apache):        Started zabbix-ha-fe01
[root@zabbix-ha-fe01 ~]# 

[root@zabbix-ha-fe01 ~]# pcs constraint colocation add virtual_ip_fe zabbix_fe INFINITY
[root@zabbix-ha-fe01 ~]# pcs constraint order virtual_ip_fe then zabbix_fe
Adding virtual_ip_fe zabbix_fe (kind: Mandatory) (Options: first-action=start then-action=start)
[root@zabbix-ha-fe01 ~]# pcs resource defaults resource-stickiness=100
Warning: Defaults do not apply to resources which override them with their own defined values
[root@zabbix-ha-fe01 ~]# pcs resource op add zabbix_fe start interval=0s timeout=60s
Error: operation start with interval 0s already specified for zabbix_fe:
start interval=0s timeout=40s (zabbix_fe-start-interval-0s)
[root@zabbix-ha-fe01 ~]# pcs resource op add zabbix_fe stop interval=0s timeout=120s
Error: operation stop with interval 0s already specified for zabbix_fe:
stop interval=0s timeout=60s (zabbix_fe-stop-interval-0s)


[root@zabbix-ha-fe01 web]# cat /etc/opt/rh/rh-php72/php-fpm.d/zabbix.conf
[zabbix]
user = apache
group = apache

listen = /var/opt/rh/rh-php72/run/php-fpm/zabbix.sock
listen.acl_users = apache
listen.allowed_clients = 127.0.0.1

pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35

php_value[session.save_handler] = files
php_value[session.save_path]    = /var/opt/rh/rh-php72/lib/php/session/

php_value[max_execution_time] = 300
php_value[memory_limit] = 128M
php_value[post_max_size] = 16M
php_value[upload_max_filesize] = 2M
php_value[max_input_time] = 300
php_value[max_input_vars] = 10000
php_value[date.timezone] = Asia/Shanghai
[root@zabbix-ha-fe01 web]# systemctl restart rh-php72-php-fpm
[root@zabbix-ha-fe01 web]# 

6.3、集群验证

[root@zabbix-ha-fe01 zabbix]# pcs cluster status --allCluster Status: Stack: corosync Current DC: zabbix-ha-fe02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum Last updated: Mon Sep  6 21:55:51 2021 Last change: Mon Sep  6 19:15:59 2021 by root via cibadmin on zabbix-ha-fe01 3 nodes configured 2 resource instances configuredPCSD Status:  zabbix-ha-fe03: Online  zabbix-ha-fe02: Online  zabbix-ha-fe01: Online[root@zabbix-ha-fe01 zabbix]# pcs resource show Resource Group: zabbix_fe_cluster     virtual_ip_fe      (ocf::heartbeat:IPaddr2):       Started zabbix-ha-fe02     zabbix_fe  (ocf::heartbeat:apache):        Started zabbix-ha-fe02[root@zabbix-ha-fe01 zabbix]# netstat -nltpActive Internet connections (only servers)Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      6801/php-fpm: maste tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      6509/rpcbind        tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      6803/sshd           tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      6972/master         tcp6       0      0 :::111                  :::*                    LISTEN      6509/rpcbind        tcp6       0      0 :::2224                 :::*                    LISTEN      6839/ruby           tcp6       0      0 :::22                   :::*                    LISTEN      6803/sshd           tcp6       0      0 ::1:25                  :::*                    LISTEN      6972/master         [root@zabbix-ha-fe01 zabbix]# 

7、部署客户端

[root@zabbix-ha-srv01 ~]# grep -v '#' /etc/zabbix/zabbix_agentd.confPidFile=/var/run/zabbix/zabbix_agentd.pidLogFile=/var/log/zabbix/zabbix_agentd.logLogFileSize=0Server=172.16.200.87ServerActive=172.16.200.87Hostname=zabbix-ha-srv01Include=/etc/zabbix/zabbix_agentd.d/*.confUnsafeUserParameters=1[root@zabbix-ha-srv01 ~]# [root@zabbix-ha-srv02 zabbix]# systemctl enable zabbix-agentCreated symlink from /etc/systemd/system/multi-user.target.wants/zabbix-agent.service to /usr/lib/systemd/system/zabbix-agent.service.[root@zabbix-ha-srv02 zabbix]# systemctl start zabbix-agent [root@zabbix-ha-srv02 zabbix]# netstat -nltpActive Internet connections (only servers)Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      7130/sshd           tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      7957/master         tcp        0      0 0.0.0.0:10050           0.0.0.0:*               LISTEN      9944/zabbix_agentd  tcp6       0      0 :::2224                 :::*                    LISTEN      26411/ruby          tcp6       0      0 :::22                   :::*                    LISTEN      7130/sshd           tcp6       0      0 ::1:25                  :::*                    LISTEN      7957/master         tcp6       0      0 :::10050                :::*                    LISTEN      9944/zabbix_agentd  [root@zabbix-ha-srv02 zabbix]# 

我们打开浏览器输入 http://172.16.200.88/zabbix/即可访问 zabbix 系统了、剩下的内容就是日常操作了;这里就不再过多介绍了、不懂的小伙伴请自行百度。

image-20210906212526741

好了、到这里我们就把zabbix官方推荐的zabbix 333 ha集群部署完了;其实这里面还有功能没有实现、比如一键迁移 resource ,zabbix management console等等;这些功能都是需要自己去写脚本来实现的、感兴趣的小伙伴可以自动百度、这里就不再详细描述了。

推荐文章

1条评论

评论已关闭。