Bookmark this page

Describing Pacemaker Architecture and Components

Objectives

After completing this section, you should be able to describe the components of a Pacemaker cluster (among others fencing, resource constraints, and status).

Selecting a High Availability Cluster Configuration

In this section, you select the required cluster type for your environment.

Introducing High Availability Cluster Configurations

A cluster is a set of computers that work together on a single task. Which task is performed, and how that task is performed, differ from cluster to cluster.

The goal of a high availability cluster, also known as an HA cluster, is to keep running services as available as possible by eliminating bottlenecks and single points of failure. This goal is primarily achieved with the nodes of the high availability cluster monitoring each other for failures, and migrating services to a node that is still considered "healthy" when a service or node fails. This strategy is different from trying to keep the uptime for a single machine as high as possible. The uptime of the server that runs the service is not important for consumers, but the service availability is important.

High availability cluster configurations can be grouped into two subsets:

  • In Active-Active high availability clusters, a service runs on multiple nodes, thus leading to shorter failover times. The main goal of this cluster type is load balancing, and thus they can scale to many instances that handle higher loads. However, they require load balancer devices. Active-active HA clusters can be as small as only two nodes. If one node fails, then the cluster redirects all the workload to the other nodes. When the failing node recovers, the cluster again distributes the workload between the available nodes. In the context of SAP HANA, this setup is known as an Active-Active (Read-enabled) configuration, where the secondary node can accept "Read Only" queries. This setup is covered in more detail later in this course.

Figure 2.2: Active-active high availability cluster configuration
  • In Active-passive high availability clusters, a service runs on only one node at a time. If one node fails, only then the cluster starts the service on another node to replace it. This configuration requires fencing to restrict the access of an unresponsive node to the cluster resources, to avoid corruption on the cluster. Fencing is covered in more detail later in this course.

Figure 2.3: Active-passive high availability cluster configuration

High availability clusters often support mission-critical services in the enterprise. Examples of software that implements high availability clustering are Pacemaker and the Red Hat High Availability Add-On.

System administrators must decide which cluster configuration best fits their requirements. For example, many common applications already have service redundancy built in, because of specific requirements that are best handled in this way. Examples include LDAP with primary/secondary, or databases with multiprimary partitioning and scaling. Thus, these applications are not appropriate for using the Red Hat High Availability Add-On to provide service redundancy. However, you can still use these services as resources to set up resource groups for other services.

When to Use the High Availability Add-On for Clustering

When planning a high availability cluster, one important question applies: will the availability of the service increase by putting it on an HA cluster?

To answer the question, it is important to know the capabilities of the service, and how the clients of the service can be configured:

Depending on the solution, services such as DNS and LDAP with built-in failover or load balancing might not benefit from placing on an HA cluster. For example, the DNS or LDAP services can use multiple servers with a primary/secondary or multiprimary relation. The services can be configured for data replication between primary and secondary servers. Clients of DNS and LDAP can use multiple servers. Less failover delay is involved in a primary/secondary or multiprimary configuration, and so the availability of the service does not increase when placed on an HA cluster. However, within an OpenStack platform solution, it might be advantageous to put resources such as RabbitMQ and Galera in an HA cluster. Further information about high availability in Red Hat OpenStack Platform is outside the scope of this course.

Services without built-in failover or load balancing can benefit from a high availability cluster configuration. Examples include services such as NFS and Samba.

Not every availability problem can be solved with high availability clustering. Typically, problems that involve application crashes or network failures are not solved by a high availability cluster:

If a bug causes an application to crash when certain input is read, then it will still crash if it is part of a high availability cluster. In this case, the cluster fails over the service to a different node, but if the same input is read again, then the application fails again.

High availability clusters do not provide end-to-end redundancy. Although the cluster itself might be fully operational, if a network error in the infrastructure causes the cluster to be unreachable, then the clients cannot reach the service, even though the service runs on a high availability cluster. Therefore, it is important to consider the cluster's architecture, and design it to avoid single points of failure throughout the deployment. Trade-offs apply here, and cluster architects must consider what level of risk they are prepared to tolerate with each component of the cluster.

The cluster might fail over the service if the primary system becomes unresponsive or slow for too long due to resource (any combination of CPU, RAM, or I/O) starvation that any application or database query caused. When the system returns to normal, the cluster might still fail over the service if the same query occurs again, and causing a similar resource situation. In this case also, the cluster itself still maintains high availability with the available nodes, but it cannot control the application or database behavior or restrict their resources to avoid such a failover.

Components of a High Availability Cluster

A high availability cluster uses various concepts and techniques for service integrity and availability.

Resources and Resource Groups In clustering terminology, the basic unit of work is called a resource. A single IP address, file system, or database are all considered to be resources. Typically, relationships between these resources are defined to create user-facing services. Commonly, these relationships are defined by combining a set of resources into a group. A group specifies that all resources in the group must run together on the same node, and establishes a fixed (linear) start and stop order.

For example, for a cluster to provide a web server service, you must set up a web server daemon, the data that the server must share, and the IP address that is assigned to the service. All these resources must be available on the same cluster node, and thus you should combine them into a resource group.

Describing Failover High availability clusters try to keep services available by migrating them to another node when the cluster notices that the original node that ran the service is not responding; this situation is called a failover.

Describing Fencing Fencing is a mechanism to ensure that a malfunctioning cluster node cannot cause corruption on the cluster, because the node could still have access to the cluster resources. Fencing also enables the cluster to recover its resources safely on another node. This approach is necessary, because you cannot assume that an unreachable node is off. Fencing is often accomplished by powering off the node, because a dead node cannot do anything. In other cases, a combination of operations is used to cut off the node from the network (to stop the cluster from allocating resources on that node) or from storage (to stop the node from writing to shared storage).

Shared Storage Most high availability clusters also need a form of shared storage, which can be accessed from multiple nodes. Shared storage provides the same application data to multiple nodes in the cluster. The data can be accessed either sequentially or simultaneously by an application that runs on the cluster. A high availability cluster must ensure data integrity on the shared storage. Fencing improves data integrity.

Describing Quorum Quorum describes a required voting system to maintain cluster integrity. Every cluster member has an assigned number of votes; by default, one vote. Depending on the cluster configuration, the cluster gains quorum when at least half of the votes are present. Cluster members that fail to communicate with other cluster members and cannot send their votes are fenced by the majority of the cluster members that operate as expected. A cluster normally requires quorum to operate. If a cluster loses or cannot establish quorum, then by default no resources or resource groups are started, and running resource groups are stopped to ensure data integrity.

Describing the Architecture of High Availability Clustering

In this section, you identify the components of a Red Hat Enterprise Linux high availability cluster.

Describing the Hardware Configuration of an HA Cluster

Figure 2.4: A typical cluster infrastructure

The following components are in this infrastructure:

Cluster Nodes They are the machines that run the cluster software and the services.

Public Network The clients use this network for communicating with the services that run on the cluster. Services normally have a floating IP address that the cluster assigns to whichever node is currently running the corresponding service.

Private Network The cluster uses this network for its inter-node communication.

Networked Power Switch It is necessary to control power remotely to the cluster nodes, for example through a networked power switch. It is one of the possible ways to implement power fencing, as described later in this course. Use remote management cards, such as Integrated Lights-out (iLO) or Dell Remote Access Card (DRAC), for this purpose.

Fibre Channel Switch In the previous figure, which shows a typical cluster infrastructure, the same shared storage connects to all nodes at the same time. Use of Fibre Channel is typical for this purpose. An alternative method would be a separate Ethernet network with iSCSI or FCoE.

In the figure, only the components on the left side of the cluster nodes are publicly accessible. Everything on the right side of the cluster nodes is strictly private, and cannot be reached from the public network.

Describing the Software Configuration of an HA Cluster

Cluster nodes require multiple software components to provide cluster services with the Red Hat High Availability Add-On. An overview of these components and their functions follows:

corosync This is the framework used by Pacemaker for handling communication between the cluster nodes. corosync is also Pacemaker's source of membership and quorum data.

Pacemaker This component is responsible for all cluster-related activities, such as monitoring cluster membership, managing the services and resources, and fencing cluster members. The pacemaker RPM package contains the following important facilities:

  • Cluster Information Base (CIB): The CIB contains configuration and status information about the cluster and the cluster resources in XML format. Pacemaker elects a cluster node in the cluster to act as a designated coordinator (DC). The node stores cluster and resource status and cluster configuration to be synchronized to all other active cluster nodes. The scheduler (pacemaker-schedulerd) uses the CIB contents to compute the ideal state of the cluster and how to reach it.

  • Cluster Resource Management Daemon (CRMd): The Cluster Resource Management Daemon coordinates and sends the resource start, stop, and status query actions to the Local Resource Management Daemon (LRMd) that runs on every cluster node. The LRMd passes the received actions from the CRMd to the resource agents.

  • Shoot the Other Node in the Head (STONITH): STONITH is responsible for processing fence requests, and forwards the requested action to the fence devices that are configured in the CIB.

pcs The pcs RPM package contains two cluster configuration tools:

  • The pcs command provides a command-line interface to create, configure, and control every aspect of a Pacemaker or a Corosync cluster.

  • The pcsd service provides the cluster configuration synchronization and a web front end to create and configure a Pacemaker or a Corosync cluster.

HA Cluster Requirements

Before deploying a high availability cluster with the Red Hat High Availability Add-On, it is important to understand the requirements of the cluster configuration and whether it can be supported. Red Hat can assist you to evaluate the performance of an existing cluster or to build a new cluster. The process requires transmitting relevant data about the cluster, such as the cluster configuration, network architecture, and fencing configuration to Red Hat Support. The support representative might request additional data if required. Red Hat Support then decides if it supports the cluster configuration.

System administrators must consider some important requirements and recommendations before deploying a high availability cluster based on the Red Hat High Availability Add-On.

Number of Nodes Red Hat supports clusters with up to 32 nodes for Red Hat Enterprise Linux 8.1 and later. If using RHEL 8.0 or earlier versions, or the Resilient Storage Add-On, then up to 16 nodes are supported.

Clusters that consist of only one or two nodes are special cases.

RHEL 8.2 and later versions support single-node clusters. However, fencing is not available, and thus single-node clusters do not support file systems that require it, such as DLM and GFS2.

RHEL 8.1 and earlier versions support only clusters with two or more nodes. Red Hat supports two-node clusters in most cases, but recommends to submit the cluster design and to consult Red Hat Support before deploying a two-node cluster in production.

Single Site, Multisite, and Stretch Clusters Red Hat fully supports single-site clusters. In this cluster setup, all cluster members are in the same physical location, and are connected by a local area network. Multisite clusters consist of two clusters, one active and one for disaster recovery. Multisite clusters require special consideration in their design. Red Hat Enterprise Linux 8 High Availability Add-On supports multisite clusters. Stretch clusters, which are also known as geo clusters, are stretched out over multiple physical locations. Red Hat does not treat these clusters as a special class of deployment with separate rules. Thus, standard policies, requirements, and limitations that apply to RHEL high availability clusters bind them. However, stretch clusters have inherent complexities, and Red Hat Support highly recommends to submit the cluster design before deploying them.

Fencing Hardware Fencing is a mechanism that ensures that a malfunctioning cluster node cannot cause corruption. Thus, it is possible to safely recover its resources elsewhere in the cluster. Recovery can be done by power-cycling a node or disabling communication to the storage level. Fencing is required for all nodes in the cluster, via power fencing, or storage fencing, or a combination of both. Before deploying the high availability infrastructure, ensure that you use supported hardware. If the cluster uses integrated fencing devices such as iLO or DRAC, then systems that act as cluster nodes must power off immediately when a shutdown signal is received, instead of initiating a clean shutdown.

Virtualized and Cloud Environments Red Hat supports using virtual machines as cluster members on the most popular virtual environments and cloud providers.

When operating as cluster nodes, the virtual machines that run on a host are members of the cluster and run resources. Special fencing agents are available so that these cluster nodes can fence each other, whether running on a RHEL 8 libvirt-based system, Red Hat Virtualization, or other VM hypervisor hosts. In these cases, the physical host is a single point of failure for all the virtual machine-based cluster nodes that run on that host. If the physical host crashes, then it crashes all the cluster node VMs that are running on it.

Networking In RHEL 8, the corosync component uses the kronosnet unicast transport protocol for the default network communication on the private network. For the public network, for the floating IP addresses, gratuitous ARP is used, which the network switch must support. On the public network, it is necessary to open the network ports that are required by the services that run on the cluster. For correct operation, you must also open the following ports on the private network:

  • pcsd port 2224/TCP

  • corosync ports 5404-5412/UDP

Optional cluster components require additional ports. For example, the GFS2 file system uses port 21064/TCP; the quorum device uses port 5403/TCP; and the Booth ticket manager uses port 9929/TCP and UDP. Some of these components are described in other chapters.

Note

Booth ticket manager is a distributed service that facilitates support of multisite clusters. For more information about configuring multisite clusters with Pacemaker by using Booth ticket manager, see the documentation at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_and_managing_high_availability_clusters/index#assembly_configuring-multisite-cluster-configuring-and-managing-high-availability-clusters.

SELinux Support The Red Hat Enterprise Linux High Availability Add-On supports using SELinux in enforcing mode when using the targeted policy on the cluster nodes.

Planning for Failures

All hardware eventually fails. A hardware lifecycle can range from weeks to years. Furthermore, almost every (complex) piece of software has bugs. Some might be unnoticeable, and others might corrupt an entire database. A major task for a system administrator is to acknowledge that these failures occur, and to plan accordingly.

When the failing piece of hardware is a simple desktop machine, the correct approach is most likely to replace the failed machine, although for a mission-critical server a more proactive approach is needed. When a machine fails, the service that runs on that machine should not fail.

A single point of failure (SPOF) is any part of a complex setup that, when it fails, can take down an entire environment. A typical high availability cluster can have many possible single points of failure. The following lists, though not exhaustive, contain common offenders:

Hardware Single Points of Failure

  • Power supply

  • Local storage

  • Network interfaces

  • Network switches

  • Fencing software

Software Single Points of Failure

  • Cluster communications

  • Shared storage connection

  • Software fencing configuration

Knowledgebase: How can Red Hat Assist Me in Assessing the Design of My RHEL High Availability or Resilient Storage Cluster? https://access.redhat.com/articles/2359891 Knowledgebase: Support Policies for RHEL High Availability Clusters https://access.redhat.com/articles/2912891

Revision: rh445-8.4-4e0c572