Bookmark this page

Chapter 6. Operation, Update, and Monitoring

Abstract

Goal Describe the administration and update process.
Objectives
  • Configure Pacemaker for SAP HANA and SAP NetWeaver.

  • Describe manual failover and acceptance tests, and check status on the HA cluster.

  • Describe the role of fencing and its involvement in the takeover process to maintain integrity and high availability.

  • Describe the prerequisites for SAP HANA, operating system, and cluster updates.

  • Harden an SAP environment for production use.

Sections
  • Configuring Pacemaker Cluster HA for SAP HANA and SAP NetWeaver (and Guided Exercise)

  • Verifying the Environment Configuration

  • Verifying the Status of the Red Hat HA Cluster (Guided Exercise)

  • Testing System Integrity with Fencing

  • Updating SAP HANA, Operating System, and the Cluster

  • Hardening an SAP Environment

Configuring Pacemaker Cluster HA for SAP HANA and SAP NetWeaver

Objectives

After completing this section, you should be able to configure Pacemaker for SAP HANA and SAP NetWeaver.

Install the Node Software

The Red Hat High Availability add-on requires installing the required set of software packages, configuring the firewall, and authenticating nodes.

Note

Red Hat Enterprise Linux 8 and Red Hat Enterprise Linux 7 cluster nodes are not compatible in a single cluster. All nodes in a Pacemaker cluster must use the same major version of Red Hat Enterprise Linux. Red Hat Enterprise Linux 8 clusters use Corosync 3.x for communication; Red Hat Enterprise Linux 7 Pacemaker clusters use Corosync 2.x.

Install Required Software on the Node to be Part of the Cluster

The pcs package provides the cluster configuration software. The pcs package requires the corosync and pacemaker packages, which are automatically installed as dependencies for an installation with Yum. The fence-agents-all package pulls in all available fencing agent packages. Administrators can also choose to install only the fence-agents-XYZ package, where XYZ is the intended fencing agent to use. The pcs and fence-agents-all packages must be installed on all the cluster nodes.

[root@node ~]# yum install pcs fence-agents-all

Configure the Firewall for Cluster Communication

You can skip this step if you are not using the Linux built-in Firewall. You must allow cluster communications through any external firewall as applicable in your environment for all the cluster nodes. The standard firewall service on a Red Hat Enterprise Linux 8 system is the firewalld service. The firewalld daemon ships with a standard high-availability service for cluster communication. To activate the high-availability firewall service on each of the cluster nodes to allow cluster communication through the firewall, execute the following commands:

[root@node ~]# firewall-cmd --permanent --add-service=high-availability
[root@node ~]# firewall-cmd --reload

Enable Pacemaker and Corosync on the Nodes

The pcsd service provides the cluster configuration synchronization and the web front end for cluster configuration. The service is required on all cluster nodes. Use the systemctl command to start and enable the pcsd service on all cluster nodes.

[root@node ~]# systemctl enable --now pcsd

The pcsd service uses the hacluster system user for cluster communication and configuration. You must set the password of the hacluster system user on all cluster nodes. Red Hat recommends to use the same password for the hacluster user on all nodes in the cluster. The following example sets the hacluster user password to redhat:

[root@node ~]# echo redhat | passwd --stdin hacluster

You must authenticate the cluster nodes in the pcsd service with the hacluster user and the password that you set up for this user. You need to run the pcs host auth command on only one node to authenticate all nodes in the cluster.

The node1.example.com and node2.example.com cluster nodes are authenticated on the node1.example.com system with the hacluster user and the corresponding password.

[root@node ~]# pcs host auth node1.example.com \
> node2.example.com
Username: hacluster
Password: redhat
node1.example.com: Authorized
node2.example.com: Authorized

For automation purposes, the -u <USERNAME> and -p <PASSWORD> options can also be used.

Configure Basic Cluster Communication

After you prepare the two nodes for the cluster setup, the pcs cluster setup command creates the cluster. This command takes as arguments the cluster name and fully qualified domain names or IP addresses of the cluster nodes. The optional --start parameter starts the cluster on all supplied cluster nodes.

[root@node ~]# pcs cluster setup mycluster --start \
> node1.example.com \
> node2.example.com

By default, a cluster node that gets rebooted does not automatically rejoin the cluster. You can use the pcs cluster enable command to enable automatic starting of the cluster service. The --all option enables automatic starting of cluster services on every cluster member.

The following command enables all cluster nodes to start the cluster service and to automatically join the cluster when executed on one of the cluster nodes.

[root@node ~]# pcs cluster enable --all

Red Hat recommends that you verify that the cluster is working as expected. The pcs cluster status command provides an overview of the current cluster status.

[root@node ~]# pcs cluster status
Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: node2.example.com (version 2.0.4-6.el8-2deceaa3ae) - partition with quorum
   * Last updated: Fri Mar  5 12:23:08 2021
   * Last change:  Fri Mar  5 12:22:57 2021 by root via cibadmin on node1.example.com
   * 2 nodes configured
   * 0 resource instances configured
 Node List:
   * Online: [ node1.example.com node2.example.com ]

PCSD Status:
  node1.example.com: Online
  node2.example.com: Online

The pcs cluster status command shows the status of all nodes if they are communicating with each other. The status indicator is the Online: [ node1.example.com node2.example.com ] statement within the Node List section. Any communication issue between nodes is also indicated in this section.

Configure Cluster Node Fencing

Fencing is a requirement for any high availability cluster. It prevents data corruption from an errant node. Fencing also isolates and restarts a cluster member if the node fails to join the cluster and the remaining cluster members still form a quorum. Depending on the hardware used, the cluster can fence a node by turning off the connection to the shared storage or by power-cycling the node.

The first step to set up fencing is to configure the physical fencing device. Different hardware devices are capable of fencing cluster nodes, for example:

  • Uninterruptible power supplies (UPS)

  • Power distribution units (PDU)

  • Blade power control devices

  • Lights-out devices

The fence devices must be added to the cluster. For physical machine fencing, each cluster node might require its own fence device. Use the pcs stonith create command. The command expects a set of parameter and value pairs that the fence agent requires to fence the cluster node. To use the fence_ipmilan fencing agent, the pcmk_host_list, username, password, and ip parameters are required. The pcmk_host_list parameter lists the corresponding host as the cluster knows it. The ip parameter expects the IP address or hostname of the fencing device.

For example:

[root@node ~]# pcs stonith create <fence_device_name> fence_ipmilan \
> pcmk_host_list=node_private_fqdn \
> ip=node_IP_BMC \
> username=username \
> password=password

The pcs stonith status command shows the status of the fence devices that are attached to the cluster. All fence_ipmilan fence devices should show Started status.

[root@node ~]# pcs stonith status
  * fence_nodea	(stonith:fence_ipmilan):	 Started node1.example.com
  * fence_nodeb	(stonith:fence_ipmilan):	 Started node2.example.com

If the status of any fence device is Stopped, then a communication problem likely exists between the fencing agent and the fencing server. Verify the settings of the fence device with the pcs stonith config fence_device command. You can update the settings with the pcs stonith update command.

Note

It is highly recommended to test the fencing even if the devices show Started state: https://access.redhat.com/solutions/18803

When the testing is complete, you can configure the SAP resources.

Note

Red Hat Enterprise Linux is shipped with many fence devices. You must verify that your intended fence method is supported for your environment: https://access.redhat.com/articles/2881341

Setting up HA for SAP HANA

When the installation and testing are complete, as described in earlier chapters, the SAP HANA system can be integrated into the pacemaker cluster. The SAP HANA System Replication needs to be already configured.

Assuming that your underlying storage and network environment as applicable is configured according to SAP guidelines, the following command starts the SAP resources into the pacemaker cluster:

[root@node ~]# pcs resource create SAPHanaTopology_<SID>_<InstanceNumber> \
> SAPHanaTopology SID=<SID> InstanceNumber=<InstanceNumber> op start \
> timeout=600 op stop timeout=300 op monitor interval=10 timeout=600 \
> clone clone-max=2 clone-node-max=1 interleave=true

You can clone a cluster resource to be active on multiple nodes. For example, you can use cloned resources to configure multiple instances of an SAPHanaTopology resource to distribute throughout a cluster, to ensure that both nodes have updated information about the SAP HANA instances. You can clone any resource, provided that the resource agent supports it. A clone consists of one resource or resource group; in this case, one resource.

[root@node ~]# pcs resource create SAPHana_<SID>_<InstanceNumber> SAPHana \
> SID=<SID> InstanceNumber=<InstanceNumber> PREFER_SITE_TAKEOVER=true \
> DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=true op start \
> timeout=3600 op stop timeout=3600 op monitor interval=61 role="Slave" \
> timeout=700 op monitor interval=59 role="Master" timeout=700 op promote \
> timeout=3600 op demote timeout=3600 promotable meta notify=true clone-max=2 \
> clone-node-max=1 interleave=true

In promotable clone resources, the promotable meta attribute is set to true. The instances can then be in one of two operating modes, called master and slave. The names of the modes do not have specific meanings, except that when an instance is started, it must come up in the slave state.

clone-max: How many copies of the resource to start. The default is the number of nodes in the cluster.

clone-node-max: How many copies of the resource can start on a single node. The default value is 1.

interleave: Changes the behavior of ordering constraints (between clones) so that copies of the first clone can start or stop as soon as the copy on the same node of the second clone starts or stops (rather than waiting until every instance of the second clone starts or stops). Allowed values are false or true. The default value is false.

After successful execution of the previous two commands, the cluster should look as follows:

[root@node ~]# pcs status
.............
  * Clone Set: SAPHanaTopology_<SID>_<InstanceNumber>-clone [SAPHanaTopology_<SID>_<InstanceNumber>]:
    * Started: [ node1.example.com node2.example.com ]
  * Clone Set: SAPHana_<SID>_<InstanceNumber>-clone [SAPHana_<SID>_<InstanceNumber>] (promotable):
    * Masters: [ node1.example.com ]
    * Slaves: [ node2.example.com ]
.............

The resulting resource should look as follows:

[root@node ~]# pcs resource config SAPHana_<SID>_<InstanceNumber>-clone
Clone: SAPHana_<SID>_<InstanceNumber>-clone
 Meta Attrs: clone-max=2 clone-node-max=1 interleave=true notify=true promotable=true
 Resource: SAPHana_<SID>_<InstanceNumber> (class=ocf provider=heartbeat type=SAPHana)
  Attributes: AUTOMATED_REGISTER=true DUPLICATE_PRIMARY_TIMEOUT=180 InstanceNumber=<InstanceNumber> PREFER_SITE_TAKEOVER=true SID=RH2
  Operations: demote interval=0s timeout=3600 (SAPHana_RH2_02-demote-interval-0s)
              methods interval=0s timeout=5 (SAPHana_RH2_02-methods-interval-0s)
              monitor interval=61 role=Slave timeout=700 (SAPHana_RH2_02-monitor-interval-61)
              monitor interval=59 role=Master timeout=700 (SAPHana_RH2_02-monitor-interval-59)
              promote interval=0s timeout=3600 (SAPHana_RH2_02-promote-interval-0s)
              start interval=0s timeout=3600 (SAPHana_RH2_02-start-interval-0s)
              stop interval=0s timeout=3600 (SAPHana_RH2_02-stop-interval-0s)

Refer to the following table for more information about some important parameters in these commands:

Attribute nameRequired?Default valuesDescription
SIDyesnullThe SAP System Identifier (SID) of the SAP HANA installation (must be identical for all nodes). Example: RH2
InstanceNumberyesnullThe instance number of the SAP HANA installation (must be identical for all nodes). Example: 02
PREFER_SITE_TAKEOVERnonullDoes the resource agent prefer to switch over to the secondary instance instead of restarting the primary locally? true: Prefer takeover to the secondary site; false: Prefer restart locally; never: Under no circumstances do a takeover to the other node.
AUTOMATED_REGISTERnofalseIf a takeover event occurred, and the DUPLICATE_PRIMARY_TIMEOUT is expired, register the former primary instance as secondary? false: No, manual intervention is needed; true: Yes, the resource agent registers the former primary as secondary.
DUPLICATE_PRIMARY_TIMEOUTno7200The needed time difference (in seconds) between two primary time stamps, if a dual-primary situation occurs. If the time difference is less than the time gap, then the cluster holds one or both instances in a WAITING status, to give the system administrator a chance to react to a takeover. After the time difference elapsed, if AUTOMATED_REGISTER is set to true, then the failed former primary is registered as secondary. After the registration to the new primary, the system replication overwrites all data on the former primary.

Create Virtual IP Address Resource

A cluster contains a virtual IP address to reach the Master instance of SAP HANA. Assuming that your internal and external network environment is properly set up to ensure that the selected IP is reachable from the client side, you can use the following example command to create a virtual IPaddr2 resource with a selected IP address of 192.168.0.15:

[root@node ~]# pcs resource create vip_<SID>_<InstanceNumber> \
> IPaddr2 ip="192.168.0.15"

The resulting resource should look as follows:

[root@node ~]# pcs resource config vip_<SID>_<InstanceNumber>
 Resource: vip_<SID>_<InstanceNumber> (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.15
  Operations: start interval=0s timeout=20s (vip_RH2_02-start-interval-0s)
              stop interval=0s timeout=20s (vip_RH2_02-stop-interval-0s)
              monitor interval=10s timeout=20s (vip_RH2_02-monitor-interval-10s)

Create Constraints

constraint - Start SAPHanaTopology before SAPHana

For correct operation, SAPHanaTopology resources must be started before the SAPHana resources are started. Also, the virtual IP address must be present on the node where the Master resource of SAPHana is running. To do so, the following two constraints must be created:

  • The symmetrical=false attribute defines that only the start of resources is of interest, and they do not need to be stopped in reverse order.

  • Both resources (SAPHana and SAPHanaTopology) have the interleave=true attribute that allows parallel starting of these resources on nodes. Regardless of ordering, it is not needed to wait for all nodes to start SAPHanaTopology, and you can start the SAPHana resource on any nodes as soon as SAPHanaTopology is running there.

Command for creating the constraint:

[root@node ~]# pcs constraint order SAPHanaTopology_<SID>_<InstanceNumber>-clone \
> then SAPHana_<SID>_<InstanceNumber>-clone symmetrical=false

The resulting constraint should look as follows:

[root@node ~]# pcs constraint
...
Ordering Constraints:
  start SAPHanaTopology_<SID>_<InstanceNumber>-clone then start SAPHana_<SID>_<InstanceNumber>-clone (kind:Mandatory) (non-symmetrical)
...
constraint: Colocate the master IPaddr2 resource with the master SAPHana resource

The following example command colocates the IPaddr2 resource with the SAPHana resource that was promoted to master.

[root@node ~]# pcs constraint colocation add vip_<SID>_<InstanceNumber> \
> with master SAPHana_<SID>_<InstanceNumber>-clone 2000

The constraint uses a score of 2000 instead of the default INFINITY. It prevents cluster from taking down the IPaddr2 resource if no master is promoted in the SAPHana resource. You can still use this address with tools such as SAP Management Console or SAP Landscape Virtualization Management that can use this address to query the status information about the SAP instance.

For more information, see Colocating Cluster Resources, https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_and_managing_high_availability_clusters/assembly_colocating-cluster-resources.adoc_configuring-and-managing-high-availability-clusters

The resulting constraint should look as follows:

[root@node ~]# pcs constraint
...
Colocation Constraints:
  vip_<SID>_<InstanceNumber> with SAPHana_<SID>_<InstanceNumber>-clone (score:2000) (rsc-role:Started) (with-rsc-role:Master)
...

Add a Secondary Virtual IP Address for an Active/Active (Read-Enabled) HANA System Replication (HSR) Setup

Starting with SAP HANA 2.0 SPS1, SAP enables Active/Active (Read-Enabled) setups for SAP HANA system replication. The secondary systems of SAP HANA system replication can be used actively for read-intensive workloads. A second virtual IP address is required to support such setups, to enable clients to access the secondary SAP HANA database. To ensure that the secondary replication site can still be accessed after a takeover, the cluster must move the virtual IP address around with the slave of the master/slave SAPHana resource.

Note

When establishing HSR for the read-enabled secondary configuration, set the operationMode to logreplay_readaccess.

Creating the resource for managing the secondary virtual IP address

[root@node ~]# pcs resource create vip2_<SID>_<InstanceNumber> \
> IPaddr2 ip="192.168.1.11"

Use the appropriate resource agent for managing the IP address based on the platform where the cluster is running.

  • Create location constraints so that the secondary virtual IP address is placed on the right cluster node at the right time:

    [root@node ~]# pcs constraint location vip2_<SID>_<InstanceNumber> \
    > rule score=INFINITY hana_<sid>_sync_state eq SOK and hana_<sid>_roles \
    > eq 4:S:master1:master:worker:master
    [root@node ~]# pcs constraint location vip2_<SID>_<InstanceNumber> \
    > rule score=2000 hana_<sid>_sync_state eq PRIM and hana_<sid>_roles eq \
    > 4:P:master1:master:worker:master
  • These location constraints ensure that the second virtual IP resource has the following behavior:

    • If both a PRIMARY node and a SECONDARY node are available, with HANA System Replication as SOK, then the second virtual IP runs on the SECONDARY node.

    • If the SECONDARY node is not available or the HANA System Replication is not SOK, then the secondary virtual IP runs on the PRIMARY node. If the SECONDARY node is available and the HANA System Replication is SOK again, then the second virtual IP moves back to the SECONDARY node.

    • If the PRIMARY node is not available or the HANA instance that runs there has a problem, then after the failover, the SECONDARY gets promoted to the PRIMARY role, and the second virtual IP continues to run on the same node until the takeover of the SECONDARY node is complete and the HANA System Replication is SOK.

The time is maximized when the second virtual IP resource is assigned to a node where a healthy SAP HANA instance is running.

Setting up HA for SAP NetWeaver

Create a resource for ASCS instance

  • For ENSA1: When the installation and testing are complete according to earlier chapters, you can integrate the SAP NetWeaver system into the pacemaker cluster. Assuming that your underlying storage and network environment as applicable are configured according to SAP guidelines and are part of the cluster, then the following command starts the SAP NetWeaver resources into the pacemaker cluster.

[root@node ~]# pcs resource create <sid>_ascs<InstanceNumber> SAPInstance \
> InstanceName="<SID>_ASCS<InstanceNumber>_rhascs" \
> START_PROFILE=/sapmnt/<SID>/profile/<SID>_ASCS<InstanceNumber>_rhascs \
> AUTOMATIC_RECOVER=false meta resource-stickiness=5000 migration-threshold=1 \
> failure-timeout=60 --group <sid>_ASCS<InstanceNumber>_group \
> op monitor interval=20 on-fail=restart timeout=60 \
> op start interval=0 timeout=600 \
> op stop interval=0 timeout=600

Note

The meta resource-stickiness=5000 value balances out the failover constraint with ERS, so the resource stays on the node where it started, and does not migrate around the cluster uncontrollably. The migration-threshold=1 value ensures ASCS failover to another node when an issue is detected instead of restarting on the same node.

  • For ENSA2:

    [root@node ~]# pcs resource create <sid>_ascs<InstanceNumber> SAPInstance \
    > InstanceName="<SID>_ASCS<InstanceNumber>_s4ascs" \
    > START_PROFILE=/sapmnt/<SID>/profile/<SID>_ASCS<InstanceNumber>_s4ascs \
    > AUTOMATIC_RECOVER=false \
    > meta resource-stickiness=5000 \
    > --group <sid>_ASCS<InstanceNumber>_group \
    > op monitor interval=20 on-fail=restart timeout=60 \
    > op start interval=0 timeout=600 \
    > op stop interval=0 timeout=600

    Add a resource stickiness value to the group to ensure that the ASCS stays on a node if possible:

    [root@node ~]# pcs resource meta <sid>_ASCS<InstanceNumber>_group \
    > resource-stickiness=3000
    Create a resource for an ERS instance

    Create the ERS instance cluster resource.

Note

The IS_ERS=true attribute is mandatory for ENSA1 deployments. For more information about IS_ERS, see How Does the IS_ERS Attribute Work on an SAP NetWeaver Cluster with Stand-alone Enqueue Server (ENSA1 and ENSA2)?: https://access.redhat.com/solutions/5474031

  • For ENSA1:

    [root@node ~]# pcs resource create <sid>_ers<InstanceNumber> SAPInstance \
    > InstanceName="<SID>_ERS<InstanceNumber>_rhers" \
    > START_PROFILE=/sapmnt/<SID>/profile/<SID>_ERS<InstanceNumber>_rhers \
    > AUTOMATIC_RECOVER=false IS_ERS=true --group rh2_ERS29_group \
    > op monitor interval=20 on-fail=restart timeout=60 \
    > op start interval=0 timeout=600 \
    > op stop interval=0 timeout=600
  • For ENSA2:

    [root@node ~]# pcs resource create s4h_ers29 SAPInstance \
    > InstanceName="S4H_ERS29_s4ers" \
    > START_PROFILE=/sapmnt/S4H/profile/S4H_ERS29_s4ers \
    > AUTOMATIC_RECOVER=false \
    > --group s4h_ERS29_group \
    > op monitor interval=20 on-fail=restart timeout=60 \
    > op start interval=0 timeout=600 \
    > op stop interval=0 timeout=600

Create the required constraints

Create a colocation constraint for ASCS and ERS resource groups.

Resource groups <sid>_ASCS<InstanceNumber>_group and <sid>_ERS<InstanceNumber>_group should avoid running on the same node whenever both nodes are available.

[root@node ~]# pcs constraint colocation add rh2_ERS29_group with \
> rh2_ASCS20_group -5000

This concludes the chapter on Configuring a Basic High Availability Cluster for SAP.

Configuring and Managing High Availability Clusters, https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_and_managing_high_availability_clusters/index

Automating SAP HANA Scale-Up System Replication using the RHEL HA Add-On, https://access.redhat.com/articles/3004101

Revision: rh445-8.4-4e0c572