Abstract
| Goal | Explain SAP NetWeaver clustering and HA agents. |
| Objectives |
|
| Sections |
|
After completing this section, you should be able to describe the components and concepts for making SAP S/4 HANA and SAP NetWeaver highly available.
This explanation will start from a basic standalone (non-clustered) SAP system. A standard system has three main components, each of them critical for the entire system to work:
ASCS It stands for ABAP (Advanced Business Application Programming) SAP Central Service, and it is composed of two parts: the Message Server and the Enqueue Server. The Message Server acts as a communication channel between the application servers and handles the load distribution. The Enqueue Server controls the lock mechanism.
AS It stands for Application Servers. Previously, a central instance included the ASCS component. Now, the ASCS component is removed and stands on its own. Therefore, the first application server is called the PAS (Primary Application Server), and the ones after that are called AAS (Additional Application Servers), but with little practical difference between them.
PAS Primary Application Server; see also AS.
AAS Additional Application Server; see also AS.
Database The database indicates where your primary persistence is, and where you store your data, for example, an SAP HANA database.
Standalone systems look as follows:
Note from this diagram that every component is required to function correctly for the entire stack to work. Failure of any component renders the entire system unusable. The solution is that each of these components can be made highly available by giving them redundancy, hence covering all single points of failure.
For the basic HA system, you need at least two nodes to fit the required components. They might or might not be located in separate data centers.
For the central services, the recommended procedure is to use ERS (Enqueue Replication Server). ASCS and ERS are installed on a shared disk in both nodes. Enqueue server keeps the lock table, and ERS keeps a replicated copy of the lock table. The Red Hat Enterprise Linux (RHEL) High Availability (HA) pacemaker cluster software provides an automatic failover mechanism for the ASCS instance. You then have an ASCS and an ERS on each node. Therefore, if at any time an issue affected one of the nodes, it would automatically failover that component that was running on the failed node to the other working node, keeping the system alive.
ERS stands for Enqueue Replication Server. Its job is to keep an updated replica of the lock table, so if anything goes wrong with the ASCS instance, the current state of the table locks is safeguarded. Somewhat similar to SAP HANA, the ERS on its own does not guarantee the High Availability of the entire system, because it does only what is stated above, which is to maintain the replica of the lock table. To deliver the intended High Availability, the ERS capabilities must be combined with the features of RHEL HA Pacemaker cluster with an automatic failover mechanism Thus, if the ASCS instance crashes, it is brought back to a different host or node, to use the safeguarded replication table to create a lock table, so that the system can resume operation.
From the application server point of view, the key is the number of servers. You need at least two application servers (PAS and AAS) that use load balancing. If an issue hits one of the nodes, then all users that are connected to that node would be disconnected, but users can log in again, because the other AS is up and running. The RHEL HA Pacemaker cluster is also capable of managing the High Availability of application servers.
Database-wise, the norm is to have at least two databases. One is set to be the primary database, to serve the system. The second one is a standby database, which is supplied with a constant feed of logs from the primary database (called log shipping). The RHEL HA Pacemaker cluster ensures that the main IP address points to the primary database. If that node becomes unavailable, then the failover mechanism runs, and the standby database becomes the primary database, to keep the system running. Cluster also ensures that the main IP address remains where the primary database is running.
When finished, your new HA system might look similar to the following diagram:
At a basic level, a standard RHEL HA cluster in an Active-Passive configuration has two nodes. One is the primary node, and the other one is the standby node. The primary node is actively serving the system. The standby node is waiting to jump in, in the event of a failure.
The diagrams appear to show that all database instances and application servers, including the ASCS and ERS, are configured within two nodes only.
This setup is technically possible, provided that the systems are well tested to take over and withstand the entire combination, if only one node is available.
A typical scenario involves multiple clusters for ASCS/ERS, HANA databases, and application servers each.
The RHEL HA Pacemaker cluster is also capable of managing Virtual IP addresses for each of the mentioned components. The cluster can also ensure that the Virtual IP addresses always stay with their respective components, giving an additional benefit of network-side High Availability. Thus, these Virtual IP addresses are flexible to move from one node to another, and remain active on only one node at a time.
Following is a brief overview of how the failover occurs.
For the ASCS and ERS instances to be able to move from one node to the other, they must be installed on a shared file system. Together with the virtual IP address, a cluster resource group is added, so that they can all combine into a single logical unit.
Assume in this section that the ASCS instance is installed on sapnode1 with the sapascs virtual hostname, and that the ERS instance is installed on sapnode2 with the sapers virtual hostname.
After installation, you must create mount points for both ASCS and ERS in their counterpart nodes, for example /usr/sap/<SID>/ASCSXX and /usr/sap/<SID>/ERSXX, where SID is the SAP-specific System Identifier, and XX is the instance number.
More installation-specific steps are required. They are covered in the next chapter.
An overview follows of the requirements for a basic cluster configuration including the inactive (gray) instance.
After the cluster configuration is completed, and the ASCS/ERS instances and the rest of the system components including lock table replication are operational, the system looks as follows.
At this point, if the sapnode2 crashes, then the system continues to operate normally with sapnode1.
ERS comes back online when sapnode2 becomes available again.
However, if sapnode1 crashes at this point, then the cluster Heartbeat monitor triggers a resource failover, and the ASCS instance is spun on sapnode2 together with ERS.
ASCS uses the replicated table to create a table lock and resumes operations.
At the same time, ERS is shut down, and is shifted and brought back on sapnode1 when it is back online.
For a short period, both ASCS and ERS run in parallel. It is necessary, because the replication table is kept in memory in the node where ERS is running. Only after the ASCS completes reading and re-creating the lock table, the ERS is stopped, and waits to be moved to the other node when it is back online.
Eventually, after sapnode1 is back online, the ERS instance is started, and creates a lock replication table.
The ASCS is once more highly available.
This concludes the sections of High Availability for SAP S4/HANA (NetWeaver).