Bookmark this page

Configuring Highly Available Virtual Machines

Objectives

After completing this section, you should be able to configure hosts and virtual machines to enable high availability features and failover in the event of host failure.

High Availability for Virtual Machines

A high availability virtual machine is automatically restarted if it crashes, or if its host becomes nonresponsive. When these events occur, RHV-M automatically restarts the high availability virtual machine, either on its original host or on another host in the cluster.

Red Hat Virtualization Manager constantly monitors hosts and storage to detect hardware failures. With high availability, interruption to service is kept to a minimum because RHV-M restarts virtual machines that are configured to be highly available within seconds, with no user intervention required.

Configuring high availability is a recommended practice for virtual machines running critical workloads.

Virtual machines may be configured to automatically restart if the host becomes nonresponse, or if the virtual machine unexpectedly crashes. To use this feature, all hosts in the cluster must support an out-of-band power management system, such as iLO, DRAC, RSA, or a network-attached remote power switch that is configured to act as a fencing device.

RHV-M can also automatically restart high priority virtual machines first. Multiple levels of priority give the highest restart priority to the most important virtual machines.

Note

An alternate method of handling high availability is with a cluster configured using Pacemaker. RHV-M high availability should not be enabled with Pacemaker as the fencing methods will conflict.

Fencing Hosts for VM Integrity

A virtual machine must never be running on two hosts at the same time, or its disk image is likely to become corrupt, leading to data loss. To avoid this issue, Red Hat Virtualization uses an out-of-band management agent to fence a nonresponsive host. The agent forces a power off, ensuring that the host and its virtual machines are truly down. Only then will it reboot the virtual machine on a new host.

A host is nonresponsive when RHV-M cannot communicate with it. RHV-M uses fencing to ensure that highly available virtual machines, running on a nonresponsive host, are stopped. Then, RHV-M restarts them on a different host in the cluster.

Red Hat Virtualization 4 and later also support the usage of a special storage volume as a lease, to control whether virtual machines boot on another host when the original host goes down unexpectedly. This feature also prevents two instances of the same virtual machine from running concurrently on different hosts.

Important

There is an important distinction between a Non-Operational host and a Non-Responsive host.

A nonoperational host has encountered a problem, but RHV-M can still communicate with it. RHV-M works with the host to migrate any virtual machines running on that host to operational hosts in the cluster. Likewise, a host that is moved to Maintenance mode automatically migrates all its virtual machines to other operational hosts in the cluster.

A nonresponsive host is one that is not communicating with RHV-M. After about 30 seconds, RHV-M fences that host and restarts any highly available virtual machines on operational hosts in the cluster.

Configuring a Fence Agent in a Host

RHV-M uses a fence agent to fence nonresponsive hosts. It does not do this directly, but uses VDSM to send power management requests to a fencing proxy, which is one of the other hosts in the same cluster or data center as the nonresponsive host. That host communicates with the fence agent to execute the power management request.

The Power Management tab in the Edit Host and New Host windows includes the power management configuration options for a host.

Figure 13.1: Configuration for host high availability

The configuration options included in the Power Management tab include:

  • The Enable Power Management check box enables power management for the host.

  • The Kdump integration check box disables host fencing while a kernel crash dump completes.

  • The Disable policy control of power management check box disables the cluster scheduling policy for the host.

  • The plus (+) button opens the Edit fence agent window, to configure a new fence agent for a host. This configuration includes parameters like the IP address of the Remote Access Card (RAC), and the username and password used to log in.

  • The Advanced Parameters section specifies the search order for a proxy in the cluster and data center for the host.

Configuring a Highly Available Virtual Machine

Virtual machines are configured to be highly available on an individual basis. This configuration can be done when creating the virtual machine, or you can edit an existing VM to enable high availability.

The High Availability tab in the Edit Virtual Machine window includes the high availability configuration options for a virtual machine. To open the Edit Virtual Machine window, right-click on the virtual machine list item, and then click Edit.

Figure 13.2: Configuration for virtual machine high availability

The configuration options included in the High Availability tab include:

  • The Highly Available check box enables high availability for the virtual machine.

  • The Target Storage Domain for VM Lease drop-down menu specifies whether or not to use a storage lease to control whether a virtual machine boots on another host when the original host goes down unexpectedly. To use a storage lease, you must configure a storage domain for the lease.

  • The Priority drop-down menu sets the priority of the virtual machine in the migration queue.

If the following conditions are met, then highly available virtual machines successfully restart when their host becomes nonresponsive:

  • Power management is available for the hosts running the highly available virtual machines.

  • The host running the highly available virtual machine is part of a cluster that has other available hosts.

  • The destination host is running.

  • The source and destination hosts have access to the data domain on which the virtual machine resides.

  • The source and destination hosts have access to the same virtual networks and VLANs.

  • There are enough CPUs on the destination host that are not in use to support the virtual machine requirements.

  • There is enough RAM on the destination host that is not in use to support the virtual machine requirements.

References

Further information on highly available virtual machine configuration is available in the "Improving Uptime with Virtual Machine High Availability" section of the Administration Guide for Red Hat Virtualization at https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html-single/virtual_machine_management_guide/index#sect-Improving_Uptime_with_Virtual_Machine_High_Availability

Revision: rh318-4.3-c05018e