Configure a virtual machine to automatically fail over to another cluster node if the node that it runs on becomes unresponsive.
Outcomes
Monitor the connectivity to a web application from two VMs on a failed node.
Identify and drain the failed node that hosts the web application VMs.
Manually recover a VM from the failed node.
Adjust the eviction strategy of a VM.
Delete the node from the cluster.
Restart the node to rejoin the cluster.
As the student user on the workstation machine, use the lab command to prepare your environment for this exercise, and to ensure that all required resources are available.
[student@workstation ~]$ lab start ha-node
Instructions
The lab command creates the ha-node namespace and starts two virtual machines, web1 and web2, which host a web application.
This command also creates a service that load balances client requests between the two VMs, and a route resource for clients to access the web application.
As the admin user, confirm that the two VMs are running in the ha-node project.
Open a web browser and navigate to https://console-openshift-console.apps.ocp4.example.com.
Select and log in as the admin user with redhatocp as the password.
Navigate to → and then select the ha-node project.
Confirm that the web1 and web2 VMs are running.
Verify the eviction and run strategies of the web1 and web2 VMs.
Select the web1 VM, and then navigate to the → menu.
Confirm that the eviction strategy is set to LiveMigrate.
![]() |
Navigate to the tab to open the VM's manifest in the YAML editor.
Within the YAML manifest, confirm that the .spec.runStrategy object is set to the RerunOnFailure run strategy.
...output omitted...
spec:
...output omitted...
runStrategy: RerunOnFailure
template:
metadata:
...output omitted...Navigate to → and select the web2 VM.
Navigate to the → menu.
Confirm that the eviction strategy is set to None.
Navigate to the tab to open the VM's manifest in the YAML editor.
Within the YAML manifest, confirm that the .spec.runStrategy object is set to the RerunOnFailure run strategy.
...output omitted...
spec:
...output omitted...
runStrategy: RerunOnFailure
template:
metadata:
...output omitted...Confirm that the www-web service endpoints resolve to the IP addresses of the web1 and web2 VMIs.
Identify and monitor the node that runs the web1 and web2 VMIs.
From a command-line window, log in to your Red Hat OpenShift cluster as the admin user with redhatocp as the password.
[student@workstation ~]$ oc login -u admin -p redhatocp \
https://api.ocp4.example.com:6443
Login Successful
...output omitted...Change to the ha-node project.
[student@workstation ~]$ oc project ha-node
Now using project "ha-node" on server "https://api.ocp4.example.com:6443".Use the oc command to list the VMI resources in the ha-node project.
Note the IP addresses of the www1 and www2 VM instances, and the node that hosts the VMIs.
The IP addresses might differ in your environment.
[student@workstation ~]$ oc get vmi
NAME AGE PHASE IP NODENAME READY
web1 18m Running 10.11.0.24 worker01 True
web2 17m Running 10.11.0.29 worker01 TrueConfirm that the www-web service has active endpoints that resolve to the IP addresses of the web1 and web2 VMIs.
[student@workstation ~]$oc get endpointsNAME ENDPOINTS AGE www-web10.11.0.24:80,10.11.0.29:8018m
Open a command-line window and execute the loop.sh file in the ~/DO316/labs/ha-node/ directory.
The loop.sh file executes the curl command against the web-ha-node.apps.ocp4.example.com route.
Leave the command running.
[student@workstation ~]$ sh /home/student/DO316/labs/ha-node/loop.sh
Welcome to web1
Welcome to web2
Welcome to web1
...output omitted...Open a command-line window and use the watch command to monitor the VMI's availability during this exercise.
Leave the command running.
[student@workstation ~]$ watch oc get vmi
Every 2.0s: oc get vmi workstation.lab.example.com:...
NAME AGE PHASE IP NODENAME READY
web1 24m Running 10.11.0.24 worker01 True
web2 22m Running 10.11.0.29 worker01 TrueDevelopers report that resources on the node that runs the VMIs are experiencing performance and connectivity issues. These issues do not affect resources on other cluster nodes.
As the cluster administrator, you suspect that the node is failing due to an incorrect configuration.
Prevent new workloads from running on the node that runs the VMIs, and then drain the node of its current workloads. Manually recover and adjust the eviction strategy of a VM on the failed node.
Then, power off the node and delete it from the cluster.
On the workstation machine, open a command-line window and mark the worker01 node as not schedulable, with the oc adm cordon command.
[student@workstation ~]$ oc adm cordon worker01
node/worker01 cordonedConfirm that the node has the Ready,SchedulingDisabled status.
[student@workstation ~]$oc get node worker01NAME STATUS ROLES AGE VERSION worker01Ready,SchedulingDisabledworker 4d16h v1.27.10+28ed2d7
Drain the node of its workloads.
[student@workstation ~]$ oc adm drain worker01 --ignore-daemonsets=true \
--delete-emptydir-data --force
node/worker01 already cordoned
...output omitted...
node/worker01 drainedMonitor the command-line window that executes the loop.sh command.
Observe the high availability of the web application.
...output omitted... Welcome to web1 Welcome to web1 Welcome to web1 Welcome to web1 Welcome to web1 Welcome to web1 ...output omitted...
Monitor the command-line window that executes the watch command.
Notice that OpenShift sets to the Succeeded value the phase for the web2 VMI, and that the VMI has a False ready status, because OpenShift shuts down the web2 VMI.
Kubernetes cannot automatically relocate the web2 VM to a healthy node in the cluster, because the web2 VMI uses the None eviction strategy and you did not configure machine health checks.
Every 2.0s: oc get vmi workstation.lab.example.com:... NAME AGE PHASE IP NODENAME READY web1 36m Running 10.8.2.55 worker02 True web2 34mSucceeded10.11.0.29 worker01False
Configure the web2 VM with the LiveMigrate eviction strategy, and then manually recover the VM from the failed node.
From the web console, navigate to → .
Select the web2 VMI and click the menu.
Navigate to the section, and click in the Eviction strategy subsection.
Click the flag and click .
Click → to power on and reschedule the VM on another node in the cluster.
Monitor the command-line window where the loop.sh command is running, and observe that the web1 VMI serves all the requests until the web2 VMI reaches the Running status.
...output omitted... Welcome to web1 Welcome to web1 Welcome to web1 Welcome to web1Welcome to web2Welcome to web2Welcome to web1 ...output omitted...
Navigate to the command-line window where the watch command is running.
The worker02 node hosts the web1 VMI, and the master02 node hosts the web2 VMI.
Nodes might differ in your environment.
Every 2.0s: oc get vmi workstation.lab.example.com:... NAME AGE PHASE IP NODENAME READY web1 43m Running 10.8.2.55worker02True web2 3m31s Running 10.9.0.41master02True
Delete the node from the cluster.
To prevent potential data corruption, power off the drained node.
From the Lab Environment page for this course, locate the drained host machine, click , and then click .
Wait for the machine to display the Stopped status before proceeding.
![]() |
Return to the command-line window on the workstation machine where you initiated the node drain.
Delete the drained node from the cluster.
[student@workstation ~]$ oc delete node worker01
node "worker01" deletedList the nodes to confirm that the deleted node is no longer available.
[student@workstation ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready control-plane,master,worker 14d v1.27.10+28ed2d7
master02 Ready control-plane,master,worker 14d v1.27.10+28ed2d7
master03 Ready control-plane,master,worker 14d v1.27.10+28ed2d7
worker02 Ready worker 4d16h v1.27.10+28ed2d7Instruct the deleted node to rejoin the cluster.
From the Lab Environment page for this course, locate the deleted host machine, click , and then click .
Wait for the machine to display the Active status before proceeding.
From the command-line window on the workstation machine, list the nodes to confirm that the deleted node rejoined the cluster.
It might take a few minutes for the node to display the Ready status.
[student@workstation ~]$oc get nodesNAME STATUS ROLES AGE VERSION master01 Ready control-plane,master,worker 14d v1.27.10+28ed2d7 master02 Ready control-plane,master,worker 14d v1.27.10+28ed2d7 master03 Ready control-plane,master,worker 14d v1.27.10+28ed2d7worker01 Ready worker 3m7s v1.27.10+28ed2d7worker02 Ready worker 4d16h v1.27.10+28ed2d7
In the command-line window that is executing the loop.sh command, press Ctrl+C to stop the command.
Close the command-line window.
In the command-line window that is executing the watch command, press Ctrl+C to stop the command.
Close the command-line window.