Bookmark this page

Node Maintenance and OpenShift Virtualization Updates

Objectives

  • Set a node into maintenance mode and describe how the OpenShift Virtualization update process affects VMs.

Node Maintenance

To prepare a node for maintenance, cluster administrators must perform two operations:

  • Cordon off the node to prevent the cluster from deploying new workloads on the node.

  • Drain the node to move the current workload, including pods and VMs, to the remaining nodes.

Moving a Node into Maintenance from the Command Line

To prevent the cluster from scheduling new workloads on the node, use the oc adm cordon command:

[user@host ~]$ oc adm cordon node2
node/node2 cordoned

The node is cordoned and marked as unschedulable with the SchedulingDisabled status. You can verify the node status with the oc get node command:

[user@host ~]$ oc get node node2
NAME       STATUS                     ROLES    AGE     VERSION
node2      Ready,SchedulingDisabled   worker   4d20h   v1.27.10+28ed2d7

To move the current workload to the remaining nodes, use the oc adm drain command:

[user@host ~]$ oc adm drain node2 \
  --delete-emptydir-data \ 1
  --ignore-daemonsets \    2
  --force                  3
...output omitted...
node/node2 drained

1

The --delete-emptydir-data option prevents the drain operation from failing when some pods or VMs use the local node storage as an ephemeral volume. In that case, RHOCP restarts the pod or VM on the other nodes with a new empty volume.

2

With the --ignore-daemonsets option, RHOCP skips moving the daemon set pods.

3

The --force option prevents the drain operation from failing when some pods that RHOCP does not manage are running on the node.

Note

The oc adm drain command automatically cordons the node.

After the maintenance is done, you can remove the node from maintenance mode by using the oc adm uncordon command:

[user@host ~]$ oc adm uncordon node2
node/node2 uncordoned

The Node Maintenance Operator

The Node Maintenance operator enables cluster administrators to declaratively place nodes into maintenance by creating a NodeMaintenance custom resource. For tracking long maintenance operations, such as hardware replacement, you can attach a message to the NodeMaintenance resource as a reason for the maintenance.

As with other operators, you can install the Node Maintenance operator by using either the OpenShift web console or the command line.

Consult the references section for installing an operator by using the OperatorHub or the oc command.

Moving a Node into Maintenance with the Node Maintenance Operator

As a cluster administrator, you set a node in maintenance mode by creating a NodeMaintenance resource.

From the OpenShift web console, navigate to OperatorsInstalled Operators, and open the Node Maintenance Operator page. In the Node Maintenance card, click Create instance.

Complete the form with the node name and an optional maintenance message, and then click Create.

When a node is set to maintenance mode, the Node Maintenance operator cordons and drains the node.

On the Node Maintenance Operator page, click the Node Maintenance tab to get the status of the node maintenance. The node maintenance is in the Succeeded state when the draining process is complete.

You can instead use the command line to create a NodeMaintenance resource. The following example shows the resource file that sets the node2 node in maintenance mode:

apiVersion: nodemaintenance.medik8s.io/v1beta1
kind: NodeMaintenance
metadata:
  name: maintenance-node2
spec:
  nodeName: node2
  reason: "Node maintenance"

Use the oc apply -f resourcefile.yaml command to create the resource from the preceding file.

You can follow the draining process by using the oc describe NodeMaintenance command:

[user@host ~]$ oc describe NodeMaintenance maintenance-node2
...output omitted...
Status:
  Drain Progress:  100
  Phase:          Succeeded
  Totalpods:      47
...output omitted...

To verify that the node does not accept new workloads, confirm that the node has the SchedulingDisabled status:

[user@host ~]$ oc get nodes
NAME    STATUS                     ROLES           AGE   VERSION
...output omitted...
node1   Ready                      worker          13d   v1.27.10+28ed2d7f
node2   Ready,SchedulingDisabled   worker          13d   v1.27.10+28ed2d7

To remove the node from maintenance mode and prepare it to accept new workloads, delete the NodeMaintenance resource.

From the OpenShift web console, on the Node Maintenance page, click the vertical ellipsis icon next to the node maintenance line, and then click Delete Node Maintenance.

From the command line, use the oc delete NodeMaintenance command:

[user@host ~]$ oc delete NodeMaintenance maintenance-node2
nodemaintenances.nodemaintenance.medik8s.io "maintenance-node2" deleted

Note

OpenShift does not move the pods and VMs that were running on the node before maintenance back to the original node.

Live Migrate Virtual Machine During Node Drain

When you create a VM, the eviction strategy is set to LiveMigrate by default.

During node drain, OpenShift Virtualization migrates these VMs live. For VMs with the eviction strategy of None, OpenShift Virtualization shuts down the VMs, and then restarts them on another node.

Important

Even though the eviction strategy is set, some VMs might not support live migration. For example, OpenShift Virtualization can live migrate only VMs with storage that supports the ReadWriteMany (RWX) access mode.

These VMs prevents the drain process from completing. A cluster administrator must manually shut down the VM or disable the live migration.

To verify whether a VM supports live migration, navigate to VirtualizationVirtualMachines and then select the VM. When a VM does not support live migration, OpenShift Virtualization adds the Not migratable badge to the VM.

You can navigate to the Diagnostics tab to get more information about this status.

Figure 7.8: Verifying the VM live migration status

For these VMs, set the eviction strategy to None to prevent the drain process from blocking.

Navigate to the Configuration tab, and click Scheduling to view the scheduling parameters. Click the LiveMigrate flag from the Eviction Strategy parameter, clear the LiveMigrate checkbox, and click Save.

You must restart the VM to apply the changes.

Update Red Hat OpenShift Container Platform

Red Hat OpenShift Container Platform uses a software distribution system that provides the best upgrade path to update your cluster and the underlying operating system.

This distribution system enables clusters to upgrade directly from the internet. This system provides all the necessary resources to update a cluster to a particular version. This system also enables a cluster to use new features as they become available, including the latest bug fixes and security patches.

Note

An on-premise version of the distribution system is available when no internet access is available.

The Cluster Version Operator (CVO), which runs in your cluster, regularly contacts the OpenShift Update Service (OSUS) on the internet. The CVO provides the current component versions, and the OSUS returns an update path.

The OSUS computes the update path from your selected channel for your cluster. A channel corresponds to a minor RHOCP version, such as 4.14.

Navigate to AdministrationCluster Settings to manage your cluster updates.

The following screen capture shows the path to update the current 4.13.5 cluster to the next minor version, 4.14. After the cluster administrator selects the stable-4.14 channel, the CVO automatically contacts the OSUS to get the update path. The path shows that to migrate from 4.13.5, which is the current cluster version, to 4.14, you must first update your cluster to version 4.13.37.

Figure 7.9: Updating from 4.13.5 to 4.14

The Machine Config Operator (MCO) updates the operating system on the cluster nodes during the update process. During the update, the MCO cordons off and drains the nodes to keep the pods and VMs running.

Update Red Hat OpenShift Virtualization

For Red Hat to support your installation, the version of the OpenShift Virtualization operator must match the version of RHOCP. By default, RHOCP automatically updates the operator.

Navigate to OperatorsInstalled Operators and then click OpenShift Virtualization to access the operator details. From the Subscription tab, review the Update approval section to confirm that RHOCP automatically applies the updates.

Figure 7.10: Confirming automatic updating of the operator

Red Hat recommends using the automatic update process.

Virtual Machine Pod Updates

The updates might include virt-launcher pod components, such as the libvirt or qemu services. To update these pods, OpenShift Virtualization migrates the VMs live. During the live migration, OpenShift Virtualization starts the new updated virt-launcher pods.

If a VM does not support live migration, then OpenShift Virtualization does not update the pod until you restart the VM.

References

For more information about node maintenance, refer to the Understanding How to Evacuate Pods on Nodes section in the Red Hat OpenShift Container Platform Nodes guide at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/nodes/index#nodes-nodes-working-evacuating_nodes-nodes-working

For more information about the Node Maintenance operator, refer to the Placing Nodes in Maintenance Mode with Node Maintenance Operator chapter in the Workload Availability for Red{nbsb}Hat OpenShift guide at https://access.redhat.com/documentation/en-us/workload_availability_for_red_hat_openshift/24.1/html-single/remediation_fencing_and_maintenance/index#node-maintenance-operator

For more information about virtual machines and node maintenance, refer to the Node Maintenance chapter in the Red Hat OpenShift Container Platform Virtualization guide at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/virtualization/index#virt-node-maintenance

For more information about updating RHOCP, refer to the Red Hat OpenShift Container Platform Updating Clusters guide at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/updating_clusters/index

For more information about updating the OpenShift Virtualization operator, refer to the Updating OpenShift Virtualization chapter in the Red Hat OpenShift Container Platform Virtualization guide at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/virtualization/index#upgrading-virt

Revision: do316-4.14-d8a6b80