Bookmark this page

Guided Exercise: Node Maintenance and OpenShift Virtualization Updates

Set compute nodes into maintenance mode, drain the workloads from those nodes, and observe the maintenance mode status.

Outcomes

  • Mark a node as unschedulable.

  • Drain the workloads from nodes.

  • Monitor node maintenance status.

  • Resume a node from maintenance mode.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

This command ensures that the cluster API is reachable and creates the required resources for this exercise.

[student@workstation ~]$ lab start advanced-maintenance

Instructions

  1. As the admin user, confirm that the vm1 VM is running in the advanced-maintenance project.

    1. Log in to the OpenShift cluster as the admin user with redhatocp as the password.

      [student@workstation ~]$ oc login -u admin -p redhatocp \
        https://api.ocp4.example.com:6443
      Login successful.
      
      ...output omitted...
    2. Change to the advanced-maintenance project.

      [student@workstation ~]$ oc project advanced-maintenance
      Now using project "advanced-maintenance" on server "https://api.ocp4.example.com:6443".
    3. Use the oc get vmi command to verify that the vm1 vm is running. Note the name of the node that runs the VM. You use that node name in a later step.

      [student@workstation ~]$ oc get vmi
      NAME     AGE   PHASE     IP           NODENAME   READY
      vm1      50m   Running   10.10.0.29   worker02   True

      The node might be different in your environment.

  2. To prepare for maintenance mode, verify that OpenShift Virtualization can live migrate the vm1 VM.

    1. Use the -o wide option with the oc get vmi command to get additional information about the VMs. Verify that the vm1 VM is live migrateable.

      [student@workstation ~]$ oc get vmi -o wide
      NAME  AGE  PHASE    IP         NODENAME  READY  LIVE-MIGRATABLE  PAUSED
      vm1   53m  Running  10.8.2.38  worker02  True   True
  3. Cordon off the node that runs the VM.

    1. Use the oc adm cordon command to mark the node as unschedulable. In the following command, replace the worker02 node with the node name from the previous step.

      [student@workstation ~]$ oc adm cordon worker02
      node/worker02 cordoned
    2. Confirm that the node has the SchedulingDisabled status.

      [student@workstation ~]$ oc get node worker02
      NAME       STATUS                     ROLES    AGE     VERSION
      worker02   Ready,SchedulingDisabled   worker   5d20h   v1.27.10+28ed2d7
  4. Evacuate the workload from the node.

    1. Run the oc adm drain command to evacuate all workloads from the node. In the following command, replace the worker02 node with the node name from the previous step. The command might take a few minutes to complete.

      [student@workstation ~]$ oc adm drain worker02 \
        --delete-emptydir-data --ignore-daemonsets
      node/worker02 already cordoned
      ...output omitted...
      evicting pod advanced-maintenance/virt-launcher-vm1-n75xj
      error when evicting pods/"virt-launcher-vm1-n75xj" -n "advanced-maintenance" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
      evicting pod advanced-maintenance/virt-launcher-vm1-n75xj
      pod/virt-launcher-vm1-n75xj evicted
      node/worker02 drained

      Ignore the error messages about the failed eviction of the virt-launcher pod. OpenShift Virtualization can delete the pod only after its migration is complete and a new virt-launcher pod is running on another node.

    2. Confirm that the VM is now running on another node.

      [student@workstation ~]$ oc get vmi
      NAME                  AGE   PHASE     IP           NODENAME   READY
      vm1                   63m   Running   10.10.0.40   master03   True

      In this example, the vm1 VM is now on the master03 node.

  5. Resume the node from maintenance mode.

    1. Use the oc adm uncordon command to remove the node from maintenance mode. Replace the worker02 node in the following command with the name of the node that you set in maintenance mode in the previous step.

      [student@workstation ~]$ oc adm uncordon worker02
      node/worker02 uncordoned
    2. Confirm that the node is no longer in maintenance mode.

      [student@workstation ~]$ oc get nodes
      NAME       STATUS   ROLES                         AGE     VERSION
      master01   Ready    control-plane,master,worker   15d     v1.27.10+28ed2d7
      master02   Ready    control-plane,master,worker   15d     v1.27.10+28ed2d7
      master03   Ready    control-plane,master,worker   15d     v1.27.10+28ed2d7
      worker01   Ready    worker                        5d20h   v1.27.10+28ed2d7
      worker02   Ready    worker                        5d20h   v1.27.10+28ed2d7
  6. As an alternative to the oc adm cordon and oc adm drain commands from the preceding steps, use the Node Maintenance operator to set a node in maintenance mode and to drain its workload.

    As the admin user, use the OpenShift web console to install the Node Maintenance operator.

    1. Open a web browser and navigate to https://console-openshift-console.apps.ocp4.example.com

    2. Click htpasswd_provider and log in as the admin user with redhatocp as the password.

    3. Click OperatorsOperatorHub and select All Projects from the Projects drop-down menu. In the Filter by keyword field, type maintenance to locate the Node Maintenance operator, and then click Node Maintenance Operator.

    4. The web console displays information about the Node Maintenance operator. Click Install to proceed to the Install Operator page.

    5. Click Install to install the operator with the default options in the openshift-workload-availability namespace.

    6. Wait until the installation is complete and the web console displays the ready for use message.

  7. Use the OpenShift web console to retrieve the name of the node that runs the vm1 VM.

    1. Navigate to VirtualizationVirtualMachines. Select the advanced-maintenance project from the Projects list.

    2. Select the vm1 VM, click the Details tab, and note the name of the node that runs the VM. You use that node name in the next step.

      The node might be different in your environment.

  8. Create a NodeMaintenance resource with the Node Maintenance operator to set the node from the previous step in maintenance mode and to drain its workload.

    1. Navigate to OperatorsInstalled Operators and open the Node Maintenance Operator page.

    2. In the Node Maintenance card, click Create instance to create a NodeMaintenance resource.

    3. Complete the Create Node Maintenance form by using the following information:

      FieldValue
      Namelive-migration-test
      Node Name Use the node name from the previous step
      ReasonTesting Live Migration
    4. Click Create to create the NodeMaintenance resource.

    5. Observe the live-migration-test status and wait until the node maintenance state is Succeeded.

  9. Verify that the node does not accept new workloads and that the VM is now running on another node.

    1. Navigate to ComputeNodes and confirm that the node has the Scheduling disabled status.

    2. Navigate to VirtualizationVirtualMachines, click the vm1 VM, and then click the Details tab. Observe the node name and confirm that the VM is running on a different node from the previous step.

  10. End the node maintenance.

    1. Navigate to OperatorsInstalled Operators and open the Node Maintenance Operator page. Click the Node Maintenance tab.

    2. On the Node Maintenance page, click the vertical ellipsis icon next to the live-migration-test node maintenance, and then click Delete Node Maintenance. Click Delete to confirm the operation and resume the node from maintenance mode.

    3. Navigate to ComputeNodes and confirm that the node no longer has the Scheduling disabled status anymore.

Finish

On the workstation machine, use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish advanced-maintenance

Revision: do316-4.14-d8a6b80