Bookmark this page

Lab: Pod Scheduling

Configure a workload to ensure that its pods run on dedicated nodes, and configure a node to prevent a pod's workloads from running on them.

Configure applications for resilience against node failures.

Outcomes

  • Create deployments with node selectors to specify node pools for a workload.

  • Configure pods with tolerations for nodes that use taints.

  • Use pod disruption budgets to minimize downtime during cluster updates and other node maintenance processes.

  • Use pod affinity rules to ensure that pods from the different workloads run on the same nodes.

As the student user on the workstation machine, use the lab command to prepare your environment for this exercise, and to ensure that all required resources are available.

[student@workstation ~]$ lab start scheduling-review

Instructions

Your company has a cluster with three compute nodes, as in the following diagram:

Servers might have different hardware. The worker01 and worker02 nodes have SATA SSD disks and include the disk=ssd label. The worker03 node has NVMe disks and includes the disk=nvme label. The NVMe disks are faster than the SATA SSD disks. Because the cluster has limited resources, the worker01 node includes the application=ml:NoSchedule taint to reserve some workload capacity for a machine learning application that is critical for your company.

Your company requires you to create a machine learning deployment that can run on the node with the taint. The name for the deployment must be review-toleration. This deployment uses the registry.ocp4.example.com:8443/ubi9/ubi:9.0.0-1468 container with eight replicas. Moreover, this deployment requires high availability in the cluster, with no more than 25% of the pods being unavailable. This requirement means that at least six pods must be always running on the cluster. Thus, if you try to drain a node that contains more than 25% of the pods, then OpenShift cannot drain it until the pods that exceed the 25% threshold are placed onto other nodes. The pod disruption budget name must be review-pdb.

Your company also requires you to create a deployment that needs a fast disk. Thus, you must create the deployment with a node selector for the node with the disk=nvme label. The deployment uses the registry.ocp4.example.com:8443/redhattraining/hello-world-nginx:v1.0 container with four replicas and with the review-ns name.

Finally, your company requires you to create a deployment that must communicate with the review-ns deployment frequently. Thus, you must create this deployment with a required pod affinity rule for the pods in the review-ns deployment. Use the kubernetes.io/hostname label as the topologyKey parameter. The deployment uses the registry.ocp4.example.com:8443/rhel9/mysql-80:1-237 container with four replicas and with the review-affinity name.

Create all the deployments as the developer user with developer as the password in the scheduling-review project. For the deployments, you can use the incomplete CR YAML files in the ~/DO380/labs/scheduling-review directory. If you need administrator permissions, then use the admin user with redhatocp as the password. You might use the count-pods.sh script, which counts the pods of a deployment in each node.

  1. As the admin user, verify the labels and taints of your cluster nodes.

    1. Connect to the OpenShift cluster as the admin user with redhatocp as the password.

      [student@workstation ~]$ oc login -u admin -p redhatocp \
        https://api.ocp4.example.com:6443
      Login successful.
      ...output omitted...
    2. List the labels for the compute nodes. Verify that the worker01 and worker02 nodes include the disk=ssd label, and that the worker03 node includes the disk=nvme label.

      [student@workstation ~]$ oc get nodes -L disk
      NAME       STATUS  ROLES                  AGE    VERSION           DISK
      master01   Ready   control-plane,master   140d   v1.25.7+eab9cc9
      master02   Ready   control-plane,master   140d   v1.25.7+eab9cc9
      master03   Ready   control-plane,master   140d   v1.25.7+eab9cc9
      worker01   Ready   worker                 37d    v1.25.7+eab9cc9   ssd
      worker02   Ready   worker                 37d    v1.25.7+eab9cc9   ssd
      worker03   Ready   worker                 37d    v1.25.7+eab9cc9   nvme
    3. Verify the taint for the worker01 node.

      [student@workstation ~]$ oc describe node worker01 | grep Taints
      Taints:             application=ml:NoSchedule
  2. Create the review-toleration deployment with a toleration for the worker01 node taint.

    1. Connect to the OpenShift cluster as the developer user with developer as the password, and verify that OpenShift uses the scheduling-review project.

      [student@workstation ~]$ oc login -u developer -p developer
      Login successful.
      ...output omitted...
      Using project "scheduling-review".
    2. Change to the ~/DO380/labs/scheduling-review directory.

      [student@workstation ~]$ cd ~/DO380/labs/scheduling-review/
    3. Edit the ~/DO380/labs/scheduling-review/review-toleration.yml file for the review-toleration deployment, and add the toleration for the application=ml:NoSchedule taint.

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: review-toleration
      spec:
      ...output omitted...
          spec:
      ...output omitted...
            tolerations:
            - key: "application"
              value: "ml"
              operator: "Equal"
              effect: "NoSchedule"
    4. Create the review-toleration deployment with the toleration.

      [student@workstation scheduling-review]$ oc create -f review-toleration.yml
      deployment.apps/review-toleration created
    5. Verify that the deployment is correctly created. Wait until all the pods are marked as ready and available. You might have to repeat this command many times.

      [student@workstation scheduling-review]$ oc get deployment
      NAME                READY   UP-TO-DATE   AVAILABLE   AGE
      review-toleration   8/8     8            8           53s
    6. Verify that the OpenShift scheduler distributes the pods between the available compute nodes. OpenShift places some pods in the worker01 node, because the deployment includes a toleration for the taint. The pod names and the nodes that run on your pods might differ on your system.

      [student@workstation scheduling-review]$ oc get pods -o wide
      NAME                               READY  STATUS    ...   NODE      ...
      review-toleration-76f8c74d7-87jjr  1/1    Running   ...   worker02  ...
      review-toleration-76f8c74d7-gtwhx  1/1    Running   ...   worker03  ...
      review-toleration-76f8c74d7-hfssj  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-lmdcn  1/1    Running   ...   worker03  ...
      review-toleration-76f8c74d7-ph724  1/1    Running   ...   worker02  ...
      review-toleration-76f8c74d7-tntwz  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-w4tfj  1/1    Running   ...   worker03  ...
      review-toleration-76f8c74d7-zr7d5  1/1    Running   ...   worker02  ...
    7. Use the ~/DO380/labs/scheduling-review/count-pods.sh script to count the pods for the review-toleration deployment in each node. The output might differ on your system, because the OpenShift scheduler distributes the pods across the three compute nodes depending on their workload.

      [student@workstation scheduling-review]$ ./count-pods.sh review-toleration
      NODE            PODS
      worker01        2
      worker02        3
      worker03        3
  3. Create the review-pdb pod disruption budget for the review-toleration deployment. This pod disruption budget ensures that no more than 25% of the pods can become unavailable.

    1. Edit the ~/DO380/labs/scheduling-review/review-pdb.yml file for the review-pdb pod disruption budget.

      apiVersion: policy/v1
      kind: PodDisruptionBudget
      metadata:
        name: review-pdb
      spec:
        maxUnavailable: 25%
        selector:
          matchLabels:
            app: review-toleration
    2. Create the review-pdb pod disruption budget.

      [student@workstation scheduling-review]$ oc create -f review-pdb.yml
      poddisruptionbudget.policy/review-pdb created
    3. Verify that the pod disruption budget is correctly created, with a maximum of two unavailable pods.

      [student@workstation scheduling-review]$ oc get pdb
      NAME         MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
      review-pdb   N/A             25%               2                     26s
      [student@workstation scheduling-review]$ oc describe pdb review-pdb
      Name:             review-pdb
      Namespace:        scheduling-review
      Max unavailable:  25%
      Selector:         app=review-toleration
      Status:
          Allowed disruptions:  2
          Current:              8
          Desired:              6
          Total:                8
      Events:                   <none>
  4. Verify that the pod disruption budget works as expected, and that you get error messages when you try to drain a node that contains more than 25% of the review-toleration deployment pods. After verifying that the pod disruption budget works as expected, uncordon all nodes to mark them as schedulable.

    1. Connect to the OpenShift cluster as the admin user with redhatocp as the password.

      [student@workstation ~]$ oc login -u admin -p redhatocp
      ...output omitted...
    2. To test that the pod disruption budget works as expected, first mark the worker02 and the worker03 nodes as unschedulable.

      [student@workstation scheduling-review]$ oc adm cordon worker02 worker03
      node/worker02 cordoned
      node/worker03 cordoned
    3. Perform a rollout restart of the review-toleration deployment to ensure that OpenShift creates all the pods in the worker01 node.

      [student@workstation scheduling-review]$ oc rollout restart \
        deployment/review-toleration
      deployment.apps/review-toleration restarted
    4. Verify that all the pods are scheduled to execute in the worker01 node. For any pods in a terminating status, wait until OpenShift removes those pods. You might have to repeat this command many times.

      [student@workstation scheduling-review]$ oc get pods -o wide
      NAME                               READY  STATUS    ...   NODE      ...
      review-toleration-76f8c74d7-4494g  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-5jw2q  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-86mf8  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-kp6wp  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-ljxpd  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-nbzj6  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-t78bg  1/1    Running   ...   worker01  ...
      review-toleration-76f8c74d7-tl29v  1/1    Running   ...   worker01  ...
    5. Use the ~/DO380/labs/scheduling-review/count-pods.sh script to count the pods for the review-toleration deployment in each node.

      [student@workstation scheduling-review]$ ./count-pods.sh review-toleration
      NODE            PODS
      worker01        8
      worker02        0
      worker03        0
    6. Mark the worker02 and worker03 nodes as schedulable.

      [student@workstation scheduling-review]$ oc adm uncordon worker02 worker03
      node/worker02 uncordoned
      node/worker03 uncordoned
    7. Drain all the pods from the worker01 node. You receive some error messages, because OpenShift can drain the node only after the pods from the review-toleration deployment that exceed the 25% threshold are placed onto other nodes. This action might take a few minutes.

      [student@workstation scheduling-review]$ oc adm drain worker01 \
        --ignore-daemonsets --delete-emptydir-data
      ...output omitted...
      evicting pod scheduling-review/review-toleration-76f8c74d7-kp6wp
      evicting pod scheduling-review/review-toleration-76f8c74d7-ljxpd
      evicting pod scheduling-review/review-toleration-76f8c74d7-5jw2q
      error when evicting pods/"review-toleration-76f8c74d7-5jw2q" -n "scheduling-review" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
      error when evicting pods/"review-toleration-76f8c74d7-ljxpd" -n "scheduling-review" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
      evicting pod scheduling-review/review-toleration-76f8c74d7-nbzj6
      error when evicting pods/"review-toleration-76f8c74d7-nbzj6" -n "scheduling-review" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
      ...output omitted...
      node/worker01 drained
    8. Mark the worker01 node as schedulable.

      [student@workstation scheduling-review]$ oc adm uncordon worker01
      node/worker01 uncordoned
  5. Create the review-ns deployment with a node selector for the node with the disk=nvme label, and verify that the pods are scheduled in the node with the disk=nvme label.

    1. Connect to the OpenShift cluster as the developer user with developer as the password.

      [student@workstation scheduling-review]$ oc login -u developer -p developer
      ...output omitted...
    2. Edit the ~/DO380/labs/scheduling-review/review-ns.yml file for the review-ns deployment, and add a node selector for the disk=nvme label.

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: review-ns
      spec:
      ...output omitted...
          spec:
      ...output omitted...
            nodeSelector:
              disk: nvme
    3. Create the review-ns deployment.

      [student@workstation scheduling-review]$ oc create -f review-ns.yml
      deployment.apps/review-ns created
    4. Verify that the deployment is correctly created. Wait until all the pods are marked as ready and available. You might have to repeat this command many times.

      [student@workstation scheduling-review]$ oc get deployment
      NAME                READY   UP-TO-DATE   AVAILABLE   AGE
      review-ns           4/4     4            4           61s
      review-toleration   8/8     8            8           46m
    5. Verify that the OpenShift scheduler creates all the pods in the worker03 node, which has the disk=nvme label. The pod names might differ on your system.

      [student@workstation scheduling-review]$ oc get pods -o wide -l app=review-ns
      NAME                          READY  STATUS    ...   NODE      ...
      review-ns-7475cbc8ff-d7dbq    1/1    Running   ...   worker03  ...
      review-ns-7475cbc8ff-hpl5b    1/1    Running   ...   worker03  ...
      review-ns-7475cbc8ff-p7lhb    1/1    Running   ...   worker03  ...
      review-ns-7475cbc8ff-wjxjz    1/1    Running   ...   worker03  ...
  6. Create the review-affinity deployment with a required pod affinity rule for the pods in the review-ns deployment. Use the kubernetes.io/hostname node label as the topology key.

    1. Edit the ~/DO380/labs/scheduling-review/review-affinity.yml file for the review-affinity deployment, and add a required pod affinity rule for the pods in the review-ns deployment. Use the kubernetes.io/hostname node label as the topology key.

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: review-affinity
      spec:
      ...output omitted...
          spec:
      ...output omitted...
            affinity:
              podAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - review-ns
                  topologyKey: kubernetes.io/hostname
    2. Create the review-affinity deployment.

      [student@workstation scheduling-review]$ oc create -f review-affinity.yml
      deployment.apps/review-affinity created
    3. Verify that the deployment is correctly created. Wait until all the pods are marked as ready and available. You might have to repeat this command many times.

      [student@workstation scheduling-review]$ oc get deployment
      NAME                READY   UP-TO-DATE   AVAILABLE   AGE
      review-affinity     4/4     4            4           27s
      review-ns           4/4     4            4           17m
      review-toleration   8/8     8            8           64m
    4. Verify that all the pods for the review-affinity deployment are placed in the worker03 node. The pod names might differ on your system.

      [student@workstation scheduling-review]$ oc get pods -o wide \
        -l app=review-affinity
      NAME                                READY  STATUS    ...   NODE      ...
      review-affinity-79d6b5bcfd-8qmh8    1/1    Running   ...   worker03  ...
      review-affinity-79d6b5bcfd-cpkhz    1/1    Running   ...   worker03  ...
      review-affinity-79d6b5bcfd-h8sq6    1/1    Running   ...   worker03  ...
      review-affinity-79d6b5bcfd-mcv7k    1/1    Running   ...   worker03  ...
    5. Verify that the OpenShift scheduler creates all the pods for the review-affinity deployment in the same compute node as the review-ns deployment. The pod names might differ on your system.

      [student@workstation scheduling-review]$ oc get pods -o wide -l app=review-ns
      NAME                          READY  STATUS    ...   NODE      ...
      review-ns-7475cbc8ff-d7dbq    1/1    Running   ...   worker03  ...
      review-ns-7475cbc8ff-hpl5b    1/1    Running   ...   worker03  ...
      review-ns-7475cbc8ff-p7lhb    1/1    Running   ...   worker03  ...
      review-ns-7475cbc8ff-wjxjz    1/1    Running   ...   worker03  ...
    6. Change to the /home/student directory.

      [student@workstation scheduling-review]$ cd

Evaluation

As the student user on the workstation machine, use the lab command to grade your work. Correct any reported failures and rerun the command until successful.

[student@workstation ~]$ lab grade scheduling-review

Finish

As the student user on the workstation machine, use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish scheduling-review

Revision: do380-4.14-397a507