Configure a workload to spread its pods between nodes from different failure domains and set minimum availability requirements. Then, drain a node to simulate a cluster update, and prove that the application keeps minimum capacity and availability.
Outcomes
Add pod anti-affinity settings to a deployment resource manifest to spread the pods across the failure domains.
Create a pod disruption budget to set a minimum availability constraint that is fulfilled when the cluster has a voluntary disruption.
Drain a compute node to simulate a voluntary disruption.
As the student user on the workstation machine, use the lab command to prepare your environment for this exercise.
[student@workstation ~]$ lab start scheduling-pdb
Instructions
The company has a local OpenShift cluster that is distributed between two racks. Each rack is connected to a different power source. The nodes are distributed by rack according to the following table:
| Rack | Control plane nodes | Compute nodes |
|---|---|---|
rack-a
|
master01 , master02
|
worker03
|
rack-b
|
master03
|
worker01 , worker02
|
The administrator needs to drain the compute nodes for maintenance.
The administrator added the rack label to indicate the location and failure domain of each node.
This label is intended as a custom topology key, so that the scheduler can spread the pods evenly across the compute nodes in different racks.
The application runs six pods and requires five of them to be available if a voluntary cluster disruption occurs, to achieve the intended response time.
The developer modifies the deployment resource to add the pod anti-affinity settings that use the custom topology, and also creates a pod disruption budget to indicate the minimum availability constraint of the application.
Verify that the nodes are labeled according to their rack location.
Log in to the cluster as the admin user.
[student@workstation ~]$ oc login -u admin -p redhatocp \
https://api.ocp4.example.com:6443
Login successful.
...output omitted...Verify that all the nodes have the rack labels according to the previous table.
[student@workstation ~]$oc get nodes -L rackNAME STATUS ROLES AGE VERSIONRACKmaster01 Ready control-plane,master 8d v1.27.6+…rack-amaster02 Ready control-plane,master 8d v1.27.6+…rack-amaster03 Ready control-plane,master 8d v1.27.6+…rack-bworker01 Ready worker 7d v1.27.6+…rack-bworker02 Ready worker 7d v1.27.6+…
rack-bworker03 Ready worker 7d v1.27.6+…
rack-a
Create the deployment without pod affinity or a pod disruption budget.
Log in as the developer user and verify that you are using the scheduling-pdb project.
[student@workstation ~]$oc login -u developer -p developerLogin successful. You have one project on this server: "scheduling-pdb"Using project "scheduling-pdb".
Change to the ~/DO380/labs/scheduling-pdb directory.
[student@workstation ~]$ cd ~/DO380/labs/scheduling-pdbCreate the deployment by using the YAML resource manifest.
[student@workstation scheduling-pdb]$ oc apply -f deployment.yaml
deployment.apps/nginx createdOpen a new terminal window, and then execute the following command to see the status of the pod disruption budget, deployment, and pods.
Wait until all pods are running, and verify that all the pods from the nginx deployment are marked as ready and available.
This process might take a few minutes.
[student@workstation scheduling-pdb]$watch oc get pdb,deployments,pods -o wideEvery 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan 3 15:59:55 2024 NAMEREADYUP-TO-DATEAVAILABLEAGE ... deployment.apps/nginx6/66660s ... NAMEREADYSTATUSRESTARTS AGE IPNODE... pod/nginx-5676948d76-75l7z1/1Running0 60s ... worker01 ... pod/nginx-5676948d76-pkdr71/1Running0 60s ... worker01 ... pod/nginx-5676948d76-gcst51/1Running0 60s ... worker01 ... pod/nginx-5676948d76-njzv21/1Running0 60s ... worker02 ... pod/nginx-5676948d76-94zmk1/1Running0 60s ... worker03 ... pod/nginx-5676948d76-mcdbj1/1Running0 60s ... worker03 ...
Keep this terminal window open to view the status of the resources from this exercise.
Return to the first terminal window and count the pods that are running on each compute node by using the count-pods.sh shell script.
The replica pods are distributed across the cluster nodes but are not distributed evenly between the rack-a and rack-b failure domains.
[student@workstation scheduling-pdb]$./count-pods.shNODE PODS worker01 3worker02 1
worker03 2
The | |
The |
Although the exact number of pods that are running on each node might be different, the total replica count is six pods.
Simulate a voluntary disruption where the cluster administrator takes the worker01 node offline for maintenance.
The selected node for draining must have at least two pods running.
Log in as the admin user.
[student@workstation scheduling-pdb]$ oc login -u admin -p redhatocp
Login successful.
...output omitted...Drain the worker01 node to simulate taking it offline for maintenance.
This command might take a few minutes to complete. Leave it running and continue with the next step. You review the output of this command in a later step.
[student@workstation scheduling-pdb]$ oc adm drain node/worker01 \
--ignore-daemonsets --delete-emptydir-data
...output omitted...Switch to the second terminal window to view the eviction of the pods from the drained node. Wait until all pods are running in another node and are marked as ready. This process might take a few minutes.
All the application pods in the drained node are evicted at the same time and the minimum availability constraints are not met. Use the values in the age column to determine which pods were evicted from the drained node and were scheduled in a different node.
This situation happens because no pod disruption budget is associated with the deployment pods, and the deployment resource also does not have an affinity setting that uses the rack label as a custom topology key.
Every 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan 3 16:02:33 2024 NAMEREADYUP-TO-DATEAVAILABLEAGE ... deployment.apps/nginx3/66314m ...NAME READY STATUS ... AGE IP NODE ... pod/nginx-5676948d76-njzv2 1/1 Running ... 3m ... worker02 ... pod/nginx-5676948d76-94zmk 1/1 Running ... 3m ... worker03 ... pod/nginx-5676948d76-mcdbj 1/1 Running ... 3m ... worker03 ... pod/nginx-5676948d76-hfbv9
0/1Init:0/1...1s... worker02 ...pod/nginx-5676948d76-zxxlg
0/1Init:0/1...1s... worker02 ... pod/nginx-5676948d76-dh6dh0/1Init:0/1...1s... worker03 ...
Only three replica pods are available. | |
Three replica pods are evicted from the drained compute node. |
Although the exact number of pods that are running on each node might be different, the total replica count is six pods.
Return to the first terminal window and inspect the output of the oc adm drain command.
Observe the pod eviction messages of the nginx pods.
All the application pods in the drained node are evicted at the same time and the minimum availability constraint is not met.
[student@workstation scheduling-pdb]$oc adm drain node/worker01 \ --ignore-daemonsets --delete-emptydir-datanode/worker01 cordoned Warning: ignoring DaemonSet-managed Pods: ...output omitted... ...output omitted... I1221 21:29:52.102938 111157 request.go:696] ...output omitted... ...output omitted...evicting podscheduling-pdb/nginx-5676948d76-pkdr7![]()
evicting podscheduling-pdb/nginx-5676948d76-75l7zevicting podscheduling-pdb/nginx-5676948d76-gcst5 ...output omitted... pod/nginx-5676948d76-gcst5evictedpod/nginx-5676948d76-75l7z
evictedpod/nginx-5676948d76-pkdr7evicted...output omitted... node/worker01 drained
All the pods are marked for eviction when the node is drained. | |
All the pods are evicted from the node at the same time and the application availability constraint is not met. |
You can safely ignore the warnings about managed pods and client-side throttling.
Get the state of the nodes to verify that the drained node is marked as not schedulable.
[student@workstation scheduling-pdb]$oc get nodesNAME STATUS ROLES ... master01 Ready control-plane,master ... master02 Ready control-plane,master ... master03 Ready control-plane,master ...worker01Ready,SchedulingDisabledworker ...worker02 Ready worker ... worker03 Ready worker ...
Count the pods that are running on each compute node.
The scheduler placed replacement pods for the evicted pods in the worker02 and worker03 compute nodes.
[student@workstation scheduling-pdb]$./count-pods.shNODE PODSworker010worker02 3 worker03 3
Delete the nginx deployment.
[student@workstation scheduling-pdb]$ oc delete deployment/nginx
deployment.apps "nginx" deletedUncordon the worker01 node that you drained previously to remove the SchedulingDisabled status.
[student@workstation scheduling-pdb]$ oc adm uncordon node/worker01
node/worker01 uncordonedList the cluster nodes and verify that all the compute nodes are marked as ready.
[student@workstation ~]$oc get nodes -L rackNAME STATUS ROLES AGE VERSION RACK master01 Ready control-plane,master 8d v1.27.6+… rack-a master02 Ready control-plane,master 8d v1.27.6+… rack-a master03 Ready control-plane,master 8d v1.27.6+… rack-b worker01Readyworker 7d v1.27.6+… rack-b worker02 Ready worker 7d v1.27.6+… rack-b worker03 Ready worker 7d v1.27.6+… rack-a
Create the nginx deployment with pod anti-affinity to spread the pods evenly across the compute nodes.
Log in as the developer user.
[student@workstation scheduling-pdb]$ oc login -u developer -p developer
Login successful.
...output omitted...Edit the deployment-affinity.yaml file and set the affinity properties according to the following specification.
Then, save and close the file.
...output omitted... spec: ...output omitted... template: ...output omitted... spec: ...output omitted... containers: ...output omitted... affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution:- weight: 100 podAffinityTerm:
topologyKey:racklabelSelector:
matchExpressions: - key: app operator: In values: -
nginx
The weighted pod affinity term is evaluated only during pod scheduling, on a best-effort basis. | |
The node label that indicates the failure domain for the nodes. | |
The label to select the pods that this affinity setting affects. |
The ~/DO380/solutions/scheduling-pdb/deployment-affinity.yaml file contains the correct configuration, and you can use it for comparison.
Create the application deployment resource by using the YAML manifest.
[student@workstation scheduling-pdb]$ oc apply -f deployment-affinity.yaml
deployment.apps/nginx createdSwitch to the second terminal window.
Wait until all pods are running and verify that all the pods from the nginx deployment are marked as ready and available.
This process might take a few minutes.
Every 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan 3 16:24:11 2024 NAMEREADYUP-TO-DATEAVAILABLEAGE ... deployment.apps/nginx6/666120s ... NAMEREADYSTATUSRESTARTS AGE IPNODE... pod/nginx-d5b9c7498-5hbkw1/1Running0 99s ... worker01 ... pod/nginx-d5b9c7498-nkx9j1/1Running0 99s ... worker01 ... pod/nginx-d5b9c7498-g6ztb1/1Running0 99s ... worker02 ... pod/nginx-d5b9c7498-bk7g61/1Running0 99s ... worker03 ... pod/nginx-d5b9c7498-djn8p1/1Running0 99s ... worker03 ... pod/nginx-d5b9c7498-pz8g51/1Running0 99s ... worker03 ...
Return to the first terminal window and count the pods that are running on each compute node.
The pods are evenly distributed across the racks, because of the pod anti-affinity settings.
Three pods are running in the rack-b rack nodes.
Three pods are running in the rack-a rack nodes.
[student@workstation scheduling-pdb]$./count-pods.shNODE PODS worker012worker02
1worker03
3
Create the pod disruption budget with the intended constraints.
Edit the pod-disruption-budget.yaml file and set the minimum available percentage and the label selector according to the following specification.
Then, save and close the file.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: nginx
labels:
app: nginx
spec:
minAvailable: 80%
selector:
matchLabels:
app: nginxThe ~/DO380/solutions/scheduling-pdb/pod-disruption-budget.yaml file contains the correct configuration, and you can use it for comparison.
Create the pod disruption budget by using the YAML manifest.
[student@workstation scheduling-pdb]$ oc apply -f pod-disruption-budget.yaml
poddisruptionbudget.policy/nginx createdVerify that the nginx pod disruption budget was created, and that it has the intended minimum available attribute.
[student@workstation scheduling-pdb]$oc describe pdb nginxName: nginx Namespace: scheduling-pdbMin available: 80%Selector: app=nginxStatus:Allowed disruptions: 1Current: 6 Desired: 5 Total: 6 Events: <none>
Drain a compute node to simulate a voluntary disruption.
The selected node for draining must have at least two pods running.
Log in again as the admin user.
[student@workstation scheduling-pdb]$ oc login -u admin -p redhatocp
Login successful.
...output omitted...Drain the worker03 node to simulate taking it offline for maintenance.
This command might take a few minutes to complete. Leave it running and continue with the next step. You review the output of this command in a later step.
[student@workstation scheduling-pdb]$ oc adm drain node/worker03 \
--ignore-daemonsets --delete-emptydir-data
...output omitted...Switch to the second terminal window to view the eviction of the pods from the drained node. Wait until all pods are running in another node and are marked as ready. This process might take a few minutes.
One pod is evicted at a time from the drained node and the availability constraints are met. Use the values in the age column to determine which pods were evicted from the drained node and were scheduled in a different node.
Every 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan 3 16:28:12 2024 NAME MIN AVAIL… MAX UNAVAIL… ALLOWED DISRUPTIONS poddisruptionbudget.policy/nginx80%N/A1NAMEREADYUP-TO-DATEAVAILABLEAGE ... deployment.apps/nginx5/66530m ...NAME READY STATUS RESTARTS AGE IP NODE ... pod/nginx-d5b9c7498-5hbkw 1/1 Running 0 5m ... worker01 ... pod/nginx-d5b9c7498-nkx9j 1/1 Running 0 5m ... worker01 ... pod/nginx-d5b9c7498-g6ztb 1/1 Running 0 5m ... worker02 ... pod/nginx-d5b9c7498-pz8g5 1/1 Running 0 5m ...
worker03...pod/nginx-d5b9c7498-q6rcf
1/1Running050s... worker01 ...pod/nginx-d5b9c7498-pxx86
0/1Init:0/1010s... worker02 ...![]()
^C
The pod eviction follows the pod disruption budget. | |
The pods in the drained node continue to run until the scheduler evicts them. | |
The replacement pods are scheduled in another compute node. | |
Only one pod is evicted at a time from the drained node. |
Press Ctrl+C and close the second terminal window when done.
Return to the first terminal window and inspect the output of the oc adm drain command.
From the pod eviction messages of the nginx pods, observe that one pod is evicted at a time from the drained node and the availability constraints are met.
The pod eviction is blocked until the PDB availability constraints are met. The pod eviction operation is retried after five seconds.
[student@workstation scheduling-pdb]$oc adm drain node/worker03 \ --ignore-daemonsets --delete-emptydir-datanode/worker03 cordoned Warning: ignoring DaemonSet-managed Pods: ...output omitted... ...output omitted... I0103 16:27:16.659505 29741 request.go:696] Waited for … due to client-side throttling, not priority and fairness, request: ...output omitted... ...output omitted...evicting podscheduling-pdb/nginx-d5b9c7498-bk7g6evicting podscheduling-pdb/nginx-d5b9c7498-djn8p![]()
evicting podscheduling-pdb/nginx-d5b9c7498-pz8g5 ...output omitted... pod/nginx-d5b9c7498-bk7g6evicted...output omitted... error when evicting pods/"nginx-d5b9c7498-djn8p" -n "scheduling-pdb" (will retry after 5s):Cannot evict pod as it would violate the pod's disruption budget.error when evicting pods/"nginx-d5b9c7498-pz8g5" -n "scheduling-pdb" (will retry after 5s):
Cannot evict pod as it would violate the pod's disruption budget. ...output omitted... pod/nginx-d5b9c7498-djn8pevictedevicting pod scheduling-pdb/nginx-d5b9c7498-pz8g5 error when evicting pods/"nginx-d5b9c7498-pz8g5" -n "scheduling-pdb" (will retry after 5s):
Cannot evict pod as it would violate the pod's disruption budget. ...output omitted... pod/nginx-d5b9c7498-pz8g5evictednode/worker03 drained
The pod is marked for eviction. | |
The pod eviction is blocked until the PDB availability constraints are met. | |
The pod is finally evicted from the drained node. |
You can safely ignore the warnings about managed pods and client-side throttling.
List the cluster nodes and verify that the worker01 node status is SchedulingDisabled.
[student@workstation scheduling-pdb]$oc get nodesNAME STATUS ROLES AGE VERSION master01 Ready control-plane,master 28d v1.27.6+... master02 Ready control-plane,master 28d v1.27.6+... master03 Ready control-plane,master 28d v1.27.6+... worker01 Ready worker 8d v1.27.6+... worker02 Ready worker 8d v1.27.6+...worker03Ready,SchedulingDisabledworker 8d v1.27.6+...
Count the pods that are running on each compute node.
[student@workstation scheduling-pdb]$./count-pods.shNODE PODS worker013worker02
3worker03 0
Switch to the student HOME directory.
[student@workstation scheduling-pdb]$ cd
[student@workstation ~]$Optional: Clean up the resources that were used in this exercise.
Delete the scheduling-pdb project.
[student@workstation ~]$ oc delete project scheduling-pdb
project.project.openshift.io "scheduling-pdb" deletedUncordon all the compute nodes.
[student@workstation ~]$ oc adm uncordon -l node-role.kubernetes.io/worker
...output omitted...Remove the rack label from all nodes.
[student@workstation ~]$ oc label node --all rack-
...output omitted...