DO380 - ch04s07

Bookmark this page

Guided Exercise: High Availability with Affinity Rules and Pod Disruption Budgets

Configure a workload to spread its pods between nodes from different failure domains and set minimum availability requirements. Then, drain a node to simulate a cluster update, and prove that the application keeps minimum capacity and availability.

Outcomes

Add pod anti-affinity settings to a deployment resource manifest to spread the pods across the failure domains.
Create a pod disruption budget to set a minimum availability constraint that is fulfilled when the cluster has a voluntary disruption.
Drain a compute node to simulate a voluntary disruption.

As the student user on the workstation machine, use the lab command to prepare your environment for this exercise.

[student@workstation ~]$ lab start scheduling-pdb

Instructions

The company has a local OpenShift cluster that is distributed between two racks. Each rack is connected to a different power source. The nodes are distributed by rack according to the following table:

Rack	Control plane nodes	Compute nodes
`rack-a`	`master01` , `master02`	`worker03`
`rack-b`	`master03`	`worker01` , `worker02`

The administrator needs to drain the compute nodes for maintenance. The administrator added the rack label to indicate the location and failure domain of each node. This label is intended as a custom topology key, so that the scheduler can spread the pods evenly across the compute nodes in different racks.

The application runs six pods and requires five of them to be available if a voluntary cluster disruption occurs, to achieve the intended response time.

The developer modifies the deployment resource to add the pod anti-affinity settings that use the custom topology, and also creates a pod disruption budget to indicate the minimum availability constraint of the application.

Verify that the nodes are labeled according to their rack location.

[student@workstation ~]$ oc login -u admin -p redhatocp \
  https://api.ocp4.example.com:6443
Login successful.

...output omitted...

Verify that all the nodes have the rack labels according to the previous table.

[student@workstation ~]$ oc get nodes -L rack
NAME      STATUS  ROLES                 AGE  VERSION    RACK
master01  Ready   control-plane,master  8d   v1.27.6+…  rack-a
master02  Ready   control-plane,master  8d   v1.27.6+…  rack-a
master03  Ready   control-plane,master  8d   v1.27.6+…  rack-b
worker01  Ready   worker                7d   v1.27.6+…  rack-b  
worker02  Ready   worker                7d   v1.27.6+…  rack-b  
worker03  Ready   worker                7d   v1.27.6+…  rack-a

	The `worker01` and `worker02` compute nodes are placed in the `rack-b` rack.
	The `worker03` compute node is placed in the `rack-a` rack.

Create the deployment without pod affinity or a pod disruption budget.

[student@workstation ~]$ oc login -u developer -p developer
Login successful.

You have one project on this server: "scheduling-pdb"

Using project "scheduling-pdb".

Change to the ~/DO380/labs/scheduling-pdb directory.

[student@workstation ~]$ cd ~/DO380/labs/scheduling-pdb

Create the deployment by using the YAML resource manifest.

[student@workstation scheduling-pdb]$ oc apply -f deployment.yaml
deployment.apps/nginx created

Open a new terminal window, and then execute the following command to see the status of the pod disruption budget, deployment, and pods.

Wait until all pods are running, and verify that all the pods from the nginx deployment are marked as ready and available.

This process might take a few minutes.

[student@workstation scheduling-pdb]$ watch oc get pdb,deployments,pods -o wide
Every 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan  3 15:59:55 2024

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE   ...
deployment.apps/nginx   6/6     6            6           60s   ...

NAME                        READY  STATUS   RESTARTS  AGE  IP   NODE      ...
pod/nginx-5676948d76-75l7z  1/1    Running  0         60s  ...  worker01  ...
pod/nginx-5676948d76-pkdr7  1/1    Running  0         60s  ...  worker01  ...
pod/nginx-5676948d76-gcst5  1/1    Running  0         60s  ...  worker01  ...
pod/nginx-5676948d76-njzv2  1/1    Running  0         60s  ...  worker02  ...
pod/nginx-5676948d76-94zmk  1/1    Running  0         60s  ...  worker03  ...
pod/nginx-5676948d76-mcdbj  1/1    Running  0         60s  ...  worker03  ...

Note

Keep this terminal window open to view the status of the resources from this exercise.

Return to the first terminal window and count the pods that are running on each compute node by using the count-pods.sh shell script.
The replica pods are distributed across the cluster nodes but are not distributed evenly between the rack-a and rack-b failure domains.
```
[student@workstation scheduling-pdb]$ ./count-pods.sh
NODE            PODS
worker01        3  
worker02        1  
worker03        2  
```
The worker01 and worker02 compute nodes are placed in the rack-b rack.

The worker03 compute node is placed in the rack-a rack.
Note
Although the exact number of pods that are running on each node might be different, the total replica count is six pods.

Simulate a voluntary disruption where the cluster administrator takes the worker01 node offline for maintenance.

Important

The selected node for draining must have at least two pods running.

[student@workstation scheduling-pdb]$ oc login -u admin -p redhatocp
Login successful.

...output omitted...

Drain the worker01 node to simulate taking it offline for maintenance.
This command might take a few minutes to complete. Leave it running and continue with the next step. You review the output of this command in a later step.
```
[student@workstation scheduling-pdb]$ oc adm drain node/worker01 \
  --ignore-daemonsets --delete-emptydir-data

...output omitted...
```
Switch to the second terminal window to view the eviction of the pods from the drained node. Wait until all pods are running in another node and are marked as ready. This process might take a few minutes.
All the application pods in the drained node are evicted at the same time and the minimum availability constraints are not met. Use the values in the age column to determine which pods were evicted from the drained node and were scheduled in a different node.
This situation happens because no pod disruption budget is associated with the deployment pods, and the deployment resource also does not have an affinity setting that uses the rack label as a custom topology key.
```
Every 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan  3 16:02:33 2024

NAME                   READY  UP-TO-DATE  AVAILABLE  AGE  ...
deployment.apps/nginx  3/6    6           3          14m  ...  

NAME                        READY  STATUS    ...  AGE  IP   NODE      ...
pod/nginx-5676948d76-njzv2  1/1    Running   ...  3m   ...  worker02  ...
pod/nginx-5676948d76-94zmk  1/1    Running   ...  3m   ...  worker03  ...
pod/nginx-5676948d76-mcdbj  1/1    Running   ...  3m   ...  worker03  ...
pod/nginx-5676948d76-hfbv9  0/1    Init:0/1  ...  1s   ...  worker02  ...  
pod/nginx-5676948d76-zxxlg  0/1    Init:0/1  ...  1s   ...  worker02  ...
pod/nginx-5676948d76-dh6dh  0/1    Init:0/1  ...  1s   ...  worker03  ...
```
Only three replica pods are available.

Three replica pods are evicted from the drained compute node.
Note
Although the exact number of pods that are running on each node might be different, the total replica count is six pods.

Return to the first terminal window and inspect the output of the oc adm drain command.

Observe the pod eviction messages of the nginx pods. All the application pods in the drained node are evicted at the same time and the minimum availability constraint is not met.

[student@workstation scheduling-pdb]$ oc adm drain node/worker01 \
  --ignore-daemonsets --delete-emptydir-data
node/worker01 cordoned
Warning: ignoring DaemonSet-managed Pods: ...output omitted...
...output omitted...
I1221 21:29:52.102938  111157 request.go:696] ...output omitted...
...output omitted...
evicting pod scheduling-pdb/nginx-5676948d76-pkdr7  
evicting pod scheduling-pdb/nginx-5676948d76-75l7z
evicting pod scheduling-pdb/nginx-5676948d76-gcst5
...output omitted...
pod/nginx-5676948d76-gcst5 evicted  
pod/nginx-5676948d76-75l7z evicted
pod/nginx-5676948d76-pkdr7 evicted
...output omitted...
node/worker01 drained

	All the pods are marked for eviction when the node is drained.
	All the pods are evicted from the node at the same time and the application availability constraint is not met.

Note

You can safely ignore the warnings about managed pods and client-side throttling.

Get the state of the nodes to verify that the drained node is marked as not schedulable.

[student@workstation scheduling-pdb]$ oc get nodes
NAME       STATUS                     ROLES                  ...
master01   Ready                      control-plane,master   ...
master02   Ready                      control-plane,master   ...
master03   Ready                      control-plane,master   ...
worker01   Ready,SchedulingDisabled   worker                 ...  
worker02   Ready                      worker                 ...
worker03   Ready                      worker                 ...

The compute node is drained for maintenance.

Count the pods that are running on each compute node. The scheduler placed replacement pods for the evicted pods in the worker02 and worker03 compute nodes.
```
[student@workstation scheduling-pdb]$ ./count-pods.sh
NODE            PODS
worker01        0  
worker02        3
worker03        3
```
No pods are on this node, because it was just drained.

Delete the nginx deployment.

[student@workstation scheduling-pdb]$ oc delete deployment/nginx
deployment.apps "nginx" deleted

Uncordon the worker01 node that you drained previously to remove the SchedulingDisabled status.

[student@workstation scheduling-pdb]$ oc adm uncordon node/worker01
node/worker01 uncordoned

List the cluster nodes and verify that all the compute nodes are marked as ready.

[student@workstation ~]$ oc get nodes -L rack
NAME      STATUS  ROLES                 AGE  VERSION    RACK
master01  Ready   control-plane,master  8d   v1.27.6+…  rack-a
master02  Ready   control-plane,master  8d   v1.27.6+…  rack-a
master03  Ready   control-plane,master  8d   v1.27.6+…  rack-b
worker01  Ready   worker                7d   v1.27.6+…  rack-b
worker02  Ready   worker                7d   v1.27.6+…  rack-b
worker03  Ready   worker                7d   v1.27.6+…  rack-a

Create the nginx deployment with pod anti-affinity to spread the pods evenly across the compute nodes.

[student@workstation scheduling-pdb]$ oc login -u developer -p developer
Login successful.

...output omitted...

Edit the deployment-affinity.yaml file and set the affinity properties according to the following specification. Then, save and close the file.

...output omitted...
spec:
  ...output omitted...
  template:
    ...output omitted...
    spec:
      ...output omitted...
      containers:
        ...output omitted...
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:  
          - weight: 100
            podAffinityTerm:
              topologyKey: rack  
              labelSelector:  
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - nginx

	The weighted pod affinity term is evaluated only during pod scheduling, on a best-effort basis.
	The node label that indicates the failure domain for the nodes.
	The label to select the pods that this affinity setting affects.

Note

The ~/DO380/solutions/scheduling-pdb/deployment-affinity.yaml file contains the correct configuration, and you can use it for comparison.

Create the application deployment resource by using the YAML manifest.

[student@workstation scheduling-pdb]$ oc apply -f deployment-affinity.yaml
deployment.apps/nginx created

Switch to the second terminal window. Wait until all pods are running and verify that all the pods from the nginx deployment are marked as ready and available.

This process might take a few minutes.

Every 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan  3 16:24:11 2024

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE   ...
deployment.apps/nginx   6/6     6            6           120s  ...

NAME                       READY  STATUS   RESTARTS  AGE   IP   NODE      ...
pod/nginx-d5b9c7498-5hbkw  1/1    Running  0         99s   ...  worker01  ...
pod/nginx-d5b9c7498-nkx9j  1/1    Running  0         99s   ...  worker01  ...
pod/nginx-d5b9c7498-g6ztb  1/1    Running  0         99s   ...  worker02  ...
pod/nginx-d5b9c7498-bk7g6  1/1    Running  0         99s   ...  worker03  ...
pod/nginx-d5b9c7498-djn8p  1/1    Running  0         99s   ...  worker03  ...
pod/nginx-d5b9c7498-pz8g5  1/1    Running  0         99s   ...  worker03  ...

Return to the first terminal window and count the pods that are running on each compute node.
The pods are evenly distributed across the racks, because of the pod anti-affinity settings.
- Three pods are running in the rack-b rack nodes.
- Three pods are running in the rack-a rack nodes.
```
[student@workstation scheduling-pdb]$ ./count-pods.sh
NODE            PODS
worker01        2  
worker02        1  
worker03        3  
```
The worker01 and worker02 compute nodes are in the rack-b rack.

The worker03 compute node is in the rack-a rack.

Create the pod disruption budget with the intended constraints.

Edit the pod-disruption-budget.yaml file and set the minimum available percentage and the label selector according to the following specification. Then, save and close the file.
```
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  minAvailable: 80%
  selector:
    matchLabels:
      app: nginx
```
Note
The ~/DO380/solutions/scheduling-pdb/pod-disruption-budget.yaml file contains the correct configuration, and you can use it for comparison.

Create the pod disruption budget by using the YAML manifest.

[student@workstation scheduling-pdb]$ oc apply -f pod-disruption-budget.yaml
poddisruptionbudget.policy/nginx created

Verify that the nginx pod disruption budget was created, and that it has the intended minimum available attribute.

[student@workstation scheduling-pdb]$ oc describe pdb nginx
Name:           nginx
Namespace:      scheduling-pdb
Min available:  80%
Selector:       app=nginx
Status:
    Allowed disruptions:  1  
    Current:              6
    Desired:              5
    Total:                6
Events:                   <none>

Only one pod can be evicted at a time from a drained node.

Drain a compute node to simulate a voluntary disruption.

Important

The selected node for draining must have at least two pods running.

[student@workstation scheduling-pdb]$ oc login -u admin -p redhatocp
Login successful.

...output omitted...

Drain the worker03 node to simulate taking it offline for maintenance.
This command might take a few minutes to complete. Leave it running and continue with the next step. You review the output of this command in a later step.
```
[student@workstation scheduling-pdb]$ oc adm drain node/worker03 \
  --ignore-daemonsets --delete-emptydir-data

...output omitted...
```

Switch to the second terminal window to view the eviction of the pods from the drained node. Wait until all pods are running in another node and are marked as ready. This process might take a few minutes.

One pod is evicted at a time from the drained node and the availability constraints are met. Use the values in the age column to determine which pods were evicted from the drained node and were scheduled in a different node.

Every 2.0s: oc get pdb,deployments,pods ... workstation: Wed Jan  3 16:28:12 2024

NAME                              MIN AVAIL…  MAX UNAVAIL…  ALLOWED DISRUPTIONS
poddisruptionbudget.policy/nginx  80%         N/A           1

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE   ...
deployment.apps/nginx   5/6     6            5           30m   ...  

NAME                       READY  STATUS    RESTARTS  AGE  IP   NODE      ...
pod/nginx-d5b9c7498-5hbkw  1/1    Running   0         5m   ...  worker01  ...
pod/nginx-d5b9c7498-nkx9j  1/1    Running   0         5m   ...  worker01  ...
pod/nginx-d5b9c7498-g6ztb  1/1    Running   0         5m   ...  worker02  ...
pod/nginx-d5b9c7498-pz8g5  1/1    Running   0         5m   ...  worker03  ...  
pod/nginx-d5b9c7498-q6rcf  1/1    Running   0         50s  ...  worker01  ...  
pod/nginx-d5b9c7498-pxx86  0/1    Init:0/1  0         10s  ...  worker02  ...  

^C

	The pod eviction follows the pod disruption budget.
	The pods in the drained node continue to run until the scheduler evicts them.
	The replacement pods are scheduled in another compute node.
	Only one pod is evicted at a time from the drained node.

Press Ctrl+C and close the second terminal window when done.

Return to the first terminal window and inspect the output of the oc adm drain command.

From the pod eviction messages of the nginx pods, observe that one pod is evicted at a time from the drained node and the availability constraints are met.

The pod eviction is blocked until the PDB availability constraints are met. The pod eviction operation is retried after five seconds.

[student@workstation scheduling-pdb]$ oc adm drain node/worker03 \
  --ignore-daemonsets --delete-emptydir-data
node/worker03 cordoned
Warning: ignoring DaemonSet-managed Pods: ...output omitted...
...output omitted...
I0103 16:27:16.659505   29741 request.go:696] Waited for … due to client-side throttling, not priority and fairness, request: ...output omitted...
...output omitted...
evicting pod scheduling-pdb/nginx-d5b9c7498-bk7g6
evicting pod scheduling-pdb/nginx-d5b9c7498-djn8p  
evicting pod scheduling-pdb/nginx-d5b9c7498-pz8g5
...output omitted...
pod/nginx-d5b9c7498-bk7g6 evicted
...output omitted...
error when evicting pods/"nginx-d5b9c7498-djn8p" -n "scheduling-pdb" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.  
error when evicting pods/"nginx-d5b9c7498-pz8g5" -n "scheduling-pdb" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
...output omitted...
pod/nginx-d5b9c7498-djn8p evicted  
evicting pod scheduling-pdb/nginx-d5b9c7498-pz8g5
error when evicting pods/"nginx-d5b9c7498-pz8g5" -n "scheduling-pdb" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
...output omitted...
pod/nginx-d5b9c7498-pz8g5 evicted
node/worker03 drained

	The pod is marked for eviction.
	The pod eviction is blocked until the PDB availability constraints are met.
	The pod is finally evicted from the drained node.

Note

You can safely ignore the warnings about managed pods and client-side throttling.

List the cluster nodes and verify that the worker01 node status is SchedulingDisabled.

[student@workstation scheduling-pdb]$ oc get nodes
NAME       STATUS                     ROLES                  AGE   VERSION
master01   Ready                      control-plane,master   28d   v1.27.6+...
master02   Ready                      control-plane,master   28d   v1.27.6+...
master03   Ready                      control-plane,master   28d   v1.27.6+...
worker01   Ready                      worker                 8d    v1.27.6+...
worker02   Ready                      worker                 8d    v1.27.6+...
worker03   Ready,SchedulingDisabled   worker                 8d    v1.27.6+...

The compute node is marked as not schedulable.

Count the pods that are running on each compute node.
```
[student@workstation scheduling-pdb]$ ./count-pods.sh
NODE            PODS
worker01        3  
worker02        3  
worker03        0  
```
The pods are evenly distributed on the remaining nodes.

No pods are on this node, because it was just drained.

Switch to the student HOME directory.

[student@workstation scheduling-pdb]$ cd
[student@workstation ~]$

Optional: Clean up the resources that were used in this exercise.

Delete the scheduling-pdb project.

[student@workstation ~]$ oc delete project scheduling-pdb
project.project.openshift.io "scheduling-pdb" deleted

Uncordon all the compute nodes.

[student@workstation ~]$ oc adm uncordon -l node-role.kubernetes.io/worker
...output omitted...

Remove the rack label from all nodes.

[student@workstation ~]$ oc label node --all rack-
...output omitted...

Finish

On the workstation machine, use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish scheduling-pdb

Discuss Red Hat OpenShift Administration III: Scaling Deployments in the Enterprise

Go to community

Welcome to Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise!

Syed

12 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to DO378.Read more about Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise here.

Revision: do380-4.14-397a507

	Only three replica pods are available.
	Three replica pods are evicted from the drained compute node.

	The `worker01` and `worker02` compute nodes are in the `rack-b` rack.
	The `worker03` compute node is in the `rack-a` rack.

	The pods are evenly distributed on the remaining nodes.
	No pods are on this node, because it was just drained.