DO180 - ch06s04

Bookmark this page

P 1 2 3 4 5 6 7 8

Guided Exercise: Application Health Probes

Configure health probes in a deployment and verify that network clients are insulated from application failures.

Outcomes

Observe potential issues with an application that is not configured with health probes.
Configure startup, liveness, and readiness probes for the application.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

This command ensures that the following conditions are true:

The reliability-probes project exists.
The resource files are available in the course directory.
The classroom registry has the long-load container image.

The registry.ocp4.example.com:8443/redhattraining/long-load:v1 container image contains an application with utility endpoints. These endpoints perform such tasks as crashing the process and toggling the server's health status.

[student@workstation ~]$ lab start reliability-probes

Instructions

As the developer user, deploy the long-load application in the reliability-probes project.

[student@workstation ~]$ oc login -u developer -p developer \
https://api.ocp4.example.com:6443
Login successful.
...output omitted...

Set the reliability-probes project as the active project.

[student@workstation ~]$ oc project reliability-probes
Now using project reliability-probes on server "https://api.ocp4.example.com:6443".

Apply the long-load-deploy.yaml file to create the pod. Move to the next step within one minute.

[student@workstation ~]$ oc apply -f \
  ~/DO180/labs/reliability-probes/long-load-deploy.yaml
deployment.apps/long-load created
service/long-load created
route.route.openshift.io/long-load created

Verify that the pods take several minutes to start by sending a request to a pod in the deployment.

[student@workstation ~]$ oc exec deploy/long-load -- \
  curl -s localhost:3000/health
app is still starting

Observe that the pods are listed as ready even though the application is not ready.

[student@workstation ~]$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
long-load-8564d998cc-579nx   1/1     Running   0          30s
long-load-8564d998cc-ttqpg   1/1     Running   0          30s
long-load-8564d998cc-wjtfw   1/1     Running   0          30s

Add a startup probe to the pods so that the cluster knows when the pods are ready.
1. Modify the ~/DO180/labs/reliability-probes/long-load-deploy.yaml YAML file by defining a startup probe. The probe runs every three seconds and triggers a pod as failed after 30 failed attempts. The file should match the following excerpt:
```
...output omitted...
spec:
  ...output omitted...
  template:
    ...output omitted...
    spec:
      containers:
      - image: registry.ocp4.example.com:8443/redhattraining/long-load:v1
        imagePullPolicy: Always
        name: long-load
        startupProbe:
          failureThreshold: 30
          periodSeconds: 3
          httpGet:
            path: /health
            port: 3000
        env:
...output omitted...
```
2. Scale down the deployment to zero replicas.
```
[student@workstation ~]$ oc scale deploy/long-load --replicas 0
deployment.apps/long-load scaled
```
  Note
  Red Hat recommends scaling down an application to zero replicas before deleting or changing a deployment. Scaling down to zero replicas stops any new pods from being created while enabling exiting pods to terminate gracefully after finishing all current requests. For multiple deployment updates, scaling down to zero prevents a noisy event log from Kubernetes responding to every change, and conserves cluster resources.
3. Apply the updated long-load-deploy.yaml file. Because the YAML file specifies the number of replicas, the deployment is scaled up. Move to the next step within one minute.
```
[student@workstation ~] oc apply -f \
  ~/DO180/labs/reliability-probes/long-load-deploy.yaml
deployment.apps/long-load configured
service/long-load unchanged
route.route.openshift.io/long-load configured
```
4. Observe that the pods do not show as ready until the application is ready and the startup probe succeeds. Wait for the three pods to reach the ready state. Press Ctrl+c to stop the watch command.
```
[student@workstation ~]$ watch oc get pods
Every 2.0s: oc get pods

NAME                         READY   STATUS    RESTARTS   AGE
long-load-785b5b4fc8-7x5ln   1/1     Running   0          90s
long-load-785b5b4fc8-f7pdk   1/1     Running   0          90s
long-load-785b5b4fc8-r2nqj   1/1     Running   0          90s
```

Add a liveness probe so that broken instances of the application are restarted.

Start the load test script.

[student@workstation ~]$ ~/DO180/labs/reliability-probes/load-test.sh
Ok
Ok
Ok
...output omitted...

Keep the script running in a visible window.

In a new terminal window, use the /togglesick endpoint to make one of the pods unhealthy. Move to the next step within one minute.
```
[student@workstation ~]$ oc exec \
  deploy/long-load -- curl -s localhost:3000/togglesick
no output expected
```
The load test window begins to show app is unhealthy. Because only one pod is unhealthy, the remaining pods still respond with Ok.

Update the ~/DO180/labs/reliability-probes/long-load-deploy.yaml file to add a liveness probe. The probe runs every three seconds and triggers the pod as failed after three failed attempts. Modify the spec.template.spec.containers object in the file to match the following excerpt.

spec:
  ...output omitted...
  template:
    ...output omitted...
    spec:
      containers:
      - image: registry.ocp4.example.com:8443/redhattraining/long-load:v1
        ...output omitted...
        startupProbe:
          failureThreshold: 30
          periodSeconds: 3
          httpGet:
            path: /health
            port: 3000
        livenessProbe:
          failureThreshold: 3
          periodSeconds: 3
          httpGet:
            path: /health
            port: 3000
        env:
...output omitted...

Scale down the deployment to zero replicas.
```
[student@workstation ~]$ oc scale deploy/long-load --replicas 0
deployment.apps/long-load scaled
```
The load test script shows that the application is not available.

Apply the updated long-load-deploy.yaml file to update the deployment, which triggers the deployment to re-create its pods.

[student@workstation ~]$  oc apply -f \
  ~/DO180/labs/reliability-probes/long-load-deploy.yaml
deployment.apps/long-load configured
service/long-load unchanged
route.route.openshift.io/long-load configured

Wait for the three new pods to reach the ready state. Press Ctrl+c to stop the watch command.

[student@workstation ~]$ watch oc get pods
Every 2.0s: oc get pods

NAME                         READY   STATUS    RESTARTS   AGE
long-load-785b5b4fc8-8x5ln   1/1     Running   0          70s
long-load-785b5b4fc8-f8pdk   1/1     Running   0          70s
long-load-785b5b4fc8-r9nqj   1/1     Running   0          70s

Wait for the load test window to show Ok for all responses, and then toggle one of the pods to be unhealthy.
```
[student@workstation ~]$ oc exec \
  deploy/long-load -- curl -s localhost:3000/togglesick
no output expected
```
The load test window might show app is unhealthy a number of times before the pod is restarted.

Observe that the unhealthy pod is restarted after the liveness probe fails. After the pod is restarted, the load test window shows only Ok.

[student@workstation ~]$ oc get pods
NAME                        READY   STATUS    RESTARTS       AGE
long-load-fbb7468d9-8xm8j   1/1     Running   0              9m42s
long-load-fbb7468d9-k66dm   1/1     Running   0              8m38s
long-load-fbb7468d9-ncxkh   0/1     Running   1 (11s ago)    10m

Add a readiness probe so that traffic goes only to pods that are ready and healthy.
1. Scale down the deployment to zero replicas.
```
[student@workstation ~]$ oc scale deploy/long-load --replicas 0
deployment.apps/long-load scaled
```
2. Use the oc set probe command to add the readiness probe.
```
[student@workstation ~]$ oc set probe deploy/long-load \
  --readiness --failure-threshold 1 --period-seconds 3 \
  --get-url http://:3000/health
deployment.apps/long-load probes updated
```
3. Scale up the deployment to three replicas.
```
[student@workstation ~]$ oc scale deploy/long-load --replicas 3
deployment.apps/long-load scaled
```
4. Observe the status of the pods by using a watch command.
```
[student@workstation ~]$ watch oc get pods
NAME                        READY   STATUS    RESTARTS   AGE
long-load-d5794d744-8hqlh   0/1     Running   0          48s
long-load-d5794d744-hphgb   0/1     Running   0          48s
long-load-d5794d744-lgkns   0/1     Running   0          48s
```
  The command does not immediately finish, but continues to show updates to the pods' status. Leave this command running in a visible window.
5. Wait for the pods to show as ready. Then, in a new terminal window, make one of the pods unhealthy for five seconds by using the /hiccup endpoint.
```
[student@workstation ~]$ oc exec \
  deploy/long-load -- curl -s localhost:3000/hiccup?time=5
no output expected
```
  The pod status window shows that one of the pods is no longer ready. After five seconds, the pod is healthy again and shows as ready.
  The load test window might show app is unhealthy one time before the pod is set as not ready. After the cluster determines that the pod is no longer ready, it stops sending traffic to the pod until either the pod is fixed or the liveness probe fails. Because the pod is sick only for five seconds, it is enough time for the readiness probe to fail, but not the liveness probe.
  Note
  Optionally, repeat this step and observe as the temporarily sick pod's status changes.
6. Stop the load test and status commands by pressing Ctrl+c in their respective windows.

Finish

On the workstation machine, use the lab command to complete this exercise. This step is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish reliability-probes

Discuss Red Hat OpenShift Administration I: Operating a Production Cluster

Go to community

Version 4.12.2 versus 4.12 of DO180 course?

DRobitaille

21 lis 2023

I just noticed that there is now a version 4.12.2 of the DO180. Currently without any new updated videos. Do we have access to some sort of changelog to judge how different the new version is compared to 4.12. I'm currently in the process of reviewing the content of DO180/DO280 in preparation for my upcoming EX280 exam (based on "4.12"), so I'm wondering if it's worth studying the 4.12.2 version instead of 4.12.

470

Red Hat OpenShift Administration I: Containers & Kubernetes (DO180)

Haley_Ruccio

20 lip 2023

Deploy, manage, and troubleshoot containerized applications running as Kubernetes workloads in OpenShift clusters.Course DescriptionRed Hat OpenShift Administration I: Managing Containers and Kubernetes (DO180) prepares OpenShift cluster administrators to manage Kubernetes workloads and to collaborate with developers, DevOps engineers, system administrators, and SREs to ensure the availability of application workloads. This course focuses on managing typical end-user applications that are often accessible from a web or mobile UI and that represent most cloud-native and containerized workloads. Managing applications also includes deploying and updating their dependencies, such as databases, messaging, and authentication systems.The skills that you learn in this course apply to all versions of OpenShift, including Red Hat OpenShift on AWS (ROSA), Azure Red Hat OpenShift, and OpenShift Container Platform.This course is based on Red Hat OpenShift 4.12.Course Content SummaryManaging OpenShift clusters from the command-line interface and from the web console.Troubleshooting network connectivity between applications inside and outside an OpenShift cluster.Connecting Kubernetes workloads to storage for application data.Configuring Kubernetes workloads for high availability and reliability.Managing updates to container images, settings, and Kubernetes manifests of an application.Target AudienceSystem administrators and platform operators who are interested in managing OpenShift clusters and containerized applications.Site Reliability Engineers who are interested in maintaining and troubleshooting containerized applications on Kubernetes.System and software architects who are interested in learning and using the features and functions of an OpenShift cluster.Developers and Site Reliability Engineers that are new to container technology should enroll in Red Hat OpenShift Development I: Introduction to Containers with Podman (DO188).Recommended trainingTake our free assessment to gauge whether this offering is the best fit for your skills.Prerequisite: Containers, Kubernetes and Red Hat OpenShift Technical Overview or equivalent knowledge of Linux containers.Technology considerationsThis course requires internet access to access the cloud-based classroom environment that provides an OpenShift cluster and a remote administrator’s workstation.

Welcome to the Red Hat OpenShift Administration (DO180) group in the Red Hat Learning Community!

Deanna

18 lip 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat OpenShift Administration I - Containers & Kubernetes! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home screen.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to DO180.Read more about Red Hat OpenShift Administration I here.

114

Revision: do180-4.14-b6cd706

Red Hat OpenShift Administration I: Operating a Production Cluster

Guided Exercise: Application Health Probes

Note

Note