Bookmark this page

Monitoring Application Health

Objectives

  • Monitor the health of applications deployed on Red Hat OpenShift by using readiness and liveness probes.

Specifying Application Resource Requirements

OpenShift lets you limit your pod resource consumption and also request the minimum resources that your pod requires.

The following deployment definition shows a pod that has both limits and requests for memory and CPU.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      deployment: example-deployment
  template:
    metadata:
      labels:
        deployment: example-deployment
    spec:
      containers:
      - image: quay.io/example/deployment:1.0
        name: example-deployment
        resources:
          requests:
            cpu: 100m 1
            memory: 200Mi 2
          limits:
            memory: 200Mi 3

1

The pod requests 100m CPU units, which is 0.1 CPU cores.

2

The pod requests 200Mi, which is 200 mebibytes.

3

The pod memory consumption is limited to 200Mi.

The sum of limits in a pod can be higher than the node's resources, when this happens the node is overcommitted.

Limits and request play an important role when a node runs low on resources. Depending on the resource type the behavior differs. When a pod consumes more memory than the node has available or than specified in the resource limits, Kubernetes terminates that pod with an Out of Memory (OOM) error. On the other hand, if a pod requests more CPU than available, then the pod continues working but the CPU throttles. If a node does not have enough memory, then it evicts pods.

In low memory situations, OpenShift decides which pods to evict based on their resource definitions. Kubernetes classifies the following Quality of Service (QoS) categories:

CategoryDescriptionOpenShift Eviction Strategy
Best-EffortPods without requests and limits, which have an unpredictable resource consumption.First pods to evict.
BurstablePods with requests, and no limits or limits that exceed the requests.Pods to evict if there are no Best-effort pods left.
GuaranteedPods with equal requests and limits.Last pods to evict.

By using the QoS categories, you can design your workloads for stability or for better resource usage. For example, in a production cluster, where you want to have predictable workload allocation, you might want to have the Guaranteed type and some burstable workloads with increased CPU limits to improve CPU use.

In multitenant environments, OpenShift administrators can define resource quotas per project to improve isolation. Administrators can also enforce a range for limits in a project and configure default resources for workloads that do not specify any.

Red Hat OpenShift Application Health Checks

A health check or probe is a periodic check that monitors the health of an application. Probes allow automating workload management. With probes OpenShift can detect the following situations for a pod:

  • When it starts.

  • When it is ready to receive traffic.

  • Whether to try to restart the pod to overcome some issue.

OpenShift uses this information to route traffic or to control deployment strategies.

Also, developers can use probes to monitor their applications. Applications can become unreliable for various reasons, for example:

  • Temporary connection loss

  • Configuration errors

  • Application errors

Developers can configure probes by using either the oc command-line client, a YAML deployment template, or by using the Red Hat OpenShift web console.

There are currently three types of probes in OpenShift:

Startup Probe

A startup probe verifies whether the application within a container is started. Startup probes run before any other probe. If a startup probe is defined, then the other probes do not start until the startup probe succeeds. If a container fails its startup probe, then OpenShift kills the container and restarts it depending on the pod's restartPolicy.

This type of probe is only executed at startup, unlike readiness probes, which are run periodically.

To configure the startup probe, you must add the spec.containers.startupprobe attribute of the pod configuration.

Readiness Probe

Readiness probes determine whether or not a container is ready to serve requests. If the readiness probe returns a failed state, then OpenShift stops sending traffic to that pod until the probe succeeds.

This can be useful for waiting for an application to perform network connections, loading files and cache, or generally any initial tasks that might take considerable time and only temporarily affect the application.

To configure the readiness probe, you must add the spec.containers.readinessprobe attribute of the pod configuration.

Liveness Probe

Liveness probes determine whether or not an application running in a container is in a healthy state. If the liveness probe detects an unhealthy state, then OpenShift restarts the container.

To configure the liveness probe, you must add the spec.containers.livenessprobe attribute of the pod configuration.

OpenShift provides the following options to configure these probes:

NameMandatoryDescriptionDefault Value
initialDelaySeconds YesDetermines how long to wait after the container starts before beginning the probe.0
timeoutSeconds YesDetermines how long to wait for the probe to finish. If this time is exceeded, then OpenShift assumes that the probe failed.1
periodSeconds NoSpecifies the frequency of the checks.1
successThreshold NoSpecifies the minimum consecutive successes for the probe to be considered successful after it has failed.1
failureThreshold NoSpecifies the minimum consecutive failures for the probe to be considered failed after it has succeeded.3

Methods of Checking Application Health

Startup, readiness and liveness probes can verify that an application is in a healthy state in three ways.

HTTP Checks

An HTTP check is ideal for applications that return HTTP status codes, such as REST APIs.

The HTTP probe uses GET requests to ensure the health of an application. The check is successful if the HTTP response code is in the range 200-399.

The following example demonstrates how to implement a readiness probe with the HTTP check method:

...contents omitted...
readinessProbe:
  httpGet:
    path: /health1
    port: 8080
  initialDelaySeconds: 152
  timeoutSeconds: 13
...contents omitted...

1

The readiness probe endpoint.

2

How long to wait after the container starts before checking its health.

3

How long to wait for the probe to finish.

Container Execution Checks

Container execution checks are ideal in scenarios where you must determine the status of the container based on the exit code of a process or shell script running in the container.

When using container execution checks, OpenShift executes a command inside the container. The check suceeds if the container returns the 0 exit code. The check fails with any other exit code. The following example demonstrates how to implement a container execution check:

...contents omitted...
livenessProbe:
  exec:
    command:1
    - cat
    - /tmp/health
  initialDelaySeconds: 15
  timeoutSeconds: 1
...contents omitted...

1

The command to run and its arguments, as a YAML array.

TCP Socket Checks

A TCP socket check is ideal for applications that open TCP ports, such as database servers, file servers, web servers, and application servers.

When using TCP socket checks, OpenShift attempts to open a socket to the container. The container is considered healthy if the check can establish a successful connection. The following example demonstrates how to implement a liveness probe by using the TCP socket check method:

...contents omitted...
livenessProbe:
  tcpSocket:
    port: 80801
  initialDelaySeconds: 15
  timeoutSeconds: 1
...contents omitted...

1

The TCP port to check.

Manage Probes By Using the Web Console

Developers can create probes by using the OpenShift web console. In the Topology view you can click the three dot menu for a deployment, and then click Add Health Checks.

This opens the Add health checks form where you can add the probes for your deployment.

For example, to add a readiness probe, click Add Readiness probe, which opens a form where you can input the settings for the probe. To terminate editing the probe click the checkmark button at the bottom of the readiness probe section.

At this point you can add more probes before clicking Add.

When you click Add OpenShift deploys the changes. You can follow the new deployment by going to the Details tab of the deployment.

Creating Probes by Using the CLI

You can use the command oc set probe to create probes on existing workloads.

The following examples demonstrate using the oc set probe command with several options:

[user@host ~]$ oc set probe deployment myapp --readiness \
--get-url=http://:8080/readyz --period=20
[user@host ~]$ oc set probe deployment myapp --liveness \
--open-tcp=3306 --period=20 \
--timeout-seconds=1
[user@host ~]$ oc set probe deployment myapp --liveness \
--get-url=http://:8080/livez --initial-delay-seconds=30 \
--success-threshold=1 --failure-threshold=3

Use the oc set probe --help command to view the available options.

References

Kubernetes best practices: Resource requests and limits

What everyone should know about Kubernetes memory limits, OOMKilled pods, and pizza parties

For more information, refer to the Resource Quotas Per Project section in the Building Applications Overview chapter in the Red Hat OpenShift Container Platform 4.12 Building Applications documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.12/html-single/building_applications/index#quotas

For more information, refer to the Resource requests and overcommitment section in the Working with clusters chapter in the Red Hat OpenShift Container Platform 4.12 Nodes documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.12/html-single/nodes/index#nodes-cluster-overcommit-resource-requests_nodes-cluster-overcommit

For more information, refer to the Understanding Health Checks section in the Building Applications Overview chapter in the Red Hat OpenShift Container Platform 4.12 Building Applications documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.12/html-single/building_applications/index#application-health

Revision: do288-4.12-0d49506