Bookmark this page

Alerts and Notifications

Objectives

  • View alerting rules.

  • Configure and silence alerts.

OpenShift Alerts

In OpenShift Container Platform, you can use the Alerts UI to manage alerts, silences, and alerting rules.

Alerting rules

Alerting rules contain a set of conditions that outline a particular state within a cluster. Alerts are triggered when those conditions are true. An alerting rule can be assigned a severity that defines how the alerts are routed.

Alerts

An alert is fired when the conditions that are defined in an alerting rule are true. Alerts notify that a set of conditions apply in an OpenShift Container Platform cluster.

Alert receivers

You can configure the alerting system to route alerts to a receiver and send notifications via email, send pager notifications, or forward them to another system by using a webhook.

Silences

A silence can be applied to an alert to prevent sending notifications when the conditions for an alert are true. You can mute an alert after the initial notification, while you work on resolving the underlying issue.

Viewing Alerting Rules

Click ObserveAlerting and then click Alerting rules to list the alerting rules in the OpenShift cluster. You can apply custom filters to search for a specific alert rule. The alert state column indicates whether the alert is currently firing or is silenced.

Click the alerting rule name to view its details such as its severity, description, and log message. You can also view the PromQL expression, which checks for the alert.

Scroll down to view the graph that displays the time when the alert was detected.

Viewing Firing Alerts

Click ObserveAlerting to view the alerts that are currently firing in the cluster. You can apply custom filters to find a specific alert.

Click the alert name to view its properties. The graph displays the time when the alert was detected.

Scroll down to view the alert details and labels. You can create a receiver to match a label and forward the alert.

Silence Alerts

You can silence an alert for a period of time to stop sending new notifications to a receiver. For example, you can silence an alert while you are troubleshooting a problem, to stop sending new alert emails, or silence certain alerts that are expected to be fired during a scheduled maintenance window. Click ObserveAlerting to display a list of the alerts that are currently firing. Then click the three dots icon at the right of the alert, and click Silence alert.

The alert silence is configured to start immediately after it is created. You can clear the Start immediately field to set a specific start time.

You can select a predefined duration for the alert silence. Select - in the For field to set when the silence begins and finishes.

A comment is required when you create an alert silence. This comment is saved in the alert silence details.

Viewing Alert Silences

Click ObserveAlerting and then click Silences to view the alert silences in the cluster.

You can see the silence name, start time, end time, matching labels, state, and associated comment.

You can also view the alerts that match the silence.

Viewing Silenced Alerts

Click ObserveAlerting to return to the alerting main section. The PersistentVolumeUsageNearFull alert is not displayed in the list because it is currently silenced. Clear the Alert State filter by clicking x to display all the alerts.

Observe that the PersistentVolumeUsageNearFull alert is listed and marked as silenced.

Sending Alert Notifications

You can view the firing alerts in the Alerting UI. Alerts are not configured by default to be sent to any notification systems. You can configure OpenShift Container Platform to send alerts to the following receiver types:

  • Email

  • PagerDuty

  • Slack

  • Webhook

By routing alerts to receivers, you can send timely notifications to the appropriate teams when failures occur. For example, critical alerts require immediate attention and are typically paged to an individual or to a critical response team. Alerts that provide non-critical warning notifications might instead be routed to a ticketing system for non-immediate review.

The alerting system uses Alertmanager, which is a component from the Prometheus monitoring stack. The Alertmanager configuration is saved in the alertmanager-main secret in the openshift-monitoring namespace.

Configuring Alertmanager with the Web Console

Click AdministrationCluster Settings, and then click Configuration and Alertmanager to open the Alertmanager configuration page. You can click the vertical ellipsis icon at the right to create an alert receiver or to edit the Alertmanager configuration file.

The Alertmanager configuration page lists the current alert routing parameters.

Scroll down to view the configured alert receivers. Click Create receiver to add a receiver. You can click the three dots icon at the right of each receiver to edit its settings or to delete it.

You can view and edit the current Alertmanager configuration by clicking YAML.

Configuring the Alerting Email Receiver

Click Create Receiver to create a receiver to send the alerts by email. Complete the values according to the following table, and then scroll down and click Create.

Field Value
Receiver name email
Receiver type Email
To address ocp-admins@example.com
SMTP configuration
Save as default SMTP configuration Checked
From address alerts@ocp4.example.com
SMTP smarthost 192.168.50.254:25
SMTP hello localhost
Auth username smtp_training
Auth password Red_H4T@!
Auth identity (empty)
Auth secret (empty)
Require TLS Unset
Routing labels
Name alertname
Value PersistentVolumeUsageNearFull
Regular expression Unset

The Alertmanager configuration changes are applied and the alerting system reloads in a few minutes.

Configuring Alertmanager with the Command Line

The Alertmanager configuration is saved in the alertmanager-main secret in the openshift-monitoring namespace.

[user@host ~]$ oc get secret/alertmanager-main -n openshift-monitoring
NAME                TYPE     DATA   AGE
alertmanager-main   Opaque   1      7d

You can extract the alertmanager-main secret to view the alertmanager.yaml configuration file.

[user@host ~]$ oc extract secret/alertmanager-main -n openshift-monitoring \
  --to ./ --confirm
alertmanager.yaml

The default alertmanager.yaml configuration file contains many unnecessary quotation marks. Remove the quotation marks by using the sed command to improve readability.

[user@host ~]$ sed -f script.sed alertmanager.yaml

Important

Although removing the extraneous quotation mark characters is not required, it improves readability. The quotation mark characters are not required in a YAML file, except to represent null as a string.

The previous sed command uses this script file to remove all the quotation marks and converts null to a string.

#!/usr/bin/sed -f
s/"//g  1
s/\<\(null\)\>/'\1'/g  2

1

Remove all the quotation marks from the file.

2

Enclose null between quotation marks.

The monitoring stack can send alerts by email through an SMTP server. The following example sends the PersistentVolumeUsageNearFull alerts to the ocp-admins@example.com email address.

global:
  resolve_timeout: 5m
  smtp_from: alerts@ocp4.example.com  1
  smtp_smarthost: '192.168.50.254:25'  2
  smtp_hello: localhost  3
  smtp_auth_username: smtp_training  4
  smtp_auth_password: Red_H4T@!  5
  smtp_require_tls: false  6
...output omitted...
inhibit_rules:
  ...output omitted...
receivers:
  ...output omitted...
  - name: email  7
    email_configs:  8
      - to: ocp-admins@example.com  9
route:
  group_by:
    - namespace
  group_interval: 2m  10
  group_wait: 30s
  receiver: Default
  repeat_interval: 1m  11
  routes:
    ...output omitted...
    - receiver: email  12
      match:
        alertname: PersistentVolumeUsageNearFull  13

1

The global SMTP host. If you do not define smarthost in the email_configs field for a receiver, then this field is the default host in use.

2

The global email sender address. If you do not define from in the email_configs field for a receiver, then this field is the default address in use.

3

The hello parameter for the SMTP connection.

4

The global SMTP username for optional authentication. If you do not define auth_username in the email_configs field for a receiver, then this field is the default username in use.

5

The global SMTP password for optional authentication. This password is used if auth_password is not defined in the email_configs field for a receiver. If you do not define auth_password in the email_configs field for a receiver, then this field is the default password in use.

6

A global setting to specify whether TLS is required for SMTP. You can override this setting by using require_tls in the email_configs field for a receiver.

7

An arbitrary name for the receiver. A route specifies this receiver name for a match.

8

This setting indicates that the receiver sends alerts by email.

9

The to setting must be specified in the email_configs field, and does not have an equivalent global SMTP setting.

10 11

Configure the group_interval and repeat_interval fields so the alert email notifications are sent more frequently.

12

The receiver to use if the match evaluates as true for the alert.

13

The expression to match a specific alert name.

You can update the Alertmanager configuration by setting the data of the alertmanager-main secret in the openshift-monitoring namespace with the content of the alertmanager.yaml file.

[user@host ~]$ oc set data secret/alertmanager-main -n openshift-monitoring \
  --from-file alertmanager.yaml
secret/alertmanager-main data updated

You can view the progression in the Alertmanager stateful set logs. A successful update generates the log message: Completed loading of configuration file.

[user@host ~]$ oc logs -f statefulset.apps/alertmanager-main -c alertmanager \
  -n openshift-monitoring
Found 2 pods, using pod/alertmanager-main-0
...output omitted...
ts=2024-01-31T01:02:03.064Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2024-01-31T01:02:03.128Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml

An incorrect configuration generates a failed log message. If you see configuration errors in the logs, then modify the alertmanager.yaml file and reapply your changes to the alertmanager-main secret in the openshift-monitoring namespace.

...output omitted...
ts=2024-01-31T02:03:04.256Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2024-01-31T02:03:04.512Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="no global SMTP smarthost set"

References

For more information about managing alerts in OpenShift, refer to the Managing Alerts chapter in the Red Hat OpenShift Container Platform 4.14 Monitoring documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/monitoring/index#managing-alerts

Querying Prometheus

AlertManager Configuration

Revision: do380-4.14-397a507