You can access and manage the monitoring features by using the web console.
The OpenShift Container Platform web console provides the Observe section, with the following subsections:
Cluster Monitoring Alerting
You access cluster alerts from the OpenShift web console at → .
For each alert, the Alerting page displays a brief description, the state, and the severity.
You can view alert details by clicking the name of the alert.
The Alert details page also displays a time-series graphic.
For more details about forwarding alerts to other systems, see the next section.
Cluster Monitoring Metrics
OpenShift integrates Prometheus metrics at → .
From the Metrics page, enter an expression, such as a metric name, and then click Run Queries to retrieve the most recent sample for the metric.
The following example displays the instance:node_cpu_utilisation:rate1m metric over time.
The metric contains data for each node instance in the cluster.
The OpenShift has three monitoring stack components to gather the metrics from the Kubernetes API: the kube-state-metrics, openshift-state-metrics, and node-exporter agents.
The dashboards in OpenShift cluster monitoring combine metrics from the three agents.
See the References section to learn about the complete list of exposed metrics.
Prometheus provides a query language, PromQL, to select and aggregate time-series data.
You can filter a metric to include only certain key/value pairs.
For example, you can modify the previous query to show only metrics for the worker02 node by using the following expression:
instance:node_cpu_utilisation:rate1m{instance="worker02"}Prometheus Query Language provides several operators to compute new time-series metrics.
PromQL contains arithmetic operators, including addition, subtraction, multiplication, and division operators.
PromQL contains comparison operators, including equality, greater-than, and less-than operators.
PromQL contains built-in functions, including the following ones, that you can include in PromQL expressions:
-
sum()
Adds the value of all sample entries at a given time.
-
rate()
Computes the per-second average of a time series for a given time range.
-
count()
Counts the number of sample entries at a given time.
-
max()
Selects the maximum value out of the sample entries.
The following examples of Prometheus Query Language expressions use one metric from the node-exporter agent, and another metric from the kube-state-metrics agent:
-
node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes*100<50
Shows nodes with less than 50% of available memory.
-
kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
Shows persistent volume claims in the pending state.
The Red Hat add-on operators can define extra metrics and alerts.
For example, the compliance operator exposes additional metrics to Prometheus.
You can get a list of exposed metrics by using the following query:
{name=~"compliance.*"}Cluster Monitoring Dashboards
The OpenShift console integrates dashboards based on the gathered metrics at → .
These dashboards refresh periodically to display current summary metrics and graphs.
In the graphs, which are interactive, you can further explore data features and characteristics that you observe.
The cluster monitoring dashboards serve as a good starting point for near real-time observability of cluster metrics and health.
After receiving an alert, an administrator might use the dashboards to investigate the problem.
This investigation might include determining whether a specific node or project has a problem.
Additionally, cluster monitoring dashboards can help identify whether a problem was temporary or appears to be persistent.
OpenShift cluster monitoring includes several default dashboards.
Some of the default monitoring default dashboards are as follows:
- Kubernetes / Compute Resources / Cluster
This dashboard displays a high-level view of cluster resources.
The Kubernetes / Compute Resources / Cluster dashboard page shows percentage values for CPU such as CPU Utilisation, CPU Requests Commitment, and CPU Limits Commitment.
Similar values are also available for memory.
You can see metrics by clicking Inspect for each parameter.
Clicking Inspect shows the Metrics page where you can see metrics and a related graph.
For example, clicking Inspect for CPU Utilisation shows a graph and values for the following metrics:
cluster:node_cpu:ratio_rate5m{cluster=""}The Kubernetes / Compute Resources / Cluster dashboard page also shows graphs for CPU, memory, and network, such as CPU Usage, CPU Quota, Memory Usage, and Memory Quota.
These graphs are common to some dashboard pages.
The only difference is data filtration.
For example, the Kubernetes / Compute Resources / Namespace (Workloads) dashboard filters resource usage, first by namespace and then by workload type, such as by deployment, daemon set, and stateful set.
- USE Method / Cluster
USE stands for Utilisation Saturation and Errors.
This dashboard displays several graphics to identify whether the cluster is overutilised, oversaturated, or experiencing many errors.
Because the dashboard displays all nodes in the cluster, you might be able to identify a node that is not behaving in the same way as the other nodes in the cluster.
The following graphic indicates that the worker03 node is experiencing higher memory saturation than other nodes in the cluster.