OpenShift uses Red Hat Enterprise Linux CoreOS (RHCOS) as the underlying operating system in the hosts. The entire operating system is updated as a single image, instead of on a package-by-package basis. The machine configuration operator (MCO) manages the RHCOS operating system upgrades and configuration changes.
Do not use traditional Red Hat Enterprise Linux management approaches, such as manually editing files or using system commands such as the systemctl command, for RHCOS operating system upgrades or configuration changes.
Those changes can conflict with the MCO and the MCO can override them.
The MCO comprises the following pods:
machine-config-operator: These pods form the main operator workload that manages the rest of the MCO components.
machine-config-controller: The machine configuration controller (MCC) manages the synchronization of machine upgrades according to specified configurations through a machine configuration object.
The MCC offers options to upgrade individual sets of machines.
machine-config-server: The machine configuration server (MCS) provides instance customizations to machines that join the cluster.
The MCS uses Ignition to provide the instance customizations to cluster nodes that join the cluster.
The RHCOS operating system downloads and processes Ignition files at boot time.
Installing an OpenShift cluster and using Ignition files to configure the cluster nodes is explained in detail in the DO322: Red Hat OpenShift Installation Lab course.
The MCO also requires the machine-config-daemon systemd server that RHCOS provides.
The machine configuration daemon (MCD) implements updates to machine configurations and validates each machine's state in accordance with the requested configuration.
The MCO can manage the following resources:
Files in the /var or /etc directories.
The MCO can also manage directories, such as the /opt and /usr/local directories, which can be writeable if symbolically linked to the /var or /etc directories.
systemd services
SSH keys
Kernel arguments
The MCO defines two custom resources (CRs), which are part of the machineconfiguration.openshift.io/v1 API group.
MachineConfig: A machine configuration (MC) CR declares instance customizations by using the Ignition configuration format.
MachineConfigPool: A machine configuration pool (MCP) CR uses labels to match one or more MCs to one or more nodes by means of the machineConfigSelector and nodeSelector parameters, respectively.
This resource creates a pool of nodes with the same configuration.
The MCO uses the MCP to track status when the MCO applies MCs to the nodes.
OpenShift administrators set custom node configurations by declaring the MachineConfig and MachineConfigPool CRs.
The following diagram shows the relationship between the node, MC, and MCP labels:
In the previous diagram, the MCP CR uses the nodeSelector parameter to select all the nodes with the worker role, and applies to them the MCs with the machineconfiguration.openshift.io/role: worker label that is selected by using the machineConfigSelector parameter.
The MCO also manages two custom resources (CRs) for modifying CRI-O container runtime settings and the Kubelet service: the ContainerRuntimeConfig and KubeletConfig CRs, respectively.
MCs declare instance configurations by using the Ignition configuration format.
Ignition files encode file contents by using the Base64 encoding scheme.
Other items, such as systemd units or SSH keys, do not use the Base64 encoding scheme.
In a terminal, use the base64 and base64 -d commands to encode and decode files or standard input.
You can also use Butane to create Ignition files and MCs. Butane simplifies handling Base64 encoding.
For more information about how to use Butane to create MCs, refer to https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/installing/index#installation-special-config-butane_installing-customizing
The following example shows an MC to modify the configuration file for the journald service by using the Base64 encoding scheme:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 60-journald
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,VGVzdGl...E8gKItAo=
filesystem: root
mode: 0644
path: /etc/systemd/journald.confThe label for the MC.
The MC uses the | |
Prefix the name with a two-digit number that specifies when to apply the configuration, relative to MCs that belong to the same MCP. Higher numbers have precedence. | |
Use the data URL format to embed escaped file content. Base64 encoding is common for Ignition files. |
The following example shows an MC to change the default kernel to a real-time kernel. This configuration does not use the Base64 encoding scheme:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-kernel-realtime
spec:
kernelType: realtimeYou can use the oc explain mc command to read the parameters that you can change by using an MC.
You can list the MCs on your cluster by using the oc get machineconfig or oc get mc commands.
To list the MCs for a specific node label, use the --selector argument.
The following example lists the MCs for the worker role in the cluster:
[user@host ~]$ oc get machineconfig \
--selector machineconfiguration.openshift.io/role=worker
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-worker 52fe...bf97 3.2.0 108d
01-worker-container-runtime 52fe...bf97 3.2.0 108d
01-worker-kubelet 52fe...bf97 3.2.0 108d
99-worker-chrony-conf-override 3.2.0 108d
99-worker-generated-registries 52fe...bf97 3.2.0 108d
99-worker-ssh 3.2.0 108dThe MCO reads MCs alphanumerically by the name, from 00-* to 99-*.
OpenShift stores the resulting compilation of MCs in a rendered MC resource.
Thus, if two or more MCs apply changes to the same file, then the MCO applies only the changes from the MC with a higher number.
For two MCs with the same number, the MCO uses the last one alphabetically.
For example, the 99-worker-ssh-new MC has precedence over the 99-worker-ssh-last MC.
MCPs use labels to match one or more MCs to one or more nodes. By using multiple MCs, you can split the node configuration and focus every MC on one aspect of server configuration. For example, you can configure one MC for DNS resolution and another MC to synchronize time.
You can also use the same MCs in multiple MCPs. For example, although a time synchronization MC would apply to multiple MCPs, only the MCPs for a specific node pool would include the configuration that a special hardware accelerator card requires.
MCPs specify a machineConfigSelector MC label to select one or more MCs, and a nodeSelector node label to select one or more nodes.
By default, OpenShift includes MCPs for the master and worker roles.
However, you can add custom MCPs to the cluster.
Red Hat recommends creating custom MCPs as a composition of worker and custom MCs that are applied to the nodes.
Assigning the worker role to the custom MCP is critical so that OpenShift applies operating system updates that are labeled as worker to the machines in the pool.
Thus, the machineConfigSelector match expression selects both the worker role and the custom label.
The nodes that are part of the custom pool use the MCs from the worker role with the additions from the custom label.
In the previous diagram, the custom MCP targets two nodes with the custom label, and the worker MCP targets three nodes with the worker label.
The worker pool applies to the worker nodes the 00-worker, 01-worker-kubelet, and 99-worker-ssh MCs.
The custom pool uses the MCs from the worker role, and adds the 99-custom-ntp MC to the configuration of the custom nodes.
Nodes with the worker role can be part of only one custom pool, and nodes with the master role cannot be part of a custom pool.
In these cases, the MCO does not apply any changes that are specific to the custom pools, and shows an error in the MCC pod logs.
The following MCP specification demonstrates creating a separate pool for custom nodes:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: custom
spec:
machineConfigSelector:
matchExpressions:
- key: machineconfiguration.openshift.io/role
operator: In
values: [worker, custom]
nodeSelector:
matchLabels:
node-role.kubernetes.io/custom: "" 
The name for the custom MCP. | |
The MC selector includes both the | |
The node selector includes the nodes in the cluster with the |
You can find more information about labels and selectors in https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
You can list the MCPs in your cluster by using the oc get machineconfigpool or oc get mcp commands.
[user@host ~]$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED
MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-716...267 True False False
3 3 3 3 108d
worker rendered-worker-573...3db True False False
3 3 3 3 108dThe previous command shows the following information for the MCPs:
CONFIG: The name for the rendered MC that is applied to the nodes in the MCP.
During the first update, this field remains blank.
UPDATED: The True status indicates that the MCO applied the current MC to the nodes in that MCP.
The False status indicates that nodes in the MCP are updating.
UPDATING: The True status indicates that the MCO is applying the intended MC.
Nodes in the UPDATING state might not be available for scheduling.
The False status indicates that all nodes in the MCP are updated.
DEGRADED: A True status indicates that the MCO is blocked from applying the current or intended MC to at least one node in that MCP.
A possible reason is a detection of configuration drift.
Configuration drift is explained later in this section.
Nodes that are degraded might not be available for scheduling.
A False status indicates that all nodes in the MCP are ready.
MACHINECOUNT: Indicates the total number of machines in the MCP.
READYMACHINECOUNT: Indicates the total number of machines in the MCP that are ready for scheduling.
UPDATEDMACHINECOUNT: Indicates the total number of machines in the MCP with the current MC.
DEGRADEDMACHINECOUNT: Indicates the total number of machines in that MCP that are degraded or irreconcilable.
You can add labels to your node by using the oc label command.
The following example adds the custom role to the worker03 node in the cluster.
[user@host ~]$ oc label node/worker03 node-role.kubernetes.io/custom=
node/worker03 labeledYou can verify the node roles in the cluster by using the oc get nodes command:
[user@host ~]$oc get nodesNAME STATUSROLESAGE VERSION master01 Ready control-plane,master 114d v1.25.7+eab9cc9 master02 Ready control-plane,master 114d v1.25.7+eab9cc9 master03 Ready control-plane,master 114d v1.25.7+eab9cc9 worker01 Ready worker 12d v1.25.7+eab9cc9 worker02 Ready worker 12d v1.25.7+eab9cc9 worker03 Readycustom,worker 12d v1.25.7+eab9cc9
One important node label is the infra role.
Use the infra role for nodes that host only infrastructure components, such as cluster logging, cluster monitoring, or the integrated container image registry.
Adding the infra role for nodes is recommended for larger clusters to ensure the performance and stability of OpenShift cluster services, such as the router or OAuth services, or to prevent the impact of heavy infrastructure components, such as metrics and logging, on user workloads.
Infrastructure nodes do not count towards the total number of required OpenShift subscriptions to run the environment. You can create a custom MCP to apply MCs to the infrastructure nodes.
For more information about creating infrastructure nodes, refer to https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/nodes/index#nodes-nodes-creating-infrastructure-nodes
For more information about using infrastructure nodes to separate maintenance and management, and to prevent incurring billing costs against subscription counts, refer to https://access.redhat.com/solutions/5034771
The MCD reports the state of the node updates by using node annotations. You can use these annotations to assess the state of the update.
You can list the node annotations by using the oc describe command:
[user@host ~]$oc describe node worker01Name: worker01 Roles: worker Labels: beta.kubernetes.io/arch=amd64 ...output omitted... node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcosAnnotations:machineconfiguration.openshift.io/currentConfig: rendered-worker-370...bfd![]()
machineconfiguration.openshift.io/desiredConfig: rendered-worker-370...bfdmachineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Donevolumes.kubernetes.io/controller-managed-attach-detach: true ...output omitted...
The current rendered MC that is applied to the node. | |
The intended rendered MC to be applied to the node. | |
The current node state regarding MCO. |
When the intended configuration does not match the current configuration, the MCD applies the intended rendered MC, drains all the pods from the node, and reboots the node.
A configuration drift is the state where the configuration on a node does not fully match what the currently applied rendered MC specifies.
The MCD checks for configuration drifts when a node boots, or when any of the specified files in the MC are modified outside the MC, or before a new MC is applied.
When the MCD detects a configuration drift, the MCD performs the following tasks:
Logs an error message to the console.
Generates a Kubernetes event.
Stops additional drift detection on the affected node.
Marks both the node and the MCP with the degraded state.
The MCO marks the node in the degraded state until an administrator corrects the node configuration. Although a degraded node is online and operational, you cannot update it.
You can correct configuration drift and return the node to the Ready state with one of the following remediations:
Generate a force file on the degraded node to bypass the configuration drift detection and reapply the current MC.
To generate the force file, create a debug pod on the node with the degraded state and create the /run/machine-config-daemon-force file.
Then, OpenShift skips the MC validation, restarts the node, and applies the current MC to the node.
The force file does not force the node upgrade; it instead skips validation of configurations on the system and attempts an update regardless of the difference.
Depending on the issue on your node, skipping the validation process might not help you move past your node error.
Rewrite the file contents or change the file permissions of the files on the node to match the MC configuration. This manual procedure requires you to review the logs and manually fix the conflicting file. This remediation does not require rebooting the node in a degraded state, and thus avoids possible downtime in your applications.
For information about configuration drift, refer to the Status field for the pool with the degraded node:
[user@host ~]$oc describe mcp...output omitted... Status: Conditions: ...output omitted... Last Transition Time: 2023-10-02T10:11:37ZworkerMessage: Node worker01 is reporting: "content mismatch for file \"/etc/containers/registries.conf\""Reason: 1 nodes are reporting degraded status on sync Status: TrueType: NodeDegradedLast Transition Time: 2023-10-02T10:11:37Z Message: Reason: Status: True Type: Degraded ...output omitted...
In the previous example, the MCO detects a configuration drift for the /etc/containers/registries.conf file.
You can also review the logs for the MCD that gives you more information about the configuration drift.
[user@host ~]$oc get pod -n openshift-machine-config-operator \ --field-selector spec.nodeName=NAME READY STATUS RESTARTS AGEworker012/2 Running 2 19d [user@host ~]$machine-config-daemon-jsrzmoc logs...output omitted... E1002 10:11:36.163676 2667 daemon.go:589] Preflight config drift check failed: content mismatch for file "/etc/containers/registries.conf" E1002 10:11:36.163692 2667 writer.go:200] Marking Degraded due to: content mismatch for file "/etc/containers/registries.conf" W1002 10:11:40.193224 2667 daemon.go:1763] current+desiredConfig is rendered-worker-d4f45006b2d83725d98af944c8296774 but state is Degraded I1002 10:11:40.510208 2667 rpm-ostree.go:394] Running captured: rpm-ostree kargs E1002 10:11:40.568388 2667 on_disk_validation.go:207]machine-config-daemon-jsrzm\ -n openshift-machine-config-operatorcontent mismatch for file "/etc/containers/registries.conf" (-want +got): []uint8( """- unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]+ unqualified-search-registries = ["docker.io"]short-name-mode = "" ... // 107 identical lines """ ) E1002 10:11:40.568446 2667 daemon.go:589] Preflight config drift check failed: content mismatch for file "/etc/containers/registries.conf" E1002 10:11:40.568467 2667 writer.go:200] Marking Degraded due to: content mismatch for file "/etc/containers/registries.conf" E1002 10:11:46.987190 2667 daemon.go:1077]mode mismatch for file: "/etc/containers/registries.conf"; expected: -rw-r--r--/420/0644; received: -rwxrwxrwx/511/0777E1002 10:11:46.987222 2667 writer.go:200] Marking Degraded due to: mode mismatch for file: "/etc/containers/registries.conf"; expected: -rw-r--r--/420/0644; received: -rwxrwxrwx/511/0777 ...output omitted...
In the previous example, the MCD marks the worker01 node in a degraded state due to the mismatches of content and permissions for the file.
The MCO provides the ContainerRuntimeConfig CR to modify CRI-O container runtime settings, and the KubeletConfig CR to manage the Kubelet service.
You can use these CRs to configure a subset of CRI-O and Kubelet configuration parameters.
Always use valid values for the configuration parameter, because invalid values might render cluster nodes unusable.
Although you can modify the CRI-O container runtime settings and the Kubelet service by using the MachineConfig CR, using either of these two special CRs simplifies node deployment and configuration management, provides API checking, and prevents misconfigurations.
Moreover, because OpenShift does not support changing all the settings of the Kubelet service and the container runtime, these CRs provide only the configuration changes that OpenShift supports.
OpenShift provides a kubelet configuration controller to the MCC.
You can use the KubeletConfig CR to edit the kubelet parameters.
The MCO can write the kubelet.conf configuration file and the kubelet.system
systemd unit file to Ignition, so Ignition writes these two files to configure the kubelet agent when it starts on a node.
If you create an MC to change the kubelet parameters, then the MCD reboots the nodes to write the new configuration.
With this approach, OpenShift can restore the default kubelet configuration if you delete the KubeletConfig instance.
For a list of all the parameters that you can modify by using the KubeletConfig CR, you can refer to the KubeletConfiguration API object in Kubernetes that uses the same parameters.
Refer to https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration
You can change the settings for the OpenShift CRI-O runtime by using the ContainerRuntimeConfig CR.
The MCO can write the crio.conf and storage.conf configuration files on the associated nodes with the updated values.
You can modify the following parameters by using a ContainerRuntimeConfig CR:
Logging level: The logLevel parameter sets the level of verbosity for logging messages.
The default level is info.
Other options include the fatal, panic, error, warn, debug, and trace options.
Overlay size: The overlaySize parameter sets the maximum size of a container image.
Container runtime: The defaultRuntime parameter sets the container runtime to either runc (the default) or crun.
Support for the crun container runtime is a Technology Preview feature only.
Technology Preview features are not supported with Red Hat production service level agreements and might not be functionally complete.
Red Hat does not recommend using Technology Preview features in production.
You can also use the ContainerRuntimeConfig CR to change the limit of PIDs or the maximum logging size.
However, Red Hat recommends using the KubeletConfig CR to change these parameters, because they will likely be deprecated in a future version.
For more information about the Machine Configuration Operator, refer to the Post-installation Machine Configuration Tasks section in the Red Hat OpenShift Container Platform 4.14 Post-installation Configuration documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/postinstallation_configuration/post-install-machine-configuration-tasks
For more information about how the Machine Configuration Operator works, refer to https://learn.spidernet.pl/en/blog/openshift-container-platform-4-how-does-machine-config-pool-work