DO380 - ch03s03

Bookmark this page

Node Configuration with the Machine Configuration Operator

Objectives

Apply operating system settings to cluster nodes with the machine configuration operator.

The Machine Configuration Operator

OpenShift uses Red Hat Enterprise Linux CoreOS (RHCOS) as the underlying operating system in the hosts. The entire operating system is updated as a single image, instead of on a package-by-package basis. The machine configuration operator (MCO) manages the RHCOS operating system upgrades and configuration changes.

Do not use traditional Red Hat Enterprise Linux management approaches, such as manually editing files or using system commands such as the systemctl command, for RHCOS operating system upgrades or configuration changes. Those changes can conflict with the MCO and the MCO can override them.

The MCO comprises the following pods:

machine-config-operator: These pods form the main operator workload that manages the rest of the MCO components.
machine-config-controller: The machine configuration controller (MCC) manages the synchronization of machine upgrades according to specified configurations through a machine configuration object. The MCC offers options to upgrade individual sets of machines.
machine-config-server: The machine configuration server (MCS) provides instance customizations to machines that join the cluster. The MCS uses Ignition to provide the instance customizations to cluster nodes that join the cluster. The RHCOS operating system downloads and processes Ignition files at boot time.

Note

Installing an OpenShift cluster and using Ignition files to configure the cluster nodes is explained in detail in the DO322: Red Hat OpenShift Installation Lab course.

The MCO also requires the machine-config-daemon systemd server that RHCOS provides. The machine configuration daemon (MCD) implements updates to machine configurations and validates each machine's state in accordance with the requested configuration.

The MCO can manage the following resources:

Files in the /var or /etc directories. The MCO can also manage directories, such as the /opt and /usr/local directories, which can be writeable if symbolically linked to the /var or /etc directories.
systemd services
SSH keys
Kernel arguments

The MCO defines two custom resources (CRs), which are part of the machineconfiguration.openshift.io/v1 API group.

MachineConfig: A machine configuration (MC) CR declares instance customizations by using the Ignition configuration format.
MachineConfigPool: A machine configuration pool (MCP) CR uses labels to match one or more MCs to one or more nodes by means of the machineConfigSelector and nodeSelector parameters, respectively. This resource creates a pool of nodes with the same configuration. The MCO uses the MCP to track status when the MCO applies MCs to the nodes.

OpenShift administrators set custom node configurations by declaring the MachineConfig and MachineConfigPool CRs.

The following diagram shows the relationship between the node, MC, and MCP labels:

Figure 3.4: Relationship between the node, MC, and MCP labels

In the previous diagram, the MCP CR uses the nodeSelector parameter to select all the nodes with the worker role, and applies to them the MCs with the machineconfiguration.openshift.io/role: worker label that is selected by using the machineConfigSelector parameter.

The MCO also manages two custom resources (CRs) for modifying CRI-O container runtime settings and the Kubelet service: the ContainerRuntimeConfig and KubeletConfig CRs, respectively.

Machine Configurations

MCs declare instance configurations by using the Ignition configuration format.

Ignition files encode file contents by using the Base64 encoding scheme. Other items, such as systemd units or SSH keys, do not use the Base64 encoding scheme. In a terminal, use the base64 and base64 -d commands to encode and decode files or standard input.

You can also use Butane to create Ignition files and MCs. Butane simplifies handling Base64 encoding.

Note

For more information about how to use Butane to create MCs, refer to https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/installing/index#installation-special-config-butane_installing-customizing

The following example shows an MC to modify the configuration file for the journald service by using the Base64 encoding scheme:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker 
  name: 60-journald 
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,VGVzdGl...E8gKItAo= 
        filesystem: root
        mode: 0644
        path: /etc/systemd/journald.conf

	The label for the MC. The MC uses the `machineconfiguration.openshift.io/role` MC role label. By default, OpenShift comes with preinstalled MCs for control plane and compute nodes.
	Prefix the name with a two-digit number that specifies when to apply the configuration, relative to MCs that belong to the same MCP. Higher numbers have precedence.
	Use the data URL format to embed escaped file content. Base64 encoding is common for Ignition files.

The following example shows an MC to change the default kernel to a real-time kernel. This configuration does not use the Base64 encoding scheme:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-kernel-realtime
spec:
  kernelType: realtime

You can use the oc explain mc command to read the parameters that you can change by using an MC.

You can list the MCs on your cluster by using the oc get machineconfig or oc get mc commands. To list the MCs for a specific node label, use the --selector argument. The following example lists the MCs for the worker role in the cluster:

[user@host ~]$ oc get machineconfig \
  --selector machineconfiguration.openshift.io/role=worker
NAME                            GENERATEDBYCONTROLLER  IGNITIONVERSION  AGE
00-worker                       52fe...bf97            3.2.0            108d
01-worker-container-runtime     52fe...bf97            3.2.0            108d
01-worker-kubelet               52fe...bf97            3.2.0            108d
99-worker-chrony-conf-override                         3.2.0            108d
99-worker-generated-registries  52fe...bf97            3.2.0            108d
99-worker-ssh                                          3.2.0            108d

The MCO reads MCs alphanumerically by the name, from 00-* to 99-*. OpenShift stores the resulting compilation of MCs in a rendered MC resource. Thus, if two or more MCs apply changes to the same file, then the MCO applies only the changes from the MC with a higher number. For two MCs with the same number, the MCO uses the last one alphabetically. For example, the 99-worker-ssh-new MC has precedence over the 99-worker-ssh-last MC.

Machine Configuration Pools

MCPs use labels to match one or more MCs to one or more nodes. By using multiple MCs, you can split the node configuration and focus every MC on one aspect of server configuration. For example, you can configure one MC for DNS resolution and another MC to synchronize time.

You can also use the same MCs in multiple MCPs. For example, although a time synchronization MC would apply to multiple MCPs, only the MCPs for a specific node pool would include the configuration that a special hardware accelerator card requires.

MCPs specify a machineConfigSelector MC label to select one or more MCs, and a nodeSelector node label to select one or more nodes.

By default, OpenShift includes MCPs for the master and worker roles. However, you can add custom MCPs to the cluster. Red Hat recommends creating custom MCPs as a composition of worker and custom MCs that are applied to the nodes. Assigning the worker role to the custom MCP is critical so that OpenShift applies operating system updates that are labeled as worker to the machines in the pool. Thus, the machineConfigSelector match expression selects both the worker role and the custom label. The nodes that are part of the custom pool use the MCs from the worker role with the additions from the custom label.

Figure 3.5: Custom pool with the worker MCs and an additional MC

In the previous diagram, the custom MCP targets two nodes with the custom label, and the worker MCP targets three nodes with the worker label. The worker pool applies to the worker nodes the 00-worker, 01-worker-kubelet, and 99-worker-ssh MCs. The custom pool uses the MCs from the worker role, and adds the 99-custom-ntp MC to the configuration of the custom nodes.

Important

Nodes with the worker role can be part of only one custom pool, and nodes with the master role cannot be part of a custom pool. In these cases, the MCO does not apply any changes that are specific to the custom pools, and shows an error in the MCC pod logs.

The following MCP specification demonstrates creating a separate pool for custom nodes:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
    name: custom 
spec:
    machineConfigSelector:
        matchExpressions:
            - key: machineconfiguration.openshift.io/role
              operator: In
              values: [worker, custom] 
    nodeSelector:
        matchLabels:
            node-role.kubernetes.io/custom: ""

	The name for the custom MCP.
	The MC selector includes both the `worker` and `custom` MCs.
	The node selector includes the nodes in the cluster with the `custom` role.

Note

You can find more information about labels and selectors in https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

You can list the MCPs in your cluster by using the oc get machineconfigpool or oc get mcp commands.

[user@host ~]$ oc get machineconfigpool
NAME    CONFIG                     UPDATED  UPDATING  DEGRADED
MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
master  rendered-master-716...267  True     False     False
3             3                  3                    3                     108d
worker  rendered-worker-573...3db  True     False     False
3             3                  3                    3                     108d

The previous command shows the following information for the MCPs:

CONFIG: The name for the rendered MC that is applied to the nodes in the MCP. During the first update, this field remains blank.
UPDATED: The True status indicates that the MCO applied the current MC to the nodes in that MCP. The False status indicates that nodes in the MCP are updating.
UPDATING: The True status indicates that the MCO is applying the intended MC. Nodes in the UPDATING state might not be available for scheduling. The False status indicates that all nodes in the MCP are updated.
DEGRADED: A True status indicates that the MCO is blocked from applying the current or intended MC to at least one node in that MCP. A possible reason is a detection of configuration drift. Configuration drift is explained later in this section. Nodes that are degraded might not be available for scheduling. A False status indicates that all nodes in the MCP are ready.
MACHINECOUNT: Indicates the total number of machines in the MCP.
READYMACHINECOUNT: Indicates the total number of machines in the MCP that are ready for scheduling.
UPDATEDMACHINECOUNT: Indicates the total number of machines in the MCP with the current MC.
DEGRADEDMACHINECOUNT: Indicates the total number of machines in that MCP that are degraded or irreconcilable.

Label Nodes

You can add labels to your node by using the oc label command.

The following example adds the custom role to the worker03 node in the cluster.

[user@host ~]$ oc label node/worker03 node-role.kubernetes.io/custom=
node/worker03 labeled

You can verify the node roles in the cluster by using the oc get nodes command:

[user@host ~]$ oc get nodes
NAME      STATUS  ROLES                 AGE   VERSION
master01  Ready   control-plane,master  114d  v1.25.7+eab9cc9
master02  Ready   control-plane,master  114d  v1.25.7+eab9cc9
master03  Ready   control-plane,master  114d  v1.25.7+eab9cc9
worker01  Ready   worker                12d   v1.25.7+eab9cc9
worker02  Ready   worker                12d   v1.25.7+eab9cc9
worker03  Ready   custom,worker         12d   v1.25.7+eab9cc9

Infrastructure Nodes

One important node label is the infra role. Use the infra role for nodes that host only infrastructure components, such as cluster logging, cluster monitoring, or the integrated container image registry. Adding the infra role for nodes is recommended for larger clusters to ensure the performance and stability of OpenShift cluster services, such as the router or OAuth services, or to prevent the impact of heavy infrastructure components, such as metrics and logging, on user workloads.

Infrastructure nodes do not count towards the total number of required OpenShift subscriptions to run the environment. You can create a custom MCP to apply MCs to the infrastructure nodes.

Note

For more information about creating infrastructure nodes, refer to https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/nodes/index#nodes-nodes-creating-infrastructure-nodes

For more information about using infrastructure nodes to separate maintenance and management, and to prevent incurring billing costs against subscription counts, refer to https://access.redhat.com/solutions/5034771

Machine Configuration Operator Updates

The MCD reports the state of the node updates by using node annotations. You can use these annotations to assess the state of the update.

You can list the node annotations by using the oc describe command:

[user@host ~]$ oc describe node worker01
Name:        worker01
Roles:       worker
Labels:      beta.kubernetes.io/arch=amd64
...output omitted...
             node-role.kubernetes.io/worker=
             node.openshift.io/os_id=rhcos
Annotations: machineconfiguration.openshift.io/currentConfig: rendered-worker-370...bfd 
             machineconfiguration.openshift.io/desiredConfig: rendered-worker-370...bfd 
             machineconfiguration.openshift.io/reason:
             machineconfiguration.openshift.io/state: Done 
             volumes.kubernetes.io/controller-managed-attach-detach: true
...output omitted...

	The current rendered MC that is applied to the node.
	The intended rendered MC to be applied to the node.
	The current node state regarding MCO.

When the intended configuration does not match the current configuration, the MCD applies the intended rendered MC, drains all the pods from the node, and reboots the node.

Configuration Drift

A configuration drift is the state where the configuration on a node does not fully match what the currently applied rendered MC specifies.

The MCD checks for configuration drifts when a node boots, or when any of the specified files in the MC are modified outside the MC, or before a new MC is applied.

When the MCD detects a configuration drift, the MCD performs the following tasks:

Logs an error message to the console.
Generates a Kubernetes event.
Stops additional drift detection on the affected node.
Marks both the node and the MCP with the degraded state.

The MCO marks the node in the degraded state until an administrator corrects the node configuration. Although a degraded node is online and operational, you cannot update it.

You can correct configuration drift and return the node to the Ready state with one of the following remediations:

Generate a force file on the degraded node to bypass the configuration drift detection and reapply the current MC. To generate the force file, create a debug pod on the node with the degraded state and create the /run/machine-config-daemon-force file. Then, OpenShift skips the MC validation, restarts the node, and applies the current MC to the node. The force file does not force the node upgrade; it instead skips validation of configurations on the system and attempts an update regardless of the difference. Depending on the issue on your node, skipping the validation process might not help you move past your node error.
Rewrite the file contents or change the file permissions of the files on the node to match the MC configuration. This manual procedure requires you to review the logs and manually fix the conflicting file. This remediation does not require rebooting the node in a degraded state, and thus avoids possible downtime in your applications.

For information about configuration drift, refer to the Status field for the pool with the degraded node:

[user@host ~]$ oc describe mcp worker
...output omitted...
Status:
  Conditions:
    ...output omitted...
    Last Transition Time:  2023-10-02T10:11:37Z
    Message:               Node worker01 is reporting: "content mismatch for file \"/etc/containers/registries.conf\""
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2023-10-02T10:11:37Z
    Message:
    Reason:
    Status:                True
    Type:                  Degraded
...output omitted...

In the previous example, the MCO detects a configuration drift for the /etc/containers/registries.conf file.

You can also review the logs for the MCD that gives you more information about the configuration drift.

[user@host ~]$ oc get pod -n openshift-machine-config-operator \
  --field-selector spec.nodeName=worker01
NAME                          READY   STATUS    RESTARTS   AGE
machine-config-daemon-jsrzm   2/2     Running   2          19d
[user@host ~]$ oc logs machine-config-daemon-jsrzm \
  -n openshift-machine-config-operator
...output omitted...
E1002 10:11:36.163676    2667 daemon.go:589] Preflight config drift check failed: content mismatch for file "/etc/containers/registries.conf"
E1002 10:11:36.163692    2667 writer.go:200] Marking Degraded due to: content mismatch for file "/etc/containers/registries.conf"
W1002 10:11:40.193224    2667 daemon.go:1763] current+desiredConfig is rendered-worker-d4f45006b2d83725d98af944c8296774 but state is Degraded
I1002 10:11:40.510208    2667 rpm-ostree.go:394] Running captured: rpm-ostree kargs
E1002 10:11:40.568388    2667 on_disk_validation.go:207] content mismatch for file "/etc/containers/registries.conf" (-want +got):
  []uint8(
  	"""
- 	unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
+ 	unqualified-search-registries = ["docker.io"]
  	short-name-mode = ""

  	... // 107 identical lines
  	"""
  )
E1002 10:11:40.568446    2667 daemon.go:589] Preflight config drift check failed: content mismatch for file "/etc/containers/registries.conf"
E1002 10:11:40.568467    2667 writer.go:200] Marking Degraded due to: content mismatch for file "/etc/containers/registries.conf"
E1002 10:11:46.987190    2667 daemon.go:1077] mode mismatch for file: "/etc/containers/registries.conf"; expected: -rw-r--r--/420/0644; received: -rwxrwxrwx/511/0777
E1002 10:11:46.987222    2667 writer.go:200] Marking Degraded due to: mode mismatch for file: "/etc/containers/registries.conf"; expected: -rw-r--r--/420/0644; received: -rwxrwxrwx/511/0777
...output omitted...

In the previous example, the MCD marks the worker01 node in a degraded state due to the mismatches of content and permissions for the file.

Configure MCO-related Custom Resources

The MCO provides the ContainerRuntimeConfig CR to modify CRI-O container runtime settings, and the KubeletConfig CR to manage the Kubelet service. You can use these CRs to configure a subset of CRI-O and Kubelet configuration parameters. Always use valid values for the configuration parameter, because invalid values might render cluster nodes unusable.

Although you can modify the CRI-O container runtime settings and the Kubelet service by using the MachineConfig CR, using either of these two special CRs simplifies node deployment and configuration management, provides API checking, and prevents misconfigurations. Moreover, because OpenShift does not support changing all the settings of the Kubelet service and the container runtime, these CRs provide only the configuration changes that OpenShift supports.

Create a Custom Resource for Kubelet Configuration

OpenShift provides a kubelet configuration controller to the MCC. You can use the KubeletConfig CR to edit the kubelet parameters.

The MCO can write the kubelet.conf configuration file and the kubelet.system systemd unit file to Ignition, so Ignition writes these two files to configure the kubelet agent when it starts on a node. If you create an MC to change the kubelet parameters, then the MCD reboots the nodes to write the new configuration. With this approach, OpenShift can restore the default kubelet configuration if you delete the KubeletConfig instance.

Note

For a list of all the parameters that you can modify by using the KubeletConfig CR, you can refer to the KubeletConfiguration API object in Kubernetes that uses the same parameters. Refer to https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration

Create a Custom Resource for Container Runtime Configuration

You can change the settings for the OpenShift CRI-O runtime by using the ContainerRuntimeConfig CR. The MCO can write the crio.conf and storage.conf configuration files on the associated nodes with the updated values.

You can modify the following parameters by using a ContainerRuntimeConfig CR:

Logging level: The logLevel parameter sets the level of verbosity for logging messages. The default level is info. Other options include the fatal, panic, error, warn, debug, and trace options.
Overlay size: The overlaySize parameter sets the maximum size of a container image.
Container runtime: The defaultRuntime parameter sets the container runtime to either runc (the default) or crun.
Important
Support for the crun container runtime is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements and might not be functionally complete. Red Hat does not recommend using Technology Preview features in production.

You can also use the ContainerRuntimeConfig CR to change the limit of PIDs or the maximum logging size. However, Red Hat recommends using the KubeletConfig CR to change these parameters, because they will likely be deprecated in a future version.

References

For more information about the Machine Configuration Operator, refer to the Post-installation Machine Configuration Tasks section in the Red Hat OpenShift Container Platform 4.14 Post-installation Configuration documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html-single/postinstallation_configuration/post-install-machine-configuration-tasks

For more information about how the Machine Configuration Operator works, refer to https://learn.spidernet.pl/en/blog/openshift-container-platform-4-how-does-machine-config-pool-work

Discuss Red Hat OpenShift Administration III: Scaling Deployments in the Enterprise

Go to community

Welcome to Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise!

Syed

12 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to DO378.Read more about Red Hat OpenShift Administration III: Scaling Kubernetes Deployments in the Enterprise here.

Revision: do380-4.14-397a507