Bookmark this page

Performing Cluster Maintenance Operations

Objectives

After completing this section, you should be able to perform common cluster maintenance tasks, such as adding or removing MONs and OSDs, and recovering from various component failures.

Adding or Removing OSD Nodes

The cluster can operate and serve clients in a degraded state during cluster maintenance activities. However, adding or removing OSDs can affect cluster performance. Backfilling operations can generate large data transfers between OSDs, causing cluster performance to degrade.

Evaluate the potential performance impact before performing cluster maintenance activities. The following factors typically affect cluster performance when adding or removing OSD nodes:

  • Client load

    If an OSD node has a pool that is experiencing high client loads, then performance and recovery time could be negatively affected. Because write operations require data replication for resiliency, write-intensive client loads increase cluster recovery time.

  • Node capacity

    The capacity of the node being added or removed affects the cluster recovery time. The node's storage density also affects recovery times. For example, a node with 36 OSDs takes longer to recover than a node with 12 OSDs.

  • Spare cluster capacity

    When removing nodes, verify that you have sufficient spare capacity to avoid reaching the full or near full ratios. When a cluster reaches the full ratio, Ceph suspends write operations to prevent data loss.

  • CRUSH rules

    A Ceph OSD node maps to at least one CRUSH hierarchy, and that hierarchy maps to at least one pool via a CRUSH rule. Each pool using a specific CRUSH hierarchy experiences a performance impact when adding and removing OSDs.

  • Pool types

    Replication pools use more network bandwidth to replicate data copies, while erasure-coded pools use more CPU to calculate data and coding chunks.

    The more data copies that exist, the longer it takes for the cluster to recover. For example, an erasure-coded pool with many chunks takes longer to recover than a replicated pool with fewer copies of the same data.

  • Node hardware

    Nodes with higher throughput characteristics, such as 10 Gbps network interfaces and SSDs, recover more quickly than nodes with lower throughput characteristics, such as 1 Gbps network interfaces and SATA drives.

Replacing a Failed OSD

Red Hat Ceph Storage is designed to be self-healing. When a storage device fails, extra data copies on other OSDs backfill automatically to recover the cluster to a healthy state.

When a storage device fails, the OSD status changes to down. Other cluster issues, such as a network error, can also mark an OSD as down. When an OSD is down, first verify if the physical device has failed.

Replacing a failed OSD requires replacing both the physical storage device and the software-defined OSD. When an OSD fails, you can replace the physical storage device and either reuse the same OSD ID or create a new one. Reusing the same OSD ID avoids having to reconfigure the CRUSH map.

If an OSD has failed, use the Dashboard GUI or the following CLI commands to replace the OSD.

To verify that the OSD has failed, perform the following steps.

  • View the cluster status and verify that an OSD has failed.

    [ceph: root@node /]# ceph health detail
  • Identify the failed OSD.

    [ceph: root@node /]# ceph osd tree | grep -i down
  • Locate the OSD node where the OSD is running.

    [ceph: root@node /]# ceph osd find osd.OSD_ID
  • Attempt to start the failed OSD.

    [ceph: root@node /]# ceph orch daemon start OSD_ID

    If the OSD does not start, then the physical storage device might have failed. Use the journalctl command to view the OSD logs or use the utilities available in your production environment to verify that the physical device has failed.

If you have verified that the physical device needs replacement, perform the following steps.

  • Temporarily disable scrubbing.

    [ceph: root@node /]# ceph osd set noscrub ; ceph osd set nodeep-scrub
  • Remove the OSD from the cluster.

    [ceph: root@node /]# ceph osd out OSD_ID
  • Watch cluster events and verify that a backfill operation has started.

    [ceph: root@node /]# ceph -w
  • Verify that the backfill process has moved all PGs off the OSD and it is now safe to remove.

    [ceph: root@node /]# while ! ceph osd safe-to-destroy osd.OSD_ID ; \
    do sleep 10 ; done
  • When the OSD is safe to remove, replace the physical storage device and destroy the OSD. Optionally, remove all data, file systems, and partitions from the device.

    [ceph: root@node /]# ceph orch device zap HOST_NAME _OSD_ID --force

    Note

    Find the current device ID using the Dashboard GUI, or the ceph-volume lvm list or ceph osd metadata CLI commands.

  • Replace the OSD using the same ID as the one that failed. Verify that the operation has completed before continuing.

    [ceph: root@node /]# ceph orch osd rm OSD_ID --replace
    [ceph: root@node /]# ceph orch osd rm status
  • Replace the physical device and recreate the OSD. The new OSD uses the same OSD ID as the one that failed.

    Note

    The device path of the new storage device might be different than the failed device. Use the ceph orch device ls command to find the new device path.

    [ceph: root@node /]# ceph orch daemon add osd HOST_NAME:_DEVICE_PATH_
  • Start the OSD and verify that the OSD is up.

    [ceph: root@node /]# ceph orch daemon start OSD_ID
    [ceph: root@node /]# ceph osd tree
  • Re-enable scrubbing.

    [ceph: root@node /]# ceph osd unset noscrub ; ceph osd unset nodeep-scrub

Adding a MON

Add a MON to your cluster by performing the following steps.

  • Verify the current MON count and placement.

    [ceph: root@node /]# ceph orch ls --service_type=mon
  • Add a new host to the cluster.

    [ceph: root@node /]# ceph cephadm get-pub-key > ~/ceph.pub
    [ceph: root@node /]# ssh-copy-id -f -i ~/ceph.pub root@HOST_NAME
    [ceph: root@node /]# ceph orch host add HOST_NAME
  • Specify the hosts where the MON nodes should run.

    Note

    Specify all MON nodes when running this command. If you only specify the new MON node, then the command removes all other MONs, leaving the cluster with only one MON node.

    [ceph: root@node /]# ceph orch apply mon --placement="NODE1 NODE2 NODE3 NODE4 ..."

Removing a MON

Use the ceph orch apply mon command to remove a MON from the cluster. Specify all MONs except the one that you want to remove.

[ceph: root@node /]# ceph orch apply mon --placement="NODE1 NODE2 NODE3 ..."

Placing Hosts Into Maintenance Mode

Use the ceph orch host maintenance command to place hosts in and out of maintenance mode. Maintenance mode stops all Ceph daemons on the host. Use the optional --force option to bypass warnings.

[ceph: root@node /]# ceph orch host maintenance enter HOST_NAME [--force]

When finished with maintenance, exit maintenance mode.

[ceph: root@node /]# ceph orch host maintenance exit HOST_NAME

 

References

For more information, refer to the Red Hat Ceph Storage 5 Operations Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/operations_guide/index

Revision: cl260-5.0-29d2128