CL260 - ch11s03

After completing this section, you should be able to perform common cluster maintenance tasks, such as adding or removing MONs and OSDs, and recovering from various component failures.

Adding or Removing OSD Nodes

The cluster can operate and serve clients in a degraded state during cluster maintenance activities. However, adding or removing OSDs can affect cluster performance. Backfilling operations can generate large data transfers between OSDs, causing cluster performance to degrade.

Evaluate the potential performance impact before performing cluster maintenance activities. The following factors typically affect cluster performance when adding or removing OSD nodes:

Client load
If an OSD node has a pool that is experiencing high client loads, then performance and recovery time could be negatively affected. Because write operations require data replication for resiliency, write-intensive client loads increase cluster recovery time.
Node capacity
The capacity of the node being added or removed affects the cluster recovery time. The node's storage density also affects recovery times. For example, a node with 36 OSDs takes longer to recover than a node with 12 OSDs.
Spare cluster capacity
When removing nodes, verify that you have sufficient spare capacity to avoid reaching the full or near full ratios. When a cluster reaches the full ratio, Ceph suspends write operations to prevent data loss.
CRUSH rules
A Ceph OSD node maps to at least one CRUSH hierarchy, and that hierarchy maps to at least one pool via a CRUSH rule. Each pool using a specific CRUSH hierarchy experiences a performance impact when adding and removing OSDs.
Pool types
Replication pools use more network bandwidth to replicate data copies, while erasure-coded pools use more CPU to calculate data and coding chunks.
The more data copies that exist, the longer it takes for the cluster to recover. For example, an erasure-coded pool with many chunks takes longer to recover than a replicated pool with fewer copies of the same data.
Node hardware
Nodes with higher throughput characteristics, such as 10 Gbps network interfaces and SSDs, recover more quickly than nodes with lower throughput characteristics, such as 1 Gbps network interfaces and SATA drives.

Replacing a Failed OSD

Red Hat Ceph Storage is designed to be self-healing. When a storage device fails, extra data copies on other OSDs backfill automatically to recover the cluster to a healthy state.

When a storage device fails, the OSD status changes to down. Other cluster issues, such as a network error, can also mark an OSD as down. When an OSD is down, first verify if the physical device has failed.

Replacing a failed OSD requires replacing both the physical storage device and the software-defined OSD. When an OSD fails, you can replace the physical storage device and either reuse the same OSD ID or create a new one. Reusing the same OSD ID avoids having to reconfigure the CRUSH map.

If an OSD has failed, use the Dashboard GUI or the following CLI commands to replace the OSD.

To verify that the OSD has failed, perform the following steps.

View the cluster status and verify that an OSD has failed.
```
[ceph: root@node /]# ceph health detail
```

Identify the failed OSD.

[ceph: root@node /]# ceph osd tree | grep -i down

Locate the OSD node where the OSD is running.

[ceph: root@node /]# ceph osd find osd.OSD_ID

Attempt to start the failed OSD.
```
[ceph: root@node /]# ceph orch daemon start OSD_ID
```
If the OSD does not start, then the physical storage device might have failed. Use the journalctl command to view the OSD logs or use the utilities available in your production environment to verify that the physical device has failed.

If you have verified that the physical device needs replacement, perform the following steps.

Temporarily disable scrubbing.

[ceph: root@node /]# ceph osd set noscrub ; ceph osd set nodeep-scrub

Remove the OSD from the cluster.

[ceph: root@node /]# ceph osd out OSD_ID

Watch cluster events and verify that a backfill operation has started.
```
[ceph: root@node /]# ceph -w
```
Verify that the backfill process has moved all PGs off the OSD and it is now safe to remove.
```
[ceph: root@node /]# while ! ceph osd safe-to-destroy osd.OSD_ID ; \
do sleep 10 ; done
```
When the OSD is safe to remove, replace the physical storage device and destroy the OSD. Optionally, remove all data, file systems, and partitions from the device.
```
[ceph: root@node /]# ceph orch device zap HOST_NAME _OSD_ID --force
```
Note
Find the current device ID using the Dashboard GUI, or the ceph-volume lvm list or ceph osd metadata CLI commands.
Replace the OSD using the same ID as the one that failed. Verify that the operation has completed before continuing.
```
[ceph: root@node /]# ceph orch osd rm OSD_ID --replace
[ceph: root@node /]# ceph orch osd rm status
```
Replace the physical device and recreate the OSD. The new OSD uses the same OSD ID as the one that failed.
Note
The device path of the new storage device might be different than the failed device. Use the ceph orch device ls command to find the new device path.
```
[ceph: root@node /]# ceph orch daemon add osd HOST_NAME:_DEVICE_PATH_
```

Start the OSD and verify that the OSD is up.

[ceph: root@node /]# ceph orch daemon start OSD_ID
[ceph: root@node /]# ceph osd tree

Re-enable scrubbing.

[ceph: root@node /]# ceph osd unset noscrub ; ceph osd unset nodeep-scrub

Adding a MON

Add a MON to your cluster by performing the following steps.

Verify the current MON count and placement.

[ceph: root@node /]# ceph orch ls --service_type=mon

Add a new host to the cluster.

[ceph: root@node /]# ceph cephadm get-pub-key > ~/ceph.pub
[ceph: root@node /]# ssh-copy-id -f -i ~/ceph.pub root@HOST_NAME
[ceph: root@node /]# ceph orch host add HOST_NAME

Specify the hosts where the MON nodes should run.
Note
Specify all MON nodes when running this command. If you only specify the new MON node, then the command removes all other MONs, leaving the cluster with only one MON node.
```
[ceph: root@node /]# ceph orch apply mon --placement="NODE1 NODE2 NODE3 NODE4 ..."
```

Removing a MON

Use the ceph orch apply mon command to remove a MON from the cluster. Specify all MONs except the one that you want to remove.

[ceph: root@node /]# ceph orch apply mon --placement="NODE1 NODE2 NODE3 ..."

Placing Hosts Into Maintenance Mode

Use the ceph orch host maintenance command to place hosts in and out of maintenance mode. Maintenance mode stops all Ceph daemons on the host. Use the optional --force option to bypass warnings.

[ceph: root@node /]# ceph orch host maintenance enter HOST_NAME [--force]

When finished with maintenance, exit maintenance mode.

[ceph: root@node /]# ceph orch host maintenance exit HOST_NAME

References

For more information, refer to the Red Hat Ceph Storage 5 Operations Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/operations_guide/index

Discuss Cloud Storage with Red Hat Ceph Storage

Go to community

Red Hat Ceph Storage for OpenStack (CL260)

Haley_Ruccio

3 sie 2023

Build, expand and maintain cloud-scale, clustered storage for your applications with Red Hat Ceph StorageCloud Storage with Red Hat Ceph Storage (CL260) is designed for storage administrators and cloud operators who deploy Red Hat Ceph Storage in a production data center environment or as a component of a Red Hat OpenStack Platform or OpenShift Container Platform infrastructure. Learn how to deploy, manage, and scale a Ceph storage cluster to provide hybrid storage resources, including Amazon S3 and OpenStack Swift-compatible object storage, Ceph-native and iSCSI-based block storage, and shared file storage. This course is based on Red Hat Ceph Storage 5.0.Course summaryDeploy and manage a Red Hat Ceph Storage cluster on commodity servers.Perform common management operations using the web-based management interface.Create, expand, and control access to storage pools provided by the Ceph cluster.Access Red Hat Ceph Storage from clients using object, block, and file-based methods.Analyze and tune Red Hat Ceph Storage performance.Integrate Red Hat OpenStack Platform image, object, block, and file storage with a Red Hat Ceph Storage cluster.Integrate OpenShift Container Platform with a Red Hat Ceph Storage cluster.Target AudienceThis course is intended for storage administrators and cloud operators who want to learn how to deploy and manage Red Hat Ceph Storage on servers in an enterprise data center or within a Red Hat OpenStack Platform or OpenShift Container Platform environment.Developers writing applications that use cloud-based storage will learn the distinctions of various storage types and client access methods.Recommended trainingTake our free assessment to gauge whether this offering is the best fit for your skills.Red Hat Certified System Administrator (RHCSA) certification, or equivalent experience.For candidates that have not earned an RHCSA or equivalent, confirmation of the correct skill set knowledge can be obtained by taking the online skills assessment.Some experience with storage administration is recommended but not required.Technology considerationsThis course does not have any special technical requirements.This course is not intended for BYOD.Internet access is recommended.

Welcome to the Red Hat Ceph Storage for OpenStack (CL260) group in the Red Hat Learning Community!

cschunke

31 lip 2023

We are excited to launch a space dedicated to the Red Hat Training course CL260! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to CL260.Read more about Red Hat Ceph Storage for OpenStack here.

381

Revision: cl260-5.0-29d2128

Cloud Storage with Red Hat Ceph Storage

Performing Cluster Maintenance Operations

Objectives

Adding or Removing OSD Nodes

Replacing a Failed OSD

Note

Note

Adding a MON

Note

Removing a MON

Placing Hosts Into Maintenance Mode

References