After completing this section, you should be able to protect OSD and cluster hardware resources from over-utilization by controlling scrubbing, deep scrubbing, backfill, and recovery processes to balance CPU, RAM, and I/O requirements.
Good client performance requires utilizing your OSDs within their physical limits. To maintain OSD performance, evaluate these tuning opportunities:
Tune the BlueStore back end used by OSDs to store objects on physical devices.
Adjust the schedule for automatic data scrubbing and deep scrubbing.
Adjust the schedule of asynchronous snapshot trimming (deleting removed snapshots).
Control how quickly backfill and recovery operations occur when OSDs fail or are added or replaced.
The default back-end object store for OSD daemons is BlueStore. The following list describes some of the main features of using BlueStore:
BlueStore consumes raw block devices or partitions. This simplifies the management of storage devices because no other abstraction layers, such as local file systems, are required.
The Ceph Block Device and Ceph File System snapshots rely on a copy-on-write clone mechanism that is implemented efficiently in BlueStore. This results in efficient I/O for regular snapshots and for erasure-coded pools that rely on cloning to implement efficient two-phase commits.
BlueStore first writes any new data to unallocated space on a block device, and then commits a RocksDB transaction that updates the object metadata to reference the new region of the disk.
BlueStore can use multiple block devices for storing the data, metadata, and write-ahead log.
In BlueStore, the raw partition is managed in chunks of the size specified by the bluestore_min_alloc_size variable.
The bluestore_min_alloc_size is set by default to 4,096, which is equivalent to 4 KB, for HDDs and SSDs.
If the data to write in the raw partition is smaller than the chunk size, then it is filled with zeroes.
This can lead to a waste of the unused space if the chunk size is not properly sized for your workload, such as for writing many small objects.
Red Hat recommends setting the bluestore_min_alloc_size variable to match the smallest common write to avoid wasting unused space.
For example, if your client writes 4 KB objects frequently, then configure the settings on OSD nodes such as bluestore_min_alloc_size = 4096.
Setting the bluestore_min_alloc_size variable overrides specific settings for HDD or SSD if previously set with the bluestore_min_alloc_size_ssd or bluestore_min_alloc_size_hdd variables.
Red Hat does not recommend changing the bluestore_min_alloc_size value in your production environment before first contacting Red Hat Support.
Set the value for the bluestore_min_alloc_size variable by using the ceph config command:
[root@node ~]# ceph config set osd.ID bluestore_min_alloc_size_device-type valueAn OSD's free space becomes fragmented over time. Fragmentation is normal, but excess fragmentation degrades OSD performance. When using BlueStore, review fragmentation levels using the BlueStore fragmentation tool. The BlueStore fragmentation tool generates a fragmentation level score for the BlueStore OSD. The fragmentation score is between 0 and 1, with 0 indicating no fragmentation, and 1 indicating severe fragmentation.
For reference, a value between 0 and 0.7 is considered small and acceptable fragmentation, a score between 0.7 and 0.9 is considerable but still safe fragmentation, and scores higher than 0.9 indicates severe fragmentation that is causing performance issues.
View the fragmentation score using the BlueStore fragmentation tool:
[root@node ~]# ceph daemon osd.ID bluestore allocator score blockOSDs are responsible for validating data coherence, using light scrubbing and deep scrubbing. Light scrubbing verifies an object's presence, checksum, and size. Deep scrubbing reads the data and recalculates and verifies the object's checksum.
By default, Red Hat Ceph Storage performs light scrubbing every day and deep scrubbing every week.
However, Ceph can begin the scrubbing operation at any time, which can impact cluster performance.
You can enable or disable cluster level light scrubbing by using the ceph osd set noscrub and ceph osd unset noscrub commands.
Although scrubbing has a performance impact, Red Hat recommends keeping the feature enabled because it maintains data integrity.
Red Hat recommends setting the scrubbing parameters to restrict scrubbing to known periods with the lowest workloads.
The default configuration allows light scrubbing at any time during the day.
Tune the light scrubbing process by adding parameters in the [osd] section of the ceph.conf file .
For example, use the osd_scrub_begin_hour parameter to set the time of day that light scrubbing begins, thereby avoiding light scrubbing during peak workloads.
The light scrubbing feature has the following tuning parameters:
osd_scrub_begin_hour = begin_hour::
The begin_hour parameter specifies the time to start scrubbing.
Valid values are from 0 to 23.
If the value is set to 0 and osd_scrub_end_hour is also 0, then scrubbing is allowed the entire day.
osd_scrub_end_hour = end_hour
The end_hour parameter specifies the time to stop scrubbing.
Valid values are from 0 to 23.
If the value is set to 0 and osd_scrub_begin_hour is also 0, then scrubbing is allowed the entire day.
osd_scrub_load_threshold
Perform a scrub if the system load is below the threshold, defined by the getloadavg() / number online CPUs parameter.
The default value is 0.5.
osd_scrub_min_interval
Perform a scrub no more often than the number of seconds defined in this parameter if the load is below the threshold set in the osd_scrub_load_threshold parameter.
The default value is 1 day.
osd_scrub_interval_randomize_ratio
Add a random delay to the value defined in the osd_scrub_min_interval parameter.
The default value is 0.5.
osd_scrub_max_interval
Do not wait more than this period before performing a scrub, regardless of load. The default value is 7 days.
osd_scrub_priority
Set the priority for scrub operations by using this parameter.
The default value is 5.
This value is relative to the value of the osd_client_op_priority, which has a higher default priority of 63.
You can enable and disable deep scrubbing at the cluster level by using the ceph osd set nodeep-scrub and ceph osd unset nodeep-scrub commands.
You can configure deep scrubbing parameters by adding them to the [osd] section of the ceph configuration file, ceph.conf.
As with the light scrubbing parameters, any changes made to the deep scrub configuration can impact cluster performance.
The following parameters are the most critical to tuning deep scrubbing:
osd_deep_scrub_interval
The interval for deep scrubbing. The default value is 7 days.
osd_scrub_sleep
Introduces a pause between deep scrub disk reads. Increase this value to slow down scrub operations and to have a lower impact on client operations. The default value is 0.
You can use an external scheduler to implement light and deep scrubbing by using the following commands:
The ceph pg dump command displays the last light and deep scrubbing occurrences in the LAST_SCRUB and LAST_DEEP_SCRUB columns.
The ceph pg scrub command schedules a deep scrub on a particular PG.pg-id
The ceph pg deep-scrub command schedules a deep scrub on a particular PG.pg-id
Use the ceph osd pool set command to set these parameters for a specific pool.pool-name parameter value
You can also control light scrubbing and deep scrubbing at the pool level with these pool parameters:
noscrub
If set to true, Ceph does not light scrub the pool.
The default value is false.
nodeep-scrub
If set to true, Ceph does not deep scrub the pool.
The default value is false
scrub_min_interval
Scrub no more often than the number of seconds defined in this parameter.
If set to the default 0, then Ceph uses the osd_scrub_min_interval global configuration parameter.
scrub_max_interval
Do not wait more than the period defined in this parameter before scrubbing the pool.
If set to the default 0, Ceph uses the osd_scrub_max_interval global configuration parameter.
deep_scrub_interval
The interval for deep scrubbing.
If set to the default 0, Ceph uses the osd_deep_scrub_interval global configuration parameter.
Snapshots are available at the pool and RBD levels. When a snapshot is removed, Ceph schedules the removal of the snapshot data as an asynchronous operation known as snapshot trimming.
To reduce the impact of the snapshot trimming process on the cluster, you can configure a pause after the deletion of each snapshot object.
Configure this pause by using the osd_snap_trim_sleep parameter, which is the time in seconds to wait before allowing the next snapshot trimming operation.
The default value for this parameter is 0.
Contact Red Hat Support for further advice on how to set this parameter based on your environment settings.
Control the snapshot trimming process using the osd_snap_trim_priority parameter, which has a default value of 5.
Controlling backfill and recovery operations is necessary to limit the impact of these operations and to preserve cluster performance.
Backfill occurs when a new OSD joins the cluster or when an OSD dies and Ceph reassigns its PGs to other OSDs. When such events occur, Ceph creates object replicas across the available OSDs.
Recovery occurs when a Ceph OSD becomes inaccessible and comes back online, for example due to a short outage. The OSD goes into recovery mode to obtain the latest copy of the data.
Use the following parameters to manage the backfill and recovery operations:
osd_max_backfills
Control the maximum number of concurrent backfill operations per OSD. The default value is 1.
osd_recovery_max_active
Control the maximum number of concurrent recovery operations per OSD. The default value is 3.
osd_recovery_op_priority
Set the recovery priority. The value can range from 1 - 63. The higher the number, the higher the priority. The default value is 3.
For more information, refer to the OSD Configuration Reference chapter of the Configuration Guide for Red Hat Ceph Storage at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/configuration_guide/index#ceph-monitor-and-osd-configuration-options_conf
For more information on scrubbing and backfilling, refer to https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/configuration_guide/index#ceph-object-storage-daemon-configuration
For more information on tuning Red Hat Ceph Storage 5 BlueStore, refer to https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/administration_guide/index#osd-bluestore