CL260 - ch12s03

Bookmark this page

Tuning Object Storage Cluster Performance

Objectives

After completing this section, you should be able to protect OSD and cluster hardware resources from over-utilization by controlling scrubbing, deep scrubbing, backfill, and recovery processes to balance CPU, RAM, and I/O requirements.

Maintaining OSD Performance

Good client performance requires utilizing your OSDs within their physical limits. To maintain OSD performance, evaluate these tuning opportunities:

Tune the BlueStore back end used by OSDs to store objects on physical devices.
Adjust the schedule for automatic data scrubbing and deep scrubbing.
Adjust the schedule of asynchronous snapshot trimming (deleting removed snapshots).
Control how quickly backfill and recovery operations occur when OSDs fail or are added or replaced.

Storing Data on Ceph BlueStore

The default back-end object store for OSD daemons is BlueStore. The following list describes some of the main features of using BlueStore:

Direct management of storage devices: BlueStore consumes raw block devices or partitions. This simplifies the management of storage devices because no other abstraction layers, such as local file systems, are required.
Efficient copy-on-write: The Ceph Block Device and Ceph File System snapshots rely on a copy-on-write clone mechanism that is implemented efficiently in BlueStore. This results in efficient I/O for regular snapshots and for erasure-coded pools that rely on cloning to implement efficient two-phase commits.
No large double writes: BlueStore first writes any new data to unallocated space on a block device, and then commits a RocksDB transaction that updates the object metadata to reference the new region of the disk.
Multidevice support: BlueStore can use multiple block devices for storing the data, metadata, and write-ahead log.

In BlueStore, the raw partition is managed in chunks of the size specified by the bluestore_min_alloc_size variable. The bluestore_min_alloc_size is set by default to 4,096, which is equivalent to 4 KB, for HDDs and SSDs. If the data to write in the raw partition is smaller than the chunk size, then it is filled with zeroes. This can lead to a waste of the unused space if the chunk size is not properly sized for your workload, such as for writing many small objects.

Red Hat recommends setting the bluestore_min_alloc_size variable to match the smallest common write to avoid wasting unused space. For example, if your client writes 4 KB objects frequently, then configure the settings on OSD nodes such as bluestore_min_alloc_size = 4096. Setting the bluestore_min_alloc_size variable overrides specific settings for HDD or SSD if previously set with the bluestore_min_alloc_size_ssd or bluestore_min_alloc_size_hdd variables.

Important

Red Hat does not recommend changing the bluestore_min_alloc_size value in your production environment before first contacting Red Hat Support.

Set the value for the bluestore_min_alloc_size variable by using the ceph config command:

[root@node ~]# ceph config set osd.ID bluestore_min_alloc_size_device-type value

The BlueStore Fragmentation Tool

An OSD's free space becomes fragmented over time. Fragmentation is normal, but excess fragmentation degrades OSD performance. When using BlueStore, review fragmentation levels using the BlueStore fragmentation tool. The BlueStore fragmentation tool generates a fragmentation level score for the BlueStore OSD. The fragmentation score is between 0 and 1, with 0 indicating no fragmentation, and 1 indicating severe fragmentation.

For reference, a value between 0 and 0.7 is considered small and acceptable fragmentation, a score between 0.7 and 0.9 is considerable but still safe fragmentation, and scores higher than 0.9 indicates severe fragmentation that is causing performance issues.

View the fragmentation score using the BlueStore fragmentation tool:

[root@node ~]# ceph daemon osd.ID bluestore allocator score block

Maintaining Data Coherence with Scrubbing

OSDs are responsible for validating data coherence, using light scrubbing and deep scrubbing. Light scrubbing verifies an object's presence, checksum, and size. Deep scrubbing reads the data and recalculates and verifies the object's checksum.

By default, Red Hat Ceph Storage performs light scrubbing every day and deep scrubbing every week. However, Ceph can begin the scrubbing operation at any time, which can impact cluster performance. You can enable or disable cluster level light scrubbing by using the ceph osd set noscrub and ceph osd unset noscrub commands. Although scrubbing has a performance impact, Red Hat recommends keeping the feature enabled because it maintains data integrity. Red Hat recommends setting the scrubbing parameters to restrict scrubbing to known periods with the lowest workloads.

Note

The default configuration allows light scrubbing at any time during the day.

Light Scrubbing

Tune the light scrubbing process by adding parameters in the [osd] section of the ceph.conf file . For example, use the osd_scrub_begin_hour parameter to set the time of day that light scrubbing begins, thereby avoiding light scrubbing during peak workloads.

The light scrubbing feature has the following tuning parameters: osd_scrub_begin_hour = begin_hour:: The begin_hour parameter specifies the time to start scrubbing. Valid values are from 0 to 23. If the value is set to 0 and osd_scrub_end_hour is also 0, then scrubbing is allowed the entire day.

osd_scrub_end_hour = end_hour: The end_hour parameter specifies the time to stop scrubbing. Valid values are from 0 to 23. If the value is set to 0 and osd_scrub_begin_hour is also 0, then scrubbing is allowed the entire day.
osd_scrub_load_threshold: Perform a scrub if the system load is below the threshold, defined by the getloadavg() / number online CPUs parameter. The default value is 0.5.
osd_scrub_min_interval: Perform a scrub no more often than the number of seconds defined in this parameter if the load is below the threshold set in the osd_scrub_load_threshold parameter. The default value is 1 day.
osd_scrub_interval_randomize_ratio: Add a random delay to the value defined in the osd_scrub_min_interval parameter. The default value is 0.5.
osd_scrub_max_interval: Do not wait more than this period before performing a scrub, regardless of load. The default value is 7 days.
osd_scrub_priority: Set the priority for scrub operations by using this parameter. The default value is 5. This value is relative to the value of the osd_client_op_priority, which has a higher default priority of 63.

Deep Scrubbing

You can enable and disable deep scrubbing at the cluster level by using the ceph osd set nodeep-scrub and ceph osd unset nodeep-scrub commands. You can configure deep scrubbing parameters by adding them to the [osd] section of the ceph configuration file, ceph.conf. As with the light scrubbing parameters, any changes made to the deep scrub configuration can impact cluster performance.

The following parameters are the most critical to tuning deep scrubbing:

osd_deep_scrub_interval: The interval for deep scrubbing. The default value is 7 days.
osd_scrub_sleep: Introduces a pause between deep scrub disk reads. Increase this value to slow down scrub operations and to have a lower impact on client operations. The default value is 0.

You can use an external scheduler to implement light and deep scrubbing by using the following commands:

The ceph pg dump command displays the last light and deep scrubbing occurrences in the LAST_SCRUB and LAST_DEEP_SCRUB columns.
The ceph pg scrub pg-id command schedules a deep scrub on a particular PG.
The ceph pg deep-scrub pg-id command schedules a deep scrub on a particular PG.

Use the ceph osd pool set pool-name parameter value command to set these parameters for a specific pool.

Pool Parameters for Scrubbing

You can also control light scrubbing and deep scrubbing at the pool level with these pool parameters:

noscrub: If set to true, Ceph does not light scrub the pool. The default value is false.
nodeep-scrub: If set to true, Ceph does not deep scrub the pool. The default value is false
scrub_min_interval: Scrub no more often than the number of seconds defined in this parameter. If set to the default 0, then Ceph uses the osd_scrub_min_interval global configuration parameter.
scrub_max_interval: Do not wait more than the period defined in this parameter before scrubbing the pool. If set to the default 0, Ceph uses the osd_scrub_max_interval global configuration parameter.
deep_scrub_interval: The interval for deep scrubbing. If set to the default 0, Ceph uses the osd_deep_scrub_interval global configuration parameter.

Trimming Snapshots and OSDs

Snapshots are available at the pool and RBD levels. When a snapshot is removed, Ceph schedules the removal of the snapshot data as an asynchronous operation known as snapshot trimming.

To reduce the impact of the snapshot trimming process on the cluster, you can configure a pause after the deletion of each snapshot object. Configure this pause by using the osd_snap_trim_sleep parameter, which is the time in seconds to wait before allowing the next snapshot trimming operation. The default value for this parameter is 0. Contact Red Hat Support for further advice on how to set this parameter based on your environment settings.

Control the snapshot trimming process using the osd_snap_trim_priority parameter, which has a default value of 5.

Controlling Backfill and Recovery

Controlling backfill and recovery operations is necessary to limit the impact of these operations and to preserve cluster performance.

Backfill occurs when a new OSD joins the cluster or when an OSD dies and Ceph reassigns its PGs to other OSDs. When such events occur, Ceph creates object replicas across the available OSDs.

Recovery occurs when a Ceph OSD becomes inaccessible and comes back online, for example due to a short outage. The OSD goes into recovery mode to obtain the latest copy of the data.

Use the following parameters to manage the backfill and recovery operations:

osd_max_backfills: Control the maximum number of concurrent backfill operations per OSD. The default value is 1.
osd_recovery_max_active: Control the maximum number of concurrent recovery operations per OSD. The default value is 3.
osd_recovery_op_priority: Set the recovery priority. The value can range from 1 - 63. The higher the number, the higher the priority. The default value is 3.

References

For more information, refer to the OSD Configuration Reference chapter of the Configuration Guide for Red Hat Ceph Storage at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/configuration_guide/index#ceph-monitor-and-osd-configuration-options_conf

For more information on scrubbing and backfilling, refer to https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/configuration_guide/index#ceph-object-storage-daemon-configuration

For more information on tuning Red Hat Ceph Storage 5 BlueStore, refer to https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/administration_guide/index#osd-bluestore

Discuss Cloud Storage with Red Hat Ceph Storage

Go to community

Red Hat Ceph Storage for OpenStack (CL260)

Haley_Ruccio

3 sie 2023

Build, expand and maintain cloud-scale, clustered storage for your applications with Red Hat Ceph StorageCloud Storage with Red Hat Ceph Storage (CL260) is designed for storage administrators and cloud operators who deploy Red Hat Ceph Storage in a production data center environment or as a component of a Red Hat OpenStack Platform or OpenShift Container Platform infrastructure. Learn how to deploy, manage, and scale a Ceph storage cluster to provide hybrid storage resources, including Amazon S3 and OpenStack Swift-compatible object storage, Ceph-native and iSCSI-based block storage, and shared file storage. This course is based on Red Hat Ceph Storage 5.0.Course summaryDeploy and manage a Red Hat Ceph Storage cluster on commodity servers.Perform common management operations using the web-based management interface.Create, expand, and control access to storage pools provided by the Ceph cluster.Access Red Hat Ceph Storage from clients using object, block, and file-based methods.Analyze and tune Red Hat Ceph Storage performance.Integrate Red Hat OpenStack Platform image, object, block, and file storage with a Red Hat Ceph Storage cluster.Integrate OpenShift Container Platform with a Red Hat Ceph Storage cluster.Target AudienceThis course is intended for storage administrators and cloud operators who want to learn how to deploy and manage Red Hat Ceph Storage on servers in an enterprise data center or within a Red Hat OpenStack Platform or OpenShift Container Platform environment.Developers writing applications that use cloud-based storage will learn the distinctions of various storage types and client access methods.Recommended trainingTake our free assessment to gauge whether this offering is the best fit for your skills.Red Hat Certified System Administrator (RHCSA) certification, or equivalent experience.For candidates that have not earned an RHCSA or equivalent, confirmation of the correct skill set knowledge can be obtained by taking the online skills assessment.Some experience with storage administration is recommended but not required.Technology considerationsThis course does not have any special technical requirements.This course is not intended for BYOD.Internet access is recommended.

Welcome to the Red Hat Ceph Storage for OpenStack (CL260) group in the Red Hat Learning Community!

cschunke

31 lip 2023

We are excited to launch a space dedicated to the Red Hat Training course CL260! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to CL260.Read more about Red Hat Ceph Storage for OpenStack here.

381

Revision: cl260-5.0-29d2128