Bookmark this page

Chapter 6. Providing Block Storage Using RADOS Block Devices

Abstract

Goal Configure Red Hat Ceph Storage to provide block storage for clients using RADOS block devices (RBDs).
Objectives
  • Provide block storage to Ceph clients using RADOS block devices (RBDs), and manage RBDs from the command line.

  • Create and configure RADOS block devices snapshots and clones.

  • Export an RBD image from the cluster to an external file and import it into another cluster.

Sections
  • Managing RADOS Block Devices (and Guided Exercise)

  • Managing RADOS Block Device Snapshots (and Guided Exercise)

  • Importing and Exporting RBD Images (and Guided Exercise)

Lab

Providing Block Storage Using RADOS Block Devices

Managing RADOS Block Devices

Objectives

After completing this section, you should be able to provide block storage to Ceph clients using RADOS block devices (RBDs), and manage RBDs from the command line.

Block Storage Using a RADOS Block Device (RBD)

Block devices are the most common long-term storage devices for servers, laptops, and other computing systems. They store data in fixed-size blocks. Block devices include both hard drives, based on spinning magnetic platters, and solid-state drives, based on nonvolatile memory. To use the storage, format a block device with a file system and mount it on the Linux file system hierarchy.

The RADOS Block Device (RBD) feature provides block storage from the Red Hat Ceph Storage cluster. RADOS provides virtual block devices stored as RBD images in pools in the Red Hat Ceph Storage cluster.

Managing and Configuring RBD Images

As a storage administrator, use the rbd command to create, list, retrieve information from, resize, and remove block device images. The following example procedure creates an RBD image:

  • Ensure that the rbd pool (or custom pool) for your RBD images exists. Use the ceph osd pool create command to create a custom pool to store RBD images. After creating the custom pool, initialize it with the rbd pool init command.

  • Although Ceph administrators can access the pool, Red Hat recommends that you create a more restricted Cephx user for clients by using the ceph auth command. Grant the restricted user read/write access to only the needed RBD pool instead of access to the entire cluster.

  • Create the RBD image with the rbd create --size size pool-name/image-name command. This command uses the default pool name if you do not specify a pool name.

The rbd_default_pool parameter specifies the name of the default pool used to store RBD images. Use ceph config set osd rbd_default_pool value to set this parameter.

Accessing RADOS Block Device Storage

The kernel RBD client (krbd) maps an RBD image to a Linux block device. The librbd library provides RBD storage to KVM virtual machines and OpenStack cloud instances. These clients enable bare-metal servers or virtual machines to use the RBD images as normal block-based storage. In an OpenStack environment, OpenStack attaches and maps these RBD images to Linux servers where they can serve as boot devices. Red Hat Ceph Storage disperses the actual storage used by the virtual block devices across the cluster, which provides high performance access using the IP network.

Accessing Ceph Storage with the RBD Kernel Client

Ceph clients can mount an RBD image using the native Linux kernel module, krbd. This module maps RBD images to Linux block devices with names such as /dev/rbd0.

Figure 6.1: Kernel environment access

The rbd device map command uses the krbd kernel module to map an image. The rbd map command is an abbreviated form of the rbd device map command. The rbd device unmap, or rbd unmap, command uses the krbd kernel module to unmap a mapped image. The following example command maps the test RBD image in the rbd pool to the /dev/rbd0 device on the host client machine:

[root@node ~]# rbd map rbd/test
/dev/rbd0

A Ceph client system can use the mapped block device, called /dev/rbd0 in the example, like any other block device. You can format it with a file system, mount it, and unmount it.

Warning

Two clients can map the same RBD image as a block device at the same time. This can be useful for high availability clustering for standby servers, but Red Hat recommends attaching a block device to one client at a time when the block device contains a normal, single-mount file system. Mounting a RADOS block device that contains a normal file system, such as XFS, on two or more clients at the same time can cause file-system corruption and data loss.

The rbd device list command, abbreviated rbd showmapped, lists the RBD images mapped in the machine.

[root@node ~]# rbd showmapped
id  pool  namespace image  snap   device
0   rbd             test   -      /dev/rbd0

The rbd device unmap command, abbreviated rbd unmap, unmaps the RBD image from the client machine.

[root@node ~]# rbd unmap /dev/rbd0

The rbd map and rbd unmap commands require root privileges.

Mapping RBD Images Persistently

The rbdmap service can automatically map and unmap RBD images to devices when booting and shutting down the system. This service looks for the mapped images with their credentials in the /etc/ceph/rbdmap file. The service mounts and unmounts the RBD images using their mount points as they appear in the /etc/fstab file.

The following steps configure rbdmap to persistently map and unmap an RBD image that already contains a file system:

  1. Create the mount point for the file system.

  2. Create a single-line entry in the /etc/ceph/rbdmap RBD map file. This entry must specify the name of the RBD pool and image. It must also reference the Cephx user who has read/write permissions to access the image and the corresponding key-ring file. Ensure that the key-ring file for the Cephx user exists on the client system.

  3. Create an entry for the RBD in the /etc/fstab file on the client system. The name of the block device has the following form:

    /dev/rbd/pool_name/image_name

    Specify the noauto mount option, because the rbdmap service, not the Linux fstab routines, handles the mounting of the file system.

  4. Confirm that the block device mapping works. Use the rbdmap map command to mount the devices. Use the rbdmap unmap command to unmount them.

  5. Enable the rbdmap systemd service.

Refer to rbdmap(8) for more information.

Accessing Ceph Storage with librbd-based Clients

The librbd library provides direct access to RBD images for user space applications. It inherits the capabilities of librados to map data blocks to objects in the Ceph Object Store, and implements the ability to access RBD images and create snapshots and clones.

Cloud and virtualization solutions, such as OpenStack and libvirt, use librbd to provide RBD images as block devices to cloud instances and the virtual machines that they manage. For example, RBD images can store QEMU virtual machine images. Using the RBD clone feature, virtualized containers can boot a virtual machine without copying the boot image. The copy-on-write (COW) mechanism copies data from the parent to the clone when it writes to an unallocated object within the clone. The copy-on-read (COR) mechanism copies data from the parent to the clone when it reads from an unallocated object within the clone.

Figure 6.2: Virtual environment access

Because the user space implementation of the Ceph block device (for example, librbd) cannot take advantage of the Linux page cache, it performs its own in-memory caching, known as RBD caching. RBD caching behaves in a similar manner to the Linux page cache. When the OS implements a barrier mechanism or a flush request, Ceph writes all dirty data to the OSDs. This means that using write-back caching is just as safe as using physical hard disk caching with a VM that properly sends flushes (for example, Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) algorithm, and in write-back mode it can coalesce contiguous requests for better throughput.

Note

The RBD cache is local to the client because it uses RAM on the machine that initiated the I/O requests. For example, if you have Nova compute nodes in your Red Hat OpenStack Platform installation that use librbd for their virtual machines, the OpenStack client initiating the I/O request will use local RAM for its RBD cache.

RBD Caching Configurations

Caching Not Enabled

Reads and writes go to the Ceph Object Store. The Ceph cluster acknowledges the writes when the data is written and flushed on all relevant OSD journals.

Cache Enabled (write-back)

Considering two values, unflushed cache bytes U and maximum dirty cache bytes M, writes are acknowledged when U < M, or after writing data back to disk until U < M.

Write-through Caching

Set the maximum dirty byte to 0 to force write-through mode. The Ceph cluster acknowledges the writes when the data is written and flushed on all relevant OSD journals.

If using write-back mode, then the librbd library caches and acknowledges the I/O requests when it writes the data into the local cache of the server. Consider write-through for strategic production servers to reduce the risk of data loss or file system corruption in case of a server failure. Red Hat Ceph Storage offers the following set of RBD caching parameters:

Table 6.1. RBD Caching Parameters

ParameterDescriptionDefault
rbd_cache Enable RBD caching. Value=true|false. true
rbd_cache_size Cache size in bytes per RBD image. Value=n. 32 MB
rbd_cache_max_dirty Maximum dirty bytes allowed per RBD image. Value=n. 24 MB
rbd_cache_target_dirty Dirty bytes to start preemptive flush per RBD image. Value=n. 16 MB
rbd_cache_max_dirty_age Maximum page age in seconds before flush. Value=n. 1
rbd_cache_writethrough_until_flush Start in write-through mode until performing the first flush. Value=true|false. true

Run ceph config set client parameter value command or ceph config set global parameter value command for client or global, respectively.

Note

When using librbd with Red Hat OpenStack Platform, create separate Cephx user names for OpenStack Cinder, Nova, and Glance. By following this recommended practice, you can create different caching strategies based on the type of RBD images that your Red Hat OpenStack Platform environment accesses.

Tuning the RBD Image Format

RBD images are striped over objects and stored in a RADOS object store. Red Hat Ceph Storage provides parameters that define how these images are striped.

RADOS Block Device Image Layout

All objects in an RBD image have a name that starts with the value contained in the RBD Block Name Prefix field of each RBD image and displayed using the rbd info command. After this prefix, there is a period (.), followed by the object number. The value for the object number field is a 12-character hexadecimal number.

[root@node ~]# rbd info rbdimage
rbd image 'rbdimage':
	size 10240{nbsp}MB in 2560 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 867cba5c2d68
	block_name_prefix: rbd_data.867cba5c2d68
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features:
	flags:
	create_timestamp: Thu Sep 23 18:54:35 2021
	access_timestamp: Thu Sep 23 18:54:35 2021
	modify_timestamp: Thu Sep 23 18:54:35 2021
[root@node ~]# rados -p rbd ls
rbd_object_map.d3d0d7d0b79e.0000000000000008
rbd_id.rbdimage
rbd_object_map.d42c1e0a1883
rbd_directory
rbd_children
rbd_info
rbd_header.d3d0d7d0b79e
rbd_header.d42c1e0a1883
rbd_object_map.d3d0d7d0b79e
rbd_trash

Ceph block devices allow storing data striped over multiple Object Storage Devices (OSD) in a Red Hat Ceph Storage cluster.

Figure 6.3: RBD layout

RBD Image Order

The image order is the size of the objects used for the RBD image. Image order defines a binary shift value based on the << (bitwise left shift) C operator. This operator shifts the left operand bits by the right operand value. For example, 1 << 2 = 4. Decimal 1 is 0001 in binary, so the result of the 1 << 2 = 4 operation is 0100 in binary, which is 4 in decimal. The value of the image order must be between 12 and 25, where 12 = 4 KiB and 13 = 8 KiB. for example. The default image order is 22, resulting in 4 MiB objects. You can override the default value by using the --order option of the rbd create command.

You can specify the size of the objects used with the --object-size option. This parameter must specify an object size between 4096 bytes (4 KiB) and 33,554,432 bytes (32 MiB), expressed in bytes, K or M (for example, 4096, 8 K or 4 M).

RBD Image Format

Each RBD image has three parameters associated with it:

image_format

The RBD image format version. The default value is 2, the most recent version. Version 1 has been deprecated and does not support features such as cloning and mirroring.

stripe_unit

The number of consecutive bytes stored in one object, object_size by default.

stripe_count

The number of RBD image objects that a stripe spans, 1 by default.

For RBD format 2 images, you can change the value of each of those parameters. The settings must align with the following equation:

stripe_unit * stripe_count = object_size

For example:

stripe_unit = 1048576, stripe_count = 4 for default 4 MiB objects

Remember that object_size must be no less than 4096 bytes and no greater than 33,554,432 bytes. Use the --object-size option to specify this value when you create the RBD image. The default object_size is 4192304 bytes (4 MiB).

 

References

rbd(8) and rbdmap(8) man pages

For more information, refer to the Red Hat Ceph Storage 5 Block Device Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/block_device_guide/index

Revision: cl260-5.0-29d2128