Abstract
| Goal | Configure Red Hat Ceph Storage to provide block storage for clients using RADOS block devices (RBDs). |
| Objectives |
|
| Sections |
|
| Lab |
Providing Block Storage Using RADOS Block Devices |
After completing this section, you should be able to provide block storage to Ceph clients using RADOS block devices (RBDs), and manage RBDs from the command line.
Block devices are the most common long-term storage devices for servers, laptops, and other computing systems. They store data in fixed-size blocks. Block devices include both hard drives, based on spinning magnetic platters, and solid-state drives, based on nonvolatile memory. To use the storage, format a block device with a file system and mount it on the Linux file system hierarchy.
As a storage administrator, use the rbd command to create, list, retrieve information from, resize, and remove block device images.
The following example procedure creates an RBD image:
Ensure that the rbd pool (or custom pool) for your RBD images exists.
Use the ceph osd pool create command to create a custom pool to store RBD images.
After creating the custom pool, initialize it with the rbd pool init command.
Although Ceph administrators can access the pool, Red Hat recommends that you create a more restricted Cephx user for clients by using the ceph auth command.
Grant the restricted user read/write access to only the needed RBD pool instead of access to the entire cluster.
Create the RBD image with the rbd create --size command.
This command uses the default pool name if you do not specify a pool name.size pool-name/image-name
The rbd_default_pool parameter specifies the name of the default pool used to store RBD images.
Use ceph config set osd rbd_default_pool to set this parameter.value
The kernel RBD client (krbd) maps an RBD image to a Linux block device.
The librbd library provides RBD storage to KVM virtual machines and OpenStack cloud instances.
These clients enable bare-metal servers or virtual machines to use the RBD images as normal block-based storage.
In an OpenStack environment, OpenStack attaches and maps these RBD images to Linux servers where they can serve as boot devices.
Red Hat Ceph Storage disperses the actual storage used by the virtual block devices across the cluster, which provides high performance access using the IP network.
Ceph clients can mount an RBD image using the native Linux kernel module, krbd.
This module maps RBD images to Linux block devices with names such as /dev/rbd0.
The rbd device map command uses the krbd kernel module to map an image.
The rbd map command is an abbreviated form of the rbd device map command.
The rbd device unmap, or rbd unmap, command uses the krbd kernel module to unmap a mapped image.
The following example command maps the test RBD image in the rbd pool to the /dev/rbd0 device on the host client machine:
[root@node ~]# rbd map rbd/test
/dev/rbd0A Ceph client system can use the mapped block device, called /dev/rbd0 in the example, like any other block device.
You can format it with a file system, mount it, and unmount it.
Two clients can map the same RBD image as a block device at the same time. This can be useful for high availability clustering for standby servers, but Red Hat recommends attaching a block device to one client at a time when the block device contains a normal, single-mount file system. Mounting a RADOS block device that contains a normal file system, such as XFS, on two or more clients at the same time can cause file-system corruption and data loss.
The rbd device list command, abbreviated rbd showmapped, lists the RBD images mapped in the machine.
[root@node ~]# rbd showmapped
id pool namespace image snap device
0 rbd test - /dev/rbd0The rbd device unmap command, abbreviated rbd unmap, unmaps the RBD image from the client machine.
[root@node ~]# rbd unmap /dev/rbd0The rbdmap service can automatically map and unmap RBD images to devices when booting and shutting down the system.
This service looks for the mapped images with their credentials in the /etc/ceph/rbdmap file.
The service mounts and unmounts the RBD images using their mount points as they appear in the /etc/fstab file.
The following steps configure rbdmap to persistently map and unmap an RBD image that already contains a file system:
Create the mount point for the file system.
Create a single-line entry in the /etc/ceph/rbdmap RBD map file.
This entry must specify the name of the RBD pool and image.
It must also reference the Cephx user who has read/write permissions to access the image and the corresponding key-ring file.
Ensure that the key-ring file for the Cephx user exists on the client system.
Create an entry for the RBD in the /etc/fstab file on the client system.
The name of the block device has the following form:
/dev/rbd/pool_name/image_name
Specify the noauto mount option, because the rbdmap service, not the Linux fstab routines, handles the mounting of the file system.
Confirm that the block device mapping works.
Use the rbdmap map command to mount the devices. Use the rbdmap unmap command to unmount them.
Enable the rbdmap systemd service.
Refer to rbdmap(8) for more information.
The librbd library provides direct access to RBD images for user space applications.
It inherits the capabilities of librados to map data blocks to objects in the Ceph Object Store, and implements the ability to access RBD images and create snapshots and clones.
Cloud and virtualization solutions, such as OpenStack and libvirt, use librbd to provide RBD images as block devices to cloud instances and the virtual machines that they manage.
For example, RBD images can store QEMU virtual machine images.
Using the RBD clone feature, virtualized containers can boot a virtual machine without copying the boot image.
The copy-on-write (COW) mechanism copies data from the parent to the clone when it writes to an unallocated object within the clone.
The copy-on-read (COR) mechanism copies data from the parent to the clone when it reads from an unallocated object within the clone.
The RBD cache is local to the client because it uses RAM on the machine that initiated the I/O requests.
For example, if you have Nova compute nodes in your Red Hat OpenStack Platform installation that use librbd for their virtual machines, the OpenStack client initiating the I/O request will use local RAM for its RBD cache.
RBD Caching Configurations
Reads and writes go to the Ceph Object Store. The Ceph cluster acknowledges the writes when the data is written and flushed on all relevant OSD journals.
Considering two values, unflushed cache bytes U and maximum dirty cache bytes M, writes are acknowledged when U < M, or after writing data back to disk until U < M.
Set the maximum dirty byte to 0 to force write-through mode. The Ceph cluster acknowledges the writes when the data is written and flushed on all relevant OSD journals.
If using write-back mode, then the librbd library caches and acknowledges the I/O requests when it writes the data into the local cache of the server.
Consider write-through for strategic production servers to reduce the risk of data loss or file system corruption in case of a server failure.
Red Hat Ceph Storage offers the following set of RBD caching parameters:
Table 6.1. RBD Caching Parameters
| Parameter | Description | Default |
|---|---|---|
rbd_cache
| Enable RBD caching. Value=true|false. |
true
|
rbd_cache_size
| Cache size in bytes per RBD image. Value=. |
32 MB
|
rbd_cache_max_dirty
| Maximum dirty bytes allowed per RBD image. Value=. |
24 MB
|
rbd_cache_target_dirty
| Dirty bytes to start preemptive flush per RBD image. Value=. |
16 MB
|
rbd_cache_max_dirty_age
| Maximum page age in seconds before flush. Value=. |
1
|
rbd_cache_writethrough_until_flush
| Start in write-through mode until performing the first flush. Value=true|false. |
true
|
Run ceph config set client command or parameter valueceph config set global command for client or global, respectively.parameter value
When using librbd with Red Hat OpenStack Platform, create separate Cephx user names for OpenStack Cinder, Nova, and Glance.
By following this recommended practice, you can create different caching strategies based on the type of RBD images that your Red Hat OpenStack Platform environment accesses.
RBD images are striped over objects and stored in a RADOS object store. Red Hat Ceph Storage provides parameters that define how these images are striped.
All objects in an RBD image have a name that starts with the value contained in the RBD Block Name Prefix field of each RBD image and displayed using the rbd info command.
After this prefix, there is a period (.), followed by the object number.
The value for the object number field is a 12-character hexadecimal number.
[root@node ~]#rbd info rbdimagerbd image 'rbdimage': size 10240{nbsp}MB in 2560 objects order 22 (4 MiB objects) snapshot_count: 0 id: 867cba5c2d68 block_name_prefix:rbd_data.867cba5c2d68format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Thu Sep 23 18:54:35 2021 access_timestamp: Thu Sep 23 18:54:35 2021 modify_timestamp: Thu Sep 23 18:54:35 2021 [root@node ~]#rados -p rbd lsrbd_object_map.d3d0d7d0b79e.0000000000000008 rbd_id.rbdimage rbd_object_map.d42c1e0a1883 rbd_directory rbd_children rbd_info rbd_header.d3d0d7d0b79e rbd_header.d42c1e0a1883 rbd_object_map.d3d0d7d0b79e rbd_trash
Ceph block devices allow storing data striped over multiple Object Storage Devices (OSD) in a Red Hat Ceph Storage cluster.
The image order is the size of the objects used for the RBD image.
Image order defines a binary shift value based on the << (bitwise left shift) C operator.
This operator shifts the left operand bits by the right operand value.
For example, 1 << 2 = 4.
Decimal 1 is 0001 in binary, so the result of the 1 << 2 = 4 operation is 0100 in binary, which is 4 in decimal.
The value of the image order must be between 12 and 25, where 12 = 4 KiB and 13 = 8 KiB. for example.
The default image order is 22, resulting in 4 MiB objects.
You can override the default value by using the --order option of the rbd create command.
Each RBD image has three parameters associated with it:
image_format
The RBD image format version. The default value is 2, the most recent version. Version 1 has been deprecated and does not support features such as cloning and mirroring.
stripe_unit
The number of consecutive bytes stored in one object, object_size by default.
stripe_count
The number of RBD image objects that a stripe spans, 1 by default.
For RBD format 2 images, you can change the value of each of those parameters. The settings must align with the following equation:
stripe_unit * stripe_count = object_size
For example:
stripe_unit = 1048576, stripe_count = 4 for default 4 MiB objects
Remember that object_size must be no less than 4096 bytes and no greater than 33,554,432 bytes.
Use the --object-size option to specify this value when you create the RBD image.
The default object_size is 4192304 bytes (4 MiB).
rbd(8) and rbdmap(8) man pages
For more information, refer to the Red Hat Ceph Storage 5 Block Device Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/block_device_guide/index