CL260 - ch04s03

Bookmark this page

Creating and Configuring Pools

Objectives

After completing this section, you should be able to describe and compare replicated and erasure coded pools, and create and configure each pool type.

Understanding Pools

Pools are logical partitions for storing objects. Ceph clients write objects to pools.

Ceph clients require the cluster name (ceph by default) and a monitor address to connect to the cluster. Ceph clients usually obtain this information from a Ceph configuration file, or by being specified as command-line parameters.

A Ceph client uses the list of pools retrieved with the cluster map, to determine where to store new objects.

The Ceph client creates an input/output context to a specific pool and the Ceph cluster uses the CRUSH algorithm to map these pools to placement groups, which are then mapped to specific OSDs.

Pools provide a layer of resilience for the cluster because pools define the number of OSDs that can fail without losing data.

Pool Types

The available pool types are replicated and erasure coded. You decide which pool type to use based on your production use case and the type of workload.

The default pool type is replicated, which functions by copying each object to multiple OSDs. This pool type requires more storage because it creates multiple copies of objects, however, read operation availability is increased through redundancy.

Erasure coded pools require less storage and network bandwidth but use more CPU processing time because of parity calculations.

Erasure coded pools are recommended for infrequently accessed data that does not require low latency. Replicated pools are recommended for frequently accessed data that requires fast read performance. The recovery time for each pool type can vary widely and is based on the cluster deployment, failure, and sizing characteristics.

A pool's type cannot be changed after creating the pool.

Pool Attributes

You must specify certain attributes when you create a pool:

The pool name, which must be unique in the cluster.
The pool type, which determines the protection mechanism the pool uses to ensure data durability. The replicated type distributes multiple copies of each object across the cluster. The erasure coded type splits each object into chunks, and distributes them along with additional erasure coded chunks to protect objects using an automatic error correction mechanism.
The number of placement groups (PGs) in the pool, which store their objects in a set of OSDs determined by the CRUSH algorithm.
Optionally, a CRUSH rule set that Ceph uses to identify which placement groups to use to store objects for the pool.

Note

Change the osd_pool_default_pg_num and osd_pool_default_pgp_num configuration settings to set the default number of PGs for a pool.

Creating Replicated Pools

Ceph protects data within replicated pools by creating multiple copies of each object, called replicas. Ceph uses the CRUSH failure domain to determine the primary OSDs of the acting set to store the data. Then, the primary OSD finds the current replica size for the pool and calculates the secondary OSDs to write the objects to. After the primary OSD receives the acknowledgment of the writes and finishes writing the data, the primary OSD acknowledges a successful write operation to the Ceph client. This protects the data in the object if one or more of the OSDs fail.

Use the following command to create a replicated pool.

[ceph: root@node /]# ceph osd pool create pool-name  pg-num  pgp-num  replicated crush-rule-name

Where:

pool_name is the name of the new pool.
pg_num is the total configured number of placement groups (PGs) for this pool.
pgp_num is the effective number of placement groups for this pool. Set this equal to pg_num.
replicated specifies that this is a replicated pool, and is the default if not included in the command.
crush-rule-name is the name of the CRUSH rule set you want to use for this pool. The osd_pool_default_crush_replicated_ruleset configuration parameter sets the default value.

The number of placement groups in a pool can be adjusted after it is initially configured. If pg_num and pgp_num are set to the same number, then any future adjustments to pg_num automatically adjusts the value of pgp_num. The adjustment to pgp_num triggers the movement of PGs across OSDs, if needed, to implement the change. Define a new number of PGs in a pool by using the following command.

[ceph: root@node /]# ceph osd pool set my_pool pg_num 32
set pool 6 pg_num to 32

When you create a pool with the ceph osd pool create command, you do not specify the number of replicas (size). The osd_pool_default_size configuration parameter defines the number of replicas, and defaults to a value of 3.

[ceph: root@node /]# ceph config get mon osd_pool_default_size
3

Change the size of a pool with the ceph osd pool set pool-name size number-of-replicas command. Alternatively, update the default setting of the osd_pool_default_size configuration setting.

The osd_pool_default_min_size parameter sets the number of copies of an object that must be available to accept I/O requests. The default value is 2.

Configuring Erasure Coded Pools

An erasure coded pool uses erasure coding instead of replication to protect object data.

Objects stored in an erasure coded pool are divided into a number of data chunks which are stored in separate OSDs. The number of coding chunks are calculated based on the data chunks and are stored in different OSDs. The coding chunks are used to reconstruct the object's data if an OSD fails. The primary OSD receives the write operation, then encodes the payload into K+M chunks and sends them to the secondary OSDs in erasure coded pools.

Erasure coded pools use this method to protect their objects and, unlike replicated pools, do not rely on storing multiple copies of each object.

To summarize how erasure coded pools work:

Each object's data is divided into k data chunks.
m coding chunks are calculated.
The coding chunk size is the same as the data chunk size.
The object is stored on a total of k + m OSDs.

Figure 4.4: Erasure coded pools

Erasure coding uses storage capacity more efficiently than replication. Replicated pools maintain n copies of an object, whereas erasure coding maintains only k + m chunks. For example, replicated pools with 3 copies use 3 times the storage space. Erasure coded pools with k=4 and m=2 use only 1.5 times the storage space.

Note

Red Hat supports the following k+m values which result in the corresponding usable-to-raw ratio:

4+2 (1:1.5 ratio)
8+3 (1:1.375 ratio)
8+4 (1:1.5 ratio)

The formula for calculating the erasure code overhead is nOSD * k / (k+m) * OSD Size. For example, if you have 64 OSDs of 4 TB each (256 TB total), with k=8 and m=4, then the formula is 64 * 8 / (8+4) * 4 = 170.67. Then divide the raw storage capacity by the overhead to get the ratio. 256 TB/170.67 TB equals a ratio of 1.5.

Erasure coded pools require less storage than replicated pools to obtain a similar level of data protection, which can reduce the cost and size of the storage cluster. However, calculating coding chunks adds CPU processing and memory overhead for erasure coded pools, reducing overall performance.

Use the following command to create an erasure coded pool.

[ceph: root@node /]# ceph osd pool create pool-name pg-num pgp-num \
erasure erasure-code-profile crush-rule-name

Where:

pool-name is the name of the new pool.
pg-num is the total number of placement groups (PGs) for this pool.
pgp-num is the effective number of placement groups for this pool. Normally, this should be equal to the total number of placement groups.
erasure specifies that this is an erasure coded pool.
erasure-code-profile is the name of the profile to use. You can create new profiles with the ceph osd erasure-code-profile set command. A profile defines the k and m values and the erasure code plug-in to use. By default, Ceph uses the default profile.
crush-rule-name is the name of the CRUSH rule set to use for this pool. If not set, Ceph uses the one defined in the erasure code profile.

You can configure placement group autoscaling on a pool. Autoscaling allows the cluster to calculate the number of placement groups and to choose appropriate pg_num values automatically. Autoscaling is enabled by default in Red Hat Ceph Storage 5.

Every pool in the cluster has a pg_autoscale_mode option with a value of on, off, or warn.

on: Enables automated adjustments of the PG count for the pool.
off: Disables PG autoscaling for the pool.
warn: Raises a health alert and changes the cluster health status to HEALTH_WARN when the PG count needs adjustment.

This example enables the pg_autoscaler module on the Ceph MGR nodes and sets the autoscaling mode to on for a pool:

[ceph: root@node /]# ceph mgr module enable pg_autoscaler
module 'pg_autoscaler' is already enabled (always-on)
[ceph: root@node /]# ceph osd pool set pool-name pg_autoscale_mode on
set pool 7 pg_autoscale_mode to on

Erasure coded pools cannot use the Object Map feature. An object map is an index of objects that tracks where the blocks of an rbd object are allocated. Having an object map for a pool improves the performance of resize, export, flatten, and other operations.

Erasure Code Profiles

An erasure code profile configures the number of data chunks and coding chunks that your erasure-coded pool uses to store objects, and which erasure coding plug-in and algorithm to use.

Create profiles to define different sets of erasure coding parameters. Ceph automatically creates the default profile during installation. This profile is configured to divide objects into two data chunks and one coding chunk.

Use the following command to create a new profile.

[ceph: root@node /]# ceph osd erasure-code-profile set profile-name arguments

The following arguments are available:

k: The number of data chunks that are split across OSDs. The default value is 2.
m: The number of OSDs that can fail before the data becomes unavailable. The default value is 1.
directory: This optional parameter is the location of the plug-in library. The default value is /usr/lib64/ceph/erasure-code.
plugin: This optional parameter defines the erasure coding algorithm to use.
crush-failure-domain: This optional parameter defines the CRUSH failure domain, which controls chunk placement. By default, it is set to host, which ensures that an object's chunks are placed on OSDs on different hosts. If set to osd, then an object's chunks can be placed on OSDs on the same host. Setting the failure domain to osd is less resilient because all OSDs on a host will fail if the host fails. Failure domains can be defined and used to ensure chunks are placed on OSDs on hosts in different data center racks or other customization.
crush-device-class: This optional parameter selects only OSDs backed by devices of this class for the pool. Typical classes might include hdd, ssd, or nvme.
crush-root: This optional parameter sets the root node of the CRUSH rule set.
key=value: Plug-ins might have key-value parameters unique to that plug-in.
technique: Each plug-in provides a different set of techniques that implement different algorithms.

Important

You cannot modify or change the erasure code profile of an existing pool.

Use the ceph osd erasure-code-profile ls command to list existing profiles.

Use the ceph osd erasure-code-profile get command to view the details of an existing profile.

Use the ceph osd erasure-code-profile rm command to remove an existing profile.

Managing and Operating Pools

You can view and modify existing pools and change pool configuration settings.

Rename a pool by using the ceph osd pool rename command. This does not affect the data stored in the pool. If you rename a pool and you have per-pool capabilities for an authenticated user, you must update the user's capabilities with the new pool name.
Delete a pool by using the ceph osd pool delete command.
Warning
Deleting a pool removes all data in the pool and is not reversible. You must set mon_allow_pool_delete to true to enable pool deletion.
Prevent pool deletion for a specific pool by using the ceph osd pool set pool_name nodelete true command. Set nodelete back to false to allow deletion of the pool.
View and modify pool configuration settings by using the ceph osd pool set and ceph osd pool get commands.
List pools and pool configuration settings by using the ceph osd lspools and ceph osd pool ls detail commands.
List pools usage and performance statistics by using the ceph df and ceph osd pool stats commands.
Enable Ceph applications for a pool by using the ceph osd pool application enable command. Application types are cephfs for Ceph File System, rbd for Ceph Block Device, and rgw for RADOS Gateway.
Set pool quotas to limit the maximum number of bytes or the maximum number of objects that can be stored in the pool by using the ceph osd pool set-quota command.
Important
When a pool reaches the configured quota, operations are blocked. You can remove a quota by setting its value to 0.

Configure these example setting values to enable protection against pool reconfiguration:

osd_pool_default_flag_nodelete: Sets the default value of the nodelete flag on pools. Set the value to true to prevent pool deletion.
osd_pool_default_flag_nopgchange: Sets the default value of the nopgchange flag on pools. Set the value to true to prevent changes to pg_num, and pgp_num.
osd_pool_default_flag_nosizechange: Sets the default value of the nosizechange flag on pools. Set the value to true to prevent pool size changes.

Pool Namespaces

A namespace is a logical group of objects in a pool. Access to a pool can be limited so that a user can only store or retrieve objects in a particular namespace. One advantage of namespaces is to restrict user access to part of a pool.

Namespaces are useful for restricting storage access by an application. They allow you to logically partition a pool and restrict applications to specific namespaces inside the pool.

You could dedicate an entire pool to each application, but having more pools means more PGs per OSD, and PGs are computationally expensive. This might degrade OSD performance as load increases. With namespaces, you can keep the number of pools the same and not dedicate an entire pool to each application.

Important

Namespaces are currently only supported for applications that directly use librados. RBD and Ceph Object Gateway clients do not currently support this feature.

To store an object inside a namespace, the client application must provide the pool and the namespace names. By default, each pool contains a namespace with an empty name, known as the default namespace.

Use the rados command to store and retrieve objects from a pool. Use the -N name and --namespace= name options to specify the pool and namespace to use.

The following example stores the /etc/services file as the srv object in the mytestpool pool, under the system namespace.

[ceph: root@node /]# rados -p mytestpool -N system put srv /etc/services
[ceph: root@node /]# rados -p mytestpool -N system ls
srv

List all the objects in all namespaces in a pool by using the --all option. To obtain JSON formatted output, add the --format=json-pretty option.

The following example lists the objects in the mytestpool pool. The mytest object has an empty namespace. The other objects belong to the system or the flowers namespaces.

[ceph: root@node /]# rados -p mytestpool --all ls
system  srv
flowers anemone
flowers iris
system  magic
flowers rose
        mytest
system networks
[ceph: root@node /]# rados -p mytestpool --all ls --format=json-pretty
[
    {
        "name": "srv",
        "namespace": "system"
    },
    {
        "name": "anemone",
        "namespace": "flowers"
    },
    {
        "name": "iris",
        "namespace": "flowers"
    },
    {
        "name": "magic",
        "namespace": "system"
    },
    {
        "name": "rose",
        "namespace": "flowers"
    },
    {
        "name": "mytest",
        "namespace": ""
    },
    {
        "name": "networks",
        "namespace": "system"
    }
]

References

For more information, refer to the Pools and Erasure Code Pools chapters in the Red Hat Ceph Storage 5 Storage Strategies Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/storage_strategies_guide

Discuss Cloud Storage with Red Hat Ceph Storage

Go to community

Red Hat Ceph Storage for OpenStack (CL260)

Haley_Ruccio

3 sie 2023

Build, expand and maintain cloud-scale, clustered storage for your applications with Red Hat Ceph StorageCloud Storage with Red Hat Ceph Storage (CL260) is designed for storage administrators and cloud operators who deploy Red Hat Ceph Storage in a production data center environment or as a component of a Red Hat OpenStack Platform or OpenShift Container Platform infrastructure. Learn how to deploy, manage, and scale a Ceph storage cluster to provide hybrid storage resources, including Amazon S3 and OpenStack Swift-compatible object storage, Ceph-native and iSCSI-based block storage, and shared file storage. This course is based on Red Hat Ceph Storage 5.0.Course summaryDeploy and manage a Red Hat Ceph Storage cluster on commodity servers.Perform common management operations using the web-based management interface.Create, expand, and control access to storage pools provided by the Ceph cluster.Access Red Hat Ceph Storage from clients using object, block, and file-based methods.Analyze and tune Red Hat Ceph Storage performance.Integrate Red Hat OpenStack Platform image, object, block, and file storage with a Red Hat Ceph Storage cluster.Integrate OpenShift Container Platform with a Red Hat Ceph Storage cluster.Target AudienceThis course is intended for storage administrators and cloud operators who want to learn how to deploy and manage Red Hat Ceph Storage on servers in an enterprise data center or within a Red Hat OpenStack Platform or OpenShift Container Platform environment.Developers writing applications that use cloud-based storage will learn the distinctions of various storage types and client access methods.Recommended trainingTake our free assessment to gauge whether this offering is the best fit for your skills.Red Hat Certified System Administrator (RHCSA) certification, or equivalent experience.For candidates that have not earned an RHCSA or equivalent, confirmation of the correct skill set knowledge can be obtained by taking the online skills assessment.Some experience with storage administration is recommended but not required.Technology considerationsThis course does not have any special technical requirements.This course is not intended for BYOD.Internet access is recommended.

Welcome to the Red Hat Ceph Storage for OpenStack (CL260) group in the Red Hat Learning Community!

cschunke

31 lip 2023

We are excited to launch a space dedicated to the Red Hat Training course CL260! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to CL260.Read more about Red Hat Ceph Storage for OpenStack here.

381

Revision: cl260-5.0-29d2128

Cloud Storage with Red Hat Ceph Storage

Creating and Configuring Pools

Objectives

Understanding Pools

Pool Types

Pool Attributes

Note

Creating Replicated Pools

Configuring Erasure Coded Pools

Note

Erasure Code Profiles

Important

Managing and Operating Pools

Warning

Important

Pool Namespaces

Important

References