Bookmark this page

P 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Chapter 10. Providing File Storage with CephFS

SectionDeploying Shared File Storage

SectionObjectives
SectionThe Ceph File System and MDS
SectionDeploying CephFS
SectionMounting a File System with CephFS
Section

Guided Exercise: Deploying Shared File Storage

Managing Shared File Storage

Objectives
CephFS Administration
Mapping a File to an Object
Controlling the RADOS Layout of Files
Managing Snapshots

Guided Exercise: Managing Shared File Storage

Lab: Providing File Storage with CephFS

Summary

Abstract

Goal	Configure Red Hat Ceph Storage to provide file storage for clients using the Ceph File System (CephFS).
Objectives	Provide file storage on the Ceph cluster by deploying the Ceph File System (CephFS). Configure CephFS, including snapshots, replication, memory management, and client access.
Sections	Deploying Shared File Storage (and Guided Exercise) Managing Shared File Storage (and Guided Exercise)
Lab	Providing File Storage with CephFS

Deploying Shared File Storage

Objectives

After completing this section, you should be able to provide file storage on the Ceph cluster by deploying the Ceph File System (CephFS).

The Ceph File System and MDS

The Ceph File System (CephFS) is a POSIX-compliant file system that is built on top of RADOS, Ceph's distributed object store. File-based storage organizes your data as a traditional file system, with a directory tree hierarchy. Implementing the Ceph File System requires a running Ceph storage cluster and at least one Ceph Metadata Server (MDS) to manage the CephFS metadata, separately from file data, which reduces complexity and improves reliability. Similar to RBD and RGW, the CephFS daemon is implemented as a native interface to librados.

File, Block, and Object Storage

File-based storage organizes your data as a traditional file system. Data is saved as files with a name and associated metadata, such as modification time stamps, an owner, and access permissions. File-based storage uses a directory tree hierarchy to organize how files are stored.

Block-based storage provides a storage volume that operates similar to a disk device, organized into equally sized chunks. Typically, block-based storage volumes are either formatted with a file system, or applications such as databases directly access and write to them.

With object-based storage, you can store arbitrary data and metadata as a unit that is labeled with a unique identifier in a flat storage pool. Rather than accessing data as blocks or in a file-system hierarchy, you use an API to store and retrieve objects. Fundamentally, the Red Hat Ceph Storage RADOS cluster is an object store.

The Metadata Server

The Metadata Server (MDS) manages metadata for CephFS clients. This daemon provides information that CephFS clients need to access RADOS objects, such as providing file locations within the file-system tree. MDS manages the directory hierarchy and stores file metadata, such as the owner, time stamps, and permission modes, in a RADOS cluster. MDS is also responsible for access caching and managing client caches to maintain cache coherence.

MDS daemons operate in two modes: active and standby. An active MDS manages the metadata on the CephFS file system. A standby MDS serves as a backup, and switches to the active mode if the active MDS becomes unresponsive. CephFS shared file systems require an active MDS service. You should deploy at least one standby MDS in your cluster to ensure high availability.

If you do not create enough MDS pools to match the number of configured standby daemons, then the Ceph cluster displays a WARN health status. The recommended solution is create more MDS pools to provide a pool for each daemon. However, a temporary solution is to set the number of standby pools to 0, which disables the Ceph MDS standby check through the ceph fs set fs-name standby_count_wanted 0 command.

CephFS clients first contact a MON to authenticate and retrieve the cluster map. Then, the client queries an active MDS for file metadata. The client uses the metadata to access the objects that comprise the requested file or directory by communicating directly with the OSDs.

MDS features and configuration options are described in the following list:

MDS Ranks: MDS ranks define how the metadata workload is distributed over the MDS daemons. The number of ranks, which is defined by the max_mds configuration setting, is the maximum number of MDS daemons that can be active at a time. MDS daemons start without a rank and the MON daemon is responsible for assigning them a rank.
Subvolumes and Subvolume Groups: CephFS subvolumes are an abstraction for independent CephFS file system directory trees. When creating subvolumes, you can specify more fine-grained rights management, such as the UID, GID, file mode, size, and the subvolume group for your subvolume. Subvolume groups are abstractions at a directory level across a set of subvolumes.

Note

You can create snapshots of subvolumes, but Red Hat Ceph Storage 5 does not support creating snapshots of subvolume groups. You can list and remove existing snapshots of subvolume groups.

File System Affinity: Configure your CephFS file system to prefer one MDS over another MDS. For example, you can configure to prefer an MDS that runs on a faster server over another MDS that runs on an older server. This file system affinity is configured through the mds_join_fs option.
MDS Cache Size Limits: Limit the size of the MDS cache by limiting the maximum memory to use with the mds_cache_memory_limit option, or by defining the maximum number of inodes with the mds_cache_size option.
Quotas: Configure your CephFS file system to restrict the number of bytes or files that are stored by using quotas. Both the FUSE and kernel clients support checking quotas when mounting a CephFS file system. These clients are also responsible for stopping writing data to the CephFS file system when the user reaches the quota limit. Use the setfattr command's ceph.quota.max_bytes and ceph.quota.max_files options to set the limits.

New CephFS Capabilities

Red Hat Ceph Storage 5 removes limitations from earlier versions.

Red Hat Ceph Storage 5 supports more than one active MDS in a cluster, which can increase metadata performance. To remain highly available, you can configure additional standby MDSes to take over from any active MDS that fails.
Red Hat Ceph Storage 5 supports more than one CephFS file system in a cluster. Deploying more than one CephFS file system requires running more MDS daemons.

Deploying CephFS

To implement a CephFS file system, create the required pools, create the CephFS file system, deploy the MDS daemons, and then mount the file system. You can manually create the pools, create the CephFS file system, and deploy the MDS daemons, or use the ceph fs volume create command, which does all these steps automatically. The first option gives the system administrator more control over the process, but with more steps than the simpler ceph fs volume create command.

Creating CephFS with the Volume Method

Use ceph fs volume to directly create the CephFS volume. This command creates pools that are associated to the CephFS, creates the CephFS volume, and also starts the MDS service on the hosts.

[ceph: root@server /]# ceph fs volume create fs-name \
--placement="number-of-hosts list-of-hosts"

Creating CephFS with Placement Specification

For more control over the deploying process, manually create the pools that are associated to the CephFS, start the MDS service on the hosts, and create the CephFS file system.

Creating the Data and Metadata Pools

A CephFS file system requires at least two pools, one to store CephFS data, and another to store CephFS metadata. The default names for these two pools are cephfs_data and cephfs_metadata. To create a CephFS file system, first create the two pools.

[ceph: root@server /]# ceph osd pool create cephfs_data
[ceph: root@server /]# ceph osd pool create cephfs_metadata

This example creates two pools with standard parameters. Because the metadata pool stores file location information, consider a higher replication level for this pool to avoid data errors that render your data inaccessible.

By default, Ceph uses replicated data pools. However, erasure-coded data pools are now also supported for CephFS file systems. Create an erasure-coded pool with the ceph osd pool command:

[ceph: root@server /]# ceph osd pool create pool-name erasure

Creating CephFS and Deploying MDS service

When the data and metadata pools are available, use the ceph fs new command to create the file system, as follows:

[ceph: root@server /]# ceph fs new fs-name metadata-pool data-pool

To add an existing erasure pool as a data pool in your CephFS file system, use ceph fs add_data_pool.

[ceph: root@server /]# ceph fs add_data_pool fs-name data-pool

You can then deploy the MDS service:

[ceph: root@server /]# ceph orch apply mds fs-name \
--placement="number-of-hosts list-of-hosts"

Creating CephFS with the Service Specification

Use the Ceph Orchestrator to deploy the MDS service with the service specification. First, manually create the two required pools. Then, create a YAML file with the service details:

service_type: mds
service_id: fs-name
placements:
  hosts:
    - host-name-1
    - host-name-2
    - ...

Use the YAML service specification to deploy the MDS service with the ceph orch apply command:

[ceph: root@server /]# ceph orch apply -i file-name.yml

Finally, create the CephFS file system with the ceph fs new command.

Mounting a File System with CephFS

You can mount CephFS file systems with either of the available clients:

The kernel client
The FUSE client

The kernel client requires a Linux kernel version 4 or later, which is available starting with RHEL 8. For previous kernel versions, use the FUSE client instead.

The two clients have unique advantages and disadvantages. Not all features are supported in both clients. For example, the kernel client does not support quotas, but can be faster. The FUSE client supports quotas and ACLs. You must enable ACLs to use them with the CephFS file system mounted with the FUSE client.

Common CephFS Client Configuration

To mount a CephFS-based file system with either client, verify the following prerequisites on the client host.

Install the ceph-common package. For the FUSE client, also install the ceph-fuse package.
Verify that the Ceph configuration file exists (/etc/ceph/ceph.conf by default).
Authorize the client to access the CephFS file system.
Extract the new authorization key with the ceph auth get command and copy it to the /etc/ceph folder on the client host.
When using the FUSE client as a non-root user, add user_allow_other in the /etc/fuse.conf configuration file.

Mounting CephFS with the FUSE Client

When the prerequisites are met, use the FUSE client to mount and unmount a CephFS file system:

[root@node ~]# ceph-fuse [mount-point] [options]

To provide the key ring for a specific user, use the --id option.

You must authorize the client to access the CephFS file system, by using the ceph fs authorize command:

[ceph: root@server /]# ceph fs authorize fs-name client-name path permissions

With the ceph fs authorize command, you can provide fine-grained access control for different users and folders in the CephFS file system. You can set different options for folders in a CephFS file system:

r: Read access to the specified folder. Read access is also granted to the subfolders, if no other restriction is specified.
w: Write access to the specified folder. Write access is also granted to the subfolders, if no other restriction is specified.
p: Clients require the p option in addition to r and w capabilities to use layouts or quotas.
s: Clients require the s option in addition to r and w capabilities to create snapshots.

This example allows one user to read the root folder, and also provides read, write, and snapshot permissions to the /directory folder.

[ceph: root@server /]# ceph fs authorize mycephfs client.user / r /directory rws

By default, the CephFS FUSE client mounts the root directory (/) of the accessed file system. You can mount a specific directory with the ceph-fuse -r directory command.

Note

When you try to mount a specific directory, this operation fails if the directory does not exist in the CephFS volume.

When more than one CephFS file system is configured, the CephFS FUSE client mounts the default CephFS file system. To use a different file system, use the --client_fs option.

To persistently mount your CephFS file system by using the FUSE client, you can add the following entry to the /etc/fstab file:

host-name:_port_ mount-point fuse.ceph ceph.id=myuser,ceph.client_mountpoint=mountpoint,_netdev 0 0

Use the umount command to unmount the file system:

[root@node ~]# umount mount-point

Mounting CephFS with the Kernel Client

When using the CephFS kernel client, use the following command to mount the file system:

[root@node ~]# mount -t ceph [device]:[path] [mount-point] \
-o [key-value] [other-options]

You must authorize the client to access the CephFS file system, with the ceph fs authorize command. Extract the client key with the ceph auth get command, and then copy the key to the /etc/ceph folder on the client host.

With the CephFS kernel client, you can mount a specific subdirectory from a CephFS file system. This example mounts a directory called /dir/dir2 from the root of a CephFS file system:

[root@node ~]# mount -t ceph mon1:/dir1/dir2 mount-point

You can specify a list of several comma-separated MONs to mount the device. The standard port (6789) is the default, or you can add a colon and a nonstandard port number after the name of each MON. Recommended practice is to specify more than one MON in case that some are offline when the file system is mounted.

These other options are available when using the CephFS kernel client:

Table 10.1. CephFS Kernel Client Mount Options

Option name	Description
`name=name`	The Cephx client ID to use. The default is `guest`.
`fs=fs-name`	The name of the CephFS file system to mount. When no value is provided, it uses the default file system.
`secret=secret_value`	Value of the secret key for this client.
`secretfile=secret_key_file`	The path to the file with the secret key for this client.
`rsize=bytes`	Specify the maximum read size in bytes.
`wsize=bytes`	Specify the maximum write size in bytes. The default is none.

To persistently mount your CephFS file system by using the kernel client, you can add the following entry to the /etc/fstab file:

mon1,mon2:/ mount_point ceph name=user1,secretfile=/root/secret,_netdev 0 0

Use the umount command to unmount the file system:

[root@node ~]# umount mount_point

Removing CephFS

You can remove a CephFS if needed. However, first back up all your data, because removing your CephFS file system destroys all the stored data on that file system.

The procedure to remove a CephFS is first to mark it as down, as follows:

[ceph: root@server /]# ceph fs set fs-name down true

Then, you can remove it with the next command:

[ceph: root@server /]# ceph fs rm fs-name --yes-i-really-mean-it

User Space Implementation of an NFS Server

Red Hat Ceph Storage 5 provides access to Ceph storage from an NFS client, with NFS Ganesha. NFS Ganesha is a user space NFS file server that supports multiple protocols, such as NFSv3, NFSv4.0, NFSv4.1, and pNFS. NFS Ganesha uses a File System Abstraction Layer (FSAL) architecture, to support and share files from multiple file systems or lower-level storage, such as Ceph, Samba, Gluster, and Linux file systems such as XFS.

In Red Hat Ceph Storage, NFS Ganesha shares files with the NFS 4.0 or later protocol. This requirement is necessary for proper feature functioning by the CephFS client, the OpenStack Manila File Sharing service, and other Red Hat products that are configured to access the NFS Ganesha service.

The following list outlines the advantages of a user space NFS server:

The server does not implement system calls.
Caching is defined and used more efficiently.
Service failover and restarting are faster and easier to implement.
User space services can be clustered easily for high availability.
You can use distributed lock management (DLM) to allow multiple client protocols.
Debugging of server issues is simpler, so you do not need to create kernel dumps.
Resource management and performance monitoring are simpler.

You can deploy NFS Ganesha in an active-active configuration on top of an existing CephFS file system through the ingress service. The main goal of this active-active configuration is for load balancing, and scaling to many instances that handle higher loads. Thus, if one node fails, then the cluster redirects all the workload to the rest of the nodes.

System administrators can deploy the NFS Ganesha daemons via the CLI or manage them automatically if either the Cephadm or Rook orchestrators are enabled.

The following list outlines the advantages to having an ingress service on top of an existing NFS service:

A virtual IP to access the NFS server.
Migration of the NFS service to another node if one node fails, providing shorter failover times.
Load balancing across the NFS nodes.

Note

The ingress implementation is not yet completely developed. It can deploy multiple Ganesha instances and balance the load between them, but failover between hosts is not yet fully implemented. This feature is expected to be available in future releases.

You can use multiple active-active NFS Ganesha services with Pacemaker for high availability. The Pacemaker component is responsible for all cluster-related activities, such as monitoring cluster membership, managing the services and resources, and fencing cluster members.

As prerequisites, create a CephFS file system and install the nfs-ganesha, nfs-ganesha-ceph, nfs-ganesha-rados-grace, and nfs-ganesha-rados-urls packages on the Ceph MGR nodes.

After the prerequisites are satisfied, enable the Ceph MGR NFS module:

[ceph: root@server /]# ceph mgr module enable nfs

Then, create the NFS Ganesha cluster:

[ceph: root@server /]# ceph nfs cluster create cluster-name "node-list"

The node-list is a comma-separated list where the daemon containers are deployed.

Next, export the CephFS file system:

[ceph: root@server /]# ceph nfs export create cephfs fs-name \
cluster-name pseudo-path

The pseudo-path parameter is the pseudo root path.

Finally, mount the exported CephFS file system on a client node.

[root@node ~]# mount -t nfs -o port=ganesha-port node-name:_pseudo-path_ path

MDS Autoscaler

CephFS shared file systems require at least one active MDS service for correct operation, and at least one standby MDS to ensure high availability. The MDS autoscaler module ensures the availability of enough MDS daemons.

This module monitors the number of ranks and the number of standby daemons, and adjusts the number of MDS daemons that the orchestrator spawns.

To enable the MDS autoscaler module, use the following command:

[ceph: root@server /]# ceph mgr module enable mds_autoscaler

Replicating CephFS on Another Ceph Cluster

Red Hat Ceph Storage 5 supports CephFS multi-site configuration for geo-replication. Thus, you can replicate the CephFS file system on another Red Hat Ceph Storage cluster. With this feature, you can fail over to the secondary CephFS file system and restart the applications that use it. The CephFS file system mirroring feature requires the cephfs-mirror package.

Note

Both the source and target clusters must use Red Hat Ceph Storage version 5 or later.

The CephFS mirroring feature is snapshot-based. The first snapshot synchronization requires bulk transfer of the data from the source cluster to the remote cluster. Then, for the following synchronizations, the mirror daemon identifies the modified files between local snapshots and synchronizes those files in the remote cluster. This synchronization method is faster than other methods that require bulk transfer of the data to the remote cluster, because it does not need to query the remote cluster (file differences are calculated between local snapshots) and needs only to transfer the updated files to the remote cluster.

The CephFS mirroring module is disabled by default. To configure a snapshot mirror for CephFS, you must enable the mirroring module on the source and remote clusters:

[ceph: root@server /]# ceph mgr module enable mirroring

Then, you can deploy the CephFS mirroring daemon on the source cluster:

[ceph: root@source /]# ceph orch apply cephfs-mirror [node-name]

The previous command deploys the CephFS mirroring daemon on node-name and creates the Ceph user cephfs-mirror. For each CephFS peer, you must create a user on the target cluster:

[ceph: root@target /]# ceph fs authorize fs-name client_ / rwps

Then, you can enable mirroring on the source cluster. Mirroring must be enabled for a specific file system.

[ceph: root@source /]# ceph fs snapshot mirror enable fs-name

The next step is to prepare the target peer. You can create the peer bootstrap in the target node with the next command:

[ceph: root@target /]# ceph fs snapshot mirror peer_bootstrap create \
fs-name peer-name site-name

You can use the site-name string to identify the target storage cluster. When the target peer is created, you must import into the source cluster the bootstrap token from creating the peer on the target cluster:

[ceph: root@source /]# ceph fs snapshot mirror peer_bootstrap import \
fs-name bootstrap-token

Finally, configure a directory for snapshot mirroring on the source cluster with the following command:

[ceph: root@source /]# ceph fs snapshot mirror add fs-name path

References

mount.ceph(8), ceph-fuse(8), ceph(8), rados(8), and cephfs-mirror(8) man pages

For more information, refer to the Red Hat Ceph Storage 5 File System Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/file_system_guide/index

For more information regarding CephFS deployment, refer to the Deployment of the Ceph File System chapter in the Red Hat Ceph Storage 5 File System Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/file_system_guide/index#deployment-of-the-ceph-file-system

For more information regarding CephFS over NFS protocol, refer to the Exporting Ceph File System Namespaces over the NFS Protocol chapter in the https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/file_system_guide/index#exporting-ceph-file-system-namespaces-over-the-nfs-protocol_fs

Discuss Cloud Storage with Red Hat Ceph Storage

Go to community

Red Hat Ceph Storage for OpenStack (CL260)

Haley_Ruccio

3 sie 2023

Build, expand and maintain cloud-scale, clustered storage for your applications with Red Hat Ceph StorageCloud Storage with Red Hat Ceph Storage (CL260) is designed for storage administrators and cloud operators who deploy Red Hat Ceph Storage in a production data center environment or as a component of a Red Hat OpenStack Platform or OpenShift Container Platform infrastructure. Learn how to deploy, manage, and scale a Ceph storage cluster to provide hybrid storage resources, including Amazon S3 and OpenStack Swift-compatible object storage, Ceph-native and iSCSI-based block storage, and shared file storage. This course is based on Red Hat Ceph Storage 5.0.Course summaryDeploy and manage a Red Hat Ceph Storage cluster on commodity servers.Perform common management operations using the web-based management interface.Create, expand, and control access to storage pools provided by the Ceph cluster.Access Red Hat Ceph Storage from clients using object, block, and file-based methods.Analyze and tune Red Hat Ceph Storage performance.Integrate Red Hat OpenStack Platform image, object, block, and file storage with a Red Hat Ceph Storage cluster.Integrate OpenShift Container Platform with a Red Hat Ceph Storage cluster.Target AudienceThis course is intended for storage administrators and cloud operators who want to learn how to deploy and manage Red Hat Ceph Storage on servers in an enterprise data center or within a Red Hat OpenStack Platform or OpenShift Container Platform environment.Developers writing applications that use cloud-based storage will learn the distinctions of various storage types and client access methods.Recommended trainingTake our free assessment to gauge whether this offering is the best fit for your skills.Red Hat Certified System Administrator (RHCSA) certification, or equivalent experience.For candidates that have not earned an RHCSA or equivalent, confirmation of the correct skill set knowledge can be obtained by taking the online skills assessment.Some experience with storage administration is recommended but not required.Technology considerationsThis course does not have any special technical requirements.This course is not intended for BYOD.Internet access is recommended.

Welcome to the Red Hat Ceph Storage for OpenStack (CL260) group in the Red Hat Learning Community!

cschunke

31 lip 2023

We are excited to launch a space dedicated to the Red Hat Training course CL260! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to CL260.Read more about Red Hat Ceph Storage for OpenStack here.

381

Revision: cl260-5.0-29d2128