Abstract
| Goal | Configure Red Hat Ceph Storage to provide file storage for clients using the Ceph File System (CephFS). |
| Objectives |
|
| Sections |
|
| Lab |
Providing File Storage with CephFS |
After completing this section, you should be able to provide file storage on the Ceph cluster by deploying the Ceph File System (CephFS).
The Ceph File System (CephFS) is a POSIX-compliant file system that is built on top of RADOS, Ceph's distributed object store.
File-based storage organizes your data as a traditional file system, with a directory tree hierarchy.
Implementing the Ceph File System requires a running Ceph storage cluster and at least one Ceph Metadata Server (MDS) to manage the CephFS metadata, separately from file data, which reduces complexity and improves reliability.
Similar to RBD and RGW, the CephFS daemon is implemented as a native interface to librados.
File-based storage organizes your data as a traditional file system. Data is saved as files with a name and associated metadata, such as modification time stamps, an owner, and access permissions. File-based storage uses a directory tree hierarchy to organize how files are stored.
With object-based storage, you can store arbitrary data and metadata as a unit that is labeled with a unique identifier in a flat storage pool. Rather than accessing data as blocks or in a file-system hierarchy, you use an API to store and retrieve objects. Fundamentally, the Red Hat Ceph Storage RADOS cluster is an object store.
The Metadata Server (MDS) manages metadata for CephFS clients. This daemon provides information that CephFS clients need to access RADOS objects, such as providing file locations within the file-system tree. MDS manages the directory hierarchy and stores file metadata, such as the owner, time stamps, and permission modes, in a RADOS cluster. MDS is also responsible for access caching and managing client caches to maintain cache coherence.
If you do not create enough MDS pools to match the number of configured standby daemons, then the Ceph cluster displays a WARN health status.
The recommended solution is create more MDS pools to provide a pool for each daemon.
However, a temporary solution is to set the number of standby pools to 0, which disables the Ceph MDS standby check through the ceph fs set command.fs-name standby_count_wanted 0
CephFS clients first contact a MON to authenticate and retrieve the cluster map. Then, the client queries an active MDS for file metadata. The client uses the metadata to access the objects that comprise the requested file or directory by communicating directly with the OSDs.
MDS features and configuration options are described in the following list:
MDS ranks define how the metadata workload is distributed over the MDS daemons.
The number of ranks, which is defined by the max_mds configuration setting, is the maximum number of MDS daemons that can be active at a time.
MDS daemons start without a rank and the MON daemon is responsible for assigning them a rank.
CephFS subvolumes are an abstraction for independent CephFS file system directory trees. When creating subvolumes, you can specify more fine-grained rights management, such as the UID, GID, file mode, size, and the subvolume group for your subvolume. Subvolume groups are abstractions at a directory level across a set of subvolumes.
You can create snapshots of subvolumes, but Red Hat Ceph Storage 5 does not support creating snapshots of subvolume groups. You can list and remove existing snapshots of subvolume groups.
Configure your CephFS file system to prefer one MDS over another MDS.
For example, you can configure to prefer an MDS that runs on a faster server over another MDS that runs on an older server.
This file system affinity is configured through the mds_join_fs option.
Limit the size of the MDS cache by limiting the maximum memory to use with the mds_cache_memory_limit option, or by defining the maximum number of inodes with the mds_cache_size option.
Configure your CephFS file system to restrict the number of bytes or files that are stored by using quotas.
Both the FUSE and kernel clients support checking quotas when mounting a CephFS file system.
These clients are also responsible for stopping writing data to the CephFS file system when the user reaches the quota limit.
Use the setfattr command's ceph.quota.max_bytes and ceph.quota.max_files options to set the limits.
Red Hat Ceph Storage 5 removes limitations from earlier versions.
Red Hat Ceph Storage 5 supports more than one active MDS in a cluster, which can increase metadata performance. To remain highly available, you can configure additional standby MDSes to take over from any active MDS that fails.
Red Hat Ceph Storage 5 supports more than one CephFS file system in a cluster. Deploying more than one CephFS file system requires running more MDS daemons.
To implement a CephFS file system, create the required pools, create the CephFS file system, deploy the MDS daemons, and then mount the file system.
You can manually create the pools, create the CephFS file system, and deploy the MDS daemons, or use the ceph fs volume create command, which does all these steps automatically.
The first option gives the system administrator more control over the process, but with more steps than the simpler ceph fs volume create command.
Use ceph fs volume to directly create the CephFS volume.
This command creates pools that are associated to the CephFS, creates the CephFS volume, and also starts the MDS service on the hosts.
[ceph: root@server /]# ceph fs volume create fs-name \
--placement="number-of-hosts list-of-hosts"For more control over the deploying process, manually create the pools that are associated to the CephFS, start the MDS service on the hosts, and create the CephFS file system.
A CephFS file system requires at least two pools, one to store CephFS data, and another to store CephFS metadata.
The default names for these two pools are cephfs_data and cephfs_metadata.
To create a CephFS file system, first create the two pools.
[ceph: root@server /]#ceph osd pool create cephfs_data[ceph: root@server /]#ceph osd pool create cephfs_metadata
This example creates two pools with standard parameters. Because the metadata pool stores file location information, consider a higher replication level for this pool to avoid data errors that render your data inaccessible.
By default, Ceph uses replicated data pools.
However, erasure-coded data pools are now also supported for CephFS file systems.
Create an erasure-coded pool with the ceph osd pool command:
[ceph: root@server /]# ceph osd pool create pool-name erasureWhen the data and metadata pools are available, use the ceph fs new command to create the file system, as follows:
[ceph: root@server /]# ceph fs new fs-name metadata-pool data-poolTo add an existing erasure pool as a data pool in your CephFS file system, use ceph fs add_data_pool.
[ceph: root@server /]# ceph fs add_data_pool fs-name data-poolYou can then deploy the MDS service:
[ceph: root@server /]# ceph orch apply mds fs-name \
--placement="number-of-hosts list-of-hosts"Use the Ceph Orchestrator to deploy the MDS service with the service specification. First, manually create the two required pools. Then, create a YAML file with the service details:
service_type: mds
service_id: fs-name
placements:
hosts:
- host-name-1
- host-name-2
- ...Use the YAML service specification to deploy the MDS service with the ceph orch apply command:
[ceph: root@server /]# ceph orch apply -i file-name.ymlFinally, create the CephFS file system with the ceph fs new command.
You can mount CephFS file systems with either of the available clients:
The kernel client
The FUSE client
The kernel client requires a Linux kernel version 4 or later, which is available starting with RHEL 8. For previous kernel versions, use the FUSE client instead.
To mount a CephFS-based file system with either client, verify the following prerequisites on the client host.
Install the ceph-common package.
For the FUSE client, also install the ceph-fuse package.
Verify that the Ceph configuration file exists (/etc/ceph/ceph.conf by default).
Authorize the client to access the CephFS file system.
Extract the new authorization key with the ceph auth get command and copy it to the /etc/ceph folder on the client host.
When using the FUSE client as a non-root user, add user_allow_other in the /etc/fuse.conf configuration file.
When the prerequisites are met, use the FUSE client to mount and unmount a CephFS file system:
[root@node ~]# ceph-fuse [mount-point] [options]To provide the key ring for a specific user, use the --id option.
You must authorize the client to access the CephFS file system, by using the ceph fs authorize command:
[ceph: root@server /]# ceph fs authorize fs-name client-name path permissionsWith the ceph fs authorize command, you can provide fine-grained access control for different users and folders in the CephFS file system.
You can set different options for folders in a CephFS file system:
r: Read access to the specified folder.
Read access is also granted to the subfolders, if no other restriction is specified.
w: Write access to the specified folder.
Write access is also granted to the subfolders, if no other restriction is specified.
p: Clients require the p option in addition to r and w capabilities to use layouts or quotas.
s: Clients require the s option in addition to r and w capabilities to create snapshots.
This example allows one user to read the root folder, and also provides read, write, and snapshot permissions to the /directory folder.
[ceph: root@server /]# ceph fs authorize mycephfs client.user / r /directory rwsBy default, the CephFS FUSE client mounts the root directory (/) of the accessed file system.
You can mount a specific directory with the ceph-fuse -r command.directory
When you try to mount a specific directory, this operation fails if the directory does not exist in the CephFS volume.
When more than one CephFS file system is configured, the CephFS FUSE client mounts the default CephFS file system.
To use a different file system, use the --client_fs option.
To persistently mount your CephFS file system by using the FUSE client, you can add the following entry to the /etc/fstab file:
host-name:_port_mount-pointfuse.ceph ceph.id=myuser,ceph.client_mountpoint=mountpoint,_netdev 0 0
Use the umount command to unmount the file system:
[root@node ~]# umount mount-pointWhen using the CephFS kernel client, use the following command to mount the file system:
[root@node ~]# mount -t ceph [device]:[path] [mount-point] \
-o [key-value] [other-options]You must authorize the client to access the CephFS file system, with the ceph fs authorize command.
Extract the client key with the ceph auth get command, and then copy the key to the /etc/ceph folder on the client host.
With the CephFS kernel client, you can mount a specific subdirectory from a CephFS file system.
This example mounts a directory called /dir/dir2 from the root of a CephFS file system:
[root@node ~]# mount -t ceph mon1:/dir1/dir2 mount-pointYou can specify a list of several comma-separated MONs to mount the device. The standard port (6789) is the default, or you can add a colon and a nonstandard port number after the name of each MON. Recommended practice is to specify more than one MON in case that some are offline when the file system is mounted.
These other options are available when using the CephFS kernel client:
Table 10.1. CephFS Kernel Client Mount Options
| Option name | Description |
|---|---|
name=
| The Cephx client ID to use.
The default is guest. |
fs=
| The name of the CephFS file system to mount. When no value is provided, it uses the default file system. |
secret=
| Value of the secret key for this client. |
secretfile=
| The path to the file with the secret key for this client. |
rsize=
| Specify the maximum read size in bytes. |
wsize=
| Specify the maximum write size in bytes. The default is none. |
To persistently mount your CephFS file system by using the kernel client, you can add the following entry to the /etc/fstab file:
mon1,mon2:/ mount_point ceph name=user1,secretfile=/root/secret,_netdev 0 0Use the umount command to unmount the file system:
[root@node ~]# umount mount_pointYou can remove a CephFS if needed. However, first back up all your data, because removing your CephFS file system destroys all the stored data on that file system.
The procedure to remove a CephFS is first to mark it as down, as follows:
[ceph: root@server /]# ceph fs set fs-name down trueThen, you can remove it with the next command:
[ceph: root@server /]# ceph fs rm fs-name --yes-i-really-mean-itRed Hat Ceph Storage 5 provides access to Ceph storage from an NFS client, with NFS Ganesha. NFS Ganesha is a user space NFS file server that supports multiple protocols, such as NFSv3, NFSv4.0, NFSv4.1, and pNFS. NFS Ganesha uses a File System Abstraction Layer (FSAL) architecture, to support and share files from multiple file systems or lower-level storage, such as Ceph, Samba, Gluster, and Linux file systems such as XFS.
In Red Hat Ceph Storage, NFS Ganesha shares files with the NFS 4.0 or later protocol. This requirement is necessary for proper feature functioning by the CephFS client, the OpenStack Manila File Sharing service, and other Red Hat products that are configured to access the NFS Ganesha service.
The following list outlines the advantages of a user space NFS server:
The server does not implement system calls.
Caching is defined and used more efficiently.
Service failover and restarting are faster and easier to implement.
User space services can be clustered easily for high availability.
You can use distributed lock management (DLM) to allow multiple client protocols.
Debugging of server issues is simpler, so you do not need to create kernel dumps.
Resource management and performance monitoring are simpler.
You can deploy NFS Ganesha in an active-active configuration on top of an existing CephFS file system through the ingress service. The main goal of this active-active configuration is for load balancing, and scaling to many instances that handle higher loads. Thus, if one node fails, then the cluster redirects all the workload to the rest of the nodes.
System administrators can deploy the NFS Ganesha daemons via the CLI or manage them automatically if either the Cephadm or Rook orchestrators are enabled.
The following list outlines the advantages to having an ingress service on top of an existing NFS service:
A virtual IP to access the NFS server.
Migration of the NFS service to another node if one node fails, providing shorter failover times.
Load balancing across the NFS nodes.
The ingress implementation is not yet completely developed. It can deploy multiple Ganesha instances and balance the load between them, but failover between hosts is not yet fully implemented. This feature is expected to be available in future releases.
You can use multiple active-active NFS Ganesha services with Pacemaker for high availability. The Pacemaker component is responsible for all cluster-related activities, such as monitoring cluster membership, managing the services and resources, and fencing cluster members.
As prerequisites, create a CephFS file system and install the nfs-ganesha, nfs-ganesha-ceph, nfs-ganesha-rados-grace, and nfs-ganesha-rados-urls packages on the Ceph MGR nodes.
After the prerequisites are satisfied, enable the Ceph MGR NFS module:
[ceph: root@server /]# ceph mgr module enable nfsThen, create the NFS Ganesha cluster:
[ceph: root@server /]# ceph nfs cluster create cluster-name "node-list"The node-list is a comma-separated list where the daemon containers are deployed.
Next, export the CephFS file system:
[ceph: root@server /]# ceph nfs export create cephfs fs-name \
cluster-name pseudo-pathThe pseudo-path parameter is the pseudo root path.
Finally, mount the exported CephFS file system on a client node.
[root@node ~]# mount -t nfs -o port=ganesha-port node-name:_pseudo-path_ pathCephFS shared file systems require at least one active MDS service for correct operation, and at least one standby MDS to ensure high availability. The MDS autoscaler module ensures the availability of enough MDS daemons.
This module monitors the number of ranks and the number of standby daemons, and adjusts the number of MDS daemons that the orchestrator spawns.
To enable the MDS autoscaler module, use the following command:
[ceph: root@server /]# ceph mgr module enable mds_autoscalerRed Hat Ceph Storage 5 supports CephFS multi-site configuration for geo-replication.
Thus, you can replicate the CephFS file system on another Red Hat Ceph Storage cluster.
With this feature, you can fail over to the secondary CephFS file system and restart the applications that use it.
The CephFS file system mirroring feature requires the cephfs-mirror package.
Both the source and target clusters must use Red Hat Ceph Storage version 5 or later.
The CephFS mirroring feature is snapshot-based. The first snapshot synchronization requires bulk transfer of the data from the source cluster to the remote cluster. Then, for the following synchronizations, the mirror daemon identifies the modified files between local snapshots and synchronizes those files in the remote cluster. This synchronization method is faster than other methods that require bulk transfer of the data to the remote cluster, because it does not need to query the remote cluster (file differences are calculated between local snapshots) and needs only to transfer the updated files to the remote cluster.
The CephFS mirroring module is disabled by default.
To configure a snapshot mirror for CephFS, you must enable the mirroring module on the source and remote clusters:
[ceph: root@server /]# ceph mgr module enable mirroringThen, you can deploy the CephFS mirroring daemon on the source cluster:
[ceph: root@source /]# ceph orch apply cephfs-mirror [node-name]The previous command deploys the CephFS mirroring daemon on node-name and creates the Ceph user cephfs-mirror.
For each CephFS peer, you must create a user on the target cluster:
[ceph: root@target /]# ceph fs authorize fs-name client_ / rwpsThen, you can enable mirroring on the source cluster. Mirroring must be enabled for a specific file system.
[ceph: root@source /]# ceph fs snapshot mirror enable fs-nameThe next step is to prepare the target peer. You can create the peer bootstrap in the target node with the next command:
[ceph: root@target /]# ceph fs snapshot mirror peer_bootstrap create \
fs-name peer-name site-nameYou can use the site-name string to identify the target storage cluster. When the target peer is created, you must import into the source cluster the bootstrap token from creating the peer on the target cluster:
[ceph: root@source /]# ceph fs snapshot mirror peer_bootstrap import \
fs-name bootstrap-tokenFinally, configure a directory for snapshot mirroring on the source cluster with the following command:
[ceph: root@source /]# ceph fs snapshot mirror add fs-name path
mount.ceph(8), ceph-fuse(8), ceph(8), rados(8), and cephfs-mirror(8) man pages
For more information, refer to the Red Hat Ceph Storage 5 File System Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/file_system_guide/index
For more information regarding CephFS deployment, refer to the Deployment of the Ceph File System chapter in the Red Hat Ceph Storage 5 File System Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/file_system_guide/index#deployment-of-the-ceph-file-system
For more information regarding CephFS over NFS protocol, refer to the Exporting Ceph File System Namespaces over the NFS Protocol chapter in the https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/file_system_guide/index#exporting-ceph-file-system-namespaces-over-the-nfs-protocol_fs