Bookmark this page

Troubleshooting Clusters and Clients

Objectives

After completing this section, you should be able to identify key tuning parameters and troubleshoot performance for Ceph clients, including RADOS Gateway, RADOS Block Devices, and CephFS.

Beginning Troubleshooting

The hardware backing a Ceph cluster is subject to failure over time. The data in your cluster becomes fragmented and requires maintenance. You should perform consistent monitoring and troubleshooting in your cluster to keep it in a healthy state. This section presents some practices that enable troubleshooting for various issues on a Ceph cluster. You can perform this initial troubleshooting of your cluster before contacting Red Hat Support.

Identifying Problems

When troubleshooting issues with Ceph, the first step is to determine which Ceph component is causing the problem. Sometimes, you can find this component in the information provided by the ceph health detail or ceph health status commands. Other times, you must investigate further to discover the issue. Verify a cluster's status to help determine if there is a single failure or an entire node failure.

The following troubleshooting checklist suggests next steps:

  • Identify the Ceph component causing the problem.

  • Set debug logging for the identified component and view the logs.

  • Verify that you have a supported configuration.

  • Determine if there are slow or stuck operations.

Troubleshooting Cluster Health

Red Hat Ceph Storage continually runs various health checks to monitor the health of the cluster. When a health check fails, the cluster health state changes to either HEALTH_WARN or HEALTH_ERR, depending on the severity and impact of the failed health checks. Red Hat Ceph Storage also logs the health check warnings and errors to the cluster logs.

The ceph status and ceph health commands show the cluster health status. When the cluster health status is HEALTH_WARN or HEALTH_ERR, use the ceph health detail command to view the health check message so that you can begin troubleshooting the issue.

[ceph: root@node /]# ceph health detail

Some health status messages indicate a specific issue; others provide a more general indication. For example, if the cluster health status changes to HEALTH_WARN and you see the health message HEALTH_WARN 1 osds down; Degraded data redundancy, then that is a clear indication of the problem.

Other health status messages might require further troubleshooting because they might indicate several possible root causes. For example, the following message indicates an issue that has multiple possible solutions:

[ceph: root@node /]# ceph health detail
HEALTH_WARN 1 pools have too few placement groups
[WRN] POOL_TOO_FEW_PGS: 1 pools have too few placement groups
    Pool testpool has 8 placement groups, should have 32

You can resolve this issue by changing the pg_num setting on the specified pool, or by reconfiguring the pg_autoscaler mode setting from warn to on so that Ceph automatically adjusts the number of PGs.

Ceph sends health messages regarding performance when a cluster performance health check fails. For example, OSDs send heartbeat ping messages to each other to monitor OSD daemon availability. Ceph also uses the OSD ping response times to monitor network performance. A single failed OSD ping message could mean a delay from a specific OSD, indicating a potential problem with that OSD. Multiple failed OSD ping messages might indicate a failure of a network component, such as a network switch between OSD hosts.

Muting Ceph Health Alerts

You might want to temporarily mute some of the cluster warnings because you already know of them and do not need to fix them yet. For example, if you bring down an OSD for maintenance, then the cluster reports a HEALTH_WARN status. You can mute this warning message so that the health check does not affect the overall reported status.

Ceph specifies the health check alert by using health check codes. For example, the previous HEALTH_WARN message shows the POOL_TOO_FEW_PGS health code.

To mute a health alert message, use the ceph health command.

[ceph: root@node /]# ceph health mute health-code [duration]

The health-code is the code provided by the ceph health detail command. The optional parameter duration is the time that the health message is muted, specified in seconds, minutes, or hours. You can unmute a health message with the ceph health unmute health-code command.

When you mute a health message, Ceph automatically unmutes the alert if the health status further degrades. For example, if your cluster reports one OSD down and you mute that alert, Ceph automatically removes the mute if another OSD goes down. Any health alerts that can be measured unmute.

Configuring Logging

If there is a problem in a specific area of your cluster, then you can enable logging for that area. For example, if your OSDs are running adequately but your metadata servers are not, enable debug logging for the specific metadata server instances. Enable logging for each subsystem as needed.

Adding debugging to your Ceph configuration is typically done temporarily during runtime. You can add Ceph debug logging to your Ceph configuration database if you encounter issues when starting your cluster. View Ceph log files under the default location /var/log/ceph. Ceph stores logs in a memory-based cache.

Warning

Logging is resource-intensive. Verbose logging can generate over 1 GB of data per hour. If your OS disk reaches its capacity, then the node stops working. When you fix your cluster issues, revert the logging configuration to default values. Consider setting up log file rotation.

Understanding Ceph Logs

Configure Ceph logging by using the ceph command at runtime. If you encounter errors when starting up the cluster, then you can update the Ceph configuration database so that it logs during startup.

You can set different logging levels for each subsystem in your cluster. Debug levels are on a scale of 1 to 20, where 1 is terse and 20 is verbose.

Ceph does not send memory-based logs to the output logs except in the following circumstances:

  • A fatal signal is raised.

  • An assert in code is triggered.

  • You request it.

To use different debug levels for the output log level and the memory level, use a slash (/) character. For example, debug_mon = 1/5 sets the output log level of the ceph-mon daemon to 1 and its memory log level to 5.

Configure Logging at Runtime

To activate debugging output at runtime, use the ceph tell command.

[ceph: root@node /]# ceph tell type.id config set debug_subsystem debug-level

The type and id arguments are the type of the Ceph daemon and its ID. The subsystem is the specific subsystem whose debug level you want to modify.

This example modifies the OSD 0 debug level for the messaging system between Ceph components:

[ceph: root@node /]# ceph tell osd.0 config set debug_ms 5

View the configuration settings at runtime as follows:

[ceph: root@node /]# ceph tell osd.0 config show

Configure Logging in the Configuration Database

Configure the subsystem debug levels so that they log to the default log file at boot time. Add the debugging settings to the Ceph configuration database by using the ceph config set command.

For example, add debug levels for specific Ceph daemons by setting these parameters in your Ceph configuration database:

[ceph: root@node /]# ceph config set global debug_ms 1/5
[ceph: root@node /]# ceph config set osd debug_ms 1
[ceph: root@node /]# ceph config set osd debug_osd 1/5
[ceph: root@node /]# ceph config set mon debug_mon 20

Setting Log File Rotation

Debug logging for Ceph components is resource-intensive and can generate a huge amount of data. If you have almost full disks, then accelerate log rotation by modifying the log rotation configuration at /etc/logrotate.d/ceph. The Cron job scheduler uses this file to schedule log rotation.

You can add a size setting after the rotation frequency, so that the log file is rotated when it reaches the specified size:

rotate 7
weekly
size size
compress
sharedscripts

Use the crontab command to add an entry to inspect the /etc/logrotate.d/ceph file.

[ceph: root@node /]# crontab -e

For example, you can instruct Cron to check /etc/logrotate.d/ceph every 30 minutes.

30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1

Troubleshooting Network Issues

Ceph nodes use the network for communicating with each other. Network issues can be the cause when OSDs are reported as down. Monitors with clock skew errors are a common cause of networking issues. A clock skew, or timing skew, is a phenomenon in synchronous digital circuit systems in which the same sourced clock signal arrives at different components at different times. If the difference between the readings is too far apart from what is configured in the cluster, then you get the clock skew error. This error can cause packet loss, high latency, or limited bandwidth, impacting cluster performance and stability.

A following network troubleshooting checklist suggests next steps:

  • Ensure that the cluster_network and public_network parameters in the cluster include correct values. You can retrieve their values by using the ceph config get mon cluster_network or ceph config get mon public_network commands, or by checking the ceph.conf file.

  • Verify that all network interfaces are functional.

  • Verify the Ceph nodes and verify they are able to reach each other using their host names.

  • Ensure that Ceph nodes are able to reach each other on their appropriate ports, if firewalls are used. Open the appropriate firewall ports if necessary.

  • Validate that network connectivity between hosts has the expected latency and no packet loss, for example, by using the ping command.

  • Slower connected nodes could slow down the faster ones. Verify that the inter-switch links can handle the accumulated bandwidth of the connected nodes.

  • Verify that NTP is working correctly in your cluster nodes. For example, you can check the information provided by the chronyc tracking command.

Troubleshooting Ceph Clients

The following list includes the most common problems that clients experience when accessing a Red Hat Ceph Storage cluster:

  • Monitors (MONs) are not available to the client.

  • Incorrect or missing command line arguments that result from using the CLI.

  • The /etc/ceph/ceph.conf file is incorrect, missing, or inaccessible.

  • Key-ring files are incorrect, missing, or inaccessible.

The ceph-common package provides bash tab completion for the rados, ceph, rbd, and radosgw-admin commands. You can access option and attribute completions by pressing the Tab key when you enter the command at the shell prompt.

Enabling and Changing Log Files

Increase the logging level when troubleshooting a client.

On the client system, you can add the debug_ms = 1 parameter to the configuration database by using the ceph config set client debug_ms 1 command. The Ceph client stores debug messages in the /var/log/ceph/ceph-client.id.log log file.

Most of the Ceph client commands, such as rados, ceph, or rbd, also accept the --debug-ms=1 option to execute only that command with an increased logging level.

Enabling the Client Admin Socket

By default, Ceph clients create a UNIX domain socket on start up. You can use this socket to communicate with the client to retrieve real-time performance data or to dynamically get or set a configuration parameter.

In the /var/run/ceph/fsid directory, there is a list of admin sockets for that host. Allow one admin socket per OSD, one for each MON, and one for each MGR. Administrators can use the ceph command with the --admin-daemon socket-patch option to query the client through the socket.

[ceph: root@node /]# sudo ls -al /var/run/ceph/fsid
total 0
drwxrwx---. 2  167  167 180 Oct 19 04:43 .
drwxr-xr-x. 3 root root  60 Oct 19 03:51 ..
srwxr-xr-x. 1  167  167   0 Oct 19 03:52 ceph-client.rgw.realm.zone.serverc.agsgpq.6.93951066330432.asok
srwxr-xr-x. 1  167  167   0 Oct 19 03:51 ceph-mgr.serverc.lab.example.com.aiqepd.asok
srwxr-xr-x. 1  167  167   0 Oct 19 03:51 ceph-mon.serverc.lab.example.com.asok
srwxr-xr-x. 1  167  167   0 Oct 19 03:51 ceph-osd.0.asok
srwxr-xr-x. 1  167  167   0 Oct 19 03:51 ceph-osd.1.asok
srwxr-xr-x. 1  167  167   0 Oct 19 03:51 ceph-osd.2.asok

The following example mounts a CephFS file system with the FUSE client, gets the performance counters, and sets the debug_ms configuration parameter to 1:

[root@host ~]# ceph-fuse -n client.admin /mnt/mountpoint
2021-10-19T09:23:57.914-0400 7f1e7b914200 -1 init, newargv = 0x55d703b17a00 newargc=15
ceph-fuse[54240]: starting ceph client
ceph-fuse[54240]: starting fuse
[root@host ~]# ls /var/run/ceph/
2ae6d05a-229a-11ec-925e-52540000fa0c  ceph-client.admin.54240.94381967377904.asok
[root@host ~]# ceph --admin-daemon \
/var/run/ceph/ceph-client.admin.54240.94381967377904.asok perf dump
{
    "AsyncMessenger::Worker-0": {
        "msgr_recv_messages": 4,
        "msgr_send_messages": 3,
        "msgr_recv_bytes": 10112,
        "msgr_send_bytes": 480,
        "msgr_created_connections": 2,
        "msgr_active_connections": 1,
        "msgr_running_total_time": 0.002775454,
        "msgr_running_send_time": 0.001042138,
        "msgr_running_recv_time": 0.000868150,
...output omitted...
[root@host ~]# ceph --admin-daemon \
/var/run/ceph/ceph-client.admin.54240.94381967377904.asok config show
...output omitted...
    "debug_ms": "0/0",
...output omitted...
[root@host ~]# ceph --admin-daemon \
/var/run/ceph/ceph-client.admin.54240.94381967377904.asok config set debug_ms 5
{
    "success": ""
}
[root@host ~]# ceph --admin-daemon \
/var/run/ceph/ceph-client.admin.54240.94381967377904.asok config show
...output omitted...
    "debug_ms": "5/5",
...output omitted...

Comparing Ceph Versions and Features

Earlier versions of Ceph clients might not benefit from features provided by the installed version of the Ceph cluster. For example, an earlier client might fail to retrieve data from an erasure-coded pool. Therefore, when upgrading a Ceph cluster, you should also update the clients. The RADOS Gateway, the FUSE client for CephFS, the librbd, or the command-line tools, such as RADOS or RBD, are examples of Ceph clients.

From a client, you can find the version of the running Ceph cluster with the ceph versions command:

[ceph: root@node /]# ceph versions
{
    "mon": {
        "ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 4
    },
    "mgr": {
        "ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 4
    },
    "osd": {
        "ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 9
    },
    "mds": {
        "ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 1
    },
    "rgw": {
        "ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 2
    },
    "overall": {
        "ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 20
    }
}

You can also list the supported level of features with the ceph features command.

If you cannot upgrade the clients, the ceph osd set-require-min-compat-client version-name command specifies the minimum client version that the Ceph cluster must support.

Using this minimum client setting, Ceph denies the use of features that are not compatible with the current client version. Historically, the main exception has been changes to CRUSH. For example, if you run the ceph osd set-require-min-compat-client jewel command, then you cannot use the ceph osd pg-upmap command. This fails because "Jewel" version clients do not support the PG upmap feature. Verify the minimum version required by your cluster with the ceph osd command:

[ceph: root@node /]# ceph osd get-require-min-compat-client
luminous

Working with Cephx

Red Hat Ceph Storage provides the Cephx protocol for cryptographic authentication. If Cephx is enabled, then Ceph looks for the key ring in the default /etc/ceph/ path.

Either enable Cephx for all components, or disable it completely. Ceph does not support a mixed setting, such as enabling Cephx for clients but disabling it for communication between the Ceph services. By default, Cephx is enabled and a client trying to access the Ceph cluster without Cephx receives an error message.

Important

Red Hat recommends using authentication in your production environment.

All Ceph commands authenticate as the client.admin user by default, although you can specify the user name or the user ID by using the --name and --id options.

Problems with Cephx are usually related to:

  • Incorrect permissions on the key ring or ceph.conf files.

  • Missing key ring and ceph.conf files.

  • Incorrect or invalid cephx permissions for a given user. Use the ceph auth list command to identify the issue.

  • Incorrect or misspelled user names, which you can also verify by using the ceph auth list command.

Troubleshooting Ceph Monitors

You can identify error messages by the ceph health detail command, or by reviewing the information provided by the Ceph logs.

The following is a list of the most common Ceph MON error messages:

mon.X is down (out of quorum)

If the Ceph MON daemon is not running, then an error is preventing the daemon from starting. For example, it is possible that the daemon has a corrupted store, or the /var partition might be full.

If the Ceph MON daemon is running but it is reported as down, then the cause depends on the MON state. If the Ceph MON is in the probing state longer than expected, then it cannot find the other Ceph Monitors. This problem can be caused by networking issues, or the Ceph Monitor can have an outdated Ceph Monitor map (monmap) is trying to reach the other Ceph Monitors on incorrect IP addresses.

If the Ceph MON is in the electing state longer than expected, then its clock might not be synchronized. If the state changes from synchronizing to electing, then it means that the Ceph MON is generating maps faster than the synchronization process can handle. If the state is either leader or peon, then the Ceph Mon has reached a quorum, but the rest of the cluster does not recognize a quorum. This problem is mainly caused by a failed clock synchronization, an improperly working network, or the NTP synchronization is not correct.

clock skew

This error message indicates that the clocks for the MON might not be synchronized. The mon_clock_drift_allowed parameter controls the maximum difference between clocks that your cluster allows before showing the warning message. This problem is mainly caused by a failed clock synchronization, an improperly working network, or the NTP synchronization is not correct.

mon.X store is getting too big!

Ceph MON shows this warning message when the store is too big and it delays the response to client queries.

Troubleshooting Ceph OSDs

Use the ceph status command to review your monitor's quorum. If the cluster shows a health status, then your cluster can form a quorum. If you do not have a monitor quorum, or if there are errors with the monitor status, address the monitor issues first, and then proceed to verify the network.

The following is a list of the most common Ceph OSD error messages:

full osds

Ceph returns the HEALTH_ERR full osds message when the cluster reaches the capacity set by the mon_osd_full_ratio parameter. By default, this parameter is set to 0.95 which means 95% of the cluster capacity.

Use the ceph df command to determine the percentage of used raw storage, given by the %RAW USED column. If the percentage of raw storage is above 70%, then you can delete unnecessary data or scale the cluster by adding new OSD nodes to reduce it.

nearfull osds

Ceph returns the nearfull osds message when the cluster reaches the capacity set by the mon_osd_nearfull_ratio default parameter. By default, this parameter is set to 0.85 which means 85% of the cluster capacity.

The main causes for this warning message are:

  • The OSDs are not balanced among the OSD nodes in the cluster.

  • The placement group count is not correct based on number of OSDs, use case, target PGs per OSD, and OSD utilization.

  • The cluster uses disproportionate CRUSH tunables.

  • The back-end storage for OSDs is almost full.

To troubleshoot this issue:

  • Verify that the PG count is sufficient.

  • Verify that you use CRUSH tunables optimal to the cluster version and adjust them if not.

  • Change the weight of OSDs by utilization.

  • Determine how much space is left on the disks used by OSDs.

osds are down

Ceph returns the osds are down message when OSDs are down or flapping. The main cause for this message is that one of the ceph-osd processes is unavailable due to a possible failure, or problems networking with other OSDs.

Troubleshooting the RADOS Gateway

You can troubleshoot the Ceph RESTful interface and some common RADOS Gateway issues.

Debugging the Ceph RESTful Interface

The radosgw daemon is a Ceph client that sits between the Ceph cluster and HTTP clients. It includes its own web server, Beast, which supports HTTP and HTTPS.

In case of errors, you should consult the log file in the /var/log/ceph/ folder.

To log to a file, set the log_to_file parameter to true. You can update the location of the log file and the log level by using the log_file and debug parameters, respectively. You can also enable the rgw_enable_ops_log and rgw_enable_usage_log parameters in the Ceph configuration database to log each successful RADOS Gateway operation and the usage, respectively.

[ceph: root@node /]# ceph config set client.rgw \
log_file /var/log/ceph/ceph-rgw-node.log
[ceph: root@node /]# ceph config set client.rgw log_to_file true
[ceph: root@node /]# ceph config set client.rgw debug_rgw 20
[ceph: root@node /]# ceph config set client.rgw rgw_enable_ops_log true
[ceph: root@node /]# ceph config set global rgw_enable_usage_log true

Verify the debugging logs using the radosgw-admin log list command. This command provides a list of the log objects that are available. View log file information using the radosgw-admin log show command. To retrieve the information directly from the log object, add the --object parameter with the object ID. To retrieve the information on the bucket at the timestamp, add the --bucket, --date, and --bucket-id parameters, which refer to the bucket name, the timestamp, and the bucket ID.

Common RADOS Gateway Issues

The most common error in RADOS Gateway is time skew between the client and the RADOS Gateway because the S3 protocol uses date and time for signing each request. To avoid this problem, use NTP on both Ceph and client nodes.

You can verify issues on RADOS Gateway request completion by looking for HTTP status lines in the RADOS Gateway log file.

The RADOS Gateway is a Ceph client that stores all of its configuration in RADOS objects. The RADOS PGs holding this configuration data must be in the active+clean state. If the state is not active+clean, then Ceph I/O requests will hang if the primary OSD becomes unable to serve data, and HTTP clients will eventually time out. Identify the inactive PGs with the ceph health detail command.

Troubleshooting CephFS

A CephFS Metadata Server (MDS) maintains a cache shared with its clients, FUSE, or the kernel so that an MDS can delegate part of its cache to clients. For example, a client accessing an inode can locally manage and cache changes to that object. If another client also requests access to the same inode, the MDS can request that the first client update the server with the new metadata.

To maintain cache consistency, an MDS requires a reliable network connection with its clients. Ceph can automatically disconnect, or evict, unresponsive clients. When this occurs, unflushed client data is lost.

When a client tries to gain access to CephFS, the MDS requests the client that has the current capabilities to release them. If the client is unresponsive, then CephFS shows an error message after a timeout. You can configure the timeout by using the session_timeout attribute with the ceph fs set command. The default value is 60 seconds.

The session_autoclose attribute controls eviction. If a client fails to communicate with the MDS for more than the default 300 seconds, then the MDS evicts it.

Ceph temporarily bans evicted clients so that they cannot reconnect. If this ban occurs, you must reboot the client system or unmount and remount the file system to reconnect.

 

References

For more information, refer to the Configuring Logging chapter in the Troubleshooting Guide for Red Hat Ceph Storage at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/troubleshooting_guide/index#configuring-logging

For more information, refer to the Troubleshooting Guide of the Red Hat Customer Portal Ceph Storage Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/troubleshooting_guide/index

Revision: cl260-5.0-29d2128