In this exercise, you will perform common administration operations on a Red Hat Ceph Storage cluster.
Outcomes
You should be able to administer and monitor the cluster, including starting and stopping specific services, analyzing placement groups, setting OSD primary affinity, verifying daemon versions, and querying cluster health and utilization.
As the student user on the workstation machine, use the lab command to prepare your system for this exercise.
[student@workstation ~]$ lab start cluster-admin
This command confirms that the hosts required for this exercise are accessible.
Procedure 11.1. Instructions
Log in to clienta as the admin user and use sudo to run the cephadm shell.
[student@workstation ~]$ssh admin@clienta[admin@clienta ~]$sudo cephadm shell[ceph: root@clienta /]#
View the enabled MGR modules.
Verify that the dashboard module is enabled.
[ceph: root@clienta /]#ceph mgr module ls | more{ "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ],"enabled_modules": [ "cephadm","dashboard", "insights", "iostat", "prometheus", "restful" ], "disabled_modules": [ { "name": "alerts", "can_run": true, "error_string": "", "module_options": { ...output omitted...
Obtain the dashboard URL for the active MGR node.
[ceph: root@clienta /]# ceph mgr services
{
"dashboard": "https://172.25.250.12:8443/",
"prometheus": "http://172.25.250.12:9283/"
}View the status of the Monitors on the Ceph Dashboard page.
Using a web browser, navigate to the dashboard URL obtained in the previous step.
Log in as the admin user with the redhat password.
On the page, click to view the status of the Monitor nodes and quorum.
View the status of all OSDs in the cluster.
[ceph: root@clienta /]#ceph osd stat9 osds:9 up(since 38m),9 in(since 38m); epoch: e294
Find the location of the OSD 2 daemon, stop the OSD, and view the cluster OSD status.
Find the location of the OSD 2 daemon.
[ceph: root@clienta /]#ceph osd find 2{ "osd": 2, "addrs": { "addrvec": [ { "type": "v2", "addr": "172.25.250.12:6808", "nonce": 2361545815 }, { "type": "v1", "addr": "172.25.250.12:6809", "nonce": 2361545815 } ] }, "osd_fsid": "1163a19e-e580-40e0-918f-25fd94e97b86", "host": "serverc.lab.example.com", "crush_location": {"host": "serverc", "root": "default" } }
Log in to the serverc node.
Stop the OSD 2 daemon.
[ceph: root@clienta /]#ssh admin@servercadmin@serverc's password:redhat[admin@serverc ~]$sudo systemctl list-units "ceph*"UNIT LOAD ACTIVE SUB DESCRIPTION ...output omitted... ceph-ff97a876-1fd2-11ec-8258-52540000fa0c@osd.0.service loaded active running Ceph osd.0 for ff97a876-1fd2-11ec-8258-52540000fa0c ceph-ff97a876-1fd2-11ec-8258-52540000fa0c@osd.1.service loaded active running Ceph osd.1 for ff97a876-1fd2-11ec-8258-52540000fa0cceph-ff97a876-1fd2-11ec-8258-52540000fa0c@osd.2.serviceloaded active running Ceph osd.2 for ff97a876-1fd2-11ec-8258-52540000fa0c ...output omitted... [admin@serverc ~]$sudo systemctl stop \ ceph-ff97a876-1fd2-11ec-8258-52540000fa0c@osd.2.service
Exit the serverc node.
View the cluster OSD status.
[admin@serverc ~]$exit[ceph: root@clienta /]#ceph osd stat9 osds:8 up(since 24s),9 in(since 45m); epoch: e296
Start osd.2 on the serverc node, and then view the cluster OSD status.
[ceph: root@clienta /]#ssh admin@serverc sudo systemctl start \ ceph-ff97a876-1fd2-11ec-8258-52540000fa0c@osd.2.serviceadmin@serverc's password:redhat[ceph: root@clienta /]#ceph osd stat9 osds:9 up(since 6s),9 in(since 47m); epoch: e298
View the log files for the osd.2 daemon.
Filter the output to view only systemd events.
[ceph: root@clienta /]#ssh admin@serverc sudo journalctl \ -u ceph-ff97a876-1fd2-11ec-8258-52540000fa0c@osd.2.service | grep systemdadmin@serverc's password:redhat...output omitted... Sep 30 01:57:36 serverc.lab.example.com systemd[1]: Stopping Ceph osd.2 for ff97a876-1fd2-11ec-8258-52540000fa0c... Sep 30 01:57:37 serverc.lab.example.com systemd[1]: ceph-ff97a876-1fd2-11ec-8258-52540000fa0c@osd.2.service: Succeeded. Sep 30 01:57:37 serverc.lab.example.com systemd[1]: Stopped Ceph osd.2 for ff97a876-1fd2-11ec-8258-52540000fa0c. Sep 30 02:00:12 serverc.lab.example.com systemd[1]: Starting Ceph osd.2 for ff97a876-1fd2-11ec-8258-52540000fa0c... Sep 30 02:00:13 serverc.lab.example.com systemd[1]: Started Ceph osd.2 for ff97a876-1fd2-11ec-8258-52540000fa0c.
Mark the osd.4 daemon as being out of the cluster and observe how it affects the cluster status.
Then, mark the osd.4 daemon as being in the cluster again.
Mark the osd.4 daemon as being out of the cluster.
Verify that the osd.4 daemon is marked out of the cluster and notice that the OSD's weight is now 0.
[ceph: root@clienta /]#ceph osd out 4marked out osd.4. [ceph: root@clienta /]#ceph osd stat9 osds:9 up(since 2m),8 in(since 3s); epoch: e312 [ceph: root@clienta /]#ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host serverc 0 hdd 0.00980 osd.0 up 1.00000 1.00000 1 hdd 0.00980 osd.1 up 1.00000 1.00000 2 hdd 0.00980 osd.2 up 1.00000 1.00000 -7 0.02939 host serverd 3 hdd 0.00980 osd.3 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 1.00000 -5 0.02939 host servere 4 hdd 0.00980 osd.4 up 0 1.00000 6 hdd 0.00980 osd.6 up 1.00000 1.00000 8 hdd 0.00980 osd.8 up 1.00000 1.00000
Ceph recreates the missing object replicas previously available on the osd.4 daemon on different OSDs.
You can trace the recovery of the objects using the ceph status or the ceph -w commands.
Mark the osd.4 daemon as being in again.
[ceph: root@clienta /]# ceph osd in 4
marked in osd.4.You can mark an OSD as out even though it is still running (up).
The in or out status does not correlate to an OSD's running state.
Analyze the current utilization and number of PGs on the OSD 2 daemon.
[ceph: root@clienta /]#ceph osd df treeID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 0.08817 - 90 GiB 256 MiB 36 MiB 56 KiB 220 MiB 90 GiB 0.28 1.00 - root default -3 0.02939 - 30 GiB 71 MiB 12 MiB 20 KiB 59 MiB 30 GiB 0.23 0.83 - host serverc 0 hdd 0.00980 1.00000 10 GiB 26 MiB 4.0 MiB 11 KiB 22 MiB 10 GiB 0.25 0.91 68 up osd.0 1 hdd 0.00980 1.00000 10 GiB 29 MiB 4.0 MiB 6 KiB 25 MiB 10 GiB 0.28 1.01 74 up osd.12hdd 0.00980 1.00000 10 GiB 16 MiB 3.9 MiB 3 KiB 12 MiB 10 GiB0.160.5759uposd.2...output omitted... TOTAL 90 GiB 256 MiB 36 MiB 61 KiB 220 MiB 90 GiB 0.28 MIN/MAX VAR: 0.57/1.48 STDDEV: 0.06
View the placement group status for the cluster. Create a test pool and a test object. Find the placement group to which the test object belongs and analyze that placement group's status.
View the placement group status for the cluster. Examine the PG states. Your output may be different in your lab environment.
[ceph: root@clienta /]# ceph pg stat
201 pgs: 201 active+clean; 8.6 KiB data, 261 MiB used, 90 GiB / 90 GiB avail; 511 B/s rd, 0 op/sCreate a pool called testpool and an object called testobject containing the /etc/ceph/ceph.conf file.
[ceph: root@clienta /]#ceph osd pool create testpool 32 32pool 'testpool' created [ceph: root@clienta /]#rados -p testpool put testobject /etc/ceph/ceph.conf
Find the placement group of the testobject object in the testpool pool and analyze its status.
Use the placement group information from your lab environment in the query.
[ceph: root@clienta /]#ceph osd map testpool testobjectosdmap e332 pool'testpool'(9) object'testobject'-> pg 9.98824931 (9.11) -> up ([8,2,5], p8) acting ([8,2,5], p8) [ceph: root@clienta /]#ceph pg{ "snap_trimq": "[]", "snap_trimq_len": 0, "state": "active+clean", "epoch": 334, "up": [ 8, 2, 5 ], "acting": [ 8, 2, 5 ], "acting_recovery_backfill": [ "2", "5", "8" ], "info": { "pgid": "9.11", ...output omitted...9.11query
List the OSD and cluster daemon versions. This is a useful command to run after cluster upgrades.
List all cluster daemon versions.
[ceph: root@clienta /]# ceph versions
{
"mon": {
"ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 4
},
"mgr": {
"ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 4
},
"osd": {
"ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 9
},
"mds": {
"ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 3
},
"rgw": {
"ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 2
},
"overall": {
"ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)": 22
}
}List all OSD versions.
[ceph: root@clienta /]# ceph tell osd.* version
osd.0: {
"version": "16.2.0-117.el8cp",
"release": "pacific",
"release_type": "stable"
}
osd.1: {
"version": "16.2.0-117.el8cp",
"release": "pacific",
"release_type": "stable"
}
osd.2: {
"version": "16.2.0-117.el8cp",
"release": "pacific",
"release_type": "stable"
}
...output omitted...View the balancer status.
[ceph: root@clienta /]# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.001072",
"last_optimize_started": "Thu Sep 30 06:07:53 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}Return to workstation as the student user.
[ceph: root@clienta /]#exit[admin@clienta ~]$exit[student@workstation ~]$
This concludes the guided exercise.