In this lab, you will perform common administration and maintenance operations on a Red Hat Ceph Storage cluster.
Outcomes
You should be able to locate the Ceph Dashboard URL, set an OSD out and in, watch cluster events, find and start a down OSD, find an object's PG location and state, and view the balancer status.
As the student user on the workstation machine, use the lab command to prepare your system for this lab.
[student@workstation ~]$ lab start cluster-review
This command confirms that the hosts required for this exercise are accessible.
Procedure 11.3. Instructions
Log in to clienta as the admin user.
Verify that the dashboard module is enabled.
Find the dashboard URL of the active MGR.
Log in to clienta as the admin user and use sudo to run the cephadm shell.
[student@workstation ~]$ssh admin@clienta[admin@clienta ~]$sudo cephadm shell[ceph: root@clienta /]#
Verify that the dashboard module is enabled.
[ceph: root@clienta /]#ceph mgr module ls | more{ ...output omitted... "enabled_modules": [ "cephadm","dashboard", "iostat", "prometheus", "restful" ], ...output omitted...
Find the dashboard URL of the active MGR.
[ceph: root@clienta /]# ceph mgr services
{
"dashboard": "https://172.25.250.12:8443/",
"prometheus": "http://172.25.250.12:9283/"
}Your output might be different depending on which MGR node is active in your lab environment.
You receive an alert that an OSD is down. Identify which OSD is down. Identify on which node the down OSD runs, and start the OSD.
Verify cluster health.
[ceph: root@clienta /]#ceph health detailHEALTH_WARN 1 osds down; Degraded data redundancy: 72/666 objects degraded (10.811%), 14 pgs degraded, 50 pgs undersized [WRN] OSD_DOWN: 1 osds down osd.6 (root=default,host=servere) is down [WRN] PG_DEGRADED: Degraded data redundancy: 72/666 objects degraded (10.811%), 14 pgs degraded, 50 pgs undersized pg 2.0 is stuck undersized for 61s, current state active+undersized, last acting [3,0] pg 2.1 is stuck undersized for 61s, current state active+undersized, last acting [2,3] pg 2.6 is stuck undersized for 61s, current state active+undersized, last acting [1,3] pg 2.7 is stuck undersized for 61s, current state active+undersized, last acting [3,2] ...output omitted...
Identify which OSD is down.
[ceph: root@clienta /]# ceph osd tree | grep -i down
6 hdd 0.00980 osd.6 down 1.00000 1.00000Identify on which host the down OSD runs.
[ceph: root@clienta /]# ceph osd find osd.6 | grep host
"host": "servere.lab.example.com",
"host": "servere",Start the OSD.
[ceph: root@clienta /]# ceph orch daemon start osd.6
Scheduled to start osd.6 on host 'servere.lab.example.com'Verify that the OSD is up.
[ceph: root@clienta /]# ceph osd tree | grep osd.6
6 hdd 0.00980 osd.6 up 1.00000 1.00000Set the OSD 5 daemon to the out state and verify that all data has been migrated off of the OSD.
Set the OSD 5 daemon to the out state.
[ceph: root@clienta /]# ceph osd out 5
marked out osd.5.Verify that all PGs have been migrated off of the OSD 5 daemon. It will take some time for the data migration to finish. Press CTL+C to exit the command.
[ceph: root@clienta /]#ceph -wcluster: id: 2ae6d05a-229a-11ec-925e-52540000fa0c health: HEALTH_WARN Reduced data availability: 5 pgs peering Degraded data redundancy: 1/663 objects degraded (0.151%), 1 pg degraded services: mon: 4 daemons, quorum serverc.lab.example.com,clienta,serverd,servere (age 9h) mgr: serverc.lab.example.com.aiqepd(active, since 9h), standbys: serverd.klrkci, servere.kjwyko, clienta.nncugs osd: 9 osds: 9 up (since 46s), 8 in (since 7s); 4 remapped pgs rgw: 2 daemons active (2 hosts, 1 zones) data: pools: 5 pools, 105 pgs objects: 221 objects, 4.9 KiB usage: 235 MiB used, 80 GiB / 80 GiB avail pgs: 12.381% pgs not active 1/663 objects degraded (0.151%) 92 active+clean 10 remapped+peering 2 activating 1 activating+degraded io: recovery: 199 B/s, 0 objects/s progress: Global Recovery Event (2s) [............................] 2021-03-28 21:23:25.557849 mon.serverc [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY) 2021-03-28 21:23:25.557884 mon.serverc [INF]Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 36/2163 objects degraded (1.664%), 5 pgs degraded) 2021-03-28 21:23:31.741476 mon.serverc [INF]Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 1 pg peering) 2021-03-28 21:23:31.741495 mon.serverc [INF]Cluster is now healthy...output omitted... [ceph: root@clienta /]#ceph osd dfIDCLASS ...output omitted... AVAIL %USE VARPGSSTATUS 0 hdd ...output omitted... 10 GiB 0.38 1.29 34 up 1 hdd ...output omitted... 10 GiB 0.33 1.13 42 up 2 hdd ...output omitted... 10 GiB 0.30 1.02 29 up 3 hdd ...output omitted... 10 GiB 0.28 0.97 58 up5hdd ...output omitted... 0 B 0 00up 7 hdd ...output omitted... 10 GiB 0.29 0.99 47 up 4 hdd ...output omitted... 10 GiB 0.33 1.13 34 up 6 hdd ...output omitted... 10 GiB 0.10 0.36 39 up 8 hdd ...output omitted... 10 GiB 0.32 1.12 32 up TOTAL ...output omitted... 80 GiB 0.29 MIN/MAX VAR: 0.36/1.29 STDDEV: 0.08
Set the OSD 5 daemon to the in state and verify that PGs have been placed onto it.
Set the OSD 5 daemon to the in state.
[ceph: root@clienta /]# ceph osd in 5
marked in osd.5.Verify that PGs have been placed onto the OSD 5 daemon.
[ceph: root@clienta /]#ceph osd dfIDCLASS ...output omitted... AVAIL %USE VARPGSSTATUS 0 hdd ...output omitted... 10 GiB 0.23 0.76 34 up 1 hdd ...output omitted... 10 GiB 0.37 1.26 42 up 2 hdd ...output omitted... 10 GiB 0.34 1.15 29 up 3 hdd ...output omitted... 10 GiB 0.29 0.99 39 up5hdd ...output omitted... 10 GiB 0.37 1.2431up 7 hdd ...output omitted... 10 GiB 0.30 1.00 35 up 4 hdd ...output omitted... 10 GiB 0.33 1.12 34 up 6 hdd ...output omitted... 10 GiB 0.11 0.37 39 up 8 hdd ...output omitted... 10 GiB 0.33 1.11 32 up TOTAL 90 GiB 0.30 MIN/MAX VAR: 0.37/1.26 STDDEV: 0.08
Display the balancer status.
[ceph: root@clienta /]# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.000647",
"last_optimize_started": "Thu Oct 14 01:38:13 2021",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}Identify the PG for object data1 in the pool1 pool.
Query the PG and find its state.
Identify the PG for object data1 in the pool1 pool.
[ceph: root@clienta /]#ceph osd map pool1 data1osdmap e218 pool 'pool1' (6) object 'data1' -> pg 6.d4f4553c(6.1c)`-> up ([8,2,3], p8) acting ([8,2,3], p8)
In this example, the PG is 6.1c.
Use the PG value in the output displayed in your lab environment.
Query the PG and view its state and primary OSD.
[ceph: root@clienta /]#ceph pg 6.1c query{ "snap_trimq": "[]", "snap_trimq_len": 0,"state": "active+clean", "epoch": 218, "up": [8, 2, 3 ], "acting": [8, 2, 3 ], "acting_recovery_backfill": [ "2", "3", "8" ], "info": { "pgid": "6.1c", ...output omitted...
Return to workstation as the student user.
This concludes the lab.