CL260 - ch11s05

Bookmark this page

Lab: Managing a Red Hat Ceph Storage Cluster

In this lab, you will perform common administration and maintenance operations on a Red Hat Ceph Storage cluster.

Outcomes

You should be able to locate the Ceph Dashboard URL, set an OSD out and in, watch cluster events, find and start a down OSD, find an object's PG location and state, and view the balancer status.

As the student user on the workstation machine, use the lab command to prepare your system for this lab.

[student@workstation ~]$ lab start cluster-review

This command confirms that the hosts required for this exercise are accessible.

Procedure 11.3. Instructions

Log in to clienta as the admin user. Verify that the dashboard module is enabled. Find the dashboard URL of the active MGR.

[student@workstation ~]$ ssh admin@clienta
[admin@clienta ~]$ sudo cephadm shell
[ceph: root@clienta /]#

Verify that the dashboard module is enabled.

[ceph: root@clienta /]# ceph mgr module ls | more
{
...output omitted...
    "enabled_modules": [
        "cephadm",
        "dashboard",
        "iostat",
        "prometheus",
        "restful"
    ],
...output omitted...

Find the dashboard URL of the active MGR.

[ceph: root@clienta /]# ceph mgr services
{
    "dashboard": "https://172.25.250.12:8443/",
    "prometheus": "http://172.25.250.12:9283/"
}

Note

Your output might be different depending on which MGR node is active in your lab environment.

You receive an alert that an OSD is down. Identify which OSD is down. Identify on which node the down OSD runs, and start the OSD.

Verify cluster health.

[ceph: root@clienta /]# ceph health detail
HEALTH_WARN 1 osds down; Degraded data redundancy: 72/666 objects degraded (10.811%), 14 pgs degraded, 50 pgs undersized
[WRN] OSD_DOWN: 1 osds down
    osd.6 (root=default,host=servere) is down
[WRN] PG_DEGRADED: Degraded data redundancy: 72/666 objects degraded (10.811%), 14 pgs degraded, 50 pgs undersized
    pg 2.0 is stuck undersized for 61s, current state active+undersized, last acting [3,0]
    pg 2.1 is stuck undersized for 61s, current state active+undersized, last acting [2,3]
    pg 2.6 is stuck undersized for 61s, current state active+undersized, last acting [1,3]
    pg 2.7 is stuck undersized for 61s, current state active+undersized, last acting [3,2]
...output omitted...

Identify which OSD is down.

[ceph: root@clienta /]# ceph osd tree | grep -i down
 6    hdd  0.00980          osd.6       down   1.00000  1.00000

Identify on which host the down OSD runs.

[ceph: root@clienta /]# ceph osd find osd.6 | grep host
    "host": "servere.lab.example.com",
        "host": "servere",

Start the OSD.

[ceph: root@clienta /]# ceph orch daemon start osd.6
Scheduled to start osd.6 on host 'servere.lab.example.com'

Verify that the OSD is up.

[ceph: root@clienta /]# ceph osd tree | grep osd.6
 6    hdd  0.00980          osd.6         up   1.00000  1.00000

Set the OSD 5 daemon to the out state and verify that all data has been migrated off of the OSD.

Set the OSD 5 daemon to the out state.

[ceph: root@clienta /]# ceph osd out 5
marked out osd.5.

Verify that all PGs have been migrated off of the OSD 5 daemon. It will take some time for the data migration to finish. Press CTL+C to exit the command.

[ceph: root@clienta /]# ceph -w
  cluster:
    id:     2ae6d05a-229a-11ec-925e-52540000fa0c
    health: HEALTH_WARN
            Reduced data availability: 5 pgs peering
            Degraded data redundancy: 1/663 objects degraded (0.151%), 1 pg degraded

  services:
    mon: 4 daemons, quorum serverc.lab.example.com,clienta,serverd,servere (age 9h)
    mgr: serverc.lab.example.com.aiqepd(active, since 9h), standbys: serverd.klrkci, servere.kjwyko, clienta.nncugs
    osd: 9 osds: 9 up (since 46s), 8 in (since 7s); 4 remapped pgs
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    pools:   5 pools, 105 pgs
    objects: 221 objects, 4.9 KiB
    usage:   235 MiB used, 80 GiB / 80 GiB avail
    pgs:     12.381% pgs not active
             1/663 objects degraded (0.151%)
             92 active+clean
             10 remapped+peering
             2  activating
             1  activating+degraded

  io:
    recovery: 199 B/s, 0 objects/s

  progress:
    Global Recovery Event (2s)
      [............................]

2021-03-28 21:23:25.557849 mon.serverc [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)
2021-03-28 21:23:25.557884 mon.serverc [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 36/2163 objects degraded (1.664%), 5 pgs degraded)
2021-03-28 21:23:31.741476 mon.serverc [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 1 pg peering)
2021-03-28 21:23:31.741495 mon.serverc [INF] Cluster is now healthy
...output omitted...

[ceph: root@clienta /]# ceph osd df
ID  CLASS  ...output omitted...    AVAIL   %USE  VAR   PGS  STATUS
 0    hdd  ...output omitted...    10 GiB  0.38  1.29   34      up
 1    hdd  ...output omitted...    10 GiB  0.33  1.13   42      up
 2    hdd  ...output omitted...    10 GiB  0.30  1.02   29      up
 3    hdd  ...output omitted...    10 GiB  0.28  0.97   58      up
 5    hdd  ...output omitted...     0 B     0     0    0      up
 7    hdd  ...output omitted...    10 GiB  0.29  0.99   47      up
 4    hdd  ...output omitted...    10 GiB  0.33  1.13   34      up
 6    hdd  ...output omitted...    10 GiB  0.10  0.36   39      up
 8    hdd  ...output omitted...    10 GiB  0.32  1.12   32      up
    TOTAL  ...output omitted...    80 GiB  0.29

MIN/MAX VAR: 0.36/1.29  STDDEV: 0.08

Set the OSD 5 daemon to the in state and verify that PGs have been placed onto it.

Set the OSD 5 daemon to the in state.

[ceph: root@clienta /]# ceph osd in 5
marked in osd.5.

Verify that PGs have been placed onto the OSD 5 daemon.

[ceph: root@clienta /]# ceph osd df
ID  CLASS  ...output omitted...    AVAIL   %USE  VAR   PGS  STATUS
 0    hdd  ...output omitted...    10 GiB  0.23  0.76   34      up
 1    hdd  ...output omitted...    10 GiB  0.37  1.26   42      up
 2    hdd  ...output omitted...    10 GiB  0.34  1.15   29      up
 3    hdd  ...output omitted...    10 GiB  0.29  0.99   39      up
 5    hdd  ...output omitted...    10 GiB  0.37  1.24   31      up
 7    hdd  ...output omitted...    10 GiB  0.30  1.00   35      up
 4    hdd  ...output omitted...    10 GiB  0.33  1.12   34      up
 6    hdd  ...output omitted...    10 GiB  0.11  0.37   39      up
 8    hdd  ...output omitted...    10 GiB  0.33  1.11   32      up
    TOTAL  90 GiB  0.30

MIN/MAX VAR: 0.37/1.26  STDDEV: 0.08

Display the balancer status.

[ceph: root@clienta /]# ceph balancer status
{
    "active": true,
    "last_optimize_duration": "0:00:00.000647",
    "last_optimize_started": "Thu Oct 14 01:38:13 2021",
    "mode": "upmap",
    "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
    "plans": []
}

Identify the PG for object data1 in the pool1 pool. Query the PG and find its state.

Identify the PG for object data1 in the pool1 pool.

[ceph: root@clienta /]# ceph osd map pool1 data1
osdmap e218 pool 'pool1' (6) object 'data1' -> pg 6.d4f4553c (6.1c)` -> up ([8,2,3], p8) acting ([8,2,3], p8)

Note

In this example, the PG is 6.1c. Use the PG value in the output displayed in your lab environment.

Query the PG and view its state and primary OSD.

[ceph: root@clienta /]# ceph pg 6.1c query
{
    "snap_trimq": "[]",
    "snap_trimq_len": 0,
    "state": "active+clean",
    "epoch": 218,
    "up": [
        8,
        2,
        3
    ],
    "acting": [
        8,
        2,
        3
    ],
    "acting_recovery_backfill": [
        "2",
        "3",
        "8"
    ],
    "info": {
        "pgid": "6.1c",

...output omitted...

Return to workstation as the student user.

[ceph: root@clienta /]# exit
[admin@clienta ~]$ exit
[student@workstation ~]$

Evaluation

Grade your work by running the lab grade cluster-review command from your workstation machine. Correct any reported failures and rerun the script until successful.

[student@workstation ~]$ lab grade cluster-review

Finish

On the workstation machine, use the lab command to complete this exercise. This is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish cluster-review

This concludes the lab.

Discuss Cloud Storage with Red Hat Ceph Storage

Go to community

Red Hat Ceph Storage for OpenStack (CL260)

Haley_Ruccio

3 sie 2023

Build, expand and maintain cloud-scale, clustered storage for your applications with Red Hat Ceph StorageCloud Storage with Red Hat Ceph Storage (CL260) is designed for storage administrators and cloud operators who deploy Red Hat Ceph Storage in a production data center environment or as a component of a Red Hat OpenStack Platform or OpenShift Container Platform infrastructure. Learn how to deploy, manage, and scale a Ceph storage cluster to provide hybrid storage resources, including Amazon S3 and OpenStack Swift-compatible object storage, Ceph-native and iSCSI-based block storage, and shared file storage. This course is based on Red Hat Ceph Storage 5.0.Course summaryDeploy and manage a Red Hat Ceph Storage cluster on commodity servers.Perform common management operations using the web-based management interface.Create, expand, and control access to storage pools provided by the Ceph cluster.Access Red Hat Ceph Storage from clients using object, block, and file-based methods.Analyze and tune Red Hat Ceph Storage performance.Integrate Red Hat OpenStack Platform image, object, block, and file storage with a Red Hat Ceph Storage cluster.Integrate OpenShift Container Platform with a Red Hat Ceph Storage cluster.Target AudienceThis course is intended for storage administrators and cloud operators who want to learn how to deploy and manage Red Hat Ceph Storage on servers in an enterprise data center or within a Red Hat OpenStack Platform or OpenShift Container Platform environment.Developers writing applications that use cloud-based storage will learn the distinctions of various storage types and client access methods.Recommended trainingTake our free assessment to gauge whether this offering is the best fit for your skills.Red Hat Certified System Administrator (RHCSA) certification, or equivalent experience.For candidates that have not earned an RHCSA or equivalent, confirmation of the correct skill set knowledge can be obtained by taking the online skills assessment.Some experience with storage administration is recommended but not required.Technology considerationsThis course does not have any special technical requirements.This course is not intended for BYOD.Internet access is recommended.

Welcome to the Red Hat Ceph Storage for OpenStack (CL260) group in the Red Hat Learning Community!

cschunke

31 lip 2023

We are excited to launch a space dedicated to the Red Hat Training course CL260! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to CL260.Read more about Red Hat Ceph Storage for OpenStack here.

381

Revision: cl260-5.0-29d2128