Bookmark this page

Guided Exercise: Optimizing Red Hat Ceph Storage Performance

In this exercise, you will run performance analysis tools and configure the Red Hat Ceph Storage cluster using the results.

Outcomes

You should be able to run performance analysis tools and configure the Red Hat Ceph Storage cluster using the results.

Important

Do you need to reset your environment before performing this exercise?

If you performed the practice exercises in the Managing a Red Hat Ceph Storage Cluster chapter, but have not reset your environment to the default classroom cluster since that chapter, then you must reset your environment before executing the lab start command. All remaining chapters use the default Ceph cluster provided in the initial classroom environment.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

This command ensures that the lab environment is available for the exercise.

[student@workstation ~]$ lab start tuning-optimize

Procedure 12.1. Instructions

  • Create a new pool called testpool and change the PG autoscale mode to off. Reduce the number of PGs, and then check the recommended number of PGs. Change the PG autoscale mode to warn and check the health warning message.

  • Modify the primary affinity settings on an OSD so it is more likely to be set as primary for placement groups.

  • Using the Ceph built in benchmarking tool known as the rados bench, measure the performance of a Ceph cluster at a pool level.

  • The clienta node is set up as your admin node server.

  • The admin user has SSH key-based access from the clienta node to the admin account on all cluster nodes, and has passwordless sudo access to the root and ceph accounts on all cluster nodes.

  • The serverc, serverd, and servere nodes comprise an operational 3-node Ceph cluster. All three nodes operate as a MON, a MGR, and an OSD host with three 10 GB collocated OSDs.

Warning

The parameters used in this exercise are appropriate for this lab environment. In production, these parameters should only be modified by qualified Ceph administrators, or as directed by Red Hat Support.

  1. Log in to clienta as the admin user. Create a new pool called testpool, set the PG autoscale mode to warn, reduce the number of PGs, and view the health warning messages. Set the PG autoscale mode to on again, and then verify the number of PGs and that cluster health is ok again.

    1. Connect to clienta as the admin user and use sudo to run the cephadm shell.

      [student@workstation ~]$ ssh admin@clienta
      [admin@clienta ~]$ sudo cephadm shell
      [ceph: root@clienta /]#
    2. Create a new pool called testpool with the default number of PGs.

      [ceph: root@clienta /]# ceph osd pool create testpool
      pool 'testpool' created
    3. Verify the cluster health status and the information from the PG autoscaler. The autoscaler mode for the created pool testpool should be on and the number of PGs is 32.

      [ceph: root@clienta /]# ceph health detail
      HEALTH_OK
      [ceph: root@clienta /]# ceph osd pool autoscale-status
      POOL                     SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
      device_health_metrics      0                 3.0        92124M  0.0000                                  1.0       1              on
      .rgw.root               1323                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.log         3702                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.control        0                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.meta           0                 3.0        92124M  0.0000                                  4.0       8              on
      testpool                   0                 3.0        92124M  0.0000                                  1.0      32              on
    4. Set the PG autoscale option to off for the pool testpool. Reduce the number of PGs to 8. Verify the autoscale recommended number of PGs, which should be 32. Verify that the cluster health is OK.

      [ceph: root@clienta /]# ceph osd pool set testpool pg_autoscale_mode off
      set pool 6 pg_autoscale_mode to off
      [ceph: root@clienta /]# ceph osd pool set testpool pg_num 8
      set pool 6 pg_num to 8
      [ceph: root@clienta /]# ceph osd pool autoscale-status
      POOL                     SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
      device_health_metrics      0                 3.0        92124M  0.0000                                  1.0       1              on
      .rgw.root               1323                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.log         3702                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.control        0                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.meta           0                 3.0        92124M  0.0000                                  4.0       8              on
      testpool                   0                 3.0        92124M  0.0000                                  1.0       8          32  off
      [ceph: root@clienta /]# ceph health detail
      HEALTH_OK
    5. Set the PG autoscale option to warn for the pool testpool. Verify that cluster health status is now WARN, because the recommended number of PGs is higher than the current number of PGs. It might take several minutes before the cluster shows the health warning message.

      [ceph: root@clienta /]# ceph osd pool set testpool pg_autoscale_mode warn
      set pool 6 pg_autoscale_mode to warn
      [ceph: root@clienta /]# ceph health detail
      HEALTH_WARN 1 pools have too few placement groups
      [WRN] POOL_TOO_FEW_PGS: 1 pools have too few placement groups
          Pool testpool has 8 placement groups, should have 32
    6. Enable the PG autoscale option and verify that the number of PGs has been increased automatically to 32, the recommended value. This increase might take a few minutes to display.

      [ceph: root@clienta /]# ceph osd pool set testpool pg_autoscale_mode on
      set pool 6 pg_autoscale_mode to on
      [ceph: root@clienta /]# ceph osd pool autoscale-status
      POOL                     SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
      device_health_metrics      0                 3.0        92124M  0.0000                                  1.0       1              on
      .rgw.root               1323                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.log         3702                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.control        0                 3.0        92124M  0.0000                                  1.0      32              on
      default.rgw.meta           0                 3.0        92124M  0.0000                                  4.0       8              on
      testpool                   0                 3.0        92124M  0.0000                                  1.0      32              on
  2. Modify the primary affinity settings on an OSD so that it is more likely to be selected as primary for placement groups. Set the primary affinity for OSD 7 to 0.

    1. Modify the primary affinity settings for OSD 7.

      [ceph: root@clienta /]# ceph osd primary-affinity 7 0
      set osd.7 primary-affinity to 0 (802)
    2. Verify the primary affinity settings for each OSD.

      [ceph: root@clienta /]# ceph osd tree
      ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
      -1         0.08817  root default
      -3         0.02939      host serverc
       0    hdd  0.00980          osd.0         up   1.00000  1.00000
       1    hdd  0.00980          osd.1         up   1.00000  1.00000
       2    hdd  0.00980          osd.2         up   1.00000  1.00000
      -5         0.02939      host serverd
       3    hdd  0.00980          osd.3         up   1.00000  1.00000
       5    hdd  0.00980          osd.5         up   1.00000  1.00000
       7    hdd  0.00980          osd.7         up   1.00000        0
      -7         0.02939      host servere
       4    hdd  0.00980          osd.4         up   1.00000  1.00000
       6    hdd  0.00980          osd.6         up   1.00000  1.00000
       8    hdd  0.00980          osd.8         up   1.00000  1.00000
    3. Verify the primary affinity settings for OSDs in the cluster.

      [ceph: root@clienta /]# ceph osd dump | grep affinity
      osd.7 up   in  weight 1 primary_affinity 0 up_from 45 up_thru 92 down_at 0 last_clean_interval [0,0) [v2:172.25.250.13:6816/3402621793,v1:172.25.250.13:6817/3402621793] [v2:172.25.249.13:6818/3402621793,v1:172.25.249.13:6819/3402621793] exists,up ebc2280d-1321-458d-a161-2250d2b4f32e
  3. Create a pool called benchpool with the object clean-up feature turned off.

    1. Create an OSD pool called benchpool.

      [ceph: root@clienta /]# ceph osd pool create benchpool 100 100
      pool 'benchpool' created
    2. Use the rbd pool init command to initialize a custom pool to store RBD images. This step could take several minutes to complete.

      [ceph: root@clienta /]# rbd pool init benchpool
  4. Open a second terminal and log in to the clienta node as the admin user. Use the first terminal to generate a workload and use the second terminal to collect metrics. Run a write test to the RBD pool benchpool. This might take several minutes to complete.

    Note

    This step requires sufficient time to complete the write OPS for the test. Be prepared to run the osd pref command in the second terminal immediately after starting the benchpool command in the first terminal.

    1. Open a second terminal. Log in to clienta as the admin user and use sudo to run the cephadm shell.

      [student@workstation ~]$ ssh admin@clienta
      [admin@clienta ~]$ sudo cephadm shell
      [ceph: root@clienta /]#
    2. In the first terminal, generate the workload.

      [ceph: root@clienta /]# rados -p benchpool bench 30 write
      hints = 1
      Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects
      Object prefix: benchmark_data_clienta.lab.example.com_50
        sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
          0       0         0         0         0         0           -           0
          1      16        58        42   167.988       168    0.211943    0.322053
          2      16       112        96   191.982       216    0.122236    0.288171
          3      16       162       146   194.643       200    0.279456    0.300593
          4      16       217       201   200.975       220    0.385703    0.292009
      ...output omitted...
    3. In the second terminal, collect performance metrics. The commit_latency data is the time for the OSD to write and commit the operation to its journal. The apply_latency data is the time to apply the write operation to the OSD file system back end. Note the OSD ID where the heavy load is occurring. Your OSD output might be different in your lab environment.

      [ceph: root@clienta /]# ceph osd perf
      osd  commit_latency(ms)  apply_latency(ms)
      osd  commit_latency(ms)  apply_latency(ms)
        7                  94                 94
        8                 117                117
        6                 195                195
        1                  73                 73
        0                  72                 72
        2                  80                 80
        3                  72                 72
        4                 135                135
        5                  59                 59

      Note

      If no data displays, then use the first terminal to generate the workload again. The metric collection must run while the bench tool is generating workload.

    4. In the second terminal, locate the system by using the OSD ID from the previous step, where the OSD has high latency. Determine the name of the system.

      [ceph: root@clienta /]# ceph osd tree
      ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
      -1         0.08817  root default
      -3         0.02939      host serverc
       0    hdd  0.00980          osd.0         up   1.00000  1.00000
       1    hdd  0.00980          osd.1         up   1.00000  1.00000
       2    hdd  0.00980          osd.2         up   1.00000  1.00000
      -5         0.02939      host serverd
       3    hdd  0.00980          osd.3         up   1.00000  1.00000
       5    hdd  0.00980          osd.5         up   1.00000  1.00000
       7    hdd  0.00980          osd.7         up   1.00000        0
      -7         0.02939      host servere
       4    hdd  0.00980          osd.4         up   1.00000  1.00000
       6    hdd  0.00980          osd.6         up   1.00000  1.00000
       8    hdd  0.00980          osd.8         up   1.00000  1.00000
  5. Evaluate the OSD performance counters.

    1. Verify the performance counters for the OSD. Redirect the output of the command to a file called perfdump.txt

      [ceph: root@clienta /]# ceph tell osd.6 perf dump > perfdump.txt
    2. In the perfdump.txt file, locate the section starting with osd:. Note the op_latency and subop_latency counters, which are the read and write operations and suboperations latency. Note the op_r_latency and op_w_latency parameters.

      Each counter includes avgcount and sum fields that are required to calculate the exact counter value. Calculate the value of the op_latency and subop_latency counters by using the formula counter = counter.sum / counter.avgcount.

      [ceph: root@clienta /]# cat perfdump.txt | grep -A88 '"osd"'
          "osd": {
              "op_wip": 0,
              "op": 3664,
              "op_in_bytes": 994050158,
              "op_out_bytes": 985,
              "op_latency": {
                  "avgcount": 3664,
                  "sum": 73.819483299,
                  "avgtime": 0.020147238
              },
      ...output omitted...
              "op_r_latency": {
                  "avgcount": 3059,
                  "sum": 1.395967825,
                  "avgtime": 0.000456347
              },
      ...output omitted...
              "op_w_latency": {
                  "avgcount": 480,
                  "sum": 71.668254827,
                  "avgtime": 0.149308864
              },
      ...output omitted...
              "op_rw_latency": {
                  "avgcount": 125,
                  "sum": 0.755260647,
                  "avgtime": 0.006042085
              },
      ...output omitted...
              "subop_latency": {
                  "avgcount": 1587,
                  "sum": 59.679174303,
                  "avgtime": 0.037605024
              },
      ...output omitted...
    3. In the first terminal, repeat the capture using the rados bench write command.

      [ceph: root@clienta /]# rados -p benchpool bench 30 write
      ...output omitted...
    4. In the second terminal, view the variation of the value using the following formulas:

      • op_latency_sum_t2 - op_latency_sum_t1 = diff_sum

      • op_latency_avgcount_t2 - op_latency_avgcount = diff_avgcount

      • op_latency = diff_sum / diff_avgcount

      [ceph: root@clienta /]# ceph tell osd.6 perf dump > perfdump.txt
      [ceph: root@clienta /]# cat perfdump.txt | grep -A88 '"osd"'
      ...output omitted...

      Note

      The values are cumulative and are returned when the command is executed.

  6. View information about the last operations processed by an OSD.

    1. In the second terminal, dump the information maintained in memory for the most recently processed operations. Redirect the dump to the historicdump.txt file. By default, each OSD records information on the last 20 operations over 600 seconds. View the historicdump.txt file contents.

      [ceph: root@clienta /]# ceph tell osd.6 dump_historic_ops > historicdump.txt
      [ceph: root@clienta /]# head historicdump.txt
      {
          "size": 20,
          "duration": 600,
          "ops": [
              {
                  "description": "osd_op(client.44472.0:479 7.14 7:2a671f00:::benchmark_data_clienta.lab.example.com_92_object478:head [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4194304] snapc 0=[] ondisk+write+known_if_redirected e642)",
      ...output omitted...
    2. Update the values for the osd_op_history_size and osd_op_history_duration parameters. Set the size to 30 and the duration to 900. Verify that the change was successful.

      [ceph: root@clienta /]# ceph tell osd.6 config set osd_op_history_size 30
      {
           "success": "osd_op_history_size = '30' "
      }
      [ceph: root@clienta /]# ceph tell osd.6 config set osd_op_history_duration 900
      {
          "success": "osd_op_history_duration = '900' "
      }
      [ceph: root@clienta /]# ceph tell osd.6 dump_historic_ops > historicops.txt
      [ceph: root@clienta /]# head -n 3 historicops.txt
      {
          "size": 30,
          "duration": 900,
    3. Update the runtime value of the osd_op_history_size and osd_op_history_duration parameters. Verify that the change was successful.

      [ceph: root@clienta /]# ceph tell osd.* config set osd_op_history_size 20
      osd.0: {
          "success": "osd_op_history_size = '20' "
      }
      ...output omitted...
      osd.8: {
          "success": "osd_op_history_size = '20' "
      }
      [ceph: root@clienta /]# ceph tell osd.* config set osd_op_history_duration 600
      osd.0: {
          "success": "osd_op_history_duration = '600' "
      }
      ...output omitted...
      osd.8: {
          "success": "osd_op_history_duration = '600' "
      }
  7. Exit the second terminal. Return to workstation as the student user.

    [ceph: root@clienta /]# exit
    [admin@clienta ~]$ exit
    [student@workstation ~]$ exit
    [ceph: root@clienta /]# exit
    [admin@clienta ~]$ exit
    [student@workstation ~]$

Finish

On the workstation machine, use the lab command to complete this exercise. This is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish tuning-optimize

This concludes the guided exercise.

Revision: cl260-5.0-29d2128