In this exercise, you will run performance analysis tools and configure the Red Hat Ceph Storage cluster using the results.
Outcomes
You should be able to run performance analysis tools and configure the Red Hat Ceph Storage cluster using the results.
Do you need to reset your environment before performing this exercise?
If you performed the practice exercises in the Managing a Red Hat Ceph Storage Cluster chapter, but have not reset your environment to the default classroom cluster since that chapter, then you must reset your environment before executing the lab start command.
All remaining chapters use the default Ceph cluster provided in the initial classroom environment.
As the student user on the workstation machine, use the lab command to prepare your system for this exercise.
This command ensures that the lab environment is available for the exercise.
[student@workstation ~]$ lab start tuning-optimize
Procedure 12.1. Instructions
Create a new pool called testpool and change the PG autoscale mode to off.
Reduce the number of PGs, and then check the recommended number of PGs.
Change the PG autoscale mode to warn and check the health warning message.
Modify the primary affinity settings on an OSD so it is more likely to be set as primary for placement groups.
Using the Ceph built in benchmarking tool known as the rados bench, measure the performance of a Ceph cluster at a pool level.
The clienta node is set up as your admin node server.
The admin user has SSH key-based access from the clienta node to the admin account on all cluster nodes, and has passwordless sudo access to the root and ceph accounts on all cluster nodes.
The serverc, serverd, and servere nodes comprise an operational 3-node Ceph cluster.
All three nodes operate as a MON, a MGR, and an OSD host with three 10 GB collocated OSDs.
The parameters used in this exercise are appropriate for this lab environment. In production, these parameters should only be modified by qualified Ceph administrators, or as directed by Red Hat Support.
Log in to clienta as the admin user.
Create a new pool called testpool, set the PG autoscale mode to warn, reduce the number of PGs, and view the health warning messages.
Set the PG autoscale mode to on again, and then verify the number of PGs and that cluster health is ok again.
Connect to clienta as the admin user and use sudo to run the cephadm shell.
[student@workstation ~]$ssh admin@clienta[admin@clienta ~]$sudo cephadm shell[ceph: root@clienta /]#
Create a new pool called testpool with the default number of PGs.
[ceph: root@clienta /]# ceph osd pool create testpool
pool 'testpool' createdVerify the cluster health status and the information from the PG autoscaler.
The autoscaler mode for the created pool testpool should be on and the number of PGs is 32.
[ceph: root@clienta /]#ceph health detailHEALTH_OK [ceph: root@clienta /]#ceph osd pool autoscale-statusPOOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE device_health_metrics 0 3.0 92124M 0.0000 1.0 1 on .rgw.root 1323 3.0 92124M 0.0000 1.0 32 on default.rgw.log 3702 3.0 92124M 0.0000 1.0 32 on default.rgw.control 0 3.0 92124M 0.0000 1.0 32 on default.rgw.meta 0 3.0 92124M 0.0000 4.0 8 ontestpool0 3.0 92124M 0.0000 1.032on
Set the PG autoscale option to off for the pool testpool.
Reduce the number of PGs to 8.
Verify the autoscale recommended number of PGs, which should be 32.
Verify that the cluster health is OK.
[ceph: root@clienta /]#ceph osd pool set testpool pg_autoscale_mode offset pool 6 pg_autoscale_mode to off [ceph: root@clienta /]#ceph osd pool set testpool pg_num 8set pool 6 pg_num to 8 [ceph: root@clienta /]#ceph osd pool autoscale-statusPOOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE device_health_metrics 0 3.0 92124M 0.0000 1.0 1 on .rgw.root 1323 3.0 92124M 0.0000 1.0 32 on default.rgw.log 3702 3.0 92124M 0.0000 1.0 32 on default.rgw.control 0 3.0 92124M 0.0000 1.0 32 on default.rgw.meta 0 3.0 92124M 0.0000 4.0 8 ontestpool0 3.0 92124M 0.0000 1.0832off[ceph: root@clienta /]#ceph health detailHEALTH_OK
Set the PG autoscale option to warn for the pool testpool.
Verify that cluster health status is now WARN, because the recommended number of PGs is higher than the current number of PGs.
It might take several minutes before the cluster shows the health warning message.
[ceph: root@clienta /]#ceph osd pool set testpool pg_autoscale_mode warnset pool 6 pg_autoscale_mode to warn [ceph: root@clienta /]#ceph health detailHEALTH_WARN 1 pools have too few placement groups [WRN] POOL_TOO_FEW_PGS: 1 pools have too few placement groups Pool testpool has 8 placement groups, should have 32
Enable the PG autoscale option and verify that the number of PGs has been increased automatically to 32, the recommended value. This increase might take a few minutes to display.
[ceph: root@clienta /]#ceph osd pool set testpool pg_autoscale_mode onset pool 6 pg_autoscale_mode to on [ceph: root@clienta /]#ceph osd pool autoscale-statusPOOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE device_health_metrics 0 3.0 92124M 0.0000 1.0 1 on .rgw.root 1323 3.0 92124M 0.0000 1.0 32 on default.rgw.log 3702 3.0 92124M 0.0000 1.0 32 on default.rgw.control 0 3.0 92124M 0.0000 1.0 32 on default.rgw.meta 0 3.0 92124M 0.0000 4.0 8 ontestpool0 3.0 92124M 0.0000 1.032on
Modify the primary affinity settings on an OSD so that it is more likely to be selected as primary for placement groups. Set the primary affinity for OSD 7 to 0.
Modify the primary affinity settings for OSD 7.
[ceph: root@clienta /]# ceph osd primary-affinity 7 0
set osd.7 primary-affinity to 0 (802)Verify the primary affinity settings for each OSD.
[ceph: root@clienta /]#ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host serverc 0 hdd 0.00980 osd.0 up 1.00000 1.00000 1 hdd 0.00980 osd.1 up 1.00000 1.00000 2 hdd 0.00980 osd.2 up 1.00000 1.00000 -5 0.02939 host serverd 3 hdd 0.00980 osd.3 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 7 hdd 0.00980osd.7up 1.000000-7 0.02939 host servere 4 hdd 0.00980 osd.4 up 1.00000 1.00000 6 hdd 0.00980 osd.6 up 1.00000 1.00000 8 hdd 0.00980 osd.8 up 1.00000 1.00000
Verify the primary affinity settings for OSDs in the cluster.
[ceph: root@clienta /]#ceph osd dump | grep affinityosd.7 up in weight 1primary_affinity 0up_from 45 up_thru 92 down_at 0 last_clean_interval [0,0) [v2:172.25.250.13:6816/3402621793,v1:172.25.250.13:6817/3402621793] [v2:172.25.249.13:6818/3402621793,v1:172.25.249.13:6819/3402621793] exists,up ebc2280d-1321-458d-a161-2250d2b4f32e
Create a pool called benchpool with the object clean-up feature turned off.
Create an OSD pool called benchpool.
[ceph: root@clienta /]# ceph osd pool create benchpool 100 100
pool 'benchpool' createdUse the rbd pool init command to initialize a custom pool to store RBD images.
This step could take several minutes to complete.
[ceph: root@clienta /]# rbd pool init benchpoolOpen a second terminal and log in to the clienta node as the admin user.
Use the first terminal to generate a workload and use the second terminal to collect metrics.
Run a write test to the RBD pool benchpool.
This might take several minutes to complete.
This step requires sufficient time to complete the write OPS for the test.
Be prepared to run the osd pref command in the second terminal immediately after starting the benchpool command in the first terminal.
Open a second terminal.
Log in to clienta as the admin user and use sudo to run the cephadm shell.
[student@workstation ~]$ssh admin@clienta[admin@clienta ~]$sudo cephadm shell[ceph: root@clienta /]#
In the first terminal, generate the workload.
[ceph: root@clienta /]# rados -p benchpool bench 30 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects
Object prefix: benchmark_data_clienta.lab.example.com_50
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 58 42 167.988 168 0.211943 0.322053
2 16 112 96 191.982 216 0.122236 0.288171
3 16 162 146 194.643 200 0.279456 0.300593
4 16 217 201 200.975 220 0.385703 0.292009
...output omitted...In the second terminal, collect performance metrics.
The commit_latency data is the time for the OSD to write and commit the operation to its journal.
The apply_latency data is the time to apply the write operation to the OSD file system back end.
Note the OSD ID where the heavy load is occurring.
Your OSD output might be different in your lab environment.
[ceph: root@clienta /]#ceph osd perfosd commit_latency(ms) apply_latency(ms) osd commit_latency(ms) apply_latency(ms) 7 94 94 8 117 1176 195 1951 73 73 0 72 72 2 80 80 3 72 72 4 135 135 5 59 59
If no data displays, then use the first terminal to generate the workload again. The metric collection must run while the bench tool is generating workload.
In the second terminal, locate the system by using the OSD ID from the previous step, where the OSD has high latency. Determine the name of the system.
[ceph: root@clienta /]#ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host serverc 0 hdd 0.00980 osd.0 up 1.00000 1.00000 1 hdd 0.00980 osd.1 up 1.00000 1.00000 2 hdd 0.00980 osd.2 up 1.00000 1.00000 -5 0.02939 host serverd 3 hdd 0.00980 osd.3 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 0 -7 0.02939host servere4 hdd 0.00980 osd.4 up 1.00000 1.000006hdd 0.00980osd.6up 1.00000 1.00000 8 hdd 0.00980 osd.8 up 1.00000 1.00000
Evaluate the OSD performance counters.
Verify the performance counters for the OSD.
Redirect the output of the command to a file called perfdump.txt
[ceph: root@clienta /]# ceph tell osd.6 perf dump > perfdump.txtIn the perfdump.txt file, locate the section starting with osd:.
Note the op_latency and subop_latency counters, which are the read and write operations and suboperations latency.
Note the op_r_latency and op_w_latency parameters.
Each counter includes avgcount and sum fields that are required to calculate the exact counter value.
Calculate the value of the op_latency and subop_latency counters by using the formula counter = counter.sum / counter.avgcount.
[ceph: root@clienta /]#cat perfdump.txt | grep -A88 '"osd"'"osd": { "op_wip": 0, "op": 3664, "op_in_bytes": 994050158, "op_out_bytes": 985,"op_latency": { "avgcount": 3664, "sum": 73.819483299, "avgtime": 0.020147238 }, ...output omitted..."op_r_latency": { "avgcount": 3059, "sum": 1.395967825, "avgtime": 0.000456347 }, ...output omitted..."op_w_latency": { "avgcount": 480, "sum": 71.668254827, "avgtime": 0.149308864 }, ...output omitted..."op_rw_latency": { "avgcount": 125, "sum": 0.755260647, "avgtime": 0.006042085 }, ...output omitted..."subop_latency": { "avgcount": 1587, "sum": 59.679174303, "avgtime": 0.037605024 }, ...output omitted...
In the first terminal, repeat the capture using the rados bench write command.
[ceph: root@clienta /]# rados -p benchpool bench 30 write
...output omitted...In the second terminal, view the variation of the value using the following formulas:
op_latency_sum_t2 - op_latency_sum_t1 = diff_sum
op_latency_avgcount_t2 - op_latency_avgcount = diff_avgcount
op_latency = diff_sum / diff_avgcount
[ceph: root@clienta /]#ceph tell osd.6 perf dump > perfdump.txt[ceph: root@clienta /]#cat perfdump.txt | grep -A88 '"osd"'...output omitted...
The values are cumulative and are returned when the command is executed.
View information about the last operations processed by an OSD.
In the second terminal, dump the information maintained in memory for the most recently processed operations.
Redirect the dump to the historicdump.txt file.
By default, each OSD records information on the last 20 operations over 600 seconds.
View the historicdump.txt file contents.
[ceph: root@clienta /]#ceph tell osd.6 dump_historic_ops > historicdump.txt[ceph: root@clienta /]#head historicdump.txt{ "size": 20, "duration": 600, "ops": [ { "description": "osd_op(client.44472.0:479 7.14 7:2a671f00:::benchmark_data_clienta.lab.example.com_92_object478:head[set-alloc-hintobject_size 4194304 write_size 4194304,write 0~4194304] snapc 0=[] ondisk+write+known_if_redirected e642)", ...output omitted...
Update the values for the osd_op_history_size and osd_op_history_duration parameters.
Set the size to 30 and the duration to 900.
Verify that the change was successful.
[ceph: root@clienta /]#ceph tell osd.6 config set osd_op_history_size 30{ "success": "osd_op_history_size = '30' " } [ceph: root@clienta /]#ceph tell osd.6 config set osd_op_history_duration 900{ "success": "osd_op_history_duration = '900' " } [ceph: root@clienta /]#ceph tell osd.6 dump_historic_ops > historicops.txt[ceph: root@clienta /]#head -n 3 historicops.txt{ "size": 30, "duration": 900,
Update the runtime value of the osd_op_history_size and osd_op_history_duration parameters.
Verify that the change was successful.
[ceph: root@clienta /]#ceph tell osd.* config set osd_op_history_size 20osd.0: { "success": "osd_op_history_size = '20' " } ...output omitted... osd.8: { "success": "osd_op_history_size = '20' " } [ceph: root@clienta /]#ceph tell osd.* config set osd_op_history_duration 600osd.0: { "success": "osd_op_history_duration = '600' " } ...output omitted... osd.8: { "success": "osd_op_history_duration = '600' " }
Exit the second terminal. Return to workstation as the student user.
[ceph: root@clienta /]#exit[admin@clienta ~]$exit[student@workstation ~]$exit
[ceph: root@clienta /]#exit[admin@clienta ~]$exit[student@workstation ~]$
This concludes the guided exercise.