In this exercise, you will view and modify the cluster CRUSH map.
Outcomes
You should be able to create data placement rules to target a specific device class, create a pool by using a specific data placement rule, and decompile and edit the CRUSH map.
As the student user on the workstation machine, use the lab command to prepare your system for this exercise.
[student@workstation ~]$ lab start map-crush
This command confirms that the hosts required for this exercise are accessible, backs up the CRUSH map, adds the ssd device class, and sets the mon_allow_pool_delete setting to true.
Procedure 5.1. Instructions
Log in to clienta as the admin user and use sudo to run the cephadm shell.
Verify that the cluster returns a HEALTH_OK state.
[student@workstation ~]$ssh admin@clienta[admin@clienta ~]$sudo cephadm shell[ceph: root@clienta /]#ceph healthHEALTH_OK
Create a new CRUSH rule called onssd that uses only the OSDs backed by SSD storage.
Create a new pool called myfast with 32 placement groups that use that rule.
Confirm that the pool is using only OSDs that are backed by SSD storage.
List the available device classes in your cluster.
[ceph: root@clienta /]# ceph osd crush class ls
[
"hdd",
"ssd"
]Display the CRUSH map tree to locate the OSDs backed by SSD storage.
[ceph: root@clienta /]#ceph osd crush treeID CLASS WEIGHT TYPE NAME -1 0.08817 root default -3 0.02939 host serverc 0 hdd 0.00980 osd.0 2 hdd 0.00980 osd.2 1ssd0.00980osd.1-5 0.02939 host serverd 3 hdd 0.00980 osd.3 7 hdd 0.00980 osd.7 5ssd0.00980osd.5-7 0.02939 host servere 4 hdd 0.00980 osd.4 8 hdd 0.00980 osd.8 6ssd0.00980osd.6
Add a new CRUSH map rule called onssd to target the OSDs with SSD devices.
[ceph: root@clienta /]# ceph osd crush rule create-replicated onssd \
default host ssdUse the ceph osd crush rule ls command to verify the successful creation of the new rule.
[ceph: root@clienta /]#ceph osd crush rule lsreplicated_ruleonssd
Create a new replicated pool called myfast with 32 placement groups that uses the onssd CRUSH map rule.
[ceph: root@clienta /]# ceph osd pool create myfast 32 32 onssd
pool 'myfast' createdVerify that the placement groups for the pool called myfast are only using the OSDs backed by SSD storage.
In a previous step, the OSDs are osd.2, osd.5, and osd.8.
Retrieve the ID of the pool called myfast.
[ceph: root@clienta /]#ceph osd lspools...output omitted...6 myfast
Use the ceph pg dump pgs_brief command to list all the PGs in the cluster.
The pool ID is the first number in a PG ID.
For example, the PG 6.1b belongs to the pool whose ID is 6.
[ceph: root@clienta /]#ceph pg dump pgs_briefPG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY6.1bactive+clean[6,5,1]6 [6,5,1] 6 4.19 active+clean [6,2,5] 6 [6,2,5] 6 2.1f active+clean [0,3,8] 0 [0,3,8] 0 3.1e active+clean [2,6,3] 2 [2,6,3] 26.1aactive+clean[6,1,5]6 [6,1,5] 6 4.18 active+clean [3,2,6] 3 [3,2,6] 3 2.1e active+clean [2,6,5] 2 [2,6,5] 2 3.1f active+clean [0,3,4] 0 [0,3,4] 06.19active+clean[1,5,6]1 [1,5,6] 1 4.1b active+clean [3,2,8] 3 [3,2,8] 3 2.1d active+clean [6,7,0] 6 [6,7,0] 6 ...output omitted...
The pool called myfast, whose ID is 6, only uses osd.1, osd.5, and osd.6.
These are the only OSDs with SSD drives.
Create a new CRUSH hierarchy under root=default-cl260 that has three rack buckets (rack1, rack2, and rack3), each of which contains one host bucket (hostc, hostd, and hoste).
Create a new CRUSH map hierarchy that matches this infrastructure:
default-cl260 (root bucket)
rack1 (rack bucket)
hostc (host bucket)
osd.1
osd.5
osd.6
rack2 (rack bucket)
hostd (host bucket)
osd.0
osd.3
osd.4
rack3 (rack bucket)
hoste (host bucket)
osd.2
osd.7
osd.8You should place the three SSDs (in this example are OSDs 1, 5, and 6) on hostc.
Because in your cluster OSD numbers can be differet, modify the CRUSH map hierarchy accordingly to this requirement.
First, create the buckets with the ceph osd crush add-bucket command.
[ceph: root@clienta /]#ceph osd crush add-bucket default-cl260 rootadded bucket default-cl260 type root to crush map [ceph: root@clienta /]#ceph osd crush add-bucket rack1 rackadded bucket rack1 type rack to crush map [ceph: root@clienta /]#ceph osd crush add-bucket hostc hostadded bucket hostc type host to crush map [ceph: root@clienta /]#ceph osd crush add-bucket rack2 rackadded bucket rack2 type rack to crush map [ceph: root@clienta /]#ceph osd crush add-bucket hostd hostadded bucket hostd type host to crush map [ceph: root@clienta /]#ceph osd crush add-bucket rack3 rackadded bucket rack3 type rack to crush map [ceph: root@clienta /]#ceph osd crush add-bucket hoste hostadded bucket hoste type host to crush map
Use the ceph osd crush move command to build the hierarchy.
[ceph: root@clienta /]#ceph osd crush move rack1 root=default-cl260moved item id -14 name 'rack1' to location {root=default-cl260} in crush map [ceph: root@clienta /]#ceph osd crush move hostc rack=rack1moved item id -15 name 'hostc' to location {rack=rack1} in crush map [ceph: root@clienta /]#ceph osd crush move rack2 root=default-cl260moved item id -16 name 'rack2' to location {root=default-cl260} in crush map [ceph: root@clienta /]#ceph osd crush move hostd rack=rack2moved item id -17 name 'hostd' to location {rack=rack2} in crush map [ceph: root@clienta /]#ceph osd crush move rack3 root=default-cl260moved item id -18 name 'rack3' to location {root=default-cl260} in crush map [ceph: root@clienta /]#ceph osd crush move hoste rack=rack3moved item id -19 name 'hoste' to location {rack=rack3} in crush map
Display the CRUSH map tree to verify the new hierarchy.
[ceph: root@clienta /]# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
ID CLASS WEIGHT TYPE NAME
-13 0 root default-cl260
-14 0 rack rack1
-15 0 host hostc
-16 0 rack rack2
-17 0 host hostd
-18 0 rack rack3
-19 0 host hoste
-1 0.08817 root default
-3 0.02939 host serverc
0 hdd 0.00980 osd.0
2 hdd 0.00980 osd.2
1 ssd 0.00980 osd.1
-5 0.02939 host serverd
3 hdd 0.00980 osd.3
7 hdd 0.00980 osd.7
5 ssd 0.00980 osd.5
-7 0.02939 host servere
4 hdd 0.00980 osd.4
8 hdd 0.00980 osd.8
6 ssd 0.00980 osd.6Place the OSDs as leaves in the new tree.
[ceph: root@clienta /]#ceph osd crush set osd.1 1.0 root=default-cl260 \ rack=rack1 host=hostcset item id 1 name 'osd.1' weight 1 at location {host=hostc,rack=rack1,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.5 1.0 root=default-cl260 \ rack=rack1 host=hostcset item id 5 name 'osd.5' weight 1 at location {host=hostc,rack=rack1,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.6 1.0 root=default-cl260 \ rack=rack1 host=hostcset item id 6 name 'osd.6' weight 1 at location {host=hostc,rack=rack1,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.0 1.0 root=default-cl260 \ rack=rack2 host=hostdset item id 0 name 'osd.0' weight 1 at location {host=hostd,rack=rack2,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.3 1.0 root=default-cl260 \ rack=rack2 host=hostdset item id 3 name 'osd.3' weight 1 at location {host=hostd,rack=rack2,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.4 1.0 root=default-cl260 \ rack=rack2 host=hostdset item id 4 name 'osd.4' weight 1 at location {host=hostd,rack=rack2,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.2 1.0 root=default-cl260 \ rack=rack3 host=hosteset item id 2 name 'osd.2' weight 1 at location {host=hoste,rack=rack3,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.7 1.0 root=default-cl260 \ rack=rack3 host=hosteset item id 7 name 'osd.7' weight 1 at location {host=hoste,rack=rack3,root=default-cl260} to crush map [ceph: root@clienta /]#ceph osd crush set osd.8 1.0 root=default-cl260 \ rack=rack3 host=hosteset item id 8 name 'osd.8' weight 1 at location {host=hoste,rack=rack3,root=default-cl260} to crush map
Display the CRUSH map tree to verify the new OSD locations.
[ceph: root@clienta /]# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-13 9.00000 root default-cl260
-14 3.00000 rack rack1
-15 3.00000 host hostc
1 ssd 1.00000 osd.1
5 ssd 1.00000 osd.5
6 ssd 1.00000 osd.6
-16 3.00000 rack rack2
-17 3.00000 host hostd
0 hdd 1.00000 osd.0
3 hdd 1.00000 osd.3
4 hdd 1.00000 osd.4
-18 3.00000 rack rack3
-19 3.00000 host hoste
2 hdd 1.00000 osd.2
7 hdd 1.00000 osd.7
8 hdd 1.00000 osd.8
-1 0 root default
-3 0 host serverc
-5 0 host serverd
-7 0 host servereAll the OSDs with SSD devices are in the rack1 bucket and no OSDs are in the default tree.
Add a custom CRUSH rule by decompiling the binary CRUSH map and editing the resulting text file to add a new CRUSH rule called ssd-first.
This rule always selects OSDs backed by SSD storage as the primary OSD, and OSDs backed by HDD storage as secondary OSDs for each placement group.
When the rule is created, compile the map and load it into your cluster.
Create a new replicated pool called testcrush that uses the rule, and verify that its placement groups are mapped correctly.
Clients accessing the pools that are using this new rule will read data from fast drives because clients always read and write from the primary OSDs.
Retrieve the current CRUSH map by using the ceph osd getcrushmap command.
Store the binary map in the /home/ceph/cm-org.bin file.
[ceph: root@clienta /]# ceph osd getcrushmap -o ~/cm-org.bin
...output omitted...Use the crushtool command to decompile the binary map to the ~/cm-org.txt text file.
When successful, this command returns no output, but immediately use the echo $? command to determine its return code.
[ceph: root@clienta /]#crushtool -d ~/cm-org.bin -o ~/cm-org.txt[ceph: root@clienta /]#echo $?0
Save a copy of the CRUSH map as ~/cm-new.txt, and add the following rule at the end of the file.
[ceph: root@clienta /]#cp ~/cm-org.txt ~/cm-new.txt[ceph: root@clienta /]#cat ~/cm-new.txt...output omitted... rule onssd { id 3 type replicated min_size 1 max_size 10 step take default class ssd step chooseleaf firstn 0 type host step emit }rule ssd-first {id 5type replicatedmin_size 1max_size 10step take rack1step chooseleaf firstn 1 type hoststep emitstep take default-cl260 class hddstep chooseleaf firstn -1 type rackstep emit} # end crush map
With this rule, the first replica uses an OSD from rack1 (backed by SSD storage), and the remaining replicas use OSDs backed by HDD storage from different racks.
Compile your new CRUSH map.
[ceph: root@clienta /]# crushtool -c ~/cm-new.txt -o ~/cm-new.binBefore applying the new map to the running cluster, use the crushtool command with the --show-mappings option to verify that the first OSD is always from rack1.
[ceph: root@clienta /]# crushtool -i ~/cm-new.bin --test --show-mappings \
--rule=5 --num-rep 3
...output omitted...
CRUSH rule 5 x 1013 [5,4,7]
CRUSH rule 5 x 1014 [1,3,7]
CRUSH rule 5 x 1015 [6,2,3]
CRUSH rule 5 x 1016 [5,0,7]
CRUSH rule 5 x 1017 [6,0,8]
CRUSH rule 5 x 1018 [6,4,7]
CRUSH rule 5 x 1019 [1,8,3]
CRUSH rule 5 x 1020 [5,7,4]
CRUSH rule 5 x 1021 [5,7,4]
CRUSH rule 5 x 1022 [1,4,2]
CRUSH rule 5 x 1023 [1,7,4]The first OSD is always 1, 5, or 6, which corresponds to the OSDs with SSD devices from rack1.
Apply the new CRUSH map to your cluster by using the ceph osd setcrushmap command.
[ceph: root@clienta /]# ceph osd setcrushmap -i ~/cm-new.bin
...output omitted...Verify that the new ssd-first rule is now available.
[ceph: root@clienta /]#ceph osd crush rule lsreplicated_rule onssdssd-first
Create a new replicated pool called testcrush with 32 placement groups and use the ssd-first CRUSH map rule.
[ceph: root@clienta /]# ceph osd pool create testcrush 32 32 ssd-first
cephpool 'testcrush' createdVerify that the first OSDs for the placement groups in the pool called testcrush are the ones from rack1.
These OSDs are osd.1, osd.5, and osd.6.
[ceph: root@clienta /]#ceph osd lspools...output omitted... 6 myfast7 testcrush[ceph: root@clienta /]#ceph pg dump pgs_brief | grep ^6dumped pgs_brief 7.b active+clean [1,8,3] 1 [1,8,3] 1 7.8 active+clean [5,3,7] 5 [5,3,7] 5 7.9 active+clean [5,0,7] 5 [5,0,7] 5 7.e active+clean [1,2,4] 1 [1,2,4] 1 7.f active+clean [1,0,8] 1 [1,0,8] 1 7.c active+clean [6,0,8] 6 [6,0,8] 6 7.d active+clean [1,4,8] 1 [1,4,8] 1 7.2 active+clean [6,8,0] 6 [6,8,0] 6 7.3 active+clean [5,3,7] 5 [5,3,7] 5 7.0 active+clean [5,0,7] 5 [5,0,7] 5 7.5 active+clean [5,4,2] 5 [5,4,2] 5 ...output omitted...
Use the pg-upmap feature to manually remap some secondary OSDs in one of the PGs in the testcrush pool.
Use the new pg-upmap optimization feature to manually map a PG to specific OSDs. Remap the second OSD of your PG from the previous step to another OSD of your choosing, except 1, 5 or, 6.
[ceph: root@clienta /]# ceph osd pg-upmap-items 7.8 3 0
set 7.8 pg_upmap_items mapping to [3->0]Use the ceph pg map command to verify the new mapping.
When done, log off from clienta.
[ceph: root@clienta /]# ceph pg map 7.8
osdmap e238 pg 7.8 (7.8) -> up [5,0,7] acting [5,0,7]Return to workstation as the student user.
[ceph: root@clienta /]#exit[admin@clienta ~]$exit[student@workstation ~]$
This concludes the guided exercise.