After completing this section, you should be able to describe the purpose and modification of the OSD maps.
The cluster OSD map contains the address and status of each OSD, the pool list and details, and other information such as the OSD near-capacity limit information. Ceph uses these last parameters to send warnings and to stop accepting write requests when an OSD reaches full capacity.
When a change occurs in the cluster's infrastructure, such as OSDs joining or leaving the cluster, the MONs update the corresponding map accordingly. The MONs maintain a history of map revisions. Ceph identifies each version of each map using an ordered set of incremented integers known as epochs.
The ceph status -f json-pretty command displays the epoch of each map.
Use the ceph subcommand to display each individual map, such as map dumpceph osd dump.
[ceph: root@serverc /]#ceph status -f json-pretty...output omitted... "osdmap": { "epoch": 478, "num_osds": 15, "num_up_osds": 15, "osd_up_since": 1632743988, "num_in_osds": 15, "osd_in_since": 1631712883, "num_remapped_pgs": 0 }, ...output omitted... [ceph: root@serverc /]#ceph osd dumpepoch 478fsid 11839bde-156b-11ec-bb71-52540000fa0c created 2021-09-14T14:50:39.401260+0000 modified 2021-09-27T12:04:26.832212+0000 flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit crush_version 69 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client luminous min_compat_client luminous require_osd_release pacific stretch_mode_enabled falsepool 1'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 475 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth ...output omitted...osd.0up in weight 1 up_from 471 up_thru 471 down_at 470 last_clean_interval [457,466) [v2:172.25.250.12:6801/1228351148,v1:172.25.250.12:6802/1228351148] [v2:172.25.249.12:6803/1228351148,v1:172.25.249.12:6804/1228351148] exists,up cfe311b0-dea9-4c0c-a1ea-42aaac4cb160 ...output omitted...
Ceph updates the OSD map every time an OSD joins or leaves the cluster. An OSD can leave the Ceph cluster either because of an OSD failure or a hardware failure.
Even though the cluster map as a whole is maintained by the MONs, OSDs do not use a leader to manage the OSD map; they propagate the map among themselves. OSDs tag every message they exchange with the OSD map epoch. When an OSD detects that it is lagging behind, it performs a map update with its peer OSD.
In large clusters, where OSD map updates are frequent, it is not practical to always distribute the full map. Instead, receiving OSD nodes perform incremental map updates.
Ceph also tags the messages between OSDs and clients with the epoch. Whenever a client connects to an OSD, the OSD inspects the epoch. If the epoch does not match, then the OSD responds with the correct increment so that the client can update its OSD map. This negates the need for aggressive propagation, because clients learn about the updated map only at the time of next contact.
To access a Ceph cluster, a client first retrieves a copy of the cluster map from the MONs. All MONs must have the same cluster map for the cluster to function correctly.
The MON submits a map update to Paxos and only writes the new version to the local key-value store after Paxos acknowledges the update. The read operations directly access the key-value store.
OSDs regularly report their status to the monitors. In addition, OSDs exchange heartbeats so that an OSD can detect the failure of a peer and report that event to the monitors.
When a leader monitor learns of an OSD failure, it updates the map, increments the epoch, and uses the Paxos update protocol to notify the other monitors, at the same time revoking their leases. After a majority of monitors acknowledge the update, and the cluster has a quorum, the leader monitor issues a new lease so that the monitors can distribute the updated OSD map. This method avoids the map epoch ever going backwards anywhere in the cluster, and finding previous leases that are still valid.
Use the following commands to manage the OSD map as an administrator:
| Command | Action |
|---|---|
ceph osd dump
| Dump the OSD map to standard output. |
ceph osd getmap -o
| Export a binary copy of the current map. |
osdmaptool --print
| Display a human-readable copy of the map to standard output. |
osdmaptool --export-crush
| Extract the CRUSH map from the OSD map. |
osdmaptool --import-crush
| Embed a new CRUSH map. |
osdmaptool --test-map-pg
| Verify the mapping of a given PG. |
For more information, refer to the Red Hat Ceph Storage 5 Storage Strategies Guide at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/storage_strategies_guide