Red Hat Enterprise Linux Diagnostics and Troubleshooting
- Section Describing the Linux Storage Stack
- Guided Exercise: Configuring Storage with Stratis
- Recovering from File System Corruption
- Guided Exercise: Recovering from File System Corruption
- Repairing LVM Issues
- Guided Exercise: Repairing LVM Issues
- Resolving Storage Device Encryption Issues
- Guided Exercise: Resolving Storage Device Encryption Issues
- Resolving iSCSI Issues
- Guided Exercise: Resolving iSCSI Issues
- Lab: Troubleshooting Storage Issues
- Summary
Abstract
| Goal |
Identify and resolve issues related to storage. |
| Objectives |
|
| Sections |
|
| Lab |
|
Describe the Linux storage layers and their function, and identify tools to examine activity at different layers.
Troubleshooting storage issues in Red Hat Enterprise Linux starts with understanding how I/O is passed from applications through the Linux storage stack to storage devices.
Applications request system calls to read and write from storage. The kernel processes these system calls through the storage stack of software and hardware layers, to move application data to storage devices. Application programmers can use the storage stack to access data as standardized data structures without the need to interact directly with specific file system and storage device implementations. The storage stack also provides features that improve the performance and reliability of I/O operations.
The Virtual File System (VFS) provides support for standard POSIX system calls to read and write files. VFS implements a common file model so that applications use the same system calls, such as creat(), open(), read(), write(), and mmap(), to access files without knowing the underlying file system's implementation details. File system implementations such as XFS, ext4, FAT32, and others plug in to VFS as modules and so VFS provides a common abstraction for the data on those file systems.
Linux applications expect that directories are accessed like any other file, but some file systems store directory data differently. VFS ensures that applications access directories as files while using the file system's driver to correctly implement directories in that file system.
VFS maintains caches to improve storage I/O performance, including an inode cache, dentry cache, buffer cache, and a page cache. The page cache is dynamically allocated by using free memory, and caches disk blocks during file system reads and writes. The buffer cache is unified with the page cache, except for memory structures that are not backed by files, such as metadata or raw block I/O. The inode cache and directory cache ease access to inodes (file metadata) and directory entries. The /proc/slabinfo file records the memory usage of both caches.
Applications that implement their own caching, such as databases, use direct I/O to bypass the page cache. In these applications, the only process to access files or volumes has the O_DIRECT flag. Multiple files that attempt to use the same direct I/O files or that conflict with the page cache for that file can cause corruption.
VFS issues usually involve memory shortages that constrain cache use or sysctl tuning mistakes. Use the free command for a simple cache overview, or view /proc/meminfo for details.
[root@host ~]# cat /proc/meminfo
MemTotal: 1860624 kB
MemFree: 1480952 kB
MemAvailable: 1515816 kB
Buffers: 2240 kB
Cached: 164108 kBCache is dynamically allocated from unused free memory as needed to support I/O. Cache pages are released for applications to use when the system is under memory pressure. The page, inode, and dentry caches can be cleared, but short-term performance drops until the caches are repopulated with frequently accessed data.
[root@host ~]# echo 3 > /proc/sys/vm/drop_cachesMany kernel sysctl tunables in /proc/sys/vm affect disk cache performance and memory management. These topics are discussed in detail in the Red Hat Performance Tuning: Linux in Physical, Virtual, and Cloud (RH442) training course.
File systems provide the logical structures for organizing and naming metadata and data in storage, and how they are secured against compromise and corruption. The default file systems in RHEL, XFS and ext4, share many basic POSIX features to integrate with VFS and the common file model. File systems might be backed by block storage (XFS, ext4, GFS2, FAT32), network storage (NFS, SMB), or might be memory-based pseudo-filesystems (procfs, sysfs) or special purpose file systems (tmpfs).
The device mapper creates mapping tables of blocks from one device layer to blocks in another logical device. Device layers build complex storage structures with LVM volumes, LUKS disk encryption, RAID, and other compatible layers. Device mapper use is optional; you can directly format physical block devices with a file system without using it.
In this example, an LVM logical volume named /dev/mapper/myvg1-mylv1 is built from two physical volumes, /dev/vdb1 and /dev/vdb2. When initially created with LVM utilities, the device mapper mapped the /dev/vdb1 and /dev/vdb2 physical block device partitions to the /dev/dm-0 higher logical device. Use the dmsetup command to view device mappings:
[root@host ~]#dmsetup lsmyvg1-mylv1 (252:0) [root@servera ~]#ls -l /dev/mapper/myvg1-mylv1lrwxrwxrwx. 1 root root 7 Sep 30 18:30 /dev/mapper/myvg1-mylv1 -> ../dm-0 [root@host ~]#dmsetup table /dev/mapper/myvg1-mylv10 1015808 linear 253:17 2048 1015808 1015808 linear 253:18 2048
The /dev/mapper/myvg1-mylv1 volume has two mappings. The first is a 1:1 linear mapping of the block device with major:minor number 253:17 to the first 1015808 blocks of /dev/mapper/myvg1-mylv1. The second is a 1:1 linear mapping of the block device with major:minor number 253:18 to the next 1015808 blocks of /dev/mapper/myvg1-mylv1, starting at block 1015808. The major:minor numbers 253:17 and 253:18 correspond to /dev/vdb1 and /dev/vdb2:
[root@host ~]# ls -l /dev/vdb*
brw-rw----. 1 root disk 253, 16 Sep 30 18:28 /dev/vdb
brw-rw----. 1 root disk 253, 17 Sep 30 18:30 /dev/vdb1
brw-rw----. 1 root disk 253, 18 Sep 30 18:30 /dev/vdb2The resulting /dev/dm-0 logical volume can be represented visually:
The logical device that the device mapper created in this example (/dev/dm-0) therefore has 2031616 sectors (1015808 sectors from /dev/vdb1 and 1015808 sectors from /dev/vdb2).
Disk schedulers are responsible for ordering the I/O requests that are submitted to a storage device. In Red Hat Enterprise Linux 8, block devices support only multi-queue scheduling. This enables block layer performance to scale with fast solid-state drives (SSDs) and multi-core systems.
Traditional, single-queue schedulers, which were available in RHEL 7 and earlier versions, are removed. These multi-queue schedulers are available in RHEL 8:
[root@host ~]# dmesg | grep -i 'io scheduler'
[ 1.136832] io scheduler mq-deadline registered
[ 1.137038] io scheduler kyber registered
[ 1.137273] io scheduler bfq registered-
none Implements a first-in first-out (FIFO) scheduling algorithm, merging requests at the generic block layer through a simple last-hit cache.
-
mq-deadline Sorts queued I/O requests into a read or write batch and schedules them for execution in increasing logical block addressing (LBA) order. After the scheduler processes a batch, it checks how long write operations have been starved of processor time and schedules the next read or write batch appropriately. This scheduler is especially suitable where read operations are more frequent than write operations.
-
kyber Tunes itself to achieve a latency goal by calculating the latencies of every I/O request that is submitted to the block I/O layer.
-
bfq Ensures that a single application never uses all the bandwidth. Focuses on delivering the lowest latency rather than achieving the maximum throughput.
To determine which disk scheduler is currently active on a given block device, view the /sys/block/ file:device/queue/scheduler
[root@host ~]# cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq noneDevice mapper multipath (DM Multipath) configures multiple I/O paths between servers and storage arrays so that they appear as a single device. These paths can be physical connections with separate cables, switches, and controllers. Multipathing aggregates the I/O, creating a new device that consists of the redundant, aggregated paths.
The RHEL server has two host bus adapters (HBAs) that are attached to a SAN, typically Fibre Channel connections. The SAN connects to separate storage controllers. This configuration supports four possible paths to the back-end storage, improving performance and fault tolerance.
In RHEL, the device mapper generates separate block devices to represent different paths to the same storage device. The multipathd daemon and the multipath command manage these devices. These paths are accessed through a higher-level multipath block device, which sends the request to a particular path by selecting the block device for that path. Multipath devices can be named for the World Wide ID (WWID) of the storage device, a more convenient name mpath plus a sequence or path number, or custom names.
The SCSI mid-layer is a bridge between the SCSI targets that present storage devices and the host bus adapters or hardware interface card drivers that communicate with the storage device. The block device drivers are the SCSI disk (sd) and the SCSI CDROM (sr) driver. The SCSI mid-layer also provides one SCSI driver for character-based tape devices (st) and one for generic SCSI devices such as scanners (sg).
All devices that can use or emulate the SCSI protocol can use the SCSI mid-layer. This mid-layer includes seemingly unrelated devices such as SATA devices, USB storage, and virtual machine disks. These devices are presented as SCSI devices and use appropriate device names, such as /dev/sda. Some storage devices bypass this layer. For example, the /dev/vd* devices are paravirtualized devices that use the virtio_blk driver, which does not emulate the SCSI protocol.
Low-level drivers communicate with physical system hardware. Examples include SCSI drivers for Qlogic (qla2xxx), Adaptec (aacraid), or Emulex (lpfc) devices, or local SATA or USB (libata and ahci) devices. An iSCSI device that uses TCP/IP transport would use a SCSI driver (such as iscsi_tcp) before passing its traffic to the network stack. Some paravirtualized devices are presented as SCSI (virtio_scsi and vmw_pvscsi) disk devices. Paravirtualized drivers such as virtio_blk interact directly with the scheduler at the block layer.
The low-level driver receives I/O from the scheduler at the block layer and dispatches it to the storage hardware. The controller takes incoming I/O requests and forwards them to the underlying hardware controller. The driver does not queue the I/O, but tracks the active I/O requests.
Stratis is a solution that eases local storage management with a focus on simplicity. The service manages pools of physical storage devices, which are created from one or more local disks or partitions. Volumes are created from the pools, with many useful features, such as file system snapshots, thin provisioning, and data tiering. Stratis pools have been tested on these block device types:
LUKS
LVM logical volumes
MD RAID
DM Multipath
iSCSI
HDDs and SSDs
NVMe devices
Install Stratis packages and enable the service:
[root@host ~]#yum install stratisd stratis-cli[root@host ~]#systemctl enable --now stratisd
Prepare devices for use in pools by erasing any existing file system, partition table, or RAID signatures on each block device:
[root@host ~]# wipefs --all block-deviceCreate the Stratis pool by selecting the block devices to use. You can also create Stratis pools in encrypted form, with the kernel key ring as the primary encryption mechanism.
[root@host ~]# stratis pool create my-pool block-deviceYou can attach further block devices to a pool to increase the storage capacity for file systems.
Stratis file systems are thinly provisioned without a fixed total size. If the size of the data approaches the virtual size of the file system, then Stratis grows the thin volume and its XFS file system automatically. Create a Stratis file system on a pool, by using the stratis fs create command:
[root@host ~]# stratis fs create my-pool my-fsTo persistently mount a Stratis file system, add an entry in the /etc/fstab file, to use the Stratis file system UUID. To retrieve the Stratis file system UUID, use lsblk.
[root@servera ~]# lsblk --output=UUID /dev/stratis/my-pool/my-fs
UUID
b65883bf-cd83-420d-bb78-433b6545c053References
A detailed storage stack diagram is available under the CC-BY-SA license at Linux Storage Stack Diagram
For further information, refer to the Managing Storage Devices Guide at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/managing_storage_devices/index
For further information, refer to the Managing layered local storage with Stratis section in System Design Guide at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/system_design_guide/index
free(1), dmsetup(8), multipath(8), blktrace(8), blkparse(1), btt(1), and stratis(1) man pages