Goal	Identify and resolve issues related to storage.
Objectives	Describe the Linux storage layers and their function, and identify tools to examine activity at different layers. Detect and recover from file system corruption. Revert an LVM storage misconfiguration. Recover data from an encrypted device. Identify and repair iSCSI issues.
Sections	Describing the Linux Storage Stack (and Guided Exercise) Recovering from File System Corruption (and Guided Exercise) Repairing LVM Issues (and Guided Exercise) Resolving Storage Device Encryption Issues (and Guided Exercise) Resolving iSCSI Issues (and Guided Exercise)
Lab	Troubleshooting Storage Issues

Describing the Linux Storage Stack

Objectives

Describe the Linux storage layers and their function, and identify tools to examine activity at different layers.

The Linux Storage Stack

Troubleshooting storage issues in Red Hat Enterprise Linux starts with understanding how I/O is passed from applications through the Linux storage stack to storage devices.

Applications request system calls to read and write from storage. The kernel processes these system calls through the storage stack of software and hardware layers, to move application data to storage devices. Application programmers can use the storage stack to access data as standardized data structures without the need to interact directly with specific file system and storage device implementations. The storage stack also provides features that improve the performance and reliability of I/O operations.

Figure 5.1: The Linux storage stack

The Virtual File System (VFS)

The Virtual File System (VFS) provides support for standard POSIX system calls to read and write files. VFS implements a common file model so that applications use the same system calls, such as creat(), open(), read(), write(), and mmap(), to access files without knowing the underlying file system's implementation details. File system implementations such as XFS, ext4, FAT32, and others plug in to VFS as modules and so VFS provides a common abstraction for the data on those file systems.

Linux applications expect that directories are accessed like any other file, but some file systems store directory data differently. VFS ensures that applications access directories as files while using the file system's driver to correctly implement directories in that file system.

VFS maintains caches to improve storage I/O performance, including an inode cache, dentry cache, buffer cache, and a page cache. The page cache is dynamically allocated by using free memory, and caches disk blocks during file system reads and writes. The buffer cache is unified with the page cache, except for memory structures that are not backed by files, such as metadata or raw block I/O. The inode cache and directory cache ease access to inodes (file metadata) and directory entries. The /proc/slabinfo file records the memory usage of both caches.

Applications that implement their own caching, such as databases, use direct I/O to bypass the page cache. In these applications, the only process to access files or volumes has the O_DIRECT flag. Multiple files that attempt to use the same direct I/O files or that conflict with the page cache for that file can cause corruption.

Viewing VFS Memory Use

VFS issues usually involve memory shortages that constrain cache use or sysctl tuning mistakes. Use the free command for a simple cache overview, or view /proc/meminfo for details.

[root@host ~]# cat /proc/meminfo
MemTotal:        1860624 kB
MemFree:         1480952 kB
MemAvailable:    1515816 kB
Buffers:            2240 kB
Cached:           164108 kB

Cache is dynamically allocated from unused free memory as needed to support I/O. Cache pages are released for applications to use when the system is under memory pressure. The page, inode, and dentry caches can be cleared, but short-term performance drops until the caches are repopulated with frequently accessed data.

[root@host ~]# echo 3 > /proc/sys/vm/drop_caches

Many kernel sysctl tunables in /proc/sys/vm affect disk cache performance and memory management. These topics are discussed in detail in the Red Hat Performance Tuning: Linux in Physical, Virtual, and Cloud (RH442) training course.

File Systems

File systems provide the logical structures for organizing and naming metadata and data in storage, and how they are secured against compromise and corruption. The default file systems in RHEL, XFS and ext4, share many basic POSIX features to integrate with VFS and the common file model. File systems might be backed by block storage (XFS, ext4, GFS2, FAT32), network storage (NFS, SMB), or might be memory-based pseudo-filesystems (procfs, sysfs) or special purpose file systems (tmpfs).

Device Mapper

The device mapper creates mapping tables of blocks from one device layer to blocks in another logical device. Device layers build complex storage structures with LVM volumes, LUKS disk encryption, RAID, and other compatible layers. Device mapper use is optional; you can directly format physical block devices with a file system without using it.

In this example, an LVM logical volume named /dev/mapper/myvg1-mylv1 is built from two physical volumes, /dev/vdb1 and /dev/vdb2. When initially created with LVM utilities, the device mapper mapped the /dev/vdb1 and /dev/vdb2 physical block device partitions to the /dev/dm-0 higher logical device. Use the dmsetup command to view device mappings:

[root@host ~]# dmsetup ls
myvg1-mylv1    (252:0)
[root@servera ~]# ls -l /dev/mapper/myvg1-mylv1
lrwxrwxrwx. 1 root root 7 Sep 30 18:30 /dev/mapper/myvg1-mylv1 -> ../dm-0
[root@host ~]# dmsetup table /dev/mapper/myvg1-mylv1
0 1015808 linear 253:17 2048
1015808 1015808 linear 253:18 2048

The /dev/mapper/myvg1-mylv1 volume has two mappings. The first is a 1:1 linear mapping of the block device with major:minor number 253:17 to the first 1015808 blocks of /dev/mapper/myvg1-mylv1. The second is a 1:1 linear mapping of the block device with major:minor number 253:18 to the next 1015808 blocks of /dev/mapper/myvg1-mylv1, starting at block 1015808. The major:minor numbers 253:17 and 253:18 correspond to /dev/vdb1 and /dev/vdb2:

[root@host ~]# ls -l /dev/vdb*
brw-rw----. 1 root disk 253, 16 Sep 30 18:28 /dev/vdb
brw-rw----. 1 root disk 253, 17 Sep 30 18:30 /dev/vdb1
brw-rw----. 1 root disk 253, 18 Sep 30 18:30 /dev/vdb2

The resulting /dev/dm-0 logical volume can be represented visually:

Figure 5.2: Device mapping between a logical volume and its two virtual storage devices

The logical device that the device mapper created in this example (/dev/dm-0) therefore has 2031616 sectors (1015808 sectors from /dev/vdb1 and 1015808 sectors from /dev/vdb2).

Disk Schedulers

Disk schedulers are responsible for ordering the I/O requests that are submitted to a storage device. In Red Hat Enterprise Linux 8, block devices support only multi-queue scheduling. This enables block layer performance to scale with fast solid-state drives (SSDs) and multi-core systems.

Traditional, single-queue schedulers, which were available in RHEL 7 and earlier versions, are removed. These multi-queue schedulers are available in RHEL 8:

[root@host ~]# dmesg | grep -i 'io scheduler'
[    1.136832] io scheduler mq-deadline registered
[    1.137038] io scheduler kyber registered
[    1.137273] io scheduler bfq registered

none: Implements a first-in first-out (FIFO) scheduling algorithm, merging requests at the generic block layer through a simple last-hit cache.
mq-deadline: Sorts queued I/O requests into a read or write batch and schedules them for execution in increasing logical block addressing (LBA) order. After the scheduler processes a batch, it checks how long write operations have been starved of processor time and schedules the next read or write batch appropriately. This scheduler is especially suitable where read operations are more frequent than write operations.
kyber: Tunes itself to achieve a latency goal by calculating the latencies of every I/O request that is submitted to the block I/O layer.
bfq: Ensures that a single application never uses all the bandwidth. Focuses on delivering the lowest latency rather than achieving the maximum throughput.

To determine which disk scheduler is currently active on a given block device, view the /sys/block/device/queue/scheduler file:

[root@host ~]# cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

Device Mapper Multipath

Device mapper multipath (DM Multipath) configures multiple I/O paths between servers and storage arrays so that they appear as a single device. These paths can be physical connections with separate cables, switches, and controllers. Multipathing aggregates the I/O, creating a new device that consists of the redundant, aggregated paths.

Figure 5.3: Device mapper multipath

The RHEL server has two host bus adapters (HBAs) that are attached to a SAN, typically Fibre Channel connections. The SAN connects to separate storage controllers. This configuration supports four possible paths to the back-end storage, improving performance and fault tolerance.

In RHEL, the device mapper generates separate block devices to represent different paths to the same storage device. The multipathd daemon and the multipath command manage these devices. These paths are accessed through a higher-level multipath block device, which sends the request to a particular path by selecting the block device for that path. Multipath devices can be named for the World Wide ID (WWID) of the storage device, a more convenient name mpath plus a sequence or path number, or custom names.

The SCSI Mid-layer

The SCSI mid-layer is a bridge between the SCSI targets that present storage devices and the host bus adapters or hardware interface card drivers that communicate with the storage device. The block device drivers are the SCSI disk (sd) and the SCSI CDROM (sr) driver. The SCSI mid-layer also provides one SCSI driver for character-based tape devices (st) and one for generic SCSI devices such as scanners (sg).

All devices that can use or emulate the SCSI protocol can use the SCSI mid-layer. This mid-layer includes seemingly unrelated devices such as SATA devices, USB storage, and virtual machine disks. These devices are presented as SCSI devices and use appropriate device names, such as /dev/sda. Some storage devices bypass this layer. For example, the /dev/vd* devices are paravirtualized devices that use the virtio_blk driver, which does not emulate the SCSI protocol.

Low-level Drivers

Low-level drivers communicate with physical system hardware. Examples include SCSI drivers for Qlogic (qla2xxx), Adaptec (aacraid), or Emulex (lpfc) devices, or local SATA or USB (libata and ahci) devices. An iSCSI device that uses TCP/IP transport would use a SCSI driver (such as iscsi_tcp) before passing its traffic to the network stack. Some paravirtualized devices are presented as SCSI (virtio_scsi and vmw_pvscsi) disk devices. Paravirtualized drivers such as virtio_blk interact directly with the scheduler at the block layer.

The low-level driver receives I/O from the scheduler at the block layer and dispatches it to the storage hardware. The controller takes incoming I/O requests and forwards them to the underlying hardware controller. The driver does not queue the I/O, but tracks the active I/O requests.

Stratis Storage Management

Stratis is a solution that eases local storage management with a focus on simplicity. The service manages pools of physical storage devices, which are created from one or more local disks or partitions. Volumes are created from the pools, with many useful features, such as file system snapshots, thin provisioning, and data tiering. Stratis pools have been tested on these block device types:

LUKS
LVM logical volumes
MD RAID
DM Multipath
iSCSI
HDDs and SSDs
NVMe devices

Installing and Creating Stratis Pools

Install Stratis packages and enable the service:

[root@host ~]# yum install stratisd stratis-cli
[root@host ~]# systemctl enable --now stratisd

Prepare devices for use in pools by erasing any existing file system, partition table, or RAID signatures on each block device:

[root@host ~]# wipefs --all block-device

Create the Stratis pool by selecting the block devices to use. You can also create Stratis pools in encrypted form, with the kernel key ring as the primary encryption mechanism.

[root@host ~]# stratis pool create my-pool block-device

You can attach further block devices to a pool to increase the storage capacity for file systems.

Stratis File System

Stratis file systems are thinly provisioned without a fixed total size. If the size of the data approaches the virtual size of the file system, then Stratis grows the thin volume and its XFS file system automatically. Create a Stratis file system on a pool, by using the stratis fs create command:

[root@host ~]# stratis fs create my-pool my-fs

To persistently mount a Stratis file system, add an entry in the /etc/fstab file, to use the Stratis file system UUID. To retrieve the Stratis file system UUID, use lsblk.

[root@servera ~]# lsblk --output=UUID /dev/stratis/my-pool/my-fs
UUID
b65883bf-cd83-420d-bb78-433b6545c053

References

A detailed storage stack diagram is available under the CC-BY-SA license at Linux Storage Stack Diagram

For further information, refer to the Managing Storage Devices Guide at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/managing_storage_devices/index

For further information, refer to the Managing layered local storage with Stratis section in System Design Guide at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/system_design_guide/index

free(1), dmsetup(8), multipath(8), blktrace(8), blkparse(1), btt(1), and stratis(1) man pages

Discuss Red Hat Enterprise Linux Diagnostics and Troubleshooting

Go to community

Welcome to the Red Hat Enterprise Linux Diagnostics and Troubleshooting (RH342) group!

cschunke

26 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat Enterprise Linux Diagnostics and Troubleshooting! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to RH342.Read more about Red Hat Enterprise Linux Diagnostics and Troubleshooting here.

445

Revision: rh342-8.4-6dd89bd

Click Create to build all of the virtual machines needed for the classroom lab environment. This may take several minutes to complete. Once created the environment can then be stopped and restarted to pause your experience.
When a lab is created, click Start to run all of the virtual machines in the classroom.
Click Stop to stop all the virtual machines from running. This will not delete your lab.
If you Delete your lab, you will remove all of the virtual machines in your classroom and lose all of your progress.

Virtual machine actions

Click Start to power on the virtual machine.
Click Shutdown to gracefully shut down the virtual machine, preserving disk contents.
Click Power off to forcefully shut down the virtual machine, while still preserving disk contents.
Click Open console to connect to the system console of the virtual machine in a new browser.

Auto-stop timer

The Red Hat Learning Subscription entitles you to set allotment of lab time.
To help conserving your allotted time, the lab environment uses automatic timers to stop or destroy your lab environment when the timer expires.

Click the Auto-stop button [+] to extend the time you would like to spend with the labs.
Click the Auto-destroy button [+] to add day(s) to the auto-destroy timer.

Auto-stop has a maximum of 11 hours, and auto-destroy has a maximum of 14 days.
Be careful to keep the timers set while you are working, so that your environment doesn't shut down unexpectedly.
We also suggest not to set the auto-timers unnecessarily high, which could waste your lab time allotment