Red Hat Enterprise Linux Diagnostics and Troubleshooting
Red Hat Enterprise Linux provides several file systems to fit various workloads and use cases. In RHEL 8, the default is XFS, and ext4 remains popular.
Note
Although the ext4 driver can read and write to ext2 and ext3 file systems, Red Hat recommends to use only ext4 for support and stability.
XFS and ext4 are journaling file systems. Uncommitted file activity is written to a journal first. If the system loses power or cannot finish transactions, then the journal can recover unsaved information and avoid file system metadata corruption. Although journaling improves fault tolerance, it does not mitigate all file system corruption. File system corruption can still occur due to hardware failure, human error, or software bugs.
When file system corruption occurs or is suspected, use file system checking tools to check the file system for consistency. When the data corruption is identified, use file system repair tools to restore file system consistency. Use recommended practices when recovering from file system corruption:
If a hardware failure caused the data corruption, then resolve the hardware issue before checking or repairing a file system. For example, if a storage disk is failing, then move the file system to a functional disk before performing file system maintenance. Moving a corrupted file system requires copying the data partition block by block with utilities such as
dd.Always check a file system before repairing it. This check serves as a dry run for the file system repair, and can report discovered file system issues and suggested corrective actions.
A file system must be unmounted before file system administration. To successfully restore file system consistency, file system repair tools must have exclusive access to the partition table. Unmounting a file system ensures that non-storage operations, other than the file system repair, can occur. Running a file system repair on a mounted file system often leads to further data corruption.
After a power loss or system crash, RHEL 8 systems automatically invoke the e2fsck utility to recover the journal of ext4 file systems. On replaying the uncommitted changes in the journal, e2fsck marks whether the file systems are consistent. If inconsistencies are discovered, e2fsck fully checks the file system and requests user intervention if it cannot safely resolve an issue. Both the e2fsck and the dumpe2fs utilities are part of the e2fsprogs package.
Administrators can manually execute e2fsck to check a file system. Unmount the file system, and then run e2fsck with the -n option and the device name for the file system. The -n option places the file system in read-only mode, and answers no to all questions during the file system check.
[root@host ~]#umount /dev/vdb1[root@host ~]#e2fsck -n /dev/vdb1
Checking a file system requires a usable superblock. If the default superblock location is corrupted, then locate a backup copy of the superblock to use for the file system check.
[root@host ~]# e2fsck -n /dev/vdb1
e2fsck 1.42.9 (28-Dec-2013)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/vdb1
The superblock could not be read or does not describe a correct ext2
file system. If the device is valid and it really contains an ext2
file system (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>The location of backup superblocks varies depending on the file system's block size when created. To determine the location of backup superblocks, use the dumpe2fs utility.
[root@host ~]# dumpe2fs /dev/vdb1 | grep 'Backup superblock'
dumpe2fs 1.42.9 (28-Dec-2013)
Backup superblock at 32768, Group descriptors at 32769-32769
Backup superblock at 98304, Group descriptors at 98305-98305
Backup superblock at 163840, Group descriptors at 163841-163841
Backup superblock at 229376, Group descriptors at 229377-229377After locating the backup superblocks, use the -b option and select one to use as an alternative superblock during the file system check.
[root@host ~]#e2fsck -n /dev/vdb1 -b 32768e2fsck 1.42.9 (28-Dec-2013) /dev/vdb1 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/vdb1: 11/65536 files (0.0% non-contiguous), 8859/261888 blocks [root@host ~]#echo $?0
Determine the file system check result with the e2fsck command's exit status. The reported exit status is the sum of the triggered exit codes.
| Exit code | Description |
|---|---|
| 0 | No errors. |
| 1 | File system errors corrected. |
| 4 | File system errors uncorrected. |
| 8 | Operational error. |
| 16 | Usage error. |
| 32 | Cancelled by user request. |
| 128 | Shared library error. |
Unlike ext4 file systems, RHEL 8 systems do not automatically initiate checks and repairs on XFS file systems. Check XFS file systems manually with the xfs_repair utility, which the xfsprogs package provides. The xfs_repair command is invoked with the -n option to check a file system. With the -n option, xfs_repair scans the file system and only reports potential repairs without taking corrective action. This example checks the XFS file system on /dev/vdb1:
[root@host ~]# xfs_repair -n /dev/vdb1The xfs_repair command does not execute on an XFS file system that does not have a clean journal log. Journal logs can be corrupted by unclean system shutdowns or system crashes. Mount and unmount an XFS file system to clean its journal logs.
Like ext4 file systems, an XFS file system check can fail to execute due to a corrupted primary superblock. However, unlike e2fsck, with xfs_repair you do not need to locate and specify an alternative superblock. The xfs_repair command automatically scans the XFS file system, locates a secondary superblock, and uses it to recover the primary superblock.
[root@host ~]#xfs_repair -n /dev/vdb1Phase 1 - find and verify superblock... Phase 2 - using internal log - scan file system freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing file system ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping file system flush and exiting. [root@host ~]#echo $?0
An XFS file system check returns an exit code of 1 if file system corruption was detected, and an exit code of 0 if the file system is clean.
After a file system check, identify the file system errors and the corrective actions to take. Repair the file system with the correct utility for your file system.
The ext4 file systems are repaired with the same e2fsck command that checks a file system. Without the -n option, e2fsck applies all safe corrective actions. For operations that cannot be done safely, e2fsck prompts the user for whether to take the action.
The file system must be unmounted during file system repair to ensure file system consistency and to prevent further corruption. Depending on the severity of the file system corruption, you might choose to execute e2fsck with additional options.
| Option | Description |
|---|---|
| -b location | Use an alternative superblock at the specified location. |
| -p | Automatically repair the file system. Prompt user only for problems that cannot be safely fixed. |
| -v | Verbose mode. |
| -y | Run in noninteractive mode and answer yes to all questions. This option cannot be used with the -p or -n options. |
This example repairs an ext4 file system with the -b option to execute an interactive file system check with an alternative superblock.
[root@host ~]#e2fsck /dev/vdb1 -b 98304e2fsck 1.42.9 (28-Dec-2013) /dev/vdb1 was not cleanly unmounted, check forced. Resize inode not valid. Recreate?yesPass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (28520, counted=28521). Fix?yesFree blocks count wrong (253028, counted=253029). Fix?yes/dev/vdb1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/vdb1: 11/65536 files (0.0% non-contiguous), 8859/261888 blocks [root@host ~]#echo $?1
Determine the result of the file system repair by the e2fsck exit code.
XFS file systems are repaired with the same xfs_repair command to check a file system. Without the -n option, xfs_repair takes all safe corrective actions. Unmount the file system before repairing it to ensure file system consistency and to prevent further corruption.
An xfs_repair can be executed only on an XFS file system with a clean journal log. If mounting and unmounting an XFS file system does not result in a clean journal, then the journal might be corrupted. Because a clean log is required, this scenario might require adding the -L option to the xfs_repair command, which clears the journal log. When the journal is unrecoverable, this step is necessary to proceed, but this option discards all journal metadata, which might result in further issues with recently written data.
Unlike e2fsck, xfs_repair is not an interactive utility. When initiated, the XFS file system repair will perform all operations automatically without any user input. An xfs_repair always returns an exit code of 0 on completion of the file system repair.
[root@host ~]#xfs_repair /dev/vdb1Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan file system freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 Invalid inode number 0x499602d2 xfs_dir_ino_validate: XFS_ERROR_REPORT Metadata corruption detected at block 0xa7e0/0x1000 entry "subscription-manager" at block 0 offset 456 in directory inode 144973 references invalid inode 1234567890 clearing inode number in entry at offset 456... entry at block 0 offset 456 in directory inode 144973 has illegal name "/ubscription-manager": - process newly discovered i nodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing file system ... bad hash table for directory inode 144973 (no data entry): rebuilding rebuilding directory inode 144973 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 145282, moving to lost+found Phase 7 - verify and correct link counts... done [root@host ~]#echo $?0
During an XFS file system repair, you might discover files and directories in use with allocated inodes but unreferenced by their parent directories. When these orphaned files and directories are discovered during file system checks, they are deposited in the lost+found directory at the root of that file system. If files are missing after a file system repair, then review whether they were relocated to the lost+found directory.
Note
RHEL 8 does not provide any automated ways to recover files in the lost+found directory. To recover files from lost+found, use the mv command to move files manually to their correct directory.
File system checking and repair cannot guarantee data recovery, and do not replace a tested backup and recovery strategy. If severely damaged inodes or directories are encountered during file system repair, then the tools might permanently discard the inodes that cannot be fixed. Discarded data can be recovered only from recent backups of the file system's data.
References
For more information, see the e2fsck(8), dumpe2fs(8), and xfs_repair(8) man pages.