RH342 - ch05s03

Bookmark this page

Recovering from File System Corruption

Objectives

Detect and recover from file system corruption.

File System Choices

Red Hat Enterprise Linux provides several file systems to fit various workloads and use cases. In RHEL 8, the default is XFS, and ext4 remains popular.

Note

Although the ext4 driver can read and write to ext2 and ext3 file systems, Red Hat recommends to use only ext4 for support and stability.

XFS and ext4 are journaling file systems. Uncommitted file activity is written to a journal first. If the system loses power or cannot finish transactions, then the journal can recover unsaved information and avoid file system metadata corruption. Although journaling improves fault tolerance, it does not mitigate all file system corruption. File system corruption can still occur due to hardware failure, human error, or software bugs.

Identifying File System Corruption

When file system corruption occurs or is suspected, use file system checking tools to check the file system for consistency. When the data corruption is identified, use file system repair tools to restore file system consistency. Use recommended practices when recovering from file system corruption:

If a hardware failure caused the data corruption, then resolve the hardware issue before checking or repairing a file system. For example, if a storage disk is failing, then move the file system to a functional disk before performing file system maintenance. Moving a corrupted file system requires copying the data partition block by block with utilities such as dd.
Always check a file system before repairing it. This check serves as a dry run for the file system repair, and can report discovered file system issues and suggested corrective actions.
A file system must be unmounted before file system administration. To successfully restore file system consistency, file system repair tools must have exclusive access to the partition table. Unmounting a file system ensures that non-storage operations, other than the file system repair, can occur. Running a file system repair on a mounted file system often leads to further data corruption.

Checking ext4 File Systems

After a power loss or system crash, RHEL 8 systems automatically invoke the e2fsck utility to recover the journal of ext4 file systems. On replaying the uncommitted changes in the journal, e2fsck marks whether the file systems are consistent. If inconsistencies are discovered, e2fsck fully checks the file system and requests user intervention if it cannot safely resolve an issue. Both the e2fsck and the dumpe2fs utilities are part of the e2fsprogs package.

Administrators can manually execute e2fsck to check a file system. Unmount the file system, and then run e2fsck with the -n option and the device name for the file system. The -n option places the file system in read-only mode, and answers no to all questions during the file system check.

[root@host ~]# umount /dev/vdb1
[root@host ~]# e2fsck -n /dev/vdb1

Checking a file system requires a usable superblock. If the default superblock location is corrupted, then locate a backup copy of the superblock to use for the file system check.

[root@host ~]# e2fsck -n /dev/vdb1
e2fsck 1.42.9 (28-Dec-2013)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/vdb1

The superblock could not be read or does not describe a correct ext2
file system.  If the device is valid and it really contains an ext2
file system (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

The location of backup superblocks varies depending on the file system's block size when created. To determine the location of backup superblocks, use the dumpe2fs utility.

[root@host ~]# dumpe2fs /dev/vdb1 | grep 'Backup superblock'
dumpe2fs 1.42.9 (28-Dec-2013)
  Backup superblock at 32768, Group descriptors at 32769-32769
  Backup superblock at 98304, Group descriptors at 98305-98305
  Backup superblock at 163840, Group descriptors at 163841-163841
  Backup superblock at 229376, Group descriptors at 229377-229377

After locating the backup superblocks, use the -b option and select one to use as an alternative superblock during the file system check.

[root@host ~]# e2fsck -n /dev/vdb1 -b 32768
e2fsck 1.42.9 (28-Dec-2013)
/dev/vdb1 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vdb1: 11/65536 files (0.0% non-contiguous), 8859/261888 blocks

[root@host ~]# echo $?
0

Determine the file system check result with the e2fsck command's exit status. The reported exit status is the sum of the triggered exit codes.

Exit code	Description
0	No errors.
1	File system errors corrected.
4	File system errors uncorrected.
8	Operational error.
16	Usage error.
32	Cancelled by user request.
128	Shared library error.

Checking XFS File Systems

Unlike ext4 file systems, RHEL 8 systems do not automatically initiate checks and repairs on XFS file systems. Check XFS file systems manually with the xfs_repair utility, which the xfsprogs package provides. The xfs_repair command is invoked with the -n option to check a file system. With the -n option, xfs_repair scans the file system and only reports potential repairs without taking corrective action. This example checks the XFS file system on /dev/vdb1:

[root@host ~]# xfs_repair -n /dev/vdb1

The xfs_repair command does not execute on an XFS file system that does not have a clean journal log. Journal logs can be corrupted by unclean system shutdowns or system crashes. Mount and unmount an XFS file system to clean its journal logs.

Like ext4 file systems, an XFS file system check can fail to execute due to a corrupted primary superblock. However, unlike e2fsck, with xfs_repair you do not need to locate and specify an alternative superblock. The xfs_repair command automatically scans the XFS file system, locates a secondary superblock, and uses it to recover the primary superblock.

[root@host ~]# xfs_repair -n /dev/vdb1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan file system freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing file system ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping file system flush and exiting.

[root@host ~]# echo $?
0

An XFS file system check returns an exit code of 1 if file system corruption was detected, and an exit code of 0 if the file system is clean.

Repairing File Systems

After a file system check, identify the file system errors and the corrective actions to take. Repair the file system with the correct utility for your file system.

Repairing ext4 File Systems

The ext4 file systems are repaired with the same e2fsck command that checks a file system. Without the -n option, e2fsck applies all safe corrective actions. For operations that cannot be done safely, e2fsck prompts the user for whether to take the action.

The file system must be unmounted during file system repair to ensure file system consistency and to prevent further corruption. Depending on the severity of the file system corruption, you might choose to execute e2fsck with additional options.

Option	Description
-b location	Use an alternative superblock at the specified location.
-p	Automatically repair the file system. Prompt user only for problems that cannot be safely fixed.
-v	Verbose mode.
-y	Run in noninteractive mode and answer yes to all questions. This option cannot be used with the -p or -n options.

This example repairs an ext4 file system with the -b option to execute an interactive file system check with an alternative superblock.

[root@host ~]# e2fsck /dev/vdb1 -b 98304
e2fsck 1.42.9 (28-Dec-2013)
/dev/vdb1 was not cleanly unmounted, check forced.
Resize inode not valid.  Recreate? yes

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (28520, counted=28521).
Fix? yes

Free blocks count wrong (253028, counted=253029).
Fix? yes


/dev/vdb1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdb1: 11/65536 files (0.0% non-contiguous), 8859/261888 blocks

[root@host ~]# echo $?
1

Determine the result of the file system repair by the e2fsck exit code.

Repairing XFS File Systems

XFS file systems are repaired with the same xfs_repair command to check a file system. Without the -n option, xfs_repair takes all safe corrective actions. Unmount the file system before repairing it to ensure file system consistency and to prevent further corruption.

An xfs_repair can be executed only on an XFS file system with a clean journal log. If mounting and unmounting an XFS file system does not result in a clean journal, then the journal might be corrupted. Because a clean log is required, this scenario might require adding the -L option to the xfs_repair command, which clears the journal log. When the journal is unrecoverable, this step is necessary to proceed, but this option discards all journal metadata, which might result in further issues with recently written data.

Unlike e2fsck, xfs_repair is not an interactive utility. When initiated, the XFS file system repair will perform all operations automatically without any user input. An xfs_repair always returns an exit code of 0 on completion of the file system repair.

[root@host ~]# xfs_repair /dev/vdb1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan file system freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
Invalid inode number 0x499602d2
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at block 0xa7e0/0x1000
entry "subscription-manager" at block 0 offset 456 in directory inode 144973 references invalid inode 1234567890
        clearing inode number in entry at offset 456...
entry at block 0 offset 456 in directory inode 144973 has illegal name "/ubscription-manager":         - process newly discovered i
nodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing file system ...
bad hash table for directory inode 144973 (no data entry): rebuilding
rebuilding directory inode 144973
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 145282, moving to lost+found
Phase 7 - verify and correct link counts...
done

[root@host ~]# echo $?
0

During an XFS file system repair, you might discover files and directories in use with allocated inodes but unreferenced by their parent directories. When these orphaned files and directories are discovered during file system checks, they are deposited in the lost+found directory at the root of that file system. If files are missing after a file system repair, then review whether they were relocated to the lost+found directory.

Note

RHEL 8 does not provide any automated ways to recover files in the lost+found directory. To recover files from lost+found, use the mv command to move files manually to their correct directory.

File System Backup and Recovery

File system checking and repair cannot guarantee data recovery, and do not replace a tested backup and recovery strategy. If severely damaged inodes or directories are encountered during file system repair, then the tools might permanently discard the inodes that cannot be fixed. Discarded data can be recovered only from recent backups of the file system's data.

References

For more information, see the e2fsck(8), dumpe2fs(8), and xfs_repair(8) man pages.

Discuss Red Hat Enterprise Linux Diagnostics and Troubleshooting

Go to community

Welcome to the Red Hat Enterprise Linux Diagnostics and Troubleshooting (RH342) group!

cschunke

26 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat Enterprise Linux Diagnostics and Troubleshooting! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to RH342.Read more about Red Hat Enterprise Linux Diagnostics and Troubleshooting here.

445

Revision: rh342-8.4-6dd89bd

Click Create to build all of the virtual machines needed for the classroom lab environment. This may take several minutes to complete. Once created the environment can then be stopped and restarted to pause your experience.
When a lab is created, click Start to run all of the virtual machines in the classroom.
Click Stop to stop all the virtual machines from running. This will not delete your lab.
If you Delete your lab, you will remove all of the virtual machines in your classroom and lose all of your progress.

Virtual machine actions

Click Start to power on the virtual machine.
Click Shutdown to gracefully shut down the virtual machine, preserving disk contents.
Click Power off to forcefully shut down the virtual machine, while still preserving disk contents.
Click Open console to connect to the system console of the virtual machine in a new browser.

Auto-stop timer

The Red Hat Learning Subscription entitles you to set allotment of lab time.
To help conserving your allotted time, the lab environment uses automatic timers to stop or destroy your lab environment when the timer expires.

Click the Auto-stop button [+] to extend the time you would like to spend with the labs.
Click the Auto-destroy button [+] to add day(s) to the auto-destroy timer.

Auto-stop has a maximum of 11 hours, and auto-destroy has a maximum of 14 days.
Be careful to keep the timers set while you are working, so that your environment doesn't shut down unexpectedly.
We also suggest not to set the auto-timers unnecessarily high, which could waste your lab time allotment