Bookmark this page

Lab: Troubleshooting Storage Issues

Troubleshoot and resolve storage issues with encryption, file system, LVM, and iSCSI.

Outcomes

You should be able to troubleshoot and repair issues with corrupted file systems, LUKS2 headers, LVM administration, and iSCSI targets.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

[student@workstation ~]$ lab start storage-review

This command confirms that the required hosts for this exercise are accessible and configures them to perform the tasks.

Instructions

The CFO requests access to the company's financial data. The data resides in an encrypted volume on an iSCSI target, iqn.2016-01.com.example.lab:iscsistorage, provided by the serverb host. The target does not use authentication and is configured with an ACL to grant access to the iqn.2016-01.com.example.lab:servera initiator.

An administrator working on this request cannot access the /dev/sda1 encrypted volume. You are asked first to resolve this issue, and then to decrypt the volume with RedHatR0cks! as the last known password. Make the volume available as the finance mapped device mounted at /mnt/finance on servera.

If you have trouble decrypting the volume, then a backup of the encrypted volume's header exists on the /dev/save/old LVM logical volume on servera. Mount the logical volume at /mnt/save on servera and find the LUKS2 header backup file in /mnt/save/luks/iscsistorage_luks_header. If the file is not accessible, then troubleshoot the issue and then restore the file to that location.

  1. On servera, assess the situation by generating a list of iSCSI active sessions and known nodes.

    1. Log in to servera and switch to the root user.

      [student@workstation ~]$ ssh student@servera
      ...output omitted...
      [student@servera ~]$ sudo -i
      [sudo] password for student: student
      [root@servera ~]#
    2. List the iSCSI active sessions and known nodes.

      [root@servera ~]# iscsiadm -m session
      iscsiadm: No active sessions.
      [root@servera ~]# iscsiadm -m node
      iscsiadm: No records found
  2. If no sessions or known nodes exist, then discover targets on serverb. Resolve any issues in the discovery process.

    1. Discover the targets on serverb.

      [root@servera ~]# iscsiadm -m discovery -t st -p serverb.lab.example.com
      iscsiadm: cannot make connection to 172.25.252.11: No route to host
      iscsiadm: cannot make connection to 172.25.252.11: No route to host
      iscsiadm: cannot make connection to 172.25.252.11: No route to host
      iscsiadm: cannot make connection to 172.25.252.11: No route to host
      iscsiadm: cannot make connection to 172.25.252.11: No route to host
      iscsiadm: cannot make connection to 172.25.252.11: No route to host
      iscsiadm: connection login retries (reopen_max) 5 exceeded
      iscsiadm: Could not perform SendTargets discovery: iSCSI PDU timed out
    2. Verify that the resolved address for serverb.lab.example.com during the discovery is correct.

      [root@servera ~]# dig +short serverb.lab.example.com
      172.25.250.11
    3. Because the IP address does not match with the address provided by DNS name resolution, validate the /etc/hosts file to determine whether it is the source of the problem.

      [root@servera ~]# grep serverb /etc/hosts
      172.25.252.11  serverb.lab.example.com serverb
    4. Fix the erroneous entries in /etc/hosts to correct the name resolution issue that causes the connectivity issues to serverb.lab.example.com.

      [root@servera ~]# grep serverb /etc/hosts
      172.25.250.11  serverb.lab.example.com serverb
    5. Rediscover the targets on serverb.

      [root@servera ~]# iscsiadm -m discovery -t st -p serverb.lab.example.com
      172.25.250.11:3260,1 iqn.2016-01.com.example.lab:iscsistorage
  3. When iSCSI target discovery for serverb succeeds, log in to the target. Resolve any issues in the login process.

    1. Log in to the target on serverb.

      [root@servera ~]# iscsiadm -m node -T iqn.2016-01.com.example.lab:iscsistorage --login
      Logging in to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260]
      iscsiadm: Could not login to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260].
      iscsiadm: initiator reported error (8 - connection timed out)
      iscsiadm: Could not log into all portals
    2. Verify the configured authentication method on the initiator.

      [root@servera ~]# iscsiadm -m node -T iqn.2016-01.com.example.lab:iscsistorage  | grep authmethod
      node.session.auth.authmethod = None
    3. Check that the authentication failure is not caused by an ACL restriction. Confirm that the initiator name matches the access granted by the destination ACL.

      [root@servera ~]# cat /etc/iscsi/initiatorname.iscsi
      InitiatorName=iqn.com.example.lab:servera
    4. Fix the incorrect initiator name in /etc/iscsi/initiatorname.iscsi.

      [root@servera ~]# cat /etc/iscsi/initiatorname.iscsi
      InitiatorName=iqn.2016-01.com.example.lab:servera
    5. Restart iscsid to implement the new initiator name.

      [root@servera ~]# systemctl restart iscsid
    6. Log in to the target.

      [root@servera ~]# iscsiadm -m node -T iqn.2016-01.com.example.lab:iscsistorage --login
      Logging in to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260]
      Login to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260] successful.
  4. When login to the target is successful, use the last known password to decrypt the encrypted volume.

    1. Locate the device locally on the servera machine.

      [root@servera ~]# grep "Attached SCSI" /var/log/messages
      Oct 13 01:40:12 servera kernel: sd 2:0:0:0: [sda] Attached SCSI disk
      [root@servera ~]# lsblk
      NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
      sda            8:0    0    1G  0 disk
      └─sda1         8:1    0  104M  0 part
      ...output omitted...
    2. Open the /dev/sda1 encrypted volume and create the associated mapped device in the /etc/crypttab file. Try decrypting the encrypted volume with the RedHatR0cks! password.

      [root@servera ~]# cryptsetup luksOpen /dev/sda1 finance
      Enter passphrase for /dev/sda1: RedHatR0cks!
      No key available with this passphrase.
  5. If the decryption fails, then mount the /dev/save/old LVM logical volume to /mnt/save to access the LUKS2 header dump file. Resolve any issues with the restore process.

    1. Mount the /dev/save/old logical volume and find the LUKS2 header dump file.

      [root@servera ~]# mkdir /mnt/save
      [root@servera ~]# mount /dev/save/old /mnt/save
      [root@servera ~]# ls /mnt/save
      ls: cannot access '/mnt/save/luks': Structure needs cleaning
      certs  keys  luks
    2. Unmount /mnt/save and inspect the /dev/save/old logical volume.

      [root@servera ~]# umount /mnt/save
      [root@servera ~]# lvs
        LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
        new  save -wi-a----- 16.00m
    3. Because the logical volume is missing, view the LVM archive log to determine the reason. Your archive log file names are expected to be different.

      [root@servera ~]# vgcfgrestore -l save
      
        File:		/etc/lvm/archive/save_00000-242032145.vg
        VG name:    	save
        Description:	Created *before* executing 'vgcreate save /dev/vdb1'
        Backup Time:	Wed Oct 13 01:44:43 2021
      
      
        File:		/etc/lvm/archive/save_00001-62014416.vg
        VG name:    	save
        Description:	Created *before* executing 'lvcreate -W y -L 15M -n old save'
        Backup Time:	Wed Oct 13 01:44:43 2021
      
      
        File:		/etc/lvm/archive/save_00002-1184718629.vg
        VG name:    	save
        Description:	Created *before* executing 'lvcreate -W y -L 15M -n new save'
        Backup Time:	Wed Oct 13 01:44:43 2021
      
      
        File:		/etc/lvm/archive/save_00003-184394976.vg
        VG name:    	save
        Description:	Created *before* executing 'lvremove -f /dev/save/old'
        Backup Time:	Wed Oct 13 01:44:45 2021
      
      
        File:		/etc/lvm/backup/save
        VG name:    	save
        Description:	Created *after* executing 'lvremove -f /dev/save/old'
        Backup Time:	Wed Oct 13 01:44:45 2021
    4. Revert the removal of the logical volume, and then mount the volume to /mnt/save.

      [root@servera ~]# vgcfgrestore -f /etc/lvm/archive/save_00003-184394976.vg save
        Volume group save has active volume: new.
        WARNING: Found 1 active volume(s) in volume group "save".
        Restoring VG with active LVs, may cause mismatch with its metadata.
      Do you really want to proceed with restore of volume group "save", while 1 volume(s) are active? [y/n]: y
        Restored volume group save.
      [root@servera ~]# lvs
        LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
        new  save -wi-a----- 16.00m
        old  save -wi------- 16.00m
      [root@servera ~]# lvchange -an /dev/save/old
      [root@servera ~]# lvchange -ay /dev/save/old
      [root@servera ~]# mount /dev/save/old /mnt/save
  6. Locate and verify the LUKS2 header backup file at /mnt/save/luks/iscsistorage_luks_header. Resolve any issues with the restore process.

    1. Locate the /mnt/save/luks/iscsistorage_luks_header file.

      [root@servera ~]# ls -la /mnt/save/luks
      ls: cannot access '/mnt/save/luks': Structure needs cleaning
    2. Determine the type of file system on the logical volume and run a file system check.

      [root@servera ~]# blkid /dev/save/old
      /dev/save/old: UUID="c878808f-3c8e-45a3-abc1-0559694e5410" BLOCK_SIZE="512" TYPE="xfs"
      [root@servera ~]# umount /dev/save/old
      [root@servera ~]# xfs_repair -n /dev/save/old
      Phase 1 - find and verify superblock...
      Only one AG detected - cannot validate filesystem geometry.
      Use the -o force_geometry option to proceed.
      [root@servera ~]# xfs_repair -n -o force_geometry /dev/save/old
      Phase 1 - find and verify superblock...
      Phase 2 - using internal log
              - zero log...
              - scan filesystem freespace and inode maps...
              - found root inode chunk
      Phase 3 - for each AG...
              - scan (but don't clear) agi unlinked lists...
              - process known inodes and perform inode discovery...
              - agno = 0
      entry "iscsistorage_luks_header" in shortform directory 13800 references invalid inode 1234567890
      would have junked entry "iscsistorage_luks_header" in directory inode 13800
              - process newly discovered inodes...
      Phase 4 - check for duplicate blocks...
              - setting up duplicate extent list...
              - check for inodes claiming duplicate blocks...
              - agno = 0
      entry "iscsistorage_luks_header" in shortform directory 13800 references invalid inode 1234567890
      would have junked entry "iscsistorage_luks_header" in directory inode 13800
      No modify flag set, skipping phase 5
      Phase 6 - check inode connectivity...
              - traversing filesystem ...
      Invalid inode number 0x499602d2
      xfs_dir_ino_validate: XFS_ERROR_REPORT
      Metadata corruption detected at 0x55cf142addd0, inode 0x35e8 data fork
      couldn't map inode 13800, err = 117
              - traversal finished ...
              - moving disconnected inodes to lost+found ...
      disconnected inode 13801, would move to lost+found
      Phase 7 - verify link counts...
      Invalid inode number 0x499602d2
      xfs_dir_ino_validate: XFS_ERROR_REPORT
      Metadata corruption detected at 0x55cf142addd0, inode 0x35e8 data fork
      couldn't map inode 13800, err = 117, can't compare link counts
      No modify flag set, skipping filesystem flush and exiting.
    3. Repair the XFS file system.

      [root@servera ~]# xfs_repair -o force_geometry /dev/save/old
      Phase 1 - find and verify superblock...
      Phase 2 - using internal log
              - zero log...
              - scan filesystem freespace and inode maps...
              - found root inode chunk
      Phase 3 - for each AG...
              - scan and clear agi unlinked lists...
              - process known inodes and perform inode discovery...
              - agno = 0
      entry "iscsistorage_luks_header" in shortform directory 13800 references invalid inode 1234567890
      junking entry "iscsistorage_luks_header" in directory inode 13800
              - process newly discovered inodes...
      Phase 4 - check for duplicate blocks...
              - setting up duplicate extent list...
              - check for inodes claiming duplicate blocks...
              - agno = 0
      Phase 5 - rebuild AG headers and trees...
              - reset superblock...
      Phase 6 - check inode connectivity...
              - resetting contents of realtime bitmap and summary inodes
              - traversing filesystem ...
              - traversal finished ...
              - moving disconnected inodes to lost+found ...
      disconnected inode 13801, moving to lost+found
      Phase 7 - verify and correct link counts...
      done
    4. Mount the repaired file system. Based on the file system repair report, you expect to find the iscsistorage_luks_header file in the lost+found directory. Restore the file to the /mnt/save/luks directory.

      [root@servera ~]# mount /dev/save/old /mnt/save
      [root@servera ~]# ls -la /mnt/save/lost+found/
      total 1028
      drwxr-xr-x. 2 root root      18 Oct 13 02:31 .
      drwxr-xr-x. 6 root root      57 Jan 21  2016 ..
      -rw-r--r--. 1 root root 1052672 Jan 21  2016 13801
      [root@servera ~]# file /mnt/save/lost+found/13801
      /mnt/save/lost+found/13801: LUKS encrypted file, ver 1 [aes, xts-plain64, sha1] UUID: b91a11a8-1bf1-4c9a-9f31-3cc2e8947476
      [root@servera ~]# mv /mnt/save/lost+found/13801 /mnt/save/luks/iscsistorage_luks_header
  7. Using the LUKS2 header backup at /mnt/save/luks/iscsistorage_luks_header, restore the LUKS2 header to the encrypted volume on the /dev/sda1 partition.

    [root@servera ~]# cryptsetup luksHeaderRestore /dev/sda1 --header-backup-file /mnt/save/luks/iscsistorage_luks_header
    
    WARNING!
    ========
    Device /dev/sda1 already contains LUKS header. Replacing header will destroy existing keyslots.
    
    Are you sure? (Type 'yes' in capital letters): YES
  8. Use the last known password to decrypt the encrypted volume and map it to /dev/mapper/finance.

    [root@servera ~]# cryptsetup luksOpen /dev/sda1 finance
    Enter passphrase for /dev/sda1: RedHatR0cks!
  9. Make the contents of the decrypted volume accessible at the /mnt/finance mount point.

    [root@servera ~]# mkdir /mnt/finance
    [root@servera ~]# mount /dev/mapper/finance /mnt/finance/
    [root@servera ~]# ls -la /mnt/finance
    total 0
    drwxr-xr-x. 8 root root 101 Jan 21  2016 .
    drwxr-xr-x. 4 root root  33 Oct 12 23:48 ..
    drwxr-xr-x. 2 root root   6 Jan 21  2016 accounts
    drwxr-xr-x. 2 root root   6 Jan 21  2016 customers
    drwxr-xr-x. 2 root root   6 Jan 21  2016 employees
    drwxr-xr-x. 2 root root   6 Jan 21  2016 loans
    drwxr-xr-x. 2 root root   6 Jan 21  2016 management
    drwxr-xr-x. 2 root root   6 Jan 21  2016 shareholders
  10. Return to workstation as the student user.

    [root@servera ~]# exit
    [student@servera ~]$ exit
    [student@workstation ~]$

Evaluation

On the workstation machine, use the lab command to grade your work. Correct any reported failures and rerun the script until you receive a passing grade.

[student@workstation ~]$ lab grade storage-review

Finish

On the workstation machine, use the lab command to complete this exercise. This is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish storage-review

Revision: rh342-8.4-6dd89bd