RH342 - ch05s11

Bookmark this page

Lab: Troubleshooting Storage Issues

Troubleshoot and resolve storage issues with encryption, file system, LVM, and iSCSI.

Outcomes

You should be able to troubleshoot and repair issues with corrupted file systems, LUKS2 headers, LVM administration, and iSCSI targets.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

[student@workstation ~]$ lab start storage-review

This command confirms that the required hosts for this exercise are accessible and configures them to perform the tasks.

Instructions

The CFO requests access to the company's financial data. The data resides in an encrypted volume on an iSCSI target, iqn.2016-01.com.example.lab:iscsistorage, provided by the serverb host. The target does not use authentication and is configured with an ACL to grant access to the iqn.2016-01.com.example.lab:servera initiator.

An administrator working on this request cannot access the /dev/sda1 encrypted volume. You are asked first to resolve this issue, and then to decrypt the volume with RedHatR0cks! as the last known password. Make the volume available as the finance mapped device mounted at /mnt/finance on servera.

If you have trouble decrypting the volume, then a backup of the encrypted volume's header exists on the /dev/save/old LVM logical volume on servera. Mount the logical volume at /mnt/save on servera and find the LUKS2 header backup file in /mnt/save/luks/iscsistorage_luks_header. If the file is not accessible, then troubleshoot the issue and then restore the file to that location.

On servera, assess the situation by generating a list of iSCSI active sessions and known nodes.

[student@workstation ~]$ ssh student@servera
...output omitted...
[student@servera ~]$ sudo -i
[sudo] password for student: student
[root@servera ~]#

List the iSCSI active sessions and known nodes.

[root@servera ~]# iscsiadm -m session
iscsiadm: No active sessions.
[root@servera ~]# iscsiadm -m node
iscsiadm: No records found

If no sessions or known nodes exist, then discover targets on serverb. Resolve any issues in the discovery process.

Discover the targets on serverb.

[root@servera ~]# iscsiadm -m discovery -t st -p serverb.lab.example.com
iscsiadm: cannot make connection to 172.25.252.11: No route to host
iscsiadm: cannot make connection to 172.25.252.11: No route to host
iscsiadm: cannot make connection to 172.25.252.11: No route to host
iscsiadm: cannot make connection to 172.25.252.11: No route to host
iscsiadm: cannot make connection to 172.25.252.11: No route to host
iscsiadm: cannot make connection to 172.25.252.11: No route to host
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: Could not perform SendTargets discovery: iSCSI PDU timed out

Verify that the resolved address for serverb.lab.example.com during the discovery is correct.
```
[root@servera ~]# dig +short serverb.lab.example.com
172.25.250.11
```
Because the IP address does not match with the address provided by DNS name resolution, validate the /etc/hosts file to determine whether it is the source of the problem.
```
[root@servera ~]# grep serverb /etc/hosts
172.25.252.11  serverb.lab.example.com serverb
```
Fix the erroneous entries in /etc/hosts to correct the name resolution issue that causes the connectivity issues to serverb.lab.example.com.
```
[root@servera ~]# grep serverb /etc/hosts
172.25.250.11  serverb.lab.example.com serverb
```

Rediscover the targets on serverb.

[root@servera ~]# iscsiadm -m discovery -t st -p serverb.lab.example.com
172.25.250.11:3260,1 iqn.2016-01.com.example.lab:iscsistorage

When iSCSI target discovery for serverb succeeds, log in to the target. Resolve any issues in the login process.

[root@servera ~]# iscsiadm -m node -T iqn.2016-01.com.example.lab:iscsistorage --login
Logging in to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260]
iscsiadm: Could not login to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260].
iscsiadm: initiator reported error (8 - connection timed out)
iscsiadm: Could not log into all portals

Verify the configured authentication method on the initiator.

[root@servera ~]# iscsiadm -m node -T iqn.2016-01.com.example.lab:iscsistorage  | grep authmethod
node.session.auth.authmethod = None

Check that the authentication failure is not caused by an ACL restriction. Confirm that the initiator name matches the access granted by the destination ACL.
```
[root@servera ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.com.example.lab:servera
```

Fix the incorrect initiator name in /etc/iscsi/initiatorname.iscsi.

[root@servera ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-01.com.example.lab:servera

Restart iscsid to implement the new initiator name.
```
[root@servera ~]# systemctl restart iscsid
```

[root@servera ~]# iscsiadm -m node -T iqn.2016-01.com.example.lab:iscsistorage --login
Logging in to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260]
Login to [iface: default, target: iqn.2016-01.com.example.lab:iscsistorage, portal: 172.25.250.11,3260] successful.

When login to the target is successful, use the last known password to decrypt the encrypted volume.

Locate the device locally on the servera machine.

[root@servera ~]# grep "Attached SCSI" /var/log/messages
Oct 13 01:40:12 servera kernel: sd 2:0:0:0: [sda] Attached SCSI disk
[root@servera ~]# lsblk
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda            8:0    0    1G  0 disk
└─sda1         8:1    0  104M  0 part
...output omitted...

Open the /dev/sda1 encrypted volume and create the associated mapped device in the /etc/crypttab file. Try decrypting the encrypted volume with the RedHatR0cks! password.
```
[root@servera ~]# cryptsetup luksOpen /dev/sda1 finance
Enter passphrase for /dev/sda1: RedHatR0cks!
No key available with this passphrase.
```

If the decryption fails, then mount the /dev/save/old LVM logical volume to /mnt/save to access the LUKS2 header dump file. Resolve any issues with the restore process.

Mount the /dev/save/old logical volume and find the LUKS2 header dump file.

[root@servera ~]# mkdir /mnt/save
[root@servera ~]# mount /dev/save/old /mnt/save
[root@servera ~]# ls /mnt/save
ls: cannot access '/mnt/save/luks': Structure needs cleaning
certs  keys  luks

Unmount /mnt/save and inspect the /dev/save/old logical volume.

[root@servera ~]# umount /mnt/save
[root@servera ~]# lvs
  LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  new  save -wi-a----- 16.00m

Because the logical volume is missing, view the LVM archive log to determine the reason. Your archive log file names are expected to be different.

[root@servera ~]# vgcfgrestore -l save

  File:		/etc/lvm/archive/save_00000-242032145.vg
  VG name:    	save
  Description:	Created *before* executing 'vgcreate save /dev/vdb1'
  Backup Time:	Wed Oct 13 01:44:43 2021


  File:		/etc/lvm/archive/save_00001-62014416.vg
  VG name:    	save
  Description:	Created *before* executing 'lvcreate -W y -L 15M -n old save'
  Backup Time:	Wed Oct 13 01:44:43 2021


  File:		/etc/lvm/archive/save_00002-1184718629.vg
  VG name:    	save
  Description:	Created *before* executing 'lvcreate -W y -L 15M -n new save'
  Backup Time:	Wed Oct 13 01:44:43 2021


  File:		/etc/lvm/archive/save_00003-184394976.vg
  VG name:    	save
  Description:	Created *before* executing 'lvremove -f /dev/save/old'
  Backup Time:	Wed Oct 13 01:44:45 2021


  File:		/etc/lvm/backup/save
  VG name:    	save
  Description:	Created *after* executing 'lvremove -f /dev/save/old'
  Backup Time:	Wed Oct 13 01:44:45 2021

Revert the removal of the logical volume, and then mount the volume to /mnt/save.

[root@servera ~]# vgcfgrestore -f /etc/lvm/archive/save_00003-184394976.vg save
  Volume group save has active volume: new.
  WARNING: Found 1 active volume(s) in volume group "save".
  Restoring VG with active LVs, may cause mismatch with its metadata.
Do you really want to proceed with restore of volume group "save", while 1 volume(s) are active? [y/n]: y
  Restored volume group save.
[root@servera ~]# lvs
  LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  new  save -wi-a----- 16.00m
  old  save -wi------- 16.00m
[root@servera ~]# lvchange -an /dev/save/old
[root@servera ~]# lvchange -ay /dev/save/old
[root@servera ~]# mount /dev/save/old /mnt/save

Locate and verify the LUKS2 header backup file at /mnt/save/luks/iscsistorage_luks_header. Resolve any issues with the restore process.

Locate the /mnt/save/luks/iscsistorage_luks_header file.

[root@servera ~]# ls -la /mnt/save/luks
ls: cannot access '/mnt/save/luks': Structure needs cleaning

Determine the type of file system on the logical volume and run a file system check.

[root@servera ~]# blkid /dev/save/old
/dev/save/old: UUID="c878808f-3c8e-45a3-abc1-0559694e5410" BLOCK_SIZE="512" TYPE="xfs"
[root@servera ~]# umount /dev/save/old
[root@servera ~]# xfs_repair -n /dev/save/old
Phase 1 - find and verify superblock...
Only one AG detected - cannot validate filesystem geometry.
Use the -o force_geometry option to proceed.
[root@servera ~]# xfs_repair -n -o force_geometry /dev/save/old
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
entry "iscsistorage_luks_header" in shortform directory 13800 references invalid inode 1234567890
would have junked entry "iscsistorage_luks_header" in directory inode 13800
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
entry "iscsistorage_luks_header" in shortform directory 13800 references invalid inode 1234567890
would have junked entry "iscsistorage_luks_header" in directory inode 13800
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
Invalid inode number 0x499602d2
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at 0x55cf142addd0, inode 0x35e8 data fork
couldn't map inode 13800, err = 117
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 13801, would move to lost+found
Phase 7 - verify link counts...
Invalid inode number 0x499602d2
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at 0x55cf142addd0, inode 0x35e8 data fork
couldn't map inode 13800, err = 117, can't compare link counts
No modify flag set, skipping filesystem flush and exiting.

Repair the XFS file system.

[root@servera ~]# xfs_repair -o force_geometry /dev/save/old
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
entry "iscsistorage_luks_header" in shortform directory 13800 references invalid inode 1234567890
junking entry "iscsistorage_luks_header" in directory inode 13800
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 13801, moving to lost+found
Phase 7 - verify and correct link counts...
done

Mount the repaired file system. Based on the file system repair report, you expect to find the iscsistorage_luks_header file in the lost+found directory. Restore the file to the /mnt/save/luks directory.

[root@servera ~]# mount /dev/save/old /mnt/save
[root@servera ~]# ls -la /mnt/save/lost+found/
total 1028
drwxr-xr-x. 2 root root      18 Oct 13 02:31 .
drwxr-xr-x. 6 root root      57 Jan 21  2016 ..
-rw-r--r--. 1 root root 1052672 Jan 21  2016 13801
[root@servera ~]# file /mnt/save/lost+found/13801
/mnt/save/lost+found/13801: LUKS encrypted file, ver 1 [aes, xts-plain64, sha1] UUID: b91a11a8-1bf1-4c9a-9f31-3cc2e8947476
[root@servera ~]# mv /mnt/save/lost+found/13801 /mnt/save/luks/iscsistorage_luks_header

Using the LUKS2 header backup at /mnt/save/luks/iscsistorage_luks_header, restore the LUKS2 header to the encrypted volume on the /dev/sda1 partition.

[root@servera ~]# cryptsetup luksHeaderRestore /dev/sda1 --header-backup-file /mnt/save/luks/iscsistorage_luks_header

WARNING!
========
Device /dev/sda1 already contains LUKS header. Replacing header will destroy existing keyslots.

Are you sure? (Type 'yes' in capital letters): YES

Use the last known password to decrypt the encrypted volume and map it to /dev/mapper/finance.
[root@servera ~]# cryptsetup luksOpen /dev/sda1 finance Enter passphrase for /dev/sda1: RedHatR0cks!

Make the contents of the decrypted volume accessible at the /mnt/finance mount point.

[root@servera ~]# mkdir /mnt/finance
[root@servera ~]# mount /dev/mapper/finance /mnt/finance/
[root@servera ~]# ls -la /mnt/finance
total 0
drwxr-xr-x. 8 root root 101 Jan 21  2016 .
drwxr-xr-x. 4 root root  33 Oct 12 23:48 ..
drwxr-xr-x. 2 root root   6 Jan 21  2016 accounts
drwxr-xr-x. 2 root root   6 Jan 21  2016 customers
drwxr-xr-x. 2 root root   6 Jan 21  2016 employees
drwxr-xr-x. 2 root root   6 Jan 21  2016 loans
drwxr-xr-x. 2 root root   6 Jan 21  2016 management
drwxr-xr-x. 2 root root   6 Jan 21  2016 shareholders

Return to workstation as the student user.

[root@servera ~]# exit
[student@servera ~]$ exit
[student@workstation ~]$

Evaluation

On the workstation machine, use the lab command to grade your work. Correct any reported failures and rerun the script until you receive a passing grade.

[student@workstation ~]$ lab grade storage-review

Finish

On the workstation machine, use the lab command to complete this exercise. This is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish storage-review

Discuss Red Hat Enterprise Linux Diagnostics and Troubleshooting

Go to community

Welcome to the Red Hat Enterprise Linux Diagnostics and Troubleshooting (RH342) group!

cschunke

26 wrz 2023

We are excited to launch a space dedicated to the Red Hat Training course Red Hat Enterprise Linux Diagnostics and Troubleshooting! To gain the most value from this group - click the "Join Group" button in the upper right hand corner of the group home page.We encourage group members to collaborate in this group to discuss topics, ask questions, share best practices and tips, provide course feedback, and share their accomplishments as it relates to RH342.Read more about Red Hat Enterprise Linux Diagnostics and Troubleshooting here.

445

Revision: rh342-8.4-6dd89bd

Click Create to build all of the virtual machines needed for the classroom lab environment. This may take several minutes to complete. Once created the environment can then be stopped and restarted to pause your experience.
When a lab is created, click Start to run all of the virtual machines in the classroom.
Click Stop to stop all the virtual machines from running. This will not delete your lab.
If you Delete your lab, you will remove all of the virtual machines in your classroom and lose all of your progress.

Virtual machine actions

Click Start to power on the virtual machine.
Click Shutdown to gracefully shut down the virtual machine, preserving disk contents.
Click Power off to forcefully shut down the virtual machine, while still preserving disk contents.
Click Open console to connect to the system console of the virtual machine in a new browser.

Auto-stop timer

The Red Hat Learning Subscription entitles you to set allotment of lab time.
To help conserving your allotted time, the lab environment uses automatic timers to stop or destroy your lab environment when the timer expires.

Click the Auto-stop button [+] to extend the time you would like to spend with the labs.
Click the Auto-destroy button [+] to add day(s) to the auto-destroy timer.

Auto-stop has a maximum of 11 hours, and auto-destroy has a maximum of 14 days.
Be careful to keep the timers set while you are working, so that your environment doesn't shut down unexpectedly.
We also suggest not to set the auto-timers unnecessarily high, which could waste your lab time allotment