Bookmark this page

Lab: Troubleshooting Kernel Issues

Compile a SystemTap module and configure a system for crash dump.

Outcomes

You should be able to configure a system to collect a crash dump during a kernel crash and compile a SystemTap program as a portable kernel module.

As the student user on the workstation machine, use the lab command to prepare your system for this exercise.

[student@workstation ~]$ lab start kernel-review

This command installs the kernel debug packages.

Instructions

An administrator is experiencing issues on one of the systems in the organization during times of heavy network activity. The system's performance worsens and eventually the system hangs. To determine the cause of the issue, the administrator suggested to use the /usr/share/systemtap/examples/network/nettop.stp example SystemTap script.

Provide the administrator with the compiled kernel module for the nettop.stp script. The administrator requires that the kernel module is built for the 4.18.0-305.el8.x86_64 version of the kernel and debug packages, which matches the affected system's kernel version. Compile the SystemTap script on the servera system as a portable kernel module and prepare it as /root/nettop.ko for sending to the affected system.

The administrator also suggested to enable kernel crash dumps on the serverb system to help to identify any future issues. Configure the serverb system to save a crash dump when the system hangs. Configure the dumps to write locally to the /var/crash directory with the lzo compression algorithm to reduce the crash dump size and to save only user data pages. Configure the crash dump output to provide maximum diagnostic information with all message types. Verify that the kernel crash dumps are recorded correctly and stored in the /var/crash directory.

  1. Log in to the servera system and switch to the root user.

    [student@workstation ~]$ ssh student@servera
    ...output omitted...
    [student@servera ~]$ sudo -i
    [sudo] password for student: student
    [root@servera ~]#
  2. On the servera system, compile the nettop.stp SystemTap script into a portable kernel module.

    The SystemTap script shows an error message while compiling. The error messages show that the necessary debuginfo dependencies are missing.

    [root@servera ~]# stap -v -p 4 -m nettop /usr/share/systemtap/examples/network/nettop.stp
    Pass 1: parsed user script and 482 library scripts using 456012virt/88784res/12872shr/75468data kb, in 230usr/60sys/629real ms.
    WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
    WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
    semantic error: resolution failed in DWARF builder
    
    semantic error: resolution failed in DWARF builder
    
    semantic error: while resolving probe point: identifier 'kernel' at /usr/share/systemtap/tapset/linux/networking.stp:140:4
            source:     kernel.function("dev_queue_xmit")
                        ^
    
    semantic error: no match
    
    semantic error: resolution failed in alias expansion builder
    
    WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
    WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo]
    semantic error: resolution failed in DWARF builder
    
    semantic error: resolution failed in alias expansion builder
    
    Pass 2: analyzed script: 3 probes, 1 function, 0 embeds, 3 globals using 487412virt/103992res/16656shr/86864data kb, in 260usr/230sys/1192real ms.
    Pass 2: analysis failed.  [man error::pass2]
  3. Troubleshoot and fix the compilation issue.

    1. Verify the installed kernel debug packages.

      [root@servera ~]# rpm -qa | grep ^kernel
      kernel-tools-4.18.0-305.el8.x86_64
      kernel-core-4.18.0-305.el8.x86_64
      kernel-modules-4.18.0-305.el8.x86_64
      kernel-tools-libs-4.18.0-305.el8.x86_64
      kernel-4.18.0-305.el8.x86_64
      kernel-devel-4.18.0-305.el8.x86_64
      kernel-headers-4.18.0-305.el8.x86_64
      kernel-debuginfo-common-x86_64-4.18.0-348.el8.x86_64
      kernel-debuginfo-4.18.0-348.el8.x86_64
    2. Uninstall the incorrect kernel-debuginfo and kernel-debuginfo-common-x86_64 packages.

      [root@servera ~]# yum remove -y kernel-debuginfo kernel-debuginfo-common-x86_64
      ...output omitted...
      Removed:
        kernel-debuginfo-4.18.0-348.el8.x86_64             kernel-debuginfo-common-x86_64-4.18.0-348.el8.x86_64
      
      Complete!
    3. Verify that the kernel debug packages match the installed kernel version.

      [root@servera ~]# yum list kernel-debuginfo-$(uname -r) kernel-debuginfo-common-x86_64-$(uname -r)
      Last metadata expiration check: 0:22:47 ago on Sun 21 Nov 2021 05:24:43 AM EST.
      Available Packages
      kernel-debuginfo.x86_64                             4.18.0-305.el8               rhel-8.4-for-x86_64-baseos-debug-rpms
      kernel-debuginfo-common-x86_64.x86_64               4.18.0-305.el8               rhel-8.4-for-x86_64-baseos-debug-rpms
    4. Install the package version that matches the installed kernel version.

      [root@servera ~]# yum install kernel-debuginfo-$(uname -r) kernel-debuginfo-common-x86_64-$(uname -r)
      ...output omitted...
      Installed:
        kernel-debuginfo-4.18.0-305.el8.x86_64             kernel-debuginfo-common-x86_64-4.18.0-305.el8.x86_64
      
      Complete!
  4. Finish building the SystemTap probe. Store the compiled modules as /root/nettop.ko on the servera system.

    1. Compile the SystemTap script into a portable kernel module. Store the compiled modules as /root/nettop.ko on the servera system.

      [root@servera ~]# stap -v -p 4 -m nettop /usr/share/systemtap/examples/network/nettop.stp
      Pass 1: parsed user script and 482 library scripts using 456012virt/88608res/12692shr/75468data kb, in 230usr/60sys/616real ms.
      Pass 2: analyzed script: 6 probes, 11 functions, 0 embeds, 3 globals using 685864virt/319488res/13720shr/305320data kb, in 2680usr/330sys/3714real ms.
      Pass 3: translated to C into "/tmp/stap0jpxAs/nettop_src.c" using 685864virt/319680res/13912shr/305320data kb, in 20usr/60sys/84real ms.
      nettop.ko
      Pass 4: compiled C into "nettop.ko" in 16670usr/3740sys/12838real ms.
    2. Return to workstation as the student user.

      [root@servera ~]# exit
      [student@servera ~]$ exit
      [student@workstation ~]$
  5. Enable the kdump service on the serverb system.

    1. Log in to the serverb system and switch to the root user.

      [student@workstation ~]$ ssh student@serverb
      ...output omitted...
      [student@serverb ~]$ sudo -i
      [sudo] password for student: student
      [root@serverb ~]#
    2. Verify whether the kdump service is enabled and active.

      [root@serverb ~]# systemctl is-enabled kdump.service
      disabled
      [root@serverb ~]# systemctl is-active kdump.service
      inactive
    3. Start and enable the kdump service if it is inactive or disabled.

      [root@serverb ~]# systemctl enable kdump.service
      Created symlink /etc/systemd/system/multi-user.target.wants/kdump.service → /usr/lib/systemd/system/kdump.service.
      [root@serverb ~]# systemctl start kdump.service
      [root@serverb ~]# systemctl status kdump.service
      ● kdump.service - Crash recovery kernel arming
         Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
         Active: active (exited) since Sun 2021-11-21 05:36:10 EST; 7s ago
        Process: 1655 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
       Main PID: 1655 (code=exited, status=0/SUCCESS)
      
      Nov 21 05:36:09 serverb.lab.example.com systemd[1]: Starting Crash recovery kernel arming...
      Nov 21 05:36:10 serverb.lab.example.com kdumpctl[1655]: kdump: kexec: loaded kdump kernel
      Nov 21 05:36:10 serverb.lab.example.com kdumpctl[1655]: kdump: Starting kdump: [OK]
      Nov 21 05:36:10 serverb.lab.example.com systemd[1]: Started Crash recovery kernel arming.
  6. Configure the kdump service to store crash dumps in the /var/crash directory; use the lzo compression algorithm; and generate all message types.

    1. Edit the /etc/kdump.conf file and modify the core_collector entry to use the lzo compression algorithm. Also, ensure that the dump stores only user data pages.

      core_collector makedumpfile -l --message-level 31 -d 23
    2. Restart the kdump service to implement the changes.

      [root@serverb ~]# systemctl restart kdump.service
      [root@serverb ~]# systemctl status kdump.service
      ● kdump.service - Crash recovery kernel arming
         Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
         Active: active (exited) since Sun 2021-11-21 05:38:48 EST; 16s ago
        Process: 2031 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
        Process: 2040 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
       Main PID: 2040 (code=exited, status=0/SUCCESS)
      
      Nov 21 05:38:40 serverb.lab.example.com dracut[2335]: *** Install squash loader ***
      Nov 21 05:38:41 serverb.lab.example.com dracut[2335]: *** Stripping files ***
      Nov 21 05:38:41 serverb.lab.example.com dracut[2335]: *** Stripping files done ***
      Nov 21 05:38:41 serverb.lab.example.com dracut[2335]: *** Squashing the files inside the initramfs ***
      Nov 21 05:38:47 serverb.lab.example.com dracut[2335]: *** Squashing the files inside the initramfs done ***
      Nov 21 05:38:47 serverb.lab.example.com dracut[2335]: *** Creating image file '/boot/initramfs-4.18.0-305.el8.x86_64k>
      Nov 21 05:38:48 serverb.lab.example.com dracut[2335]: *** Creating initramfs image file '/boot/initramfs-4.18.0-305.e>
      Nov 21 05:38:48 serverb.lab.example.com kdumpctl[2040]: kdump: kexec: loaded kdump kernel
      Nov 21 05:38:48 serverb.lab.example.com kdumpctl[2040]: kdump: Starting kdump: [OK]
      Nov 21 05:38:48 serverb.lab.example.com systemd[1]: Started Crash recovery kernel arming.
  7. Access the serverb system with its console. Test the kdump configuration by triggering a system crash.

    1. Log in to the serverb system console and switch to the root user.

      ...output omitted...
      serverb login: student
      Password: student
      [student@serverb ~]$ sudo -i
      [sudo] password for student: student
      [root@serverb ~]#
    2. Enable all kernel SysRq functions by setting the value in the /proc/sys/kernel/sysrq file to 1.

      [root@serverb ~]# echo 1 > /proc/sys/kernel/sysrq
    3. Trigger a system crash by setting the value in the /proc/sysrq-trigger file to c. Wait for the serverb system to restart and then proceed to the next step.

      [root@serverb ~]# echo c > /proc/sysrq-trigger
    4. Return to the workstation system. Log in to the serverb system and switch to the root user.

      [student@workstation ~]$ ssh student@serverb
      ...output omitted...
      [student@serverb ~]$ sudo -i
      [sudo] password for student: student
      [root@serverb ~]#
    5. Verify that the kernel crash dump was generated and stored in the /var/crash directory.

      [root@serverb ~]# ll /var/crash/
      total 0
      drwxr-xr-x. 2 root root 67 Nov 21 05:40 127.0.0.1-2021-11-21-05:40:49
      [root@serverb ~]# ll /var/crash/127.0.0.1-2021-11-21-05\:40\:49/
      total 83760
      -rw-r--r--. 1 root root    56051 Nov 21 05:40 kexec-dmesg.log
      -rw-------. 1 root root 85671410 Nov 21 05:40 vmcore
      -rw-r--r--. 1 root root    38269 Nov 21 05:40 vmcore-dmesg.txt
  8. Return to the workstation system as the student user.

    [root@serverb ~]# exit
    [student@serverb ~]$ exit
    [student@workstation ~]$

Evaluation

On the workstation machine, use the lab command to grade your work. Correct any reported failures and rerun the script until you receive a passing grade.

[student@workstation ~]$ lab grade kernel-review

Finish

On the workstation machine, use the lab command to complete this exercise. This is important to ensure that resources from previous exercises do not impact upcoming exercises.

[student@workstation ~]$ lab finish kernel-review

Revision: rh342-8.4-6dd89bd