Red Hat Enterprise Linux Diagnostics and Troubleshooting
Compile a SystemTap module and configure a system for crash dump.
Outcomes
You should be able to configure a system to collect a crash dump during a kernel crash and compile a SystemTap program as a portable kernel module.
As the student user on the workstation machine, use the lab command to prepare your system for this exercise.
[student@workstation ~]$ lab start kernel-review
This command installs the kernel debug packages.
Instructions
An administrator is experiencing issues on one of the systems in the organization during times of heavy network activity. The system's performance worsens and eventually the system hangs. To determine the cause of the issue, the administrator suggested to use the /usr/share/systemtap/examples/network/nettop.stp example SystemTap script.
Provide the administrator with the compiled kernel module for the nettop.stp script. The administrator requires that the kernel module is built for the 4.18.0-305.el8.x86_64 version of the kernel and debug packages, which matches the affected system's kernel version. Compile the SystemTap script on the servera system as a portable kernel module and prepare it as /root/nettop.ko for sending to the affected system.
The administrator also suggested to enable kernel crash dumps on the serverb system to help to identify any future issues. Configure the serverb system to save a crash dump when the system hangs. Configure the dumps to write locally to the /var/crash directory with the lzo compression algorithm to reduce the crash dump size and to save only user data pages. Configure the crash dump output to provide maximum diagnostic information with all message types. Verify that the kernel crash dumps are recorded correctly and stored in the /var/crash directory.
Log in to the
serverasystem and switch to therootuser.On the
serverasystem, compile thenettop.stpSystemTap script into a portable kernel module.The SystemTap script shows an error message while compiling. The error messages show that the necessary
debuginfodependencies are missing.[root@servera ~]#
stap -v -p 4 -m nettop /usr/share/systemtap/examples/network/nettop.stpPass 1: parsed user script and 482 library scripts using 456012virt/88784res/12872shr/75468data kb, in 230usr/60sys/629real ms. WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo] WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo] semantic error: resolution failed in DWARF builder semantic error: resolution failed in DWARF builder semantic error: while resolving probe point: identifier 'kernel' at /usr/share/systemtap/tapset/linux/networking.stp:140:4 source: kernel.function("dev_queue_xmit") ^ semantic error: no match semantic error: resolution failed in alias expansion builder WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo] WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo] semantic error: resolution failed in DWARF builder semantic error: resolution failed in alias expansion builder Pass 2: analyzed script: 3 probes, 1 function, 0 embeds, 3 globals using 487412virt/103992res/16656shr/86864data kb, in 260usr/230sys/1192real ms. Pass 2: analysis failed. [man error::pass2]Troubleshoot and fix the compilation issue.
Verify the installed kernel debug packages.
[root@servera ~]#
rpm -qa | grep ^kernelkernel-tools-4.18.0-305.el8.x86_64 kernel-core-4.18.0-305.el8.x86_64 kernel-modules-4.18.0-305.el8.x86_64 kernel-tools-libs-4.18.0-305.el8.x86_64 kernel-4.18.0-305.el8.x86_64 kernel-devel-4.18.0-305.el8.x86_64 kernel-headers-4.18.0-305.el8.x86_64kernel-debuginfo-common-x86_64-4.18.0-348.el8.x86_64 kernel-debuginfo-4.18.0-348.el8.x86_64Uninstall the incorrect
kernel-debuginfoandkernel-debuginfo-common-x86_64packages.[root@servera ~]#
yum remove -y kernel-debuginfo kernel-debuginfo-common-x86_64...output omitted... Removed: kernel-debuginfo-4.18.0-348.el8.x86_64 kernel-debuginfo-common-x86_64-4.18.0-348.el8.x86_64 Complete!Verify that the kernel debug packages match the installed kernel version.
[root@servera ~]#
yum list kernel-debuginfo-$(uname -r) kernel-debuginfo-common-x86_64-$(uname -r)Last metadata expiration check: 0:22:47 ago on Sun 21 Nov 2021 05:24:43 AM EST. Available Packages kernel-debuginfo.x86_64 4.18.0-305.el8 rhel-8.4-for-x86_64-baseos-debug-rpms kernel-debuginfo-common-x86_64.x86_64 4.18.0-305.el8 rhel-8.4-for-x86_64-baseos-debug-rpmsInstall the package version that matches the installed kernel version.
[root@servera ~]#
yum install kernel-debuginfo-$(uname -r) kernel-debuginfo-common-x86_64-$(uname -r)...output omitted... Installed: kernel-debuginfo-4.18.0-305.el8.x86_64 kernel-debuginfo-common-x86_64-4.18.0-305.el8.x86_64 Complete!
Finish building the SystemTap probe. Store the compiled modules as
/root/nettop.koon theserverasystem.Compile the SystemTap script into a portable kernel module. Store the compiled modules as
/root/nettop.koon theserverasystem.[root@servera ~]#
stap -v -p 4 -m nettop /usr/share/systemtap/examples/network/nettop.stpPass 1: parsed user script and 482 library scripts using 456012virt/88608res/12692shr/75468data kb, in 230usr/60sys/616real ms. Pass 2: analyzed script: 6 probes, 11 functions, 0 embeds, 3 globals using 685864virt/319488res/13720shr/305320data kb, in 2680usr/330sys/3714real ms. Pass 3: translated to C into "/tmp/stap0jpxAs/nettop_src.c" using 685864virt/319680res/13912shr/305320data kb, in 20usr/60sys/84real ms. nettop.ko Pass 4: compiled C into "nettop.ko" in 16670usr/3740sys/12838real ms.Return to
workstationas thestudentuser.[root@servera ~]#
exit[student@servera ~]$exit[student@workstation ~]$
Enable the
kdumpservice on theserverbsystem.Log in to the
serverbsystem and switch to therootuser.[student@workstation ~]$
ssh student@serverb...output omitted... [student@serverb ~]$sudo -i[sudo] password for student:student[root@serverb ~]#Verify whether the
kdumpservice is enabled and active.[root@serverb ~]#
systemctl is-enabled kdump.servicedisabled [root@serverb ~]#systemctl is-active kdump.serviceinactiveStart and enable the
kdumpservice if it is inactive or disabled.[root@serverb ~]#
systemctl enable kdump.serviceCreated symlink /etc/systemd/system/multi-user.target.wants/kdump.service → /usr/lib/systemd/system/kdump.service. [root@serverb ~]#systemctl start kdump.service[root@serverb ~]#systemctl status kdump.service● kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled) Active: active (exited) since Sun 2021-11-21 05:36:10 EST; 7s ago Process: 1655 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS) Main PID: 1655 (code=exited, status=0/SUCCESS) Nov 21 05:36:09 serverb.lab.example.com systemd[1]: Starting Crash recovery kernel arming... Nov 21 05:36:10 serverb.lab.example.com kdumpctl[1655]: kdump: kexec: loaded kdump kernel Nov 21 05:36:10 serverb.lab.example.com kdumpctl[1655]: kdump: Starting kdump: [OK] Nov 21 05:36:10 serverb.lab.example.com systemd[1]: Started Crash recovery kernel arming.
Configure the
kdumpservice to store crash dumps in the/var/crashdirectory; use thelzocompression algorithm; and generate all message types.Edit the
/etc/kdump.conffile and modify thecore_collectorentry to use thelzocompression algorithm. Also, ensure that the dump stores only user data pages.core_collector makedumpfile -l --message-level 31 -d 23
Restart the
kdumpservice to implement the changes.[root@serverb ~]#
systemctl restart kdump.service[root@serverb ~]#systemctl status kdump.service● kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled) Active: active (exited) since Sun 2021-11-21 05:38:48 EST; 16s ago Process: 2031 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS) Process: 2040 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS) Main PID: 2040 (code=exited, status=0/SUCCESS) Nov 21 05:38:40 serverb.lab.example.com dracut[2335]: *** Install squash loader *** Nov 21 05:38:41 serverb.lab.example.com dracut[2335]: *** Stripping files *** Nov 21 05:38:41 serverb.lab.example.com dracut[2335]: *** Stripping files done *** Nov 21 05:38:41 serverb.lab.example.com dracut[2335]: *** Squashing the files inside the initramfs *** Nov 21 05:38:47 serverb.lab.example.com dracut[2335]: *** Squashing the files inside the initramfs done *** Nov 21 05:38:47 serverb.lab.example.com dracut[2335]: *** Creating image file '/boot/initramfs-4.18.0-305.el8.x86_64k> Nov 21 05:38:48 serverb.lab.example.com dracut[2335]: *** Creating initramfs image file '/boot/initramfs-4.18.0-305.e> Nov 21 05:38:48 serverb.lab.example.com kdumpctl[2040]: kdump: kexec: loaded kdump kernel Nov 21 05:38:48 serverb.lab.example.com kdumpctl[2040]: kdump: Starting kdump: [OK] Nov 21 05:38:48 serverb.lab.example.com systemd[1]: Started Crash recovery kernel arming.
Access the
serverbsystem with its console. Test thekdumpconfiguration by triggering a system crash.Log in to the
serverbsystem console and switch to therootuser....output omitted... serverb login:
studentPassword:student[student@serverb ~]$sudo -i[sudo] password for student:student[root@serverb ~]#Enable all kernel SysRq functions by setting the value in the
/proc/sys/kernel/sysrqfile to1.[root@serverb ~]#
echo 1 > /proc/sys/kernel/sysrqTrigger a system crash by setting the value in the
/proc/sysrq-triggerfile toc. Wait for theserverbsystem to restart and then proceed to the next step.[root@serverb ~]#
echo c > /proc/sysrq-triggerReturn to the
workstationsystem. Log in to theserverbsystem and switch to therootuser.[student@workstation ~]$
ssh student@serverb...output omitted... [student@serverb ~]$sudo -i[sudo] password for student:student[root@serverb ~]#Verify that the kernel crash dump was generated and stored in the
/var/crashdirectory.[root@serverb ~]#
ll /var/crash/total 0 drwxr-xr-x. 2 root root 67 Nov 21 05:40 127.0.0.1-2021-11-21-05:40:49 [root@serverb ~]#ll /var/crash/127.0.0.1-2021-11-21-05\:40\:49/total 83760 -rw-r--r--. 1 root root 56051 Nov 21 05:40 kexec-dmesg.log -rw-------. 1 root root 85671410 Nov 21 05:40 vmcore -rw-r--r--. 1 root root 38269 Nov 21 05:40 vmcore-dmesg.txt
Return to the
workstationsystem as thestudentuser.