Red Hat Enterprise Linux Diagnostics and Troubleshooting
The SystemTap tool gathers information for tracing and probing to monitor and analyze activities on a running Linux kernel. SystemTap collects information on performance or functional issues. SystemTap helps developers and system administrators to gather debug, profile, and performance data without requiring the installation or use of instrumented kernels and other specialized tool packages.
SystemTap provides options to analyze and filter collected data similar to capabilities of the netstat, ps, top, and iostat tools. The systemtap package includes numerous example scripts, and supports custom, user-developed scripts.
The stap command processes a SystemTap script, which is referred to as a probe file, by translating the probe into C language and compiling it as a kernel module. SystemTap inserts this kernel module into the running kernel for analysis and performs the probe functions as defined in the script. You can view the probe output when the probe runs or redirects to a file.
The systemtap package provides the SystemTap framework. Probes must be compiled with the identical RHEL kernel version to the system where the probe is inserted to monitor and analyze kernel activity. A probe that is compiled on a different kernel is not guaranteed to run properly on other kernel versions, similar to any other C application.
Red Hat recommends that you do not normally compile SystemTap probes on the production systems to be analyzed, because these probes are a security vulnerability and can affect performance. Instead, use a development system for compiling SystemTap modules, and then transfer the compiled module to the target system. To use a SystemTap probe on more than one kernel, the probe must be recompiled for each target kernel version.
The compiling machine must be running the same kernel version as the system to be analyzed. Although it is possible to install multiple kernels on a single system, and boot into the chosen kernel version before compiling, the recommended practice is to create virtual machines on your development system, each with one of the target kernel versions. Install SystemTap and the version-specific kernel packages into each of these unique virtual machines.
The SystemTap framework requires the kernel-debuginfo, kernel-debuginfo-common, and kernel-devel packages that match the kernel version to be used for compiling. The C compiler and related developer tools are required and are installed as SystemTap dependencies if they are not already available on the development system. Most of the required packages are distributed in the RHEL 8 base distribution. The kernel-debuginfo and kernel-debuginfo-common packages are provided by the debug and source repositories.
To install all of these packages, enable the debug repositories with the following commands:
[root@host ~]#subscription-manager repos --enable rhel-8-for-$(uname -i)-baseos-debug-rpms[root@host ~]#subscription-manager repos --enable rhel-8-for-$(uname -i)-baseos-source-rpms[root@host ~]#subscription-manager repos --enable rhel-8-for-$(uname -i)-appstream-debug-rpms[root@host ~]#subscription-manager repos --enable rhel-8-for-$(uname -i)-appstream-source-rpms
The stap-prep command from the systemtap package simplifies the tool installation process. Because the stap-prep command installs the matching kernel-devel and kernel-debuginfo packages for the currently running kernel, verify that you are booted into the needed target kernel version before running the stap-prep command. The SystemTap installation installs the necessary C compiler, libraries, build tools, kernel source code, and Perl tools if not already installed.
The systemtap package provides various useful example scripts for gathering data on the running kernel. These scripts are stored in /usr/share/systemtap/examples and are organized by the type of probed information.
By convention, SystemTap scripts use an .stp extension.
The stap command compiles and runs SystemTap scripts, including the provided examples. The stap command performs the following actions:
Parse the SystemTap script to validate syntax.
Elaborate the script to resolve and include symbols, variables, functions, and aliases.
Translate the script into C and save it to a temporary file.
Compile the C code into a kernel module object file.
Load and run the kernel module, trace the output, and unload when exited.
When run without options, the stap command runs the syscalls_by_proc.stp SystemTap example script, gathering data until the probe is interrupted and the summary results are displayed.
[root@host ~]# stap /usr/share/systemtap/examples/process/syscalls_by_proc.stp
Collecting data... Type Ctrl-C to exit and display results
#SysCalls Process Name
...output omitted...
12 crond
7 rpcbind
5 stapRunning stap with the -v option displays the multiple passes that the stap command makes to run the probe.
[root@host ~]#stap -v /usr/share/systemtap/examples/process/syscalls_by_proc.stpPass 1: parsed user script and 482 library scripts using 456140virt/88756res/12612shr/75596data kb, in 180usr/50sys/239real ms.Pass 2: analyzed script: 4 probes, 2 functions, 95 embeds, 5 globals using 464380virt/98008res/13668shr/83836data kb, in 220usr/200sys/432real ms.
Pass 3: using cached /root/.systemtap/cache/5a/stap_5adeba4dc66f80ef54ba94400cfc572d_133889.c
Pass 4: using cached /root/.systemtap/cache/5a/stap_5adeba4dc66f80ef54ba94400cfc572d_133889.ko
Pass 5: starting run.
Collecting data... Type Ctrl-C to exit and display results #SysCalls Process Name ...output omitted... 2 lsmd 2 rs:main Q:Reg 1 sssd Pass 5: run completed in 10usr/30sys/12362real ms.
Pass 1 parses the script and verifies that the code is syntactically consistent. | |
Pass 2 elaborates the script to resolve symbols, variables, functions, and aliases. | |
Pass 3 translates the program to C code, creates a build Makefile, and places these files in a temporary cache directory. If this probe was run previously and the script was not modified since, then the previously translated C object and Makefile are retrieved from the cache and reused, as in this example. | |
Pass 4 builds the kernel object file that uses the C object, the Makefile, and the system's C compiler tools. If the kernel object exists in the cache and the script is not modified, then the previously built kernel object is retrieved from the cache and reused, as in this example. | |
Pass 5 invokes the |
Pass 4 of the stap command compiles a kernel module with a unique and random name, and stores it in the hidden cache directory. Unless an absolute path is specified, the named module is created in the user's current working directory. Use the -m option to name the module, which is preferred when the module will be used repetitively and copied to target systems.
Kernel module names must contain only the following characters:
ASCII lowercase alphabetic characters (a-z)
Digits (0-9)
Underscores (_)
When compiling on a developer system or virtual machine SystemTap scripts that are intended to be run on a different target system, use the -p 4 option to stop after completing pass 4 and before the kernel module is loaded to be run.
This example runs the syscalls_by_proc.stp example script, and stops after compiling the kernel module. The manually named syscalls_by_proc.ko file is created in the /root/ directory, which is the current working directory.
[root@host ~]# stap -p 4 -v -m syscalls_by_proc /usr/share/systemtap/examples/process/syscalls_by_proc.stp
Pass 1: parsed user script and 482 library scripts using 456012virt/88844res/12928shr/75468data kb, in 190usr/40sys/232real ms.
Pass 2: analyzed script: 4 probes, 2 functions, 95 embeds, 5 globals using 464368virt/98252res/13924shr/83824data kb, in 73080usr/13570sys/44902real ms.
Pass 3: translated to C into "/tmp/stapllx3aE/syscalls_by_proc_src.c" using 464368virt/98252res/13924shr/83824data kb, in 20usr/10sys/24real ms.
syscalls_by_proc.ko
Pass 4: compiled C into "syscalls_by_proc.ko" in 18520usr/3340sys/14244real ms.On target systems, install the systemtap-runtime package to obtain the staprun command without the additional compile and build tools. Use the staprun command to run previously compiled modules. Copy the kernel module to the target system, placing the module into the system-wide kernel modules directory for access by general users with 'stapusr` privileges.
Developers with the stapdev privileges, and the root user, can run the module from any directory location.
| SystemTap group | Description |
|---|---|
stapusr
| User can run only the SystemTap instrumentation modules that are stored in the default /lib/modules/$(uname -r)/systemtap kernel module directory.
The module files must have root ownership, and root must have write permissions on the /lib/modules/$(uname -r)/systemtap directory.
Users who are assigned to the stapusr group can use only the staprun command, and do not have permission to use the stap command for compiling SystemTap scripts. |
stapdev
| Developers with stapdev group membership can compile their own SystemTap kernel modules with the stap command.
If the user also has stapusr group membership, then they can use the staprun command to load a module from any directory. |
In this example, a SystemTap kernel module that is created as /root/syscalls_by_proc.ko on the target system is prepared for use by copying the file to the default kernel modules directory. Files that are stored in the default location can be run by the module name, which matches the file name without the .ko extension.
[root@host ~]#mkdir -p /lib/modules/$(uname -r)/systemtap[root@host ~]#ls -ld /lib/modules/$(uname -r)/systemtapdrwxr-xr-x. 2 root root 33 Nov 20 22:33 /lib/modules/4.18.0-305.el8.x86_64/systemtap [root@host ~]#cp /root/syscalls_by_proc.ko /lib/modules/$(uname -r)/systemtap
Any user with stapusr, stapdev, or root privileges can run the module by using the staprun command with the module name.
[root@host ~]# staprun syscalls_by_proc
Collecting data... Type Ctrl-C to exit and display results
#SysCalls Process Name
...output omitted...
2 gmain
2 lsmd
1 rpcbindAlternatively, a developer with stapdev or root privileges can run a module by using the absolute or relative path to the target module file in any writable and accessible directory. However, modules that are run this way are not located by module name, but by the file name, which includes the .ko extension. The user must have write permission in the directory with the module file for SystemTap to capture the trace data in temporary files.
The ftrace framework uses virtual files in the debugfs file system, and enables specific tracers. The ftrace function tracer displays each function that is called by the kernel in real-time; other tracers within the ftrace framework can analyze wakeup latency, task switches, and kernel events. You can also add new tracers for ftrace, to make it a flexible solution for analyzing kernel events.
The ftrace framework helps to debug or analyze latencies and performance issues outside userspace. The interface for ftrace resides in the debugfs file system in the 'tracing' directory.
To view the kernel config file for the ftrace command:
[root@host ~]# grep TRACER /boot/config-4.18.0-305.el8.x86_64
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_GENERIC_TRACER=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_STACK_TRACER=yTo verify whether the debugfs file system is mounted on the system:
[root@host ~]#findmnt -t debugfsTARGET SOURCE FSTYPE OPTIONS /sys/kernel/debug debugfs debugfs rw,relatime,seclabel [root@host ~]#mount | grep ^debugfsdebugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
To view the available_tracers file and check all the available tracers:
[root@host ~]# cat /sys/kernel/debug/tracing/available_tracers
hwlat blk function_graph wakeup_dl wakeup_rt wakeup function nopTo change the current tracer and use the function tracer:
[root@host ~]# echo function > /sys/kernel/debug/tracing/current_tracerTo view the current_tracer file and check the current tracer:
[root@host ~]# cat /sys/kernel/debug/tracing/current_tracer
functionTo enable tracing:
[root@host ~]#echo 1 > /sys/kernel/debug/tracing/tracing_on[root@host ~]#cat /sys/kernel/debug/tracing/tracing_on1
To check the trace file and check the trace buffer:
[root@host ~]# head -n 20 /sys/kernel/debug/tracing/trace
# tracer: function
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/0:1-25346 [000] .... 3031.749852: unlock_page <-shmem_read_mapping_page_gfp
kworker/0:1-25346 [000] .... 3031.749852: shmem_read_mapping_page_gfp <-drm_gem_get_pages
kworker/0:1-25346 [000] .... 3031.749852: shmem_getpage_gfp <-shmem_read_mapping_page_gfp
kworker/0:1-25346 [000] .... 3031.749852: find_lock_entry <-shmem_getpage_gfp
kworker/0:1-25346 [000] .... 3031.749852: find_get_entry <-find_lock_entry
kworker/0:1-25346 [000] .... 3031.749852: PageHuge <-find_get_entry
kworker/0:1-25346 [000] .... 3031.749852: _cond_resched <-find_lock_entry
kworker/0:1-25346 [000] .... 3031.749852: rcu_all_qs <-_cond_resched
kworker/0:1-25346 [000] .... 3031.749852: page_mapping <-find_lock_entry
kworker/0:1-25346 [000] .... 3031.749852: unlock_page <-shmem_read_mapping_page_gfp
kworker/0:1-25346 [000] .... 3031.749852: shmem_read_mapping_page_gfp <-drm_gem_get_pagesThe trace-cmd tool is a front-end tool to the ftrace tool, from the trace-cmd package. The trace-cmd tool enables the ftrace tool interactions without writing to the /sys/kernel/debug/tracing/ directory.
References
stap(1), staprun(8), gdb(1), strace(1), ltrace(1), and trace-cmd(1) man pages
For further information, refer to Chapter 37. Getting Started with Systemtap at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/monitoring_and_managing_system_status_and_performance/getting-started-with-systemtap_monitoring-and-managing-system-status-and-performance
For further information, refer to What Is Ftrace and How Do I Use It? at https://access.redhat.com/solutions/221323
For further information, refer to Chapter 7. GNU Debugger (GDB) at https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/8/html-single/user_guide/chap-gdb
For futher information, refer to Chapter 8. strace at https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/8/html-single/user_guide/chap-strace