Bookmark this page

Monitoring Process Activity

Manage system workload by utilizing load averages and process statistics.

Objectives

After completing this section, students should be able to:

  • Interpret uptime and load averages.

  • Monitor real-time processes.

Load average

Monitoring process activity

The Linux kernel calculates a load average metric as an exponential moving average of the load number, a cumulative CPU count of active system resource requests.

  • Active requests are counted from per-CPU queues for running threads and threads waiting for I/O, as the kernel tracks process resource activity and corresponding process state changes.

  • Load number is a calculation routine run every five seconds by default, which accumulates and averages the active requests into a single number for all CPUs.

  • Exponential moving average is a mathematical formula to smooth out trending data highs and lows, increase current activity significance, and decrease aging data quality.

  • Load average is the load number calculation routine result. Collectively, it refers to the three displayed values of system activity data averaged for the last 1, 5, and 15 minutes.

Understanding the Linux load average calculation

The load average represents the perceived system load over a time period. Linux implements the load average calculation as a representation of expected service wait times, not only for CPU but also for disk and network I/O.

  • Linux counts not only processes, but threads individually, as separate tasks. CPU request queues for running threads (nr_running) and threads waiting for I/O resources (nr_iowait) reasonably correspond to process states R (Running) and D (Uninterruptable Sleeping). Waiting for I/O includes tasks sleeping for expected disk and network responses.

  • The load number is a global counter calculation, which is sum-totaled for all CPUs. Since tasks returning from sleep may reschedule to different CPUs, accurate per-CPU counts are difficult, but an accurate cumulative count is assured. Displayed load averages represent all CPUs.

  • Linux counts each physical CPU core and microprocessor hyperthread as separate execution units, logically represented and referred to as individual CPUs. Each CPU has independent request queues. View /proc/cpuinfo for the kernel representation of system CPUs.

    [student@serverX ~]$ grep "model name" /proc/cpuinfo
    model name	: Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
    model name	: Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
    model name	: Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
    model name	: Intel(R) Core(TM) i5 CPU       M 520  @ 2.40GHz
    [student@serverX ~]$ grep "model name" /proc/cpuinfo | wc -l
    4

  • Some UNIX systems only considered CPU utilization or run queue length to indicate system load. Since a system with idle CPUs can experience extensive waiting due to busy disk or network resources, I/O consideration is included in the Linux load average. When experiencing high load averages with minimal CPU activity, examine the disk and network activity.

Interpreting displayed load average values

The three values represent the weighted values over the last 1, 5, and 15 minutes. A quick glance can indicate whether system load appears to be increasing or decreasing. Calculate the approximate per-CPU load value to determine whether the system is experiencing significant waiting.

  • top, uptime, w, and gnome-system-monitor display load average values.

    [student@serverX ~]$ uptime
     15:29:03 up 14 min,  2 users,  load average: 2.92, 4.48, 5.20
  • Divide the displayed load average values by the number of logical CPUs in the system. A value below 1 indicates satisfactory resource utilization and minimal wait times. A value above 1 indicates resource saturation and some amount of service waiting times.

    # From /proc/cpuinfo, system has four logical CPUs, so divide by 4:
    #                               load average: 2.92, 4.48, 5.20
    #           divide by number of logical CPUs:    4     4     4
    #                                             ----  ----  ----
    #                       per-CPU load average: 0.73  1.12  1.30
    #
    # This system's load average appears to be decreasing.
    # With a load average of 2.92 on four CPUs, all CPUs were in use ~73% of the time.
    # During the last 5 minutes, the system was overloaded by ~12%.
    # During the last 15 minutes, the system was overloaded by ~30%.
  • An idle CPU queue has a load number of 0. Each ready and waiting thread adds a count of 1. With a total queue count of 1, the resource (CPU, disk, or network) is in use, but no requests spend time waiting. Additional requests increment the count, but since many requests can be processed within the time period, resource utilization increases, but not wait times.

  • Processes sleeping for I/O due to a busy disk or network resource are included in the count and increase the load average. While not an indication of CPU utilization, the queue count still indicates that users and programs are waiting for resource services.

  • Until resource saturation, a load average will remain below 1, since tasks will seldom be found waiting in queue. Load average only increases when resource saturation causes requests to remain queued and counted by the load calculation routine. When resource utilization approaches 100%, each additional request starts experiencing service wait time.

Real-time process monitoring

The top program is a dynamic view of the system's processes, displaying a summary header followed by a process or thread list similar to ps information. Unlike the static ps output, top continuously refreshes at a configurable interval, and provides capabilities for column reordering, sorting, and highlighting. User configurations can be saved and made persistent.

Default output columns are recognizable from other resource tools:

  • The process ID (PID).

  • User name (USER) is the process owner.

  • Virtual memory (VIRT) is all memory the process is using, including the resident set, shared libraries, and any mapped or swapped memory pages. (Labeled VSZ in the ps command.)

  • Resident memory (RES) is the physical memory used by the process, including any resident shared objects. (Labeled RSS in the ps command.)

  • Process state (S) displays as:

    • D = Uninterruptable Sleeping

    • R = Running or Runnable

    • S = Sleeping

    • T = Stopped or Traced

    • Z = Zombie

  • CPU time (TIME) is the total processing time since the process started. May be toggled to include cumulative time of all previous children.

  • The process command name (COMMAND).

Table 7.3. Fundamental keystrokes in top

KeyPurpose
? or hHelp for interactive keystrokes.
l, t, mToggles for load, threads, and memory header lines.
1Toggle showing individual CPUs or a summary for all CPUs in header.
s (1)Change the refresh (screen) rate, in decimal seconds (e.g., 0.5, 1, 5).
bToggle reverse highlighting for Running processes; default is bold only.
BEnables use of bold in display, in the header, and for Running processes.
HToggle threads; show process summary or individual threads.
u, UFilter for any user name (effective, real).
MSorts process listing by memory usage, in descending order.
PSorts process listing by processor utilization, in descending order.
k (1)Kill a process. When prompted, enter PID, then signal.
r (1)Renice a process. When prompted, enter PID, then nice_value.
WWrite (save) the current display configuration for use at the next top restart.
qQuit.
Note:(1) Not available if top started in secure mode. See top(1).

References

GNOME System Monitor

  • yelp help:gnome-system-monitor

ps(1), top(1), uptime(1), and w(1) man pages

Revision: rh124-7-1b00421