Bookmark this page

Match Text in Command Output with Regular Expressions

Objectives

Create regular expressions to match data, apply regular expressions to text files with the grep command, and use grep to search files and data from piped commands.

Write Regular Expressions

Regular expressions provide a pattern matching mechanism to find specific content. The vim, grep, and less commands can use regular expressions. Programming languages such as Perl, Python, and C also support regular expressions, but might differ slightly in syntax.

Regular expressions are a unique language, with their own syntax and rules. This section introduces regular expression syntax as implemented in bash, with examples.

Describe a Simple Regular Expression

The simplest regular expression is an exact match of the string to search. An exact match is when the characters in the regular expression match the type and order of the string.

Imagine that a user is looking through the following file for all occurrences of the pattern cat:

cat
dog
concatenate
dogma
category
educated
boondoggle
vindication
chilidog

The cat string is an exact match of the c character, followed by the a and t characters with no other characters between. Searching the file with the cat string as the regular expression returns the following matches:

cat
concatenate
category
educated
vindication

Match the Start and End of a Line

The regular expression would match the search string anywhere on the line on which it occurred: the beginning, middle, or end of the word or line. Use a line anchor metacharacter to control where on a line to look for a match.

To match only at the beginning of a line, use the caret character (^). To match only at the end of a line, use the dollar sign ($).

With the same file as for the previous example, the ^cat regular expression would match two lines.

cat
category

The cat$ regular expression would find only one match, where the cat characters are at the end of a line.

cat

Locate lines in the file that end with dog, by using an end-of-line anchor to create the dog$ regular expression, which matches two lines:

dog
chilidog

To locate a line that contains only the search expression exactly, use both the beginning and end-of-line anchors. For example, to locate the word cat when it is both at the beginning and the end of a line simultaneously, use ^cat$.

cat

Basic and Extended Regular Expression

The two types of regular expressions are basic regular expressions and extended regular expressions.

One difference between basic and extended regular expressions is in the behavior of the |, +, ?, (, ), {, and } special characters. In basic regular expression syntax, these characters have a special meaning only if they are prefixed with a backslash \ character. In extended regular expression syntax, these characters are special unless they are prefixed with a backslash \ character. Other minor differences apply to how the ^, $, and * characters are handled.

The grep, sed, and vim commands use basic regular expressions. The grep command -E option, the sed command -E option, and the less command use extended regular expressions.

Wildcard and Multiplier Usage in Regular Expressions

Regular expressions use a dot character (.) as a wildcard to match any single character on a single line. The c.t regular expression searches for a string that contains a c, followed by any single character, followed by a t. Example matches might include cat, concatenate, vindication, cut, and c$t.

With an unrestricted wildcard, you cannot predict the character that matches the wildcard. To match specific characters, replace the unrestricted wildcard with appropriate characters.

The use of bracket characters, such as in the c[aou]t regular expression, matches patterns that start with a c, followed by an a, o, or u, followed by a t. Possible matching expressions can have the cat, cot, and cut strings.

Multipliers are an often used mechanism with wildcards. Multipliers apply to the previous character or wildcard in the regular expression. An often used multiplier is the asterisk (*) character. When used in a regular expression, the asterisk multiplier matches zero or more occurrences of the multiplied expression. You can use the asterisk with expressions, in addition to characters.

For example, the c[aou]*t regular expression might match coat or coot. A regular expression of c.*t matches cat, coat, culvert, and even ct (matching zero characters between the c and the t). Any string that starts with a c, is followed by zero or more characters, and ends with a t must be a match.

Another type of multiplier indicates a more precise number of characters in the pattern. An example of an explicit multiplier is the 'c.\{2\}t' regular expression, which matches any word that begins with a c, followed by exactly any two characters, and ends with a t. The 'c.\{2\}t' expression would match two words in the following example:

cat
coat
convert
cart
covert
cypher

Note

This course introduced two metacharacter text parsing mechanisms: shell pattern matching (also known as file globbing or file name expansion), and regular expressions. Both mechanisms use similar metacharacters, such as the asterisk character (*), but have differences in metacharacter interpretation and rules.

Pattern matching is a shell technique to specify multiple file names on the command line. Regular expressions represent any form or pattern in text strings, no matter how complex. Regular expressions are internally supported by many text processing commands, such as grep, sed, awk, python, and perl, and in many applications.

Table 1.1. Basic and Extended Regular Expression Syntax

Basic syntaxExtended syntaxDescription
.The period (.) matches any single character.
?The preceding item is optional and is matched at most once.
*The preceding item is matched zero or more times.
+The preceding item is matched one or more times.
\{n\}{n}The preceding item is matched exactly n times.
\{n,\}{n,}The preceding item is matched n or more times.
\{,m\}{,m}The preceding item is matched at most m times.
\{n,m\}{n,m}The preceding item is matched at least n times, but not more than m times.
[:alnum:]Alphanumeric characters: [:alpha:] and [:digit:]; in the 'C' locale and ASCII character encoding, this expression is the same as [0-9A-Za-z].
[:alpha:]Alphabetic characters: [:lower:] and [:upper:]; in the 'C' locale and ASCII character encoding, this expression is the same as [A-Za-z].
[:blank:]Blank characters: space and tab.
[:cntrl:]Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL).
[:digit:]Digits: 0 1 2 3 4 5 6 7 8 9.
[:graph:]Graphical characters: [:alnum:] and [:punct:].
[:lower:]Lowercase letters; in the 'C' locale and ASCII character encoding: a b c d e f g h i j k l m n o p q r s t u v w x y z.
[:print:]Printable characters: [:alnum:], [:punct:], and space.
[:punct:]Punctuation characters; in the 'C' locale and ASCII character encoding: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ' { | } ~.
[:space:]Space characters: in the 'C' locale, it is tab, newline, vertical tab, form feed, carriage return, and space.
[:upper:]Uppercase letters: in the 'C' locale and ASCII character encoding, it is: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.
[:xdigit:]Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f.
\bMatch the empty string at the edge of a word.
\BMatch the empty string provided that it is not at the edge of a word.
\<Match the empty string at the beginning of a word.
\>Match the empty string at the end of a word.
\wMatch word constituent. Synonym for [_[:alnum:]].
\WMatch non-word constituent. Synonym for [^_[:alnum:]].
\sMatch white space. Synonym for [[:space:]].
\SMatch non-white space. Synonym for [^[:space:]].

Match Regular Expressions from the Command Line

The grep command uses regular expressions to isolate matching data. You can use the grep command to match data in a single file or in multiple files. When you use grep to match data in multiple files, it prints the file name followed by a colon character and then the lines that match the regular expression.

Isolating Data with the grep Command

The grep command specifies a regular expression and a file to parse for matches.

[user@host ~]$ grep '^computer' /usr/share/dict/words
computer
computerese
computerise
computerite
computerizable
computerization
computerize
computerized
computerizes
computerizing
computerlike
computernik
computers

Note

It is recommended practice to use single quotation marks to encapsulate the regular expression to protect any shell metacharacters (such as the $, *, and {} characters). Encapsulating the regular expression ensures that the command and not the shell interprets the characters.

The grep command can process output from other commands by using a pipe operator character (|). The following example shows the grep command parsing lines from the output of another command.

[root@host ~]# ps aux | grep chrony
chrony     662  0.0  0.1  29440  2468 ?        S    10:56   0:00 /usr/sbin/chronyd

The grep Command Options

The grep command has many options for controlling how it parses lines.

Table 1.2. Table of Common grep Options

OptionFunction
-i Use the provided regular expression and do not enforce case sensitivity (run case-insensitive).
-v Display only lines that do not contain matches to the regular expression.
-r Search for data that matches the regular expression recursively in a group of files or directories.
-A NUMBER Display NUMBER of lines after the regular expression match.
-B NUMBER Display NUMBER of lines before the regular expression match.
-e If multiple -e options are used, then multiple regular expressions can be supplied and are used with a logical OR.
-E Use extended regular expression syntax instead of basic regular expression syntax when parsing the provided regular expression.

View the man pages to find other options for the grep command.

Examples of the grep Command

The following examples use various configuration files and log files.

Regular expressions are case-sensitive by default. Use the grep command -i option to run a case-insensitive search. The following example shows an excerpt of the /etc/httpd/conf/httpd.conf configuration file.

[user@host ~]$ cat /etc/httpd/conf/httpd.conf
...output omitted...
ServerRoot "/etc/httpd"

#
# Listen: Allows you to bind Apache to specific IP addresses and/or
# ports, instead of the default. See also the <VirtualHost>
# directive.
#
# Change this to Listen on a specific IP address, but note that if
# httpd.service is enabled to run at boot time, the address may not be
# available when the service starts.  See the httpd.service(8) man
# page for more information.
#
#Listen 12.34.56.78:80
Listen 80
...output omitted...

The following example searches for the serverroot regular expression in the /etc/httpd/conf/httpd.conf configuration file.

[user@host ~]$ grep -i serverroot /etc/httpd/conf/httpd.conf
# with "/", the value of ServerRoot is prepended -- so 'log/access_log'
# with ServerRoot set to '/www' will be interpreted by the
# ServerRoot: The top of the directory tree under which the server's
# ServerRoot at a non-local disk, be sure to specify a local disk on the
# same ServerRoot for multiple httpd daemons, you will need to change at
ServerRoot "/etc/httpd"

Use the grep command -v option to reverse search the regular expression. This option displays only the lines that do not match the regular expression.

In the following example, all lines, regardless of case, that do not contain the server regular expression are returned.

[user@host ~]$ grep -v -i server /etc/hosts
127.0.0.1 localhost.localdomain localhost
172.25.254.254 classroom.example.com classroom
172.25.254.254 content.example.com content
172.25.254.254 materials.example.com materials
### rht-vm-hosts file listing the entries to be appended to /etc/hosts

172.25.250.9	workstation.lab.example.com workstation
172.25.250.254	bastion.lab.example.com bastion
172.25.250.220  utility.lab.example.com utility
172.25.250.220  registry.lab.example.com registry

To view a file without the distraction of comment lines, use the grep command -v option. In the following example, the regular expression matches and excludes all the lines that begin with a hash character (#) or a semicolon (;) character in the /etc/systemd/system/multi-user.target.wants/rsyslog.service file. In that file, the hash character at the beginning of a line indicates a general comment, whereas the semicolon character refers to a commented variable value.

[user@host ~]$ grep -v '^[#;]' \
/etc/systemd/system/multi-user.target.wants/rsyslog.service
[Unit]
Description=System Logging Service
Documentation=man:rsyslogd(8)
Documentation=https://www.rsyslog.com/doc/

[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/rsyslog
ExecStart=/usr/sbin/rsyslogd -n $SYSLOGD_OPTIONS
ExecReload=/usr/bin/kill -HUP $MAINPID
UMask=0066
StandardOutput=null
Restart=on-failure

LimitNOFILE=16384

[Install]
WantedBy=multi-user.target

The grep command -e option can search for more than one regular expression at a time. The following example, which uses a combination of the less and grep commands, locates all occurrences of pam_unix, user root, and Accepted publickey in the /var/log/secure log file.

[root@host ~]# cat /var/log/secure | grep -e 'pam_unix' \
-e 'user root' -e 'Accepted publickey' | less
Mar  4 03:31:41 localhost passwd[6639]: pam_unix(passwd:chauthtok): password changed for root
Mar  4 03:32:34 localhost sshd[15556]: Accepted publickey for devops from 10.30.0.167 port 56472 ssh2: RSA SHA256:M8ikhcEDm2tQ95Z0o7ZvufqEixCFCt+wowZLNzNlBT0
Mar  4 03:32:34 localhost systemd[15560]: pam_unix(systemd-user:session): session opened for user devops(uid=1001) by (uid=0)

To search for text in a file that you opened with the vim or less commands, first enter the slash character (/) and then type the pattern to find. Press Enter to start the search. Press N to find the next match.

[root@host ~]# vim /var/log/boot.log
...output omitted...
[^[[0;32m  OK  ^[[0m] Finished ^[[0;1;39mdracut pre-pivot and cleanup hook^[[0m.^M
         Starting ^[[0;1;39mCleaning Up and Shutting Down Daemons^[[0m...^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mRemote Encrypted Volumes^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mTimer Units^[[0m.^M
[^[[0;32m  OK  ^[[0m] Closed ^[[0;1;39mD-Bus System Message Bus Socket^[[0m.^M
/Daemons
[root@host ~]# less /var/log/messages
...output omitted...
Mar  4 03:31:19 localhost kernel: pci 0000:00:02.0: vgaarb: setting as boot VGA device
Mar  4 03:31:19 localhost kernel: pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
Mar  4 03:31:19 localhost kernel: pci 0000:00:02.0: vgaarb: bridge control possible
Mar  4 03:31:19 localhost kernel: vgaarb: loaded
Mar  4 03:31:19 localhost kernel: SCSI subsystem initialized
Mar  4 03:31:19 localhost kernel: ACPI: bus type USB registered
Mar  4 03:31:19 localhost kernel: usbcore: registered new interface driver usbfs
Mar  4 03:31:19 localhost kernel: usbcore: registered new interface driver hub
Mar  4 03:31:19 localhost kernel: usbcore: registered new device driver usb
/device

 

References

regex(7) and grep(1) man pages

Revision: rh134-9.0-fa57cbe