After completing this section, students should be able to:
Create regular expressions that match desired data.
Apply regular expressions to text files using the grep command.
Search files and data from piped commands using grep.
Regular expressions provide a pattern matching mechanism that facilitates finding specific content. The vim , grep, and less commands can all use regular expressions. Programming languages such as Perl, Python, and C can all use regular expressions when using pattern matching criteria.
Regular expressions are a language of their own, which means they have their own syntax and rules. This section looks at the syntax used when creating regular expressions, as well as showing some regular expression examples.
Describing a Simple Regular Expression
The simplest regular expression is an exact match. An exact match is when the characters in the regular expression match the type and order in the data that is being searched.
Suppose a user is looking through the following file looking for all occurrences of the pattern cat:
cat dog concatenate dogma category educated boondoggle vindication chilidog
cat is an exact match of a c, followed by an a, followed by a t with no other characters in between.
Using cat as the regular expression to search the previous file returns the following matches:
catconcatenatecategory educated vindication
Matching the Start and End of a Line
The previous section used an exact match regular expression on a file. Note that the regular expression would match the search string no matter where on the line it occurred: the beginning, end, or middle of the word or line. Use a line anchor to control the location of where the regular expression looks for a match.
To search at the beginning of a line, use the caret character (^). To search at the end of a line, use the dollar sign ($).
Using the same file as above, the ^cat regular expression would match two words.
The $cat regular expression would not find any matching words.
catdog concatenate dogmacategory educated boondoggle vindication chilidog
To locate lines in the file ending with dog, use that exact expression and an end of line anchor to create the regular expression dog$.
Applying dog$ to the file would find two matches:
dogchilidog
To locate the only word on a line, use both the beginning and end-of-line anchors.
For example, to locate the word cat when it is the only word on a line, use ^cat$.
cat dog rabbit
cat
horse cat cow
cat pigAdding Wildcards and Multipliers to Regular Expressions
Regular expressions use a period or dot (.) to match any single character with the exception of the newline character.
A regular expression of c.t searches for a string that contains a c followed by any single character followed by a t.
Example matches include cat, concatenate, vindication, c5t, and c$t.
Using an unrestricted wildcard you cannot predict the character that would match the wildcard.
To match specific characters, replace the unrestricted wildcard with acceptable characters.
Changing the regular expression to c[aou]t matches patterns that start with a c, followed by either an a, o, or u, followed by a t.
Multipliers are a mechanism used often with wildcards.
Multipliers apply to the previous character in the regular expression.
One of the more common multipliers used is the asterisk, or star character (*).
When used in a regular expression, this multiplier means match zero or more of the previous expression.
You can use * with expressions, not just characters.
For example, c[aou]*t.
A regular expression of c.*t matches cat, coat, culvert, and even ct (zero characters between the c and the t).
Any data starting with a c, then zero or more characters, ending with a t.
Another type of multiplier would indicate the number of previous characters desired in the pattern.
An example of using an explicit multiplier would be 'c.\{2\}t'.
This regular expression will match any word beginning with a c, followed by exactly any two characters, and ending with a t. 'c.\{2\}t' would match two words in the example below:
catcoatconvertcartcovert cypher
It is recommend practice to use single quotes to encapsulate the regular expression because they often contain shell metacharacters (such as $, *, and {}). This ensures that the characters are interpreted by the command and not by the shell.
This course has introduced two distinct metacharacter text parsing systems: shell pattern matching (also known as file globbing or file-name expansion), and regular expressions. Because both systems use similar metacharacters, such as the asterisk (*), but have differences in metacharacter interpretation and rules, the two systems can be confusing until each is sufficiently mastered.
Pattern matching is a command-line parsing technique designed for specifying many file names easily, and is primarily supported only for representing file-name patterns on the command line. Regular expressions are designed to represent any form or pattern in text strings, no matter how complex. Regular expression are internally supported by numerous text processing commands, such as grep, sed, awk, python, perl, and many applications, with some minimal command-dependent variations in interpretation rules.
Table 1.1. Regular Expressions
| Option | Description |
|---|---|
| . | The period (.) matches any single character. |
| ? | The preceding item is optional and will be matched at most once. |
| * | The preceding item will be matched zero or more times. |
| + | The preceding item will be matched one or more times. |
| {n} | The preceding item is matched exactly n times. |
| {n,} | The preceding item is matched n or more times. |
| {,m} | The preceding item is matched at most m times. |
| {n,m} | The preceding item is matched at least n times, but not more than m times. |
| [:alnum:] | Alphanumeric characters: '[:alpha:]' and '[:digit:]'; in the 'C' locale and ASCII character encoding, this is the same as '[0-9A-Za-z]'. |
| [:alpha:] | Alphabetic characters: '[:lower:]' and '[:upper:]'; in the 'C' locale and ASCII character encoding, this is the same as '[A-Za-z]'. |
| [:blank:] | Blank characters: space and tab. |
| [:cntrl:] | Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any. |
| [:digit:] | Digits: 0 1 2 3 4 5 6 7 8 9. |
| [:graph:] | Graphical characters: '[:alnum:]' and '[:punct:]'. |
| [:lower:] | Lower-case letters; in the 'C' locale and ASCII character encoding, this is a b c d e f g h i j k l m n o p q r s t u v w x y z. |
| [:print:] | Printable characters: '[:alnum:]', '[:punct:]', and space. |
| [:punct:] | Punctuation characters; in the 'C' locale and ASCII character encoding, this is! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ' { | } ~. In other character sets, these are the equivalent characters, if any. |
| [:space:] | Space characters: in the 'C' locale, this is tab, newline, vertical tab, form feed,carriage return, and space. |
| [:upper:] | Upper-case letters: in the 'C' locale and ASCII character encoding, this is A B C D E F G H I J K L M N O P Q R S T U V W X Y Z. |
| [:xdigit:] | Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f. |
| \b | Match the empty string at the edge of a word. |
| \B | Match the empty string provided it is not at the edge of a word. |
| \< | Match the empty string at the beginning of word. |
| \> | Match the empty string at the end of word. |
| \w | Match word constituent. Synonym for '[_[:alnum:]]'. |
| \W | Match non-word constituent. Synonym for '[^_[:alnum:]]'. |
| \s | Match white space. Synonym for '[[:space:]]'. |
| \S | Match non-whitespace. Synonym for '[^[:space:]]'. |
The grep command, provided as part of the distribution, uses regular expressions to isolate matching data.
Isolating data using the grep command
The grep command provides a regular expression and a file on which the regular expression should be matched.
[user@host ~]$grep '^computer' /usr/share/dict/wordscomputercomputeresecomputerisecomputeritecomputerizablecomputerizationcomputerizecomputerizedcomputerizescomputerizingcomputerlikecomputernikcomputers
It is recommend practice to use single quotes to encapsulate the regular expression because they often contain shell metacharacters (such as $, *, and {}). This ensures that the characters are interpreted by grep and not by the shell.
The grep command can be used in conjunction with other commands using a pipe operator (|). For example:
[root@host ~]#ps aux | grep chronychrony662 0.0 0.1 29440 2468 ? S 10:56 0:00 /usr/sbin/chronyd
grep Options
The grep command has many useful options for adjusting how it uses the provided regular expression with data.
Table 1.2. Table of Common grep Options
| Option | Function |
|---|---|
-i
| Use the regular expression provided but do not enforce case sensitivity (run case-insensitive). |
-v
| Only display lines that do not contain matches to the regular expression. |
-r
| Apply the search for data matching the regular expression recursively to a group of files or directories. |
-A
|
Display NUMBER of lines after the regular expression match.
|
-B
|
Display NUMBER of lines before the regular expression match.
|
-e
|
With multiple -e options used, multiple regular expressions can be supplied
and will be used with a logical OR.
|
There are many other options to grep. Use the man page to research them.
grep Examples
The next examples use varied configuration files and log files.
Regular expressions are case-sensitive by default.
Use the -i option with grep to run a case-insensitive search.
The following example searches for the pattern serverroot.
[user@host ~]$cat /etc/httpd/conf/httpd.conf...output omitted... ServerRoot "/etc/httpd" # # Listen: Allows you to bind Apache to specific IP addresses and/or # ports, instead of the default. See also the <VirtualHost> # directive. # # Change this to Listen on specific IP addresses as shown below to # prevent Apache from glomming onto all bound IP addresses. # #Listen 12.34.56.78:80 Listen 80 ...output omitted...
[user@host ~]$grep -i serverroot /etc/httpd/conf/httpd.conf# with "/", the value ofServerRootis prepended -- so 'log/access_log' # withServerRootset to '/www' will be interpreted by the #ServerRoot: The top of the directory tree under which the server's #ServerRootat a non-local disk, be sure to specify a local disk on the # sameServerRootfor multiple httpd daemons, you will need to change atServerRoot"/etc/httpd"
In cases where you know what you are not looking for, the -v option is very useful.
The -v option only displays lines that do not match the regular expression.
In the following example, all lines, regardless of case, that do not contain the regular expression server
are returned.
[user@host ~]$cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.25.254.254 classroom.example.com classroom 172.25.254.254 content.example.com content 172.25.254.254 materials.example.com materials 172.25.250.254 workstation.lab.example.com workstation ### rht-vm-hosts file listing the entries to be appended to /etc/hosts 172.25.250.10 servera.lab.example.com servera 172.25.250.11 serverb.lab.example.com serverb 172.25.250.254 workstation.lab.example.com workstation
[user@host ~]$grep -v -i server /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.25.254.254 classroom.example.com classroom 172.25.254.254 content.example.com content 172.25.254.254 materials.example.com materials 172.25.250.254 workstation.lab.example.com workstation ### rht-vm-hosts file listing the entries to be appended to /etc/hosts 172.25.250.254 workstation.lab.example.com workstation
To look at a file without being distracted by comment lines use the -v option.
In the following example, the regular expression matches all lines that begin with a # or ; (typical characters that indicate the line will be interpreted as a comment).
Those lines are then omitted from the output.
[user@host ~]$cat /etc/ethertypes# # Ethernet frame types # This file describes some of the various Ethernet # protocol types that are used on Ethernet networks. # # This list could be found on: # http://www.iana.org/assignments/ethernet-numbers # http://www.iana.org/assignments/ieee-802-numbers # # <name> <hexnumber> <alias1>...<alias35> #Comment # IPv4 0800 ip ip4 # Internet IP (IPv4) X25 0805 ARP 0806 ether-arp # FR_ARP 0808 # Frame Relay ARP [RFC1701] ...output omitted...
[user@host ~]$grep -v '^[#;]' /etc/ethertypesIPv4 0800 ip ip4 # Internet IP (IPv4) X25 0805 ARP 0806 ether-arp # FR_ARP 0808 # Frame Relay ARP [RFC1701]
The grep command with the -e option allows you to search for more than one regular expression at a time.
The following example, using a combination of less and grep, locates all occurrences of pam_unix, user root and Accepted publickey in the /var/log/secure log file.
[root@host ~]#cat /var/log/secure | grep -e 'pam_unix' \-e 'user root' -e 'Accepted publickey' | lessMar 19 08:04:46 host sshd[6141]:pam_unix(sshd:session): session opened foruser rootby (uid=0) Mar 19 08:04:50 host sshd[6144]: Disconnected fromuser root172.25.250.254 port 41170 Mar 19 08:04:50 host sshd[6141]:pam_unix(sshd:session): session closed foruser rootMar 19 08:04:53 host sshd[6168]:Accepted publickeyfor student from 172.25.250.254 port 41172 ssh2: RSA SHA256:M8ikhcEDm2tQ95Z0o7ZvufqEixCFCt+wowZLNzNlBT0
To search for text in a file opened using vim or less, use the slash character (/) and type the pattern to find. Press Enter to start the search. Press N to find the next match.
[root@host ~]#vim /var/log/boot.log...output omitted... [^[[0;32m OK ^[[0m] Reached target Initrd Default Target.^M Starting dracut pre-pivot and cleanup hook...^M [^[[0;32m OK ^[[0m] Started dracut pre-pivot and cleanup hook.^M Starting Cleaning Up and Shutting DownDaemons...^M Starting Plymouth switch root service...^M Starting Setup Virtual Console...^M [^[[0;32m OK ^[[0m] Stopped target Timers.^M [^[[0;32m OK ^[[0m] Stopped dracut pre-pivot and cleanup hook.^M [^[[0;32m OK ^[[0m] Stopped target Initrd Default Target.^M /Daemons
[root@host ~]#less /var/log/messages...output omitted... Feb 26 15:51:07 host NetworkManager[689]: <info> [1551214267.8584] Loadeddeviceplugin: NMTeamFactory (/usr/lib64/NetworkManager/1.14.0-14.el8/libnm-device-plugin-team.so) Feb 26 15:51:07 host NetworkManager[689]: <info> [1551214267.8599]device(lo): carrier: link connected Feb 26 15:51:07 host NetworkManager[689]: <info> [1551214267.8600] manager: (lo): new Genericdevice(/org/freedesktop/NetworkManager/Devices/1) Feb 26 15:51:07 host NetworkManager[689]: <info> [1551214267.8623] manager: (ens3): new Ethernetdevice(/org/freedesktop/NetworkManager/Devices/2) Feb 26 15:51:07 host NetworkManager[689]: <info> [1551214267.8653]device(ens3): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') /device
regex(7) and grep(1) man pages