Bookmark this page

Manipulating Text and Binary Files

Objectives

  • Differentiate between text and binary files.

Text and Binary Files Overview

Linux supports different types of files, such as directories, regular files, pipe files, and socket files. This course focuses on directories and regular files, which are the file types that regular users interact with most often. Regular files are text files or binary files. Although all files are stored in the system as a series of bytes, text and binary files are interpreted differently.

Text files follow standard character encoding schemes such as Unicode or ASCII to produce human-readable character sequences. You can view and edit text files by using widely available text editing tools. Binary files contain structured content that is not limited to readable character sets. Binary files include compiled executable programs and other file formats that can be interpreted only by programs that support those formats.

Listing Files

To view files in Linux, you can use the Files application, which is the default file manager in GNOME. The Files application is typically available as a shortcut in the Activities Overview, which shows your favorite and running applications. To open the Files application, click Activities > Files.

The following image shows the files in the Files application. The first time you log in to GNOME Display Manager (GDM), GNOME creates some directories to organize your files, such as the Downloads and Pictures directories.

Figure 3.1: GNOME file manager

On the command line, you can use the ls command to view your files. The ls command is similar to the dir command in the Microsoft Windows operating system. Both commands show the contents of a directory.

By default, the ls command shows only file and directory names, and it accepts options to display the contents in different formats. The -l option shows the directory entries as a long listing that includes permissions, ownership, time stamps, and size.

Note

To view command information and available options, use the --help option with the command.

[user@host ~]$ ls
 Desktop   Documents   Downloads   Essay.doc   Inventory.xlsx   Music  'My shopping list'   Pictures   Public   Templates   Videos
[user@host ~]$ ls -l
total 5948
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Desktop
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Documents
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Downloads
-rw-rw-r--. 1 user user    4662 Oct 10 14:52  Essay.docx
-rw-rw-r--. 1 user user 6077382 Oct 10 15:04  Inventory.xlsx
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Music
-rw-rw-r--. 1 user user      19 Oct 10 14:51 'My shopping list'
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Pictures
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Public
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Templates
drwxr-xr-x. 2 user user       6 Oct  9 22:03  Videos

The first character of the first column represents the file type. A hyphen (-) displays if the file is a regular file, and a lowercase d displays for a directory.

You can use the long listing option in combination with other options. For example, the -t option sorts files by the time they were modified, with the most recently modified files listed first. This option is useful for finding the files that you most recently edited. The -h option shows the file size information in a human-readable format, rather than in bytes.

[user@host ~]$ ls -lth
total 5.9M
-rw-rw-r--. 1 user user 5.8M Oct 10 15:04  Inventory.xlsx
-rw-rw-r--. 1 user user 4.6K Oct 10 14:52  Essay.docx
-rw-rw-r--. 1 user user   19 Oct 10 14:51 'My shopping list'
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Desktop
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Documents
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Downloads
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Music
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Pictures
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Public
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Templates
drwxr-xr-x. 2 user user    6 Oct  9 22:03  Videos

File Naming Conventions

In Linux, one best practice is to avoid using spaces in file names because the shell uses whitespace characters to separate the arguments of a command. Linux users often use the terminal to manipulate files, and manipulating files that contain a space in the name can cause problems.

For example, you can list the Inventory.xlsx file, but listing the My shopping list file with whitespace characters in the name returns an error.

[user@host ~]$ ls -l Inventory.xlsx
-rw-rw-r--. 1 user user 6077382 Oct 10 15:04 Inventory.xlsx
[user@host ~]$ ls -l My shopping list
ls: cannot access 'My': No such file or directory
ls: cannot access 'shopping': No such file or directory
ls: cannot access 'list': No such file or directory

The shell interprets My, shopping, and list as different arguments for the ls command, but those files do not exist. The following examples show how to correctly reference the My shopping list file by using different methods.

[user@host ~]$ ls -l My\ shopping\ list 1
-rw-rw-r--. 1 user user 19 Oct 10 14:51 'My shopping list'
[user@host ~]$ ls -l MyTab 2
[user@host ~]$ ls -l My\ shopping\ list
-rw-rw-r--. 1 user user 19 Oct 10 14:51 'My shopping list'
[user@host ~]$ ls -l 'My shopping list' 3
-rw-rw-r--. 1 user user 19 Oct 10 14:51 'My shopping list'

1

Escape the space character. The shell can turn off the special meaning of the white space as a word separator and instead interpret it as a standard character that is part of the file name. This is called escaping a special character. To escape a space, use the backslash character (\) before the space.

2

Use tab completion. You can start typing the file name and use tab completion to let the shell complete the file name for you. The shell automatically fills the required escaping characters. This method might not be helpful when files have similar names and you might still need to manually escape the space.

3

Enclose the file name in quotation marks. You can refer to a name that contains spaces by enclosing the full name in single or double quotation marks.

Hidden Files

File and directory names that start with a period character (.) are known as hidden files. Programs and applications store user-specific configuration in hidden files and directories.

However, the hidden file naming does not protect a file from being located, viewed, or modified. Instead, hidden naming prevents files from appearing in the default view of commands and graphical tools, so that you can focus on the regular files that you created. Commands and programs ignore hidden files unless you specifically include them. This behavior can help you to avoid accidentally modifying or deleting your personal configuration files.

When working with files in the Files application, you can view hidden files by clicking the application's menu at the upper right, and then selecting the Show Hidden Files checkbox.

Figure 3.2: Show hidden files

On the command line, you can include hidden files in the listing by using the ls command -a or --all option.

[user@host ~]$ ls -la
total 6012
drwx------. 18 user user    4096 Oct 10 15:04  .
drwxr-xr-x.  5 root root      53 Dec 14  2022  ..
drwxr-xr-x.  4 user user      27 Dec 14  2022  .ansible
-rw-------.  1 user user     151 Oct 10 14:31  .bash_history
-rw-r--r--.  1 user user      18 Aug  8  2022  .bash_logout
-rw-r--r--.  1 user user     141 Aug  8  2022  .bash_profile
...output omitted...

Identifying Files

In some operating systems, it is mandatory for files to have an extension, such as the .txt extension for a plain text file or the .exe extension for an executable program. However, Linux does not rely on file extensions to recognize and interact with files. Instead, programs read the file's header.

A file's header is a non-specific number of lines at the top of a file that act as a code. This code is known as the magic number of a file. This code is compared internally to a database of magic patterns to help determine the file type. You can use the file command to perform the necessary magic tests and to detect the types and formats of files.

For example, using the file command on a binary program file, such as the cat command file, returns ELF 64-bit LSB shared object as the description. An Executable and Linkable Format (ELF) file is a compiled binary program. The rest of the description shows technical details about how the file was compiled.

[user@host ~]$ file /bin/cat
/bin/cat: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=1e8fb43d197eddeaa361995a88dedb415f1ebead, stripped

Text files do not have magic numbers, so they do not match the entries in the magic database. In that case, Linux performs a series of tests to see whether the text file conforms to any of the character encoding standards. If the text file conforms to a character encoding set such as ASCII or UTF-8, then the file command prints the character set. Files that cannot be identified as either a known binary format or as a valid text file are shown as data files.

The following examples show the output of the file command when it is used with different types of files.

[user@host ~]$ file Essay.docx
Essay.docx: Microsoft Word 2007+
[user@host ~]$ file Downloads/
Downloads/: directory
[user@host ~]$ file 'My shopping list'
My shopping list: ASCII text
[user@host ~]$ file .bashrc
.bashrc: ASCII text

Text Files

Most files that you create by using command-line tools and editors are text files. The system's built-in dictionary is a text file, and so are configuration files and scripts.

Viewing Text Files

You can view the contents of text files by opening the file with a text editor. You can also view the contents of a text file on the command line by using command-line text utilities, such as the cat command. The following example shows the contents of the dictionary text file.

[user@host ~]$ cat dictionary
1080
10-point
10th
...output omitted...
zyzzyvas
ZZ
Zz
zZt
ZZZ

To view the contents of multiple files, add the file names to the cat command as arguments.

[user@host ~]$ cat file1
Hello World!
[user@host ~]$ cat file2
Introduction to file manipulation in Linux.
[user@host ~]$ cat file1 file2
Hello World!
Introduction to file manipulation in Linux.

The cat command shows the full content of the file, even for files that have thousands of lines of text. To view these large files, use the less command. Unlike the cat command, the less command paginates the output so that it adapts to the size of your terminal window.

Pagination stops at the end of the lines that are displayed in the window, and you must provide input to continue to the next lines. The less command also provides options for forward and backward file navigation. You can jump directly to any line using a line code, or search for a word or keystrokes at the beginning or the end of a file.

[user@host ~]$ less dictionary
1080
10-point
10th
...output omitted...
5th
6-point
6th
7-point
dictionary

Use the UpArrow key and the DownArrow key to scroll up and down. Press h for the help page and q to exit the command.

You can use the head command to view the first few lines of a file. The head command displays the first 10 lines of the file by default, but you can use the -n option to specify a different number of lines. The tail command displays the last lines of a specified file, the same way the head command returns the first lines.

[user@host ~]$ head -n 2 'My shopping list'
eggs
milk
[user@host ~]$ tail -n 5 dictionary
zyzzyvas
ZZ
Zz
zZt
ZZZ

In some scenarios, you might want to know how many lines or words are in a text file. You can use the wc command to view the number of lines, words, and characters in a file. Use the -l, -w, or -c options to display only the given number of lines, words, or characters.

[user@host ~]$ wc dictionary
 479826  479826 4953598 dictionary
[user@host ~]$ wc -l dictionary
 479826 dictionary
[user@host ~]$ wc -w dictionary
 479826 dictionary
[user@host ~]$ wc -c dictionary
 4953598 dictionary

Binary Files

Binary files are not human readable and are not edited by using text editor tools. Binary files are either compiled files or structured data files. Structured data files include graphics, sounds, databases, compressed archive data, and other file types.

For example, although a Microsoft Word document file (.docx) contains readable text, the document is a binary file. Inside the file, the document structure includes layout, formatting, and other control information that can be read and edited only by Microsoft Word-compatible programs. The binary format is the reason that you cannot open Microsoft Word files in a regular text editor.

A compiled file is an executable file that was converted from the source code written in a programming language into machine code. Machine code is a binary format that can be efficiently loaded and processed by the computer's central processing unit (CPU). Compiled files are typically smaller than their corresponding source code files.

You can use the file command to identify binary files. For example, the cat and ls commands are binary files. The following example shows the file information of the cat binary file. The ELF format is the standard binary format for compiled files on Linux.

[user@host ~]$ file /bin/cat
/bin/cat: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=c03f9125d32e5af4ca1852b7316746ee0fafc4ec, for GNU/Linux 3.2.0, stripped

Viewing Binary Files

You cannot directly view or edit binary files by using text tools, such as the cat or less commands. Attempting to use text tools on a binary file can display the file's binary content, which is rarely useful. Binary files are read and modified only by their supported programs.

If you open a binary file with a text viewer, you might cause the terminal window to stop functioning. In this situation, a terminal might stop displaying the characters that you type. The stty command can view and modify terminal settings, and can resolve this problem. You can use the stty command with the sane argument to reset the terminal's attributes to the default working values.

To restore terminal functionality, first exit the text tool that caused the terminal malfunction. If the tool uses pagination, press q to quit the tool, otherwise press Ctrl+C to interrupt and forcibly quit the tool. Next, press Enter to request a new command prompt. Ignore any error output that displays. If the terminal prints another command prompt, then your terminal is functional.

If the terminal remains non-functional, then type the stty sane command. If the terminal still does not print another command prompt, then close the malfunctioning terminal tab or window and start another terminal instead.

[user@host ~]$ cat /bin/cat
ELF�'@�@8@����@@@h�h������������HzHz��PzPzPzp�|||������DDS�td���P�td�j�j�j��Q�td
R�tdPzPzPz��/lib64/ld-linux-x86-64.so.2GNU�GNU���GNU��=~���a�Z���A_��z�1u����#9�
1
stty sane 2
[user@host ~]$ 3

1

If the terminal stops working, then you can no longer see what you type. Try q or Ctrl+C.

2

Press Enter, and then type stty sane. The characters that you type might not echo in the terminal.

3

Your command prompt should return. If not, then close the window and open another one.

References

file(1), magic(5), cat(1), less(1), head(1), tail(1), and wc(1) man pages

For more information about magic numbers, see the IBM support page What is a magic number? at https://www.ibm.com/support/pages/what-magic-number

Revision: rh104-9.1-3d1f2bc