Bookmark this page

Chapter 4.  Archive and Transfer Files

Abstract

Goal

Archive and copy files from one system to another.

Objectives
  • Archive files and directories into a compressed file with tar, and extract the contents of an existing tar archive.

  • Transfer files to or from a remote system securely with SSH.

  • Efficiently and securely synchronize the contents of a local file or directory with a remote server copy.

Sections
  • Manage Compressed tar Archives (and Guided Exercise)

  • Transfer Files Between Systems Securely (and Guided Exercise)

  • Synchronize Files Between Systems Securely (and Guided Exercise)

Lab
  • Archive and Transfer Files

Manage Compressed tar Archives

Objectives

  • Archive files and directories into a compressed file with tar, and extract the contents of an existing tar archive.

Create Archives from the Command Line

An archive is a single regular file or device file that contains multiple files. The device file could be a tape drive, flash drive, or other removable media. When using a regular file, archiving is analogous to the zip utility and similar variations that are popular on most operating systems.

Note

The original, ubiquitous zip compression and file packaging utility uses the PKZIP (Phil Katz's ZIP for MSDOS systems) algorithm, which has evolved significantly, and is supported on RHEL with the zip and unzip commands. Many other compression algorithms have been developed since zip was introduced, and each has its advantages. For creating compressed archives for general use, any tar-supported compression algorithm is acceptable.

Archive files are used to create manageable personal backups, or to simplify transferring a set of files across a network when other methods, such as rsync, are unavailable or might be more complex. Archive files can be created with or without using compression to reduce the archive file size.

On Linux, the tar utility is the common command to create, manage, and extract archives. Use the tar command to gather multiple files into a single archive file. A tar archive is a structured sequence of file metadata and data with an index so you can extract individual files.

Files can be compressed during creation by using one of the supported compression algorithms. The tar command can list the contents of an archive without extracting, and can extract original files directly from both compressed and uncompressed archives.

Options of the tar Utility

One of the following tar command actions is required to perform a tar operation:

  • -c or --create : Create an archive file.

  • -t or --list : List the contents of an archive.

  • -x or --extract : Extract an archive.

The following tar command general options are often included:

  • -v or --verbose : Show the files that are being archived or extracted during the tar operation.

  • -f or --file : Follow this option with the archive file name to create or open.

  • -p or --preserve-permissions : Preserve the original file permissions when extracting.

  • --xattrs : Enable extended attribute support, and store extended file attributes.

  • --selinux : Enable SELinux context support, and store SELinux file contexts.

The following tar command compression options are used to select an algorithm:

  • -a or --auto-compress : Use the archive's suffix to determine the algorithm to use.

  • -z or --gzip : Use the gzip compression algorithm, which results in a .tar.gz suffix.

  • -j or --bzip2 : Use the bzip2 compression algorithm, which results in a .tar.bz2 suffix.

  • -J or --xz : Use the xz compression algorithm, which results in a .tar.xz suffix.

Note

The tar command still supports the legacy option style that does not use a dash (-) character. You might find this syntax in legacy scripts or documentation, and the behavior is essentially the same. For command consistency, Red Hat recommends using the short- or long-option styles instead.

Create an Archive

To create an archive with the tar command, use the create and file options with the archive file name as the first argument, followed by a list of files and directories to include in the archive.

The tar command recognizes absolute and relative file name syntax. By default, tar removes the leading forward slash (/) character from absolute file names, so that files are stored internally with relative path names. This technique is safer, because extracting absolute path names always overwrites existing files. With files that are archived with relative path names, files can be extracted to a new directory without overwriting existing files.

The following command creates the mybackup.tar archive to contain the myapp1.log, myapp2.log, and myapp2.log files from the user's home directory. If a file with the same name as the requested archive exists in the target directory, then the tar command overwrites the file.

[user@host ~]$ tar -cf mybackup.tar myapp1.log myapp2.log myapp3.log
[user@host ~]$ ls mybackup.tar
mybackup.tar

A user must have read permissions on the target files that are being archived. For example, creating an archive in the /etc directory requires root privileges, because only privileged users can read all /etc files. An unprivileged user can create an archive of the /etc directory, but the archive excludes files that the user cannot read, and directories for which the user lacks the read and execute permissions.

In this example, the root user creates the /root/etc-backup.tar archive of the /etc directory.

[root@host ~]# tar -cf /root/etc-backup.tar /etc
tar: Removing leading `/' from member names

Important

Extended file attributes, such as access control lists (ACL) and SELinux file contexts, are not preserved by default in an archive. Use the --acls, --selinux, and --xattrs options to include POSIX ACLs, SELinux file contexts, and other extended attributes, respectively.

List Archive Contents

Use the tar command t option to list the file names from within the archive that are specified with the f option. The files list with relative name syntax, because the leading forward slash was removed during archive creation.

[root@host ~]# tar -tf /root/etc.tar
etc/
etc/fstab
etc/crypttab
etc/mtab
...output omitted...

Extract Archive Contents

Extract a tar archive into an empty directory to avoid overwriting existing files. When the root user extracts an archive, the extracted files preserve the original user and group ownership. If a regular user extracts files, then the user becomes the owner of the extracted files.

List the contents of the /root/etc.tar archive and then extract its files to the /root/etcbackup directory:

[root@host ~]# mkdir /root/etcbackup
[root@host ~]# cd /root/etcbackup
[root@host etcbackup]# tar -tf /root/etc.tar
etc/
etc/fstab
etc/crypttab
etc/mtab
...output omitted...
[root@host etcbackup]# tar -xf /root/etc.tar

When you extract files from an archive, the current umask is used to modify each extracted file's permissions. Instead, use the tar command p option to preserve the original archived permissions for extracted files. The --preserve-permissions option is enabled by default for a superuser.

[user@host scripts]# tar -xpf /home/user/myscripts.tar
...output omitted...

Create a Compressed Archive

The tar command supports these compression methods, and others:

  • gzip compression is the earlier, fastest method, and is widely available across platforms.

  • bzip2 compression creates smaller archives but is less widely available than gzip.

  • xz compression is newer, and offers the best compression ratio of the available methods.

The effectiveness of any compression algorithm depends on the type of data that is compressed. Previously compressed data files, such as picture formats or RPM files, typically do not significantly compress further.

Create the /root/etcbackup.tar.gz archive with gzip compression from the contents of the /etc directory:

[root@host ~]# tar -czf /root/etcbackup.tar.gz /etc
tar: Removing leading `/' from member names

Create the /root/logbackup.tar.bz2 archive with bzip2 compression from the contents of the /var/log directory:

[root@host ~]$ tar -cjf /root/logbackup.tar.bz2 /var/log
tar: Removing leading `/' from member names

Create the /root/sshconfig.tar.xz archive with xz compression from the contents of the /etc/ssh directory:

[root@host ~]$ tar -cJf /root/sshconfig.tar.xz /etc/ssh
tar: Removing leading `/' from member names

After creating an archive, verify its table of contents with the tar command tf options. It is not necessary to specify the compression option when listing a compressed archive file, because the compression type is read from the archive's header. List the archived content in the /root/etcbackup.tar.gz file, which uses the gzip compression:

[root@host ~]# tar -tf /root/etcbackup.tar.gz
etc/
etc/fstab
etc/crypttab
etc/mtab
...output omitted...

Extract Compressed Archive Contents

The tar command can automatically determine which compression was used, so it is not necessary to specify the compression option. If you do include an incorrect compression type, tar reports that the specified compression type does not match the file's type. In the following example, the tar command uses the -z option, which indicates gzip compression, but the file name extension is .xz, which indicates xz compression:

[root@host ~]# tar -xzf /root/etcbackup.tar.xz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Listing a compressed tar archive works in the same way as listing an uncompressed tar archive. Use the tar command with the tf option to verify the content of the compressed archive before extracting its contents:

[root@host logbackup]# tar -tf /root/logbackup.tar
var/log/
var/log/lastlog
var/log/README
var/log/private/
...output omitted...

The gzip, bzip2, and xz algorithms are also implemented as stand-alone commands for compressing individual files without creating an archive. With these commands, you cannot create a single compressed file of multiple files, such as a directory. As previously discussed, to create a compressed archive of multiple files, use the tar command with your preferred compression option. To uncompress a single compressed file or a compressed archive file without extracting its contents, use the gunzip, bunzip2, and unxz stand-alone commands.

The gzip and xz commands provide an -l option to view the uncompressed size of a compressed single or archive file. Use this option to verify that enough space is available before uncompressing or extracting a file.

[user@host ~]$ gzip -l file.tar.gz
         compressed        uncompressed  ratio uncompressed_name
          221603125           303841280  27.1% file.tar
[user@host ~]$ xz -l file.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1    195.7 MiB    289.8 MiB  0.675  CRC64   file.xz

References

tar(1), gzip(1), gunzip(1), bzip2(1), bunzip2(1), xz(1), and unxz(1) man pages

Revision: rh134-9.3-5fd2368