Bookmark this page

Chapter 12. Archiving and Copying Files Between Systems

Abstract

Goal To archive and copy files from one system to another.
Objectives

  • Use tar to create new compressed archive files and extract files from existing archive files.

  • Copy files securely to or from a remote system running sshd.

  • Securely synchronize the contents of a local file or directory with a remote copy.

Sections
  • Managing Compressed tar Archives (and Practice)

  • Copying Files Between Systems Securely (and Practice)

  • Synchronizing Files Between Systems Securely (and Practice)

Lab
  • Archiving and Copying Files Between Systems

Managing Compressed tar Archives

The tar command provides a set of different compression methods to archive files and restore them from an archive.

Objective

After completing this section, students should be able to use tar to create new compressed archive files and extract files from existing archive files.

Archives and compression

What is tar?

Archiving and compressing files are useful when creating backups and transferring data across a network. One of the oldest and most common commands for creating and working with backup archives is the tar command.

With tar, users can gather large sets of files into a single file (archive). The archive can be compressed using gzip, bzip2, or xz compression.

The tar command can list the contents of archives or extract their files to the current system. Examples of how to use the tar command are included in this section.

Operate the tar command

To use the tar command, one of the three following actions is required:

  • c (create an archive)

  • t (list the contents of an archive)

  • x (extract an archive)

Commonly used options are:

  • f file name (file name of the archive to operate on)

  • v (verbosity; useful to see which files get added to or extracted from the archive)

Note

A leading - is not required for tar options.

Archive files and directories with tar

Before creating a tar archive, verify that there is no other archive in the directory with the same name as the new archive to be created. The tar command will overwrite an existing archive without any feedback.

The first option to use when creating a new archive is the c option, followed by the f option, then a single space, then the file name of the archive to be created, and finally the list of files and directories that should get added to the archive. The archive is created in the current directory unless specified otherwise.

In the following example, an archive named archive.tar is created with the contents of file1, file2, and file3 in the user's home directory.

[user@host ~]# tar cf archive.tar file1 file2 file3
[user@host ~]# ls archive.tar
archive.tar

Note

When archiving files by absolute path names, the leading / of the path is removed from the file name by default. This helps avoid mistakes which could cause important files to be overwritten. Files are normally extracted relative to the current working directory of the tar command.

For tar to be able to archive the selected files, it is mandatory that the user executing the tar command is able to read the file(s). For example, creating a new archive of the /etc folder and all of its content requires root privileges, because only root is allowed to read all of the files there. An unprivileged user could create an archive of the /etc folder, but the archive would omit files which do not include read permission for the user and it would omit directories which do not include both read and execute permission for the user.

Create the tar archive /root/etc.tar with the /etc directory as content as user root:

[root@host ~]# tar cf /root/etc.tar /etc
tar: Removing leading `/' from member names
[root@host ~]# 

Important

While tar stores ownership and permissions of the files, there are other attributes that are not stored in the tar archive by default, such as the SELinux context and ACLs. To store those extended attributes in the tar archive, the --xattrs option is required when creating an archive.

List contents of a tar archive

To list the content of an archive, the t and f options, accompanied by the archive to operate, are required.

List the content of the archive /root/etc.tar:

[root@host ~]# tar tf /root/etc.tar
etc/
etc/fstab
etc/crypttab
etc/mtab
...

Extract an archive created with tar

A tar archive should normally be extracted in an empty directory to ensure it does not overwrite any existing files. If files are extracted by root, tar attempts to preserve the original user and group ownership of the files. If a regular user extracts files using tar, the extracted files are owned by that user.

Extract the archive /root/etc.tar to the /root/etcbackup directory:

[root@host ~]# mkdir /root/etcbackup
[root@host ~]# cd /root/etcbackup
[root@host etcbackup]# tar xf /root/etc.tar

By default, when files get extracted from an archive, the umask is subtracted from the permissions of archive content. This is a security measure and prevents extracted regular files from receiving execute permissions by default. To preserve the permissions of an archived file, the p option is to be used when extracting an archive.

Extract the archive /root/myscripts.tar to the /root/scripts directory while preserving the permissions of the extracted files:

[root@host ~]# mkdir /root/scripts
[root@host ~]# cd /root/scripts
[root@host scripts]# tar xpf /root/myscripts.tar

Create a compressed tar archive

There are three different compression methods supported by the tar command. The gzip compression is the fastest and oldest one, and is most widely available. The bzip2 compression usually leads to smaller archive files compared to gzip and is less widely available than gzip, while the xz compression method is relatively new, but usually offers the best compression ratio of the methods available.

Note

The effectiveness of any compression algorithm depends on the exact nature of the data being compressed. Data files that are already compressed, such as compressed picture formats or rpm files, usually lead to a low compression ratio.

It is good practice to use a single top-level directory, which can contain other directories and files, to simplify extraction of the files in an organized way.

To create a compressed tar archive, one of the following tar options can be specified:

  • z for gzip compression (filename.tar.gz or filename.tgz)

  • j for bzip2 compression (filename.tar.bz2)

  • J for xz compression (filename.tar.xz)

Create (c option) a gzip-compressed (z option) tar archive /root/etcbackup.tar.gz of the /etc directory on serverX:

[root@serverX ~]$ tar czf /root/etcbackup.tar.gz /etc

Create (c option) a bzip2-compressed (j option) tar archive /root/logbackup.tar.bz2 of the /var/log directory on serverX:

[root@serverX ~]$ tar cjf /root/logbackup.tar.bz2 /var/log

Create (c option) a xz-compressed (J option) tar archive /root/sshconfig.tar.bz2 of the /etc/ssh directory on serverX:

[root@serverX ~]$ tar cJf /root/sshconfig.tar.xz /etc/ssh

Extract a compressed tar archive

The first step when extracting a compressed tar archive is to determine where the archived files should be extracted to, then create and change to the target directory. To successfully extract the archive, it is usually not necessary to use the same compression option used when creating the archive, as the tar command will determine which compression was used. It is valid to add the decompression method to the tar options as follows:

Extract (x option) the contents of a gzip-compressed (z option) tar archive named /root/etcbackup.tar.gz to the directory /tmp/etcbackup:

[root@serverX ~]$ mkdir /tmp/etcbackup
[root@serverX ~]$ cd /tmp/etcbackup
[root@serverX etcbackup]$ tar xzf /root/etcbackup.tar.gz

Extract (x option) the contents of a bzip2-compressed (j option) tar archive named /root/logbackup.tar.bz2 to the directory /tmp/logbackup:

[root@serverX ~]$ mkdir /tmp/logbackup
[root@serverX ~]$ cd /tmp/logbackup
[root@serverX logbackup]# tar xjf /root/logbackup.tar.bz2

Extract (x option) the contents of a xz-compressed (J option) tar archive named /root/sshbackup.tar.xz to the directory /tmp/sshbackup:

[root@serverX ~]$ mkdir /tmp/sshbackup
[root@serverX ~]$ cd /tmp/sshbackup
[root@serverX sshbackup]# tar xJf /root/sshbackup.tar.xz

Note

Listing a compressed tar archive works in the same way as listing an uncompressed tar archive.

Note

Additionally, gzip, bzip2, and xz can be used independently to compress single files. For example, gzip etc.tar results in the compressed file etc.tar.gz, while bzip2 abc.tar results in the compressed file abc.tar.bz2 and xz myarchive.tar results in the compressed file myarchive.tar.xz.

The corresponding decompress commands are gunzip, bunzip2, and unxz. For example, gunzip /tmp/etc.tar.gz results in the uncompressed tar file etc.tar, while bunzip2 abc.tar.bz2 results in the uncompressed tar file abc.tar and unxz myarchive.tar.xz results in the uncompressed tar file myarchive.tar.

Overview of tar options

The tar command has many options to use. The following table lists some common options and their meanings.

Table 12.1. Overview of tar options

Option Meaning
c

Create a new archive.

x

Extract from an existing archive.

t

List the contents of an archive.

v

Verbose; shows which files get archived or extracted.

f

File name; this option needs to be followed by the file name of the archive to use/create.

p

Preserve the permissions of files and directories when extracting an archive, without subtracting the umask.

z

Use gzip compression (.tar.gz).

j

Use bzip2 compression (.tar.bz2). bzip2 typically achieves a better compression ratio than gzip.

J

Use xz compression (.tar.xz). xz typically achieves a better compression ratio than bzip2.


References

tar(1), gzip(1), gunzip(1), bzip2(1), bunzip2(1), xz(1), unxz(1) man pages

Revision: rh124-7-1b00421