A simple and highly customizable way to read a load of files is provided by the tape backup utility
tar. You can tell it how to handle the various intricacies listed above and it will then recursively read a load of files and write them in a single stream to its output or to a file.
[code]
Common tar options:
-c combine files into an archive
-x extract files from archive
-f <file> set archive filename (default is standard input/output)
-t list names of files in archive
-z, -j, -J use gzip / bzip2 / lzma (de)compression
-v list names of files processed
-C <path> set current working directory to this path before proceeding
[/code]
[code language=”bash”]tar -cf output.tar file1 file2 …[/code]
[code language=”bash”]tar -xf input.tar[/code]
By writing to the standard output, we can pass this archive through a stream compressor, e.g. gzip, bzip2.
[code language=”bash”]
tar -c file1 file2 … | gzip -c > output.tar.gz
[/code]
As this is a common use of tar, the most common compressors can also be specified as flags to tar rather than via a pipeline:
Archive and compress:
[code language=”bash”]
tar -czf output.tar.bz2 file1 file2 …
tar -cjf output.tar.bz2 file1 file2 …
tar -cJf output.tar.xz file1 file2 …
[/code]
Decompress and extract
[code language=”bash”]
tar -xzf input.tar.bz2
tar -xjf input.tar.bz2
tar -xJf input.tar.xz
[/code]
Tar streams can be transferred over networks to a destination computer, where a second tar instance is run. This second one receives the archive stream from the first tar instance and extracts the files onto the destination computer. This usage of two tar instances over a pipeline has resulted in the technique being nicknamed the “tar-pipe”.
Where network speed is the bottleneck, tar can be instructed to (de)compress the streams on the fly, and offers a choice of codecs. Note that due to the pipelined nature of this operation, any other streaming (de)compressors can also be used even if not supported by tar.
Tar-pipe examples
In its simplest form, to copy one folder tree to another:
[code language=”bash”]tar -C source/ -c . | tar -C dest/ -x[/code]
One could specify the -h parameter for the left-side tar, to have it follow symbolic links and build a link-free copy of the source in the destination, e.g. for sharing the tree with Windows users.
To copy the files over a network, simply wrap the second tar in an SSH call:
[code language=”bash”]tar -C source/ -c . | ssh user@host ‘tar -C dest/ -x'[/code]
To copy from a remote machine, put the first tar in an SSH call instead:
[code language=”bash”]ssh user@host ‘tar -C source/ -c .’ | tar -C dest/ -x[/code]
SSH provides authentication and encryption, so this form can be used over insecure networks such as the internet. The SCP utility uses SSH internally. SSH can also provide transparent compression, but the options provided by tar will generally be more useful.