LFCS – Archive and Compression

Jarret B

Well-Known Member
Staff member
Joined
May 22, 2017
Messages
339
Reaction score
369
Credits
11,689
Part of the LFCS exam covers archiving, compressing, unarchiving and uncompressing files. The ability to perform these on files is a useful tool for any Linux user.
This article will cover the use of seven commands to perform the actions needed for the archival and the compression of files.

TAR

TAR stands for Tape ARchival and comes from the days when data was written sequentially to a tape. The resulting single file is called a tarball. The data was written in blocks so a full block is used on the tape when writing the file. The same is still done on hard drives and the like so the resulting file will be divisible by the block size. To find the block size on your disks you can run the command: lsblk -o NAME,PHY-SeC. The resulting numbers for each device is the number of bytes per block. Usually, most devices are 512 bytes. In my case, each tarball will be a multiple of 512 bytes for my device.

NOTE: Keep in mind that the tarballs are a copy of one or more selected files archived into a single file. Space can be saved by archiving them into one file, but more space can be saved by using compression.

To archive files you use the command: tar -cvf filename-to-make files-to-archive. Let’s assume I want to back up my personal Documents folder to a file in my HOME folder called ‘Doc-bak.tar’. The command I would use would be:

tar -cvf $HOME/Doc-bak.tar $HOME/Documents/

The option ‘-c’ is used to create a new tarball. The option ‘-v’ can be used to specify verbose which will list the files and folders being placed into the archive. Finally, the ‘-f’ option is needed to create a file archive by specifying its name. You can also not use the dash in front of the options if you wish.
Now that we have archived files it is necessary to know how to unarchive those files.
The standard command of tar -xv $HOME/Doc-bak.tar will extract the files from the specified archive and place them into the current folder. The options are the same, but the ‘x’ is used to extract the contents. If you want to place the contents into a specific folder then the folder must exist and the option to specify the folder is ‘-C’.

tar -xv $HOME/Doc-bak.tar -C $HOME/Docs

Here the files will be extracted to the folder ‘Docs’ in my Home folder.
Since archiving does not save much space, if any, we need to look at compression utilities such as gzip and bzip2.

GZip

GZzip is a replacement for the older LZW compression utility for UNIX. GZip is intended for GNU systems and is technically named ‘GNU Zip’ or GZip for short. Depending on the file being compressed the compression ratio will vary.
GZip works by compressing a single file. It is possible to create a tar file and then compress the archived file from the tar command itself which will be demonstrated later.
The command for GZip is as follows:

gzip -options name-of-tarball

There are a few options which can be useful.

  • -k Keep the original file. If not used the original file is removed.
  • -t Test the compressed file.
  • -v Verbose listing for more information.
  • -1 to 9 A 1 is used to perform a fast compress, but does not compress well. A ‘9’ compresses the most, but takes longer. Use numbers in between for better outputs. Do not include a value to let the program use the default of 6.
The extension used should be ‘.gz’ to designate that GZip was used. When compressing a tarball the extension should be ‘.tar.gz’.
If I had a tarball named ‘Stories.tar’ and I wanted to compress it the command would be:

gzip Stories.tar

I must be in the current folder as the file I am compressing or I must specify its location on the command line. The resulting file would be ‘Stories.tar.gz’.
To uncompress a GZip file you would use the ‘gunzip’ program. The syntax is as follows:

gunzip -options name-of-gzip-file

Some of the options available which can be useful are:

  • -f Force overwrite if file exists.
  • -k Keep original file. If not used the original file is removed.
  • -t Test the compressed file.
The file, when uncompressed, is placed into the current folder.
Let’s say I wanted to uncompress the ‘$HOME/Stories.tar.gz’ file to a folder named ‘$HOME/Stories’. I would first create the folder ‘$HOME/Stories’, go to the folder and then run the command:

gunzip $HOME/Stories.tar.gz

The original ‘.gz’ file would be deleted and the ‘Stories’ folder would contain the file ‘Stories.tar’.

BZip2

BZip2 is another compression program which compresses a single file. It compresses better than GZip, but is slower to compress the file. BZip2 will decompress faster than Gunzip.
Some of the options available are the same as GZip as follows:

  • -k Keep the original file and not delete it.
  • -t Test the compressed file.
  • -v Verbose output when compressing.
Like GZip it is best to use BZip2 on a tarball. The resulting extension should be ‘.tar.bz2’. The command is as follows:

bzip2 -options file-to-be-compressed

If I have a file named ‘Stories.tar’ and I wish to compress it using BZip2 the command would be:

bzip2 Stories.tar

In a Terminal the command would have to be executed from the same folder as where the file ‘Stories.tar’ is located. Otherwise, I must specify the location on the command line. Once the new BZip2 file is created the original file is deleted by default. If you wish to keep the original file after the compression then use the option ‘-k’. The resulting file would be ‘Stories.tar.bz2’.
To uncompress the ‘.bz2’ file you use the program ‘bunzip2’. The syntax for bunzip2 is as follows:

bunzip2 -options file-to-be-uncompressed

Some of the options which can be used for ‘BUnzip2’ include the following:

  • -k Keep the original file after decompression.
  • -t Test the file.
  • -f Overwrite the uncompressed file if it exists.
As with ‘gunzip’ the file is uncompressed to the current path. To uncompress the file ‘Stories.tar.bz2’ I could use the following command:

bunzip2 -kf Stories.tar.bz2

The file ‘Stories.tar’ would be uncompressed into the current folder where the file ‘Stories.tar.bz2’ exists. The uncompress would occur and the original ‘.tar.bz2’ file would be kept and not deleted because of the ‘-k’ option. If the file ‘Stories.tar’ exists in the current folder it would be overwritten with the ‘-f’ option.

Using Tar to Compress Files

Tar can perform the ‘gzip’ and ‘bzip2’ compression from the ‘tar’ command.
To compress using ‘gzip’ use the option ‘-z’ and for ‘bzip2’ use the option ‘-j’.
For example, to create the ‘Doc-bak.tar’ archive from my ‘Documents’ folder and ‘gzip’ it the command would be:

tar -cvzf Doc-bak.tar.gz $HOME/Documents/

Basically, any ‘tar’ command which works fine will only need the option ‘z’ or ‘j’ added to perform the compression.
When uncompressing the file the command would be:
tar -xvzf Doc-bak.tar.gz

The files will be placed into the original folder from which they were archived since the absolute path is stored with the archive only if the file is created by specifying the path to the files.
If you are in the folder where the files reside which are to be archived then the paths are not stored:

tar -cvzf Doc-bak.tar.gz *

Try to archive and extract files to get familiar with using the ‘tar’ command. Once you have become at ease using ‘tar’ then start using ‘gzip’ and ‘bzip2’ to familiarize yourself with these commands.
 

Members online


Top