How to Gzip Large (100GB+) Files Faster in Linux

Linux users and system administrators will never fail to cross paths with file management routines. As the Linux system, programs, and user files grow from Megabytes to Gigabytes, there is always the need to zip or compress some of your OS-bound files.

[ You might also like: How to Create a Large 1GB or 10GB File in Linux ]

The advantages of zipping or compressing these files are as follows:

  • You get to save and create extra storage space on your Linux machine.
  • Since Linux will have extra storage space to work with, its efficiency will improve.
  • Compressed files are easier to transfer to other machine environments and systems.
  • Zipping or compressing your files provides a data encryption advantage.

Most Linux users are familiar with Gzip as an effective means of compressing large files or creating zip files from slightly larger file sizes. Gzip comes pre-installed in almost all Linux operating system distributions.

You can check for its availability on your Linux system with the following command:

$ gzip --version
Check Gzip Version in Linux
Check Gzip Version in Linux

How to Compress File Using Gzip in Linux

To compress a simple file with Gzip, you only need to run a command similar to the following:

$ gzip linuxshelltips_v2.txt

You should be on the same path with the file that needs compressing or zipping when referencing the above gzip command.

How to Compress Large Files Using Gzip in Linux

Using Gzip for compressing files on Linux is fast and efficient until you start dealing with large files. Compressing 100GB+ files through Gzip takes unnecessary hours for the whole compression process to successfully complete even when underperforming machine hardware with minimum CPU specs like Core (TM) i3-2350M CPU @2.30 GHz. What if you do not have this much time on your hands?

Gzip is one-task bound meaning it can perfectly execute a single file compression job at a time. It only deals with files and will ignore compressing directories.

Despite such limitations, applications don’t need to extract gzip-compressed files before reading them. However, its dependence on a single processor core makes large file compression take hours to complete.

Gzip Alternatives for High Compression in Linux

While this article’s title addresses the need for speedily zipping large files with high compression through Gzip, we had to address the drawbacks of Gzip for you to embrace the better alternatives to it.

Pigz – Compress 100GB+ Files with High Compression

Unlike Gzip which is single-core oriented, Pigz is multi-core oriented and will also compress your targeted files to a “.gz” file extension.

It has the advantage of improved compression time when dealing with large file sizes like 100GB+. Think of Pigz as a multi-thread Gzip version.

Install Pigz in Linux

Reference one of the following Pigz installation commands in relation to your Linux operating system distribution.

$ sudo apt-get install pigz     [On Debian, Ubuntu and Mint]
$ sudo yum install pigz         [On RHEL/CentOS/Fedora and Rocky Linux/AlmaLinux]
$ sudo emerge -a sys-apps/pigz  [On Gentoo Linux]
$ sudo pacman -S pigz           [On Arch Linux]
$ sudo zypper install pigz      [On OpenSUSE]    
Compressing 100GB+ Files with Pigz

Consider the following 100GB+ file statistics:

$ stat LinuxShellTipsBackup.iso
Check Size of File in Linux
Check Size of File in Linux

Since this file (LinuxShellTipsBackup.iso) meets the criterion of being 100GB+, i.e. 164GB, we should try compressing it with Pigz.

The command usage should be similar to Gzip’s.

$ Pigz -9 -k -p4 LinuxShellTipsBackup.iso

The command options:

  • -9: Provides the best compression (High compression).
  • -k: Retains the original file.
  • -p4: Tells Pigz to use 4 processor cores since it’s multi-core oriented.

More processor cores make the compression process faster. The number of processor cores you choose to use should depend on the processor properties of your machine e.g. Core i3, Core i5, Core i7.

The resulting file compression size is 156GB from the original 164GB. If we decide not to keep the original file, we will have 8GB (164B-156GB) of extra storage.

Another good thing is that you can still open and navigate through the archived files without necessarily extracting them.

$ stat LinuxShellTipsBackup.iso.gz
Verify Size of File in Linux
Verify Size of File in Linux

To decompress or extract your files, use either of the following commands:

$ unpigz LinuxShellTipsBackup.iso.gz
or
$ pigz -d LinuxShellTipsBackup.iso.gz

Pigz vs Gzip Compression Speed Comparison

Let us compare their compression speed with a slightly smaller file.

For Pigz,

$ time pigz file.mp4
Compress File with Pigz
Compress File with Pigz

For Gzip,

$ time gzip file.mp4

.

Compress File with Gzip
Compress File with Gzip

Pigz wins with faster compression time even without specifying the number of processor cores to use.

The notable trick of zipping large files (100GB+) is to make sure the zipping application you are using supports multi-core or multi-thread processing. Such programs (e.g Pigz) limit or reduce the bottleneck effect associated with the compression of large file sizes.

Got something to say? Join the discussion.