Context
I often face the situation that, when transferring files to a remote server, it is much more convenient to send the files as one archive, rather than all of them individually.
There are many ways to accomplish this but my favorite solution is to use zip
. As long as both the sending and receiving side come with some flavor of Unix (or Cygwin / Git Bash under Windows), the zip
utility is pretty much guaranteed to be available. And, particularly relevant to me, zip
also comes pre-installed on the Linux VMs underlying the Databricks Cluster nodes on MS Azure, whereas tar
is not directly available.
This post is intended to just serve as a quick reference on the usage of the unix zip
command.
Usage of the zip
Command
# Zip up a directory
zip –r archive.zip directory_to_zip
# Unzip archive into the current directory
unzip archive.zip
# Delete some files from an existing archive
zip –d archive.zip unwanted_file.txt
Bonus: Using the Python zipfile
module
Python's standard library comes with the zipfile module which can be used to handle operations on .zip
files directly from within Python. For one-off tasks I personally prefer to use the command line. But when working with many zip files it would make sense to automate the process using Python. Here is a quick pointer how this can be done:
import zipfile
with zipfile.ZipFile("archive.zip","r") as zip_file:
zip_file.extractall("target_directory")
While zipfile
is purpose-built for handling .zip
files, the shutil
module Link has options for packing and unpackaging archives as well and can be preferable in certain situations as it can handle multiple types of archives and is able to automatically detect the correct type and compression format from the file extension. (See Stackoverflow)
# Make archive
import shutil
shutil.make_archive('archive.zip', 'zip', 'directory_to_zip')
# Unpack an archive
import shutil
shutil.unpack_archive('archive.zip', 'destination_directory')
# Note: Can use with `pathlib.Path` objects instead of strings
Bonus: How to use tar
instead
On some more exotic systems like Alpine Linux - which powers iSH, a Linux Shell for iOS - however zip
doesn't come pre-installed but tar
does.
So for completeness here is how we can package and unpackage a collection of files with tar
if zip is not available or we if are faced with a tar
file from another source:
# Create a tar archive from a directory
tar -cf archive_name.tar directory_to_archive
# c: create
# f: filename of the new tar archive
# Unpack a tar archive
tar -xf archive_name.tar
# x: extract
Tar files by default don't use any compression. They are simply a way to package multiple files together. If we want to add compression, we can do so using a compression algorithm with gzip
or bzip2
:
# Optional: Create a tar file and compress it to save space
# Using gzip
tar -cf archive.tar.gz directory_to_archive
gzip archive.tar.gz
# Using bzip2 (slower but higher compression)
tar -cf archive.tar.bz
bzip2 archive.tar.bz
# Uncompress and unpack the archive
gunzip archive.tar.gz
tar -xf archive.tar.gz
bunzip2 archive.tar.bz2
tar -xf archive.tar.bz2
When I started writing this post my intention was to only provide a quick reference on how to use the zip
/unzip
unix commands.
Now it turned out to cover a few more approaches. I am happy to have them all here in one place for reference. That said, for simple transfers zip
remains my go-to solution.
Reference / Further Details
zipfile
Python Docs- Computer Hope – Unix
zip
command - Computer Hope – Unix
tar
command - Usage of
zip
command - Usage of Unix
tar
command