A project of mine involves extracting files from .tar.gz and .zip archives as Python streams.
As this is Python and the modules do similar things, you might expect they have similar interfaces. Or at least consistent interfaces.
Unfortunately they are annoyingly different. Consider:
|open an archive||
|get a list of members of the archive||
|get name of a member||
|get size of a member||
|extract an archive member (create a file on the hard disk)||
|get a member as a file-like Python object||
There’s some more catches. If you opened the ZIP archive with
ZipFile(zipfilename) and want to extract more than one member, each extraction will open and close the ZIP file separately, so use
with open(zipfilename) as zipfp: ZipFile(zipfp) instead.
Also, for tar archives, in Python 3, the result of
opened_tarfile.extractfile() inherits from BufferedReader and so supports a context manager. In Python 2 it inherits
read() itself, and doesn’t include
__exit__() required to support a context manager. Extracting members out of ZIP archives with
opened_zipfile.open() gets a context manager-capable object since Python 2.7.
I understand what happened here: tarfile was added in 2.3 and was different (and better) than zipfile because tarfile’s author
didn’t like [zipfile] interface very much. New things were added to zipfile in 2.6 and again in 2.7, and tarfile was improved in Python 3. But that doesn’t make it any less annoying to write code that works with both tarfile and zipfile on both Python 2 and Python 3. We’re stuck with two frustratingly different interfaces for very similar tasks for a while.