Pro.Zip — API for parsing Zip archives

Overview

The Pro.Zip module contains the API for parsing Zip images.

Parsing a Zip Archive

from Pro.Core import *
from Pro.Zip import *

def parseZipArchive(fname):
    c = createContainerFromFile(fname)
    if c.isNull():
        return
    obj = ZipObject()
    if not obj.Load(c) or not obj.Initialize():
        return
    n = obj.GetEntryCount()
    for i in range(n):
        entry = obj.GetEntry(i)
        print("name:", entry.Str("FileName"))
        print("compressed size:", entry.Uns("CompressedSize"))
        print("uncompressed size:", entry.Uns("UncompressedSize"))
        # extract data
        data = obj.Extract(entry)

Module API

Pro.Zip module API.

Classes:

CentralDirectoryData()

This class contains the central directory data of a Zip archive.

FileHeader64Data()

This class represents the file header for an individual entry within a Zip archive.

ZipObject()

This class represents a Zip archive.

Attributes:

ZIP_MAX_FILE_ENTRIES

Maximum number for file entries in a Zip file.

class CentralDirectoryData

This class contains the central directory data of a Zip archive.

See also ZipObject.GetCentralDirectoryData().

Attributes:

central_directory_disk

The disk number where the central directory starts.

disk

The number of the disk on which the central directory record of this archive is located.

offset

The offset of the start of the central directory on the disk on which the central directory starts.

size

The size of the central directory.

total_disk_entries

The total number of entries in the central directory on this disk.

total_entries

The total number of entries in the central directory for the entire Zip archive, across all disks.

central_directory_disk

The disk number where the central directory starts. This is particularly relevant for Zip archives spanning multiple disks, allowing the system to locate the beginning of the central directory. In single-disk archives, this field is usually set to zero.

disk

The number of the disk on which the central directory record of this archive is located. In the context of multi-disk Zip archives, this field is essential to identify the specific disk containing the central directory entry for a file. For single-disk Zip archives, this will typically be zero.

offset

The offset of the start of the central directory on the disk on which the central directory starts. It is essential for locating the central directory in the archive, especially in the context of multi-disk or large archives where the central directory may not start at the beginning of the disk or file.

size

The size of the central directory.

total_disk_entries

The total number of entries in the central directory on this disk. For multi-disk archives, it indicates the number of entries within the central directory that are on the current disk. For a single-disk archive, it would typically match the total_entries field.

total_entries

The total number of entries in the central directory for the entire Zip archive, across all disks. It provides a count of all individual entries, giving an overview of all the files and directories included in the Zip file.

class FileHeader64Data

This class represents the file header for an individual entry within a Zip archive.

See also ZipObject.GetFileHeader64Data().

Attributes:

CompressedSize

The size of the compressed file within the Zip archive.

Disk

The disk number on which this file entry is located, used in multi-disk Zip archives.

LocalHeaderOffset

The offset within the archive at which the local header of the file begins.

UncompressedSize

The size of the file after decompression.

CompressedSize

The size of the compressed file within the Zip archive.

See also UncompressedSize.

Disk

The disk number on which this file entry is located, used in multi-disk Zip archives.

LocalHeaderOffset

The offset within the archive at which the local header of the file begins. The local header contains specific information about the file, including its name, compression method, etc.

UncompressedSize

The size of the file after decompression.

See also CompressedSize.

ZIP_MAX_FILE_ENTRIES: Final[int]

Maximum number for file entries in a Zip file.

See also ZipObject.RetrieveEntries().

class ZipObject

Bases: Pro.Core.CFFObject

This class represents a Zip archive.

Methods:

CentralDirectoryEnd()

Retrieves the structure of the central directory in both Zip32 and Zip64 archives.

CentralDirectoryEnd32()

Retrieves the structure of the central directory in a Zip archive.

CentralDirectoryEnd64(offset)

Retrieves the structure of the central directory in a Zip64 archive.

CentralDirectoryEndLocator64(offset)

Finds the locator of the central directory in a Zip64 archive.

CentralDirectoryEndOffset32()

Finds the central directory in a Zip32 archive.

CentralDirectoryOffset()

Finds the central directory in both Zip32 and Zip64 archives.

DumpExtraField(fh, out)

Outputs to text the extra field in a file header.

DumpFileHeader(s, out)

Outputs to text a file header.

DumpHeaders(out)

Outputs to text the central directory data.

ExtraFieldBuffer(fh)

Retrieves as a buffer the extra field in file header.

Extract(fh)

Extracts the data of a file, applying the necessary decompression.

ExtractTo(fh, dst)

Extracts the data of a file to a specified input container, applying the necessary decompression.

FileHeader(offset)

Retrieves a file header structure from the offset of a file header.

FileHeaders()

Retrieves the array of file headers of the central directory.

FindEntry(name)

Looks up a file entry by name.

GetCentralDirectoryData(data)

Retrieves the data of the central directory.

GetCompressedData(fh)

Retrieves the compressed data of a file entry.

GetCompressionMethodName(comptype)

Returns the name of a compression method by its value.

GetEntries()

Retrieves the internally stored entries.

GetEntry(i)

Retrieves a file entry.

GetEntryCount()

Returns the number of file entries.

GetEntryName(i)

Retrieves the name of a file entry.

GetEntryOffset(i)

Retrieves the offset of a file entry.

GetExtraFieldDescr(hid)

Retrieves a description for an extra field.

GetFileHeader64Data(fh, data)

Extracts the data from a file header structure.

HasEntry(name)

Checks if a file entry exists.

RetrieveEntries([max_entries, parse_corrupted])

Find the file entries in a Zip archive.

SetEntries(entries)

Sets the internal file entries.

CentralDirectoryEnd()Pro.Core.CFFStruct

Retrieves the structure of the central directory in both Zip32 and Zip64 archives.

Returns

Returns a valid structure if successful; otherwise returns an invalid structure.

Return type

CFFStruct

See also GetCentralDirectoryData().

CentralDirectoryEnd32()Pro.Core.CFFStruct

Retrieves the structure of the central directory in a Zip archive.

Returns

Returns a valid structure if successful; otherwise returns an invalid structure.

Return type

CFFStruct

See also CentralDirectoryEnd().

CentralDirectoryEnd64(offset: int)Pro.Core.CFFStruct

Retrieves the structure of the central directory in a Zip64 archive.

Parameters

offset (int) – The offset of the central directory.

Returns

Returns a valid structure if successful; otherwise returns an invalid structure.

Return type

CFFStruct

See also CentralDirectoryEnd() and CentralDirectoryEndLocator64().

CentralDirectoryEndLocator64(offset: int)Pro.Core.CFFStruct

Finds the locator of the central directory in a Zip64 archive.

Parameters

offset (int) – The start offset from where to search.

Returns

Returns a valid structure if successful; otherwise returns an invalid structure.

Return type

CFFStruct

CentralDirectoryEndOffset32()int

Finds the central directory in a Zip32 archive.

Returns

Returns the offset of the central directory if successful; otherwise returns Pro.CoreINVALID_STREAM_OFFSET.

Return type

int

See also CentralDirectoryOffset().

CentralDirectoryOffset()int

Finds the central directory in both Zip32 and Zip64 archives.

Returns

Returns the offset of the central directory if successful; otherwise returns Pro.CoreINVALID_STREAM_OFFSET.

Return type

int

See also GetCentralDirectoryData().

DumpExtraField(fh: Pro.Core.CFFStruct, out: Pro.Core.NTTextStream)bool

Outputs to text the extra field in a file header.

Parameters
Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also DumpFileHeader().

DumpFileHeader(s: Pro.Core.CFFStruct, out: Pro.Core.NTTextStream)bool

Outputs to text a file header.

Parameters
Returns

Returns True if successful; otherwise returns False.

Return type

bool

DumpHeaders(out: Pro.Core.NTTextStream)bool

Outputs to text the central directory data.

Parameters

out (NTTextStream) – The output text stream.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also DumpFileHeader().

ExtraFieldBuffer(fh: Pro.Core.CFFStruct)Pro.Core.NTContainerBuffer

Retrieves as a buffer the extra field in file header.

Parameters

fh (CFFStruct) – The file header structure.

Returns

Returns a valid buffer instance if successful; otherwise returns an invalid Pro.Core.NTContainerBuffer instance.

Return type

NTContainerBuffer

Extract(fh: Pro.Core.CFFStruct)Pro.Core.NTContainer

Extracts the data of a file, applying the necessary decompression.

Parameters

fh (CFFStruct) – The file header structure of the file to extract.

Returns

Returns a valid container instance if successful; otherwise returns an invalid Pro.Core.NTContainer instance.

Return type

NTContainer

See also ExtractTo() and GetCompressedData().

ExtractTo(fh: Pro.Core.CFFStruct, dst: Pro.Core.NTContainer)bool

Extracts the data of a file to a specified input container, applying the necessary decompression.

Parameters
  • fh (CFFStruct) – The file header structure of the file to extract.

  • dst (NTContainer) – The container used to extract the data.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also Extract() and GetCompressedData().

FileHeader(offset: int)Pro.Core.CFFStruct

Retrieves a file header structure from the offset of a file header.

Note

This method handles both local and non-local file headers.

Parameters

offset (int) – The offset of the file header.

Returns

Returns a valid structure if successful; otherwise returns an invalid structure.

Return type

CFFStruct

FileHeaders()Pro.Core.CFFStruct

Retrieves the array of file headers of the central directory.

Returns

Returns a valid structure if successful; otherwise returns an invalid structure.

Return type

CFFStruct

FindEntry(name: str)Pro.Core.CFFStruct

Looks up a file entry by name.

Parameters

name (str) – The name of the file entry.

Returns

Returns a valid structure if the entry is found; otherwise returns an invalid structure.

Return type

CFFStruct

GetCentralDirectoryData(data: Pro.Zip.CentralDirectoryData)bool

Retrieves the data of the central directory.

Parameters

data (CentralDirectoryData) – The structure of the data to be retrieved.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

GetCompressedData(fh: Pro.Core.CFFStruct)Pro.Core.NTContainer

Retrieves the compressed data of a file entry.

Parameters

fh (CFFStruct) – The file header structure.

Returns

Returns a valid container instance if successful; otherwise returns an invalid Pro.Core.NTContainer instance.

Return type

NTContainer

See also Extract() and ExtractTo().

GetCompressionMethodName(comptype: int)str

Returns the name of a compression method by its value.

Parameters

comptype (int) – The compression type.

Returns

Returns the compression method name if successful; otherwise returns "Unknown".

Return type

str

GetEntries()Pro.Core.NTContainer

Retrieves the internally stored entries.

Returns

Returns the stored entries.

Return type

NTContainer

See also SetEntries() and RetrieveEntries().

GetEntry(i: int)Pro.Core.CFFStruct

Retrieves a file entry.

Parameters

i (int) – The index of the file entry.

Returns

Returns a valid file entry if successful; otherwise returns an invalid structure.

Return type

CFFStruct

See also GetEntryCount().

GetEntryCount()int
Returns

Returns the number of file entries.

Return type

int

See also GetEntry().

GetEntryName(i: int)str

Retrieves the name of a file entry.

Parameters

i (int) – The index of the file entry.

Returns

Returns the name if successful; otherwise returns an empty string.

Return type

str

See also GetEntry().

GetEntryOffset(i: int)int

Retrieves the offset of a file entry.

Parameters

i (int) – The index of the file entry.

Returns

Returns the offset of the file entry if successful; otherwise returns Pro.CoreINVALID_STREAM_OFFSET.

Return type

int

See also GetEntry().

GetExtraFieldDescr(hid: int)str

Retrieves a description for an extra field.

Parameters

hid (int) – The id of the extra field.

Returns

Returns a description if successful; otherwise returns an empty string.

Return type

str

GetFileHeader64Data(fh: Pro.Core.CFFStruct, data: Pro.Zip.FileHeader64Data)tuple

Extracts the data from a file header structure.

Parameters
Returns

Returns a tuple containing two booleans. The first boolean represents the result of the operation, while the second boolean uses True to signal a Zip64 file header structure and False for Zip32.

Return type

tuple[bool, bool]

See also GetEntry().

HasEntry(name: str)bool

Checks if a file entry exists.

Hint

Internally this method calls FindEntry().

Parameters

name (str) – The name of the file entry.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also FindEntry().

RetrieveEntries(max_entries: int = ZIP_MAX_FILE_ENTRIES, parse_corrupted: bool = False)Pro.Core.NTContainer

Find the file entries in a Zip archive.

Parameters
  • max_entries (int) – The maximum number of file entries to collect.

  • parse_corrupted (bool) – If True, it tries to parse corrupted archives.

Returns

Returns the collected file entries.

Return type

NTContainer

See also SetEntries() and GetEntries().

SetEntries(entries: Pro.Core.NTContainer)None

Sets the internal file entries.

Parameters

entries (NTContainer) – The file entries.

See also RetrieveEntries() and GetEntries().