`Pro.PDF` — API for parsing PDF documents¶

Overview¶

The Pro.PDF module contains the API for parsing PDF documents.

PDF Parsing¶

The following code example demonstrates how to iterate through all the objects in a PDF document:

from Pro.Core import *
from Pro.PDF import *

def parsePDF(fname):
    # open the file
    c = createContainerFromFile(fname)
    if c.isNull():
        print("error: couldn't open file")
        return
    # load the file as PDF
    pdf = PDFObject()
    if not pdf.Load(c):
        print("error: invalid file format")
        return
    # parse all referenced objects
    objtable = pdf.BuildObjectTable()
    # detect unreferenced objects
    # (corrupted or malicious PDFs may contain them)
    pdf.DetectObjects(objtable)
    # store the object table internally
    pdf.SetObjectTable(objtable)
    # process PDF encryption
    if not pdf.ProcessEncryption():
        print("warning: couldn't decrypt file")
    # [optional] sort objects by ID
    oids = []
    it = objtable.iterator()
    while it.hasNext():
        oid, _ = it.next()
        oids.append(oid)
    oids.sort()
    # iterate through the objects
    for oid in oids:
        # print out the object id
        print("\nOBJECT ID:", oid >> 32, "\n")
        # parse the object
        ret, dictn, content, info = pdf.ParseObject(objtable, oid)
        if not ret:
            print("warning: couldn't parse object %d" % (oid,))
            continue
        # print out the object dictionary
        it = dictn.iterator()
        while it.hasNext():
            k, v = it.next()
            print("   ", k, "-", v)
        # print out the decoded object stream
        content = pdf.DecodeObjectStream(content, dictn, oid)
        if not content:
            continue
        out = NTTextBuffer()
        out.printHex(content)
        print("\n", out.buffer)

Hint

Since PDF parsing can be a complex operation, it is often recommended to leverage the scan engine to extract artifacts.

In the following example, a hook is used to extract JavaScript code from a PDF document:

from Pro.Core import *

def printJSEntry(sp, xml, tnode):
    # data node
    dnode = xml.findChild(tnode, "d")
    if not dnode:
        return
    # we let the scan engine extract the JavaScript for us
    params = NTStringVariantHash()
    params.insert("op", "js")
    idnode = xml.findChild(dnode, "id")
    if idnode:
        params.insert("id", int(xml.value(idnode), 16))
    ridnode = xml.findChild(dnode, "rid")
    if idnode:
        params.insert("rid", int(xml.value(ridnode), 16))
    js = sp.customOperation(params)
    # print out the JavaScript
    print("JS CODE")
    print("-------")
    print(js)

def pdfExtractJS(sp, ud):
    xml = sp.getReportXML()
    # object node
    onode = xml.findChild(None, "o")
    if onode:
        # scan node
        snode = xml.findChild(onode, "s")
        if snode:
            # enumerate scan entries
            tchild = xml.firstChild(snode)
            while tchild:
                if xml.name(tchild) == "t":
                    # type attribute
                    tattr = xml.findAttribute(tchild, "t")
                    # check if it's a JavaScript entry
                    if tattr and int(xml.value(tattr)) == CT_JavaScript:
                        printJSEntry(sp, xml, tchild)
                tchild = xml.nextSibling(tchild)

Module API¶

Pro.PDF module API.

Classes:

PDFCrossRefTable()

This class represents a PDF cross-reference table.

PDFCrossRefTableList()

List of PDFCrossRefTable elements.

PDFCrossRefTableListIt(obj)

Iterator class for PDFCrossRefTableList.

PDFCrossRefTableSection()

This class represents the section of a PDF cross-reference table.

PDFCrossRefTableSectionList()

List of PDFCrossRefTableSection elements.

PDFCrossRefTableSectionListIt(obj)

Iterator class for PDFCrossRefTableSectionList.

PDFObject()

This class represents a PDF document.

PDFObjectParseInfo()

This class contains the information of a parsed PDF object.

PDFObjectRef()

This class represents a reference to a PDF object.

PDFObjectTable()

Dictionary of int -> PDFObjectRef elements.

PDFObjectTableIt(obj)

Iterator class for PDFObjectTable.

Attributes:

PDF_CROSSREF_ENTRY_SIZE

Size of an entry in a cross-reference table.

PDF_INVALID_OBJECT_REF

Invalid PDF object id.

PDF_OBJREF_FLAG_UNREFERENCED

Flag for unreferenced PDF objects.

Functions:

PDF_CreateEncryptionKey(Password, O, P, …)

Generates an encryption key for a PDF document.

PDF_GenerateOwnerKey(OwnerKey, UserKey, …)

Generates an owner key for a PDF document.

PDF_GenerateUserKey(UserKey, O, P, ID_1, …)

Generates a user key for a PDF document.

PDF_R5_GenerateDecryptionKey(user_password, …)

Generates a revision 5 decryption key for a PDF document.

PDF_R5_GenerateOwnerDecryptionKey(usrpwd, O, …)

Generates a revision 5 owner decryption key for a PDF document.

PDF_R5_GenerateUserDecryptionKey(usrpwd, U, UE)

Generates a revision 5 user decryption key for a PDF document.

PDF_R6_GenerateDecryptionKey(user_password, …)

Generates a revision 6 decryption key for a PDF document.

PDF_R6_GenerateUserDecryptionKey(usrpwd, U, UE)

Generates a revision 6 user decryption key for a PDF document.

class PDFCrossRefTable¶

This class represents a PDF cross-reference table.

Attributes:

Prev

Offset of the previous cross-reference table if available; otherwise Pro.Core.INVALID_STREAM_OFFSET.

sections

List of PDFCrossRefTableSection.

trailer

Trailer dictionary.

Prev¶

Offset of the previous cross-reference table if available; otherwise Pro.Core.INVALID_STREAM_OFFSET.

sections¶

List of PDFCrossRefTableSection.

See also PDFCrossRefTableSectionList.

trailer¶

Trailer dictionary.

See also Pro.Core.NTStringStringHash.

class PDFCrossRefTableList¶

List of PDFCrossRefTable elements.

Methods:

append(value)

Inserts value at the end of the list.

at(i)

Returns the item at index position i in the list.

clear()

Removes all items from the list.

contains(value)

Checks the presence of an element in the list.

count(value)

Returns the number of occurrences of value in the list.

indexOf(value[, start])

Searches for an element in the list.

insert(i, value)

Inserts value at index position i in the list.

isEmpty()

Checks whether the list is empty.

iterator()

Creates an iterator for the list.

removeAll(value)

Removes all occurrences of value in the list and returns the number of entries removed.

removeAt(i)

Removes the item at index position i.

reserve(alloc)

Reserve space for alloc elements.

size()

Returns the number of items in the list.

takeAt(i)

Removes the item at index position i and returns it.

append(value: Pro.PDF.PDFCrossRefTable) → None¶

Inserts value at the end of the list.

Parameters

value (PDFCrossRefTable) – The value to add to the list.

See also insert().

at(i: int) → Pro.PDF.PDFCrossRefTable ¶

Returns the item at index position i in the list. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to return.

Returns

Returns the requested element.

Return type

PDFCrossRefTable

clear() → None¶

Removes all items from the list.

contains(value: Pro.PDF.PDFCrossRefTable) → bool¶

Checks the presence of an element in the list.

Parameters

value (PDFCrossRefTable) – The value to check for.

Returns

Returns True if the list contains an occurrence of value; otherwise returns False.

Return type

bool

See also indexOf() and count().

count(value: Pro.PDF.PDFCrossRefTable) → int¶

Returns the number of occurrences of value in the list.

Parameters

value (PDFCrossRefTable) – The value to count.

Returns

Returns the number of occurrences.

Return type

int

See also indexOf() and contains().

indexOf(value: Pro.PDF.PDFCrossRefTable, start: int = 0) → int¶

Searches for an element in the list.

Parameters

value (PDFCrossRefTable) – The value to search for.

start (int) – The start index.

Returns

Returns the index position of the first occurrence of value in the list. Returns -1 if no item was found.

Return type

int

See also contains().

insert(i: int, value: Pro.PDF.PDFCrossRefTable) → None¶

Inserts value at index position i in the list. If i is 0, the value is prepended to the list. If i is size(), the value is appended to the list.

Parameters

i (int) – The position at which to add the value.

value (PDFCrossRefTable) – The value to add.

See also append() and removeAt().

isEmpty() → bool¶

Checks whether the list is empty.

Returns

Returns True if the list contains no items; otherwise returns False.

Return type

bool

See also size().

iterator() → Pro.PDF.PDFCrossRefTableListIt ¶

Creates an iterator for the list.

Returns

Returns the iterator.

Return type

PDFCrossRefTableListIt

removeAll(value: Pro.PDF.PDFCrossRefTable) → int¶

Removes all occurrences of value in the list and returns the number of entries removed.

Parameters

value (PDFCrossRefTable) – The value to remove from the list.

Returns

Returns the number of entries removed.

Return type

int

See also removeAt().

removeAt(i: int) → None¶

Removes the item at index position i. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the item to remove.

See also removeAll().

reserve(alloc: int) → None¶

Reserve space for alloc elements. Calling this method doesn’t change the size of the list.

Parameters

alloc (int) – The amount of elements to reserve space for.

size() → int¶

Returns

Returns the number of items in the list.

Return type

int

See also isEmpty().

takeAt(i: int) → Pro.PDF.PDFCrossRefTable ¶

Removes the item at index position i and returns it. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to remove from the list.

Returns

Returns the removed element. If you don’t use the return value, removeAt() is more efficient.

Return type

PDFCrossRefTable

See also removeAt().

class PDFCrossRefTableListIt(obj: Pro.PDF.PDFCrossRefTableList)¶

Iterator class for PDFCrossRefTableList.

Parameters

obj (PDFCrossRefTableList) – The object to iterate over.

Methods:

hasNext()

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

hasPrevious()

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

next()

Returns the next item and advances the iterator by one position.

previous()

Returns the previous item and moves the iterator back by one position.

toBack()

Moves the iterator to the back of the container (after the last item).

toFront()

Moves the iterator to the front of the container (before the first item).

hasNext() → bool¶

Returns

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

Return type

bool

See also hasPrevious() and next().

hasPrevious() → bool¶

Returns

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

Return type

bool

See also hasNext() and previous().

next() → Pro.PDF.PDFCrossRefTable ¶

Returns

Returns the next item and advances the iterator by one position.

Return type

PDFCrossRefTable

See also hasNext() and previous().

previous() → Pro.PDF.PDFCrossRefTable ¶

Returns

Returns the previous item and moves the iterator back by one position.

Return type

PDFCrossRefTable

See also hasPrevious() and next().

toBack() → None¶

Moves the iterator to the back of the container (after the last item).

See also toFront() and previous().

toFront() → None¶

Moves the iterator to the front of the container (before the first item).

See also toBack() and next().

class PDFCrossRefTableSection¶

This class represents the section of a PDF cross-reference table.

See also PDFCrossRefTable.

Attributes:

array_offset

The offset of the cross-reference section entries.

count

The number of entries.

start_id

The id of the first object in the array.

array_offset¶

The offset of the cross-reference section entries.

count¶

The number of entries.

start_id¶

The id of the first object in the array.

See also PDFObject.OBJID().

class PDFCrossRefTableSectionList¶

List of PDFCrossRefTableSection elements.

Methods:

append(value)

Inserts value at the end of the list.

at(i)

Returns the item at index position i in the list.

clear()

Removes all items from the list.

contains(value)

Checks the presence of an element in the list.

count(value)

Returns the number of occurrences of value in the list.

indexOf(value[, start])

Searches for an element in the list.

insert(i, value)

Inserts value at index position i in the list.

isEmpty()

Checks whether the list is empty.

iterator()

Creates an iterator for the list.

removeAll(value)

Removes all occurrences of value in the list and returns the number of entries removed.

removeAt(i)

Removes the item at index position i.

reserve(alloc)

Reserve space for alloc elements.

size()

Returns the number of items in the list.

takeAt(i)

Removes the item at index position i and returns it.

append(value: Pro.PDF.PDFCrossRefTableSection) → None¶

Inserts value at the end of the list.

Parameters

value (PDFCrossRefTableSection) – The value to add to the list.

See also insert().

at(i: int) → Pro.PDF.PDFCrossRefTableSection ¶

Returns the item at index position i in the list. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to return.

Returns

Returns the requested element.

Return type

PDFCrossRefTableSection

clear() → None¶

Removes all items from the list.

contains(value: Pro.PDF.PDFCrossRefTableSection) → bool¶

Checks the presence of an element in the list.

Parameters

value (PDFCrossRefTableSection) – The value to check for.

Returns

Returns True if the list contains an occurrence of value; otherwise returns False.

Return type

bool

See also indexOf() and count().

count(value: Pro.PDF.PDFCrossRefTableSection) → int¶

Returns the number of occurrences of value in the list.

Parameters

value (PDFCrossRefTableSection) – The value to count.

Returns

Returns the number of occurrences.

Return type

int

See also indexOf() and contains().

indexOf(value: Pro.PDF.PDFCrossRefTableSection, start: int = 0) → int¶

Searches for an element in the list.

Parameters

value (PDFCrossRefTableSection) – The value to search for.

start (int) – The start index.

Returns

Returns the index position of the first occurrence of value in the list. Returns -1 if no item was found.

Return type

int

See also contains().

insert(i: int, value: Pro.PDF.PDFCrossRefTableSection) → None¶

Inserts value at index position i in the list. If i is 0, the value is prepended to the list. If i is size(), the value is appended to the list.

Parameters

i (int) – The position at which to add the value.

value (PDFCrossRefTableSection) – The value to add.

See also append() and removeAt().

isEmpty() → bool¶

Checks whether the list is empty.

Returns

Returns True if the list contains no items; otherwise returns False.

Return type

bool

See also size().

iterator() → Pro.PDF.PDFCrossRefTableSectionListIt ¶

Creates an iterator for the list.

Returns

Returns the iterator.

Return type

PDFCrossRefTableSectionListIt

removeAll(value: Pro.PDF.PDFCrossRefTableSection) → int¶

Removes all occurrences of value in the list and returns the number of entries removed.

Parameters

value (PDFCrossRefTableSection) – The value to remove from the list.

Returns

Returns the number of entries removed.

Return type

int

See also removeAt().

removeAt(i: int) → None¶

Removes the item at index position i. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the item to remove.

See also removeAll().

reserve(alloc: int) → None¶

Reserve space for alloc elements. Calling this method doesn’t change the size of the list.

Parameters

alloc (int) – The amount of elements to reserve space for.

size() → int¶

Returns

Returns the number of items in the list.

Return type

int

See also isEmpty().

takeAt(i: int) → Pro.PDF.PDFCrossRefTableSection ¶

Removes the item at index position i and returns it. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to remove from the list.

Returns

Returns the removed element. If you don’t use the return value, removeAt() is more efficient.

Return type

PDFCrossRefTableSection

See also removeAt().

class PDFCrossRefTableSectionListIt(obj: Pro.PDF.PDFCrossRefTableSectionList)¶

Iterator class for PDFCrossRefTableSectionList.

Parameters

obj (PDFCrossRefTableSectionList) – The object to iterate over.

Methods:

hasNext()

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

hasPrevious()

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

next()

Returns the next item and advances the iterator by one position.

previous()

Returns the previous item and moves the iterator back by one position.

toBack()

Moves the iterator to the back of the container (after the last item).

toFront()

Moves the iterator to the front of the container (before the first item).

hasNext() → bool¶

Returns

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

Return type

bool

See also hasPrevious() and next().

hasPrevious() → bool¶

Returns

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

Return type

bool

See also hasNext() and previous().

next() → Pro.PDF.PDFCrossRefTableSection ¶

Returns

Returns the next item and advances the iterator by one position.

Return type

PDFCrossRefTableSection

See also hasNext() and previous().

previous() → Pro.PDF.PDFCrossRefTableSection ¶

Returns

Returns the previous item and moves the iterator back by one position.

Return type

PDFCrossRefTableSection

See also hasPrevious() and next().

toBack() → None¶

Moves the iterator to the back of the container (after the last item).

See also toFront() and previous().

toFront() → None¶

Moves the iterator to the front of the container (before the first item).

See also toBack() and next().

class PDFObject¶

Bases: Pro.Core.CFFObject

This class represents a PDF document.

Methods:

AddManuallyObjectToTable(objtable, offset)

Adds a PDF object to a PDF object table.

BuildObjectTable([xref_offset])

Creates a PDF object table.

BuildStringObjectFromBytes(bytes)

Builds a string object from raw bytes.

CatalogTreeToInverseHash(catalog)

Converts a PDF catalog tree into an inverse hash mapping object IDs to page numbers.

ComputeCatalogTree(objtable[, eid])

Computes the catalog tree of the PDF document.

CountUncompressedObjects(objtable)

Calculates the number of uncompressed objects.

CurrentEOF()

Returns the offset of the “%%EOF” string.

DecodeNameObject(raw)

Decodes a name object.

DecodeObjectStream(raw, dictionary_or_filter)

Decodes the stream of an object.

DecodeObjectStreamEx(raw, dictionary[, eid])

Decodes the stream of an object.

DecodeObjectStreamWithFilter(raw, filter[, …])

Decodes the stream of an object.

DecodingOperationsFinished()

Closes helper processes if spawned.

DetectObjects(objtable)

Detects unreferenced objects in the PDF document.

EnableFilter(type[, b])

Sets whether the specified decoding filter is enabled.

EnumerateCrossRefTables([xref_offset])

Enumerates cross-reference tables in a PDF document.

FindObjects(objtable, pathstr[, compressed])

Finds objects matching the specified criteria.

FlattenCatalogTree(catalog)

Converts a PDF catalog tree into a list of pages in their correct order as object IDs.

GetDictValue(dict, key[, dflt])

Gets the value from a PDF dictionary.

GetEOF()

Finds the position of the “%%EOF” string.

GetElement(offset)

Retrieves a PDF element such as a dictionary, a list or other objects.

GetElementSize(offset_or_str[, _from])

Calculates the size of a PDF element such as a dictionary, a list or other objects.

GetFilterDefaultParameters(filter)

Retrieves the default parameters for a filter.

GetJBIG2DecodeOptions()

Returns the decoding options for JBIG2 streams.

GetJBIG2DecodeTimeout()

Returns the time-out value for the JBIG2 decoding process.

GetJBIG2LibraryVersion()

Returns the version of the JBIG2 library used for decoding.

GetObjectContent(objtable, eid)

Retrieves the decoded stream of a PDF object and its dictionary.

GetObjectContentEx(objtable, eid)

Retrieves the decoded stream of a PDF object and its dictionary.

GetObjectTable()

Returns the internally stored object table.

GetStartXRef([pos])

Retrieves the start cross-reference offset.

GetStartXRefEx(pos)

Retrieves the start cross-reference offset.

GetStringObjectBytes(str)

Converts a string object back to its original bytes.

GetSupportedFilterNames()

Returns the list of supported decoding filter names.

GetTrailer(i)

Retrieves a specified trailer dictionary.

GetTrailers()

Returns the list of trailer dictionaries.

HasEncryption()

Returns True if the PDF has encryption; otherwise returns False.

HexStringSize(str[, _from])

Computes the size of a hex string object.

IsContainer(estr)

Checks whether an element is either a dictionary or a list.

IsDecrypted()

Returns True if the PDF doesn’t have encryption or was decrypted; otherwise returns False.

IsFilterEnabled(type)

Checks whether the specified decoding filter is enabled.

IsValidPDF()

Returns True if the PDF document has a “%%EOF” signature; otherwise returns False.

LiteralStringSize(str[, _from])

Calculates the size of a literal string.

OBJID(id, generation)

Creates an object ID from its number and generation.

OBJIDGEN(oid)

Retrieves the object generation from an object ID.

OBJIDNUM(oid)

Retrieves the object number from an object ID.

ObjectToString(eid_or_objtable, eid_or_ref)

Converts an object ID to a string.

PDFValueLength(offset_or_str[, _from])

Calculates the size of a value.

ParseContainerElement(estr[, eid])

Parses a container element such as a dictionary or a list.

ParseCrossRefEntry(bytes)

Parses a cross-reference entry.

ParseObject(objtable, eid)

Parses a PDF object without decoding its stream.

ParseObjectContent(objtable, eid)

Retrieves the stream of a PDF object without decoding it.

ParseObjectDictionary(objtable, eid)

Retrieves the dictionary of a PDF object.

ParseObjectInfo(objtable, eid)

Retrieves the parsing information of a PDF object.

ParseObjectName(ref)

Converts an object name such as “10 1 obj” into its ID.

ParseObjectRef(ref)

Converts an object reference such as “10 1 R” into its ID.

ParseObjectRefOrName(ref, parse_ref)

Converts an object name or reference into its ID.

ProcessEncryption()

Decrypts the PDF document if encrypted.

ReadHexString(offset)

Reads a hex strings.

ReadLiteralString(offset[, encrypted_string])

Reads a literal string.

RegularCharsLength(offset_or_str[, _from])

Calculates the length of characters not interrupted by reserved characters.

SetJBIG2DecodeOptions(options)

Sets the decoding options for JBIG2 streams.

SetJBIG2DecodeTimeout(timeout)

Sets the JBIG2 decoding time-out value.

SetJBIG2LibraryVersion(version)

Sets the version of the JBIG2 library used for decoding.

SetObjectTable(objtable)

Sets the internally stored object table.

SkipEmptyChars(offset_or_str[, _from_or_down])

Skips empty characters and comments.

SkipNewLine(offset)

Skips new-line characters if present.

UnescapeLiteralString(str)

Unescapes a literal string.

Unpredict(raw, filter, parms[, eid])

Removes PNG prediction on input data.

Attributes:

FilterType_ASCII85Decode

ASCII85Decode filter type.

FilterType_ASCIIHexDecode

ASCIIHexDecode filter type.

FilterType_CCITTFaxDecode

CCITTFaxDecode filter type.

FilterType_DCTDecode

DCTDecode filter type.

FilterType_FlateDecode

FlateDecode filter type.

FilterType_JBIG2Decode

JBIG2Decode filter type.

FilterType_JPXDecode

JPXDecode filter type.

FilterType_LZWDecode

LZWDecode filter type.

FilterType_RunLengthDecode

RunLengthDecode filter type.

JBIG2DecodeOpt_HelperProcess

JBIG2 decoding option to decode JBIG2 streams in a separate process.

JBIG2DecodeOpt_InProcess

Default JBIG2 decoding option to decode JBIG2 streams in the same process.

JBIG2DecodeOpt_NoDecode

JBIG2 decoding option to disable the decoding of JBIG2 streams.

AddManuallyObjectToTable(objtable: Pro.PDF.PDFObjectTable, offset: int) → bool¶

Adds a PDF object to a PDF object table.

Parameters

objtable (PDFObjectTable) – The object table.

offset (int) – The offset of the object.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also BuildObjectTable(), DetectObjects() and SetObjectTable().

BuildObjectTable(xref_offset: int = INVALID_STREAM_OFFSET) → Pro.PDF.PDFObjectTable ¶

Creates a PDF object table.

Parameters

xref_offset (int) – The optional offset of the cross-reference table.

Returns

Returns the object table if successful; otherwise returns an empty PDFObjectTable instance.

Return type

PDFObjectTable

See also DetectObjects(), SetObjectTable() and AddManuallyObjectToTable().

BuildStringObjectFromBytes(bytes: bytes) → str¶

Builds a string object from raw bytes.

This method is used to construct strings after decryption.

Parameters

bytes (bytes) – The input data.

Returns

Returns the string object if successful; otherwise returns an empty string.

Return type

str

CatalogTreeToInverseHash(catalog: Pro.Core.NTMaxUIntTree) → Pro.Core.NTUInt64UIntHash ¶

Converts a PDF catalog tree into an inverse hash mapping object IDs to page numbers.

Parameters

catalog (NTMaxUIntTree) – The catalog tree.

Returns

Returns the inverse hash if successful; otherwise returns an empty Pro.Core.NTUInt64UIntHash instance.

Return type

NTUInt64UIntHash

See also ComputeCatalogTree() and FlattenCatalogTree().

ComputeCatalogTree(objtable: Pro.PDF.PDFObjectTable, eid: int = PDF_INVALID_OBJECT_REF) → Pro.Core.NTMaxUIntTree ¶

Computes the catalog tree of the PDF document.

Parameters

objtable (PDFObjectTable) – The object table.

eid (int) – The optional catalog tree object ID.

Returns

Returns the computed catalog tree if successful; otherwise returns an empty Pro.Core.NTMaxUIntTree instance.

Return type

NTMaxUIntTree

See also CatalogTreeToInverseHash() and FlattenCatalogTree().

CountUncompressedObjects(objtable: Pro.PDF.PDFObjectTable) → int¶

Calculates the number of uncompressed objects.

Parameters

objtable (PDFObjectTable) – The object table.

Returns

Returns the number of uncompressed objects.

Return type

int

See also PDFObjectRef.n.

CurrentEOF() → int¶

Returns

Returns the offset of the “%%EOF” string.

Return type

int

DecodeNameObject(raw: str) → str¶

Decodes a name object.

For instance, it converts something like “/Adobe#20Green” to “/Adobe Green”.

Parameters

raw (str) – The raw name object.

Returns

Returns the decode name object.

Return type

str

DecodeObjectStream(raw: bytes, dictionary_or_filter: Union[Pro.Core.NTStringStringHash, str], eid_or_parms: Union[Pro.Core.NTStringStringHash, int] = NTStringStringHashList(), eid: int = PDF_INVALID_OBJECT_REF) → bytes¶

Decodes the stream of an object.

Parameters

raw (bytes) – The stream data.

dictionary_or_filter (Union[NTStringStringHash, str]) – Either the dictionary of the PDF object or the name of the filter.

eid_or_parms (Union[NTStringStringHash, int]) – Either the object ID or the parameters of the filter if the dictionary wasn’t provided.

eid (int) – The object ID if the dictionary wasn’t provided.

Returns

Returns the decoded data if successful; otherwise returns an empty bytes object.

Return type

bytes

See also DecodeObjectStreamEx(), DecodeObjectStreamWithFilter() and ParseObject().

DecodeObjectStreamEx(raw: bytes, dictionary: Pro.Core.NTStringStringHash, eid: Optional[int] = None) → tuple¶

Decodes the stream of an object.

Parameters

raw (bytes) – The stream data.

dictionary (NTStringStringHash) – The dictionary of the PDF object.

eid (Optional[int]) – The object ID.

Returns

Returns a tuple containing the decoded data and an error string.

Return type

tuple[bytes, str]

See also DecodeObjectStream(), DecodeObjectStreamWithFilter() and ParseObject().

DecodeObjectStreamWithFilter(raw: bytes, filter: str, parms: Optional[Pro.Core.NTStringStringHash] = None, eid: Optional[int] = None) → tuple¶

Decodes the stream of an object.

Parameters

raw (bytes) – The stream data.

filter (str) – The name of the filter.

parms (Optional[NTStringStringHash]) – The parameters of the filter.

eid (Optional[int]) – The object ID.

Returns

Returns a tuple containing the decoded data and an error string.

Return type

tuple[bytes, str]

See also DecodeObjectStream(), DecodeObjectStreamEx() and ParseObject().

DecodingOperationsFinished() → None¶

Closes helper processes if spawned.

Note

This method should be used only in conjunction with JBIG2DecodeOpt_HelperProcess.

See also SetJBIG2DecodeOptions().

DetectObjects(objtable: Pro.PDF.PDFObjectTable) → None¶

Detects unreferenced objects in the PDF document.

Hint

This method can be called after BuildObjectTable() to detect additional unreferenced objects.

Parameters

objtable (PDFObjectTable) – The PDF object table.

See also BuildObjectTable(), SetObjectTable() and AddManuallyObjectToTable().

EnableFilter(type: int, b: bool = True) → None¶

Sets whether the specified decoding filter is enabled.

Note

By default all decoding filters are enabled.

Parameters

type (int) – The decoding filter (e.g., ASCIIHexDecode).

b (bool) – If True, enables the filter; otherwise disables it.

See also IsFilterEnabled().

EnumerateCrossRefTables(xref_offset: int = INVALID_STREAM_OFFSET) → Pro.PDF.PDFCrossRefTableList ¶

Enumerates cross-reference tables in a PDF document.

Note

This method is called internally by BuildObjectTable().

Parameters

xref_offset (int) – The optional offset of the first cross-reference table.

Returns

Returns the list of cross-reference tables if successful; otherwise returns an empty PDFCrossRefTableList instance.

Return type

PDFCrossRefTableList

See also BuildObjectTable().

FilterType_ASCII85Decode: Final[int]¶

ASCII85Decode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_ASCIIHexDecode: Final[int]¶

ASCIIHexDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_CCITTFaxDecode: Final[int]¶

CCITTFaxDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_DCTDecode: Final[int]¶

DCTDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_FlateDecode: Final[int]¶

FlateDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_JBIG2Decode: Final[int]¶

JBIG2Decode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_JPXDecode: Final[int]¶

JPXDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_LZWDecode: Final[int]¶

LZWDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_RunLengthDecode: Final[int]¶

RunLengthDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FindObjects(objtable: Pro.PDF.PDFObjectTable, pathstr: str, compressed: bool = True) → Pro.Core.NTUInt64List ¶

Finds objects matching the specified criteria.

Parameters

objtable (PDFObjectTable) – The object table.

pathstr (str) – The search criteria. This can match multiple keys as well as specify allowed values (e.g., "Parent;Type|T;A|B").

compressed (bool) – If True, includes compressed objects in the search.

Returns

Returns a list of matching objects.

Return type

NTUInt64List

FlattenCatalogTree(catalog: Pro.Core.NTMaxUIntTree) → Pro.Core.NTUInt64List ¶

Converts a PDF catalog tree into a list of pages in their correct order as object IDs.

Parameters

catalog (NTMaxUIntTree) – The catalog tree to convert.

Returns

Returns the list if successful; otherwise returns an empty Pro.Core.NTUInt64List instance.

Return type

NTUInt64List

See also ComputeCatalogTree() and CatalogTreeToInverseHash().

GetDictValue(dict: Pro.Core.NTStringStringHash, key: str, dflt: str = str()) → str¶

Gets the value from a PDF dictionary.

This method automatically resolves object references.

Parameters

dict (NTStringStringHash) – The dictionary.

key (str) – The value to extract. This parameter can specify a sub-key using the semi-colon character as separator.

dflt (str) – The default value.

Returns

Returns the value from the dictionary if successful; otherwise returns the default value.

Return type

str

GetEOF() → int¶

Finds the position of the “%%EOF” string.

Returns

Returns the offset if successful; otherwise returns Pro.Core.INVALID_STREAM_OFFSET().

Return type

int

See also CurrentEOF().

GetElement(offset: int) → tuple¶

Retrieves a PDF element such as a dictionary, a list or other objects.

Parameters

offset (int) – The offset of the element.

Returns

Returns a tuple containing the element as string and its size if successful; otherwise returns an empty string and -1.

Return type

str

See also GetElementSize().

GetElementSize(offset_or_str: Union[int, str], _from: int = 0) → int¶

Calculates the size of a PDF element such as a dictionary, a list or other objects.

Parameters

offset_or_str (Union[int, str]) – Either the offset of an element or the element as string.

_from (int) – An optional start position into the string.

Returns

Returns the size of the element.

Return type

int

See also GetElement().

static GetFilterDefaultParameters(filter: str) → Pro.Core.NTStringStringHash ¶

Retrieves the default parameters for a filter.

Parameters

filter (str) – The filter type.

Returns

Returns the default parameters if available; otherwise returns an empty Pro.Core.NTStringStringHash instance.

Return type

NTStringStringHash

GetJBIG2DecodeOptions() → int¶

Returns

Returns the decoding options for JBIG2 streams.

Return type

int

See also SetJBIG2DecodeOptions().

GetJBIG2DecodeTimeout() → int¶

Returns

Returns the time-out value for the JBIG2 decoding process.

Return type

int

See also SetJBIG2DecodeTimeout().

GetJBIG2LibraryVersion() → int¶

Returns

Returns the version of the JBIG2 library used for decoding.

Return type

int

See also SetJBIG2LibraryVersion().

GetObjectContent(objtable: Pro.PDF.PDFObjectTable, eid: int) → tuple¶

Retrieves the decoded stream of a PDF object and its dictionary.

Parameters

objtable (PDFObjectTable) – The object table.

eid (int) – The object ID.

Returns

Returns a tuple containing the decoded stream and the dictionary of the object if successful; otherwise returns a tuple containing an empty bytes object and an empty NTStringStringHash() instance.

Return type

tuple[bytes, NTStringStringHash]

See also GetObjectContentEx().

GetObjectContentEx(objtable: Pro.PDF.PDFObjectTable, eid: int) → tuple¶

Retrieves the decoded stream of a PDF object and its dictionary.

Parameters

objtable (PDFObjectTable) – The object table.

eid (int) – The object ID.

Returns

Returns a tuple containing the decoded stream, the dictionary of the object and an empty string if successful; otherwise returns a tuple containing an empty bytes object, an empty NTStringStringHash() instance and an error string.

Return type

tuple[bytes, NTStringStringHash, str]

See also GetObjectContent().

GetObjectTable() → Pro.PDF.PDFObjectTable ¶

Returns

Returns the internally stored object table.

Return type

PDFObjectTable

See also SetObjectTable().

GetStartXRef(pos: int = 0) → int¶

Retrieves the start cross-reference offset.

Parameters

pos (int) – An optional start position for the search.

Returns

Returns the offset if successful; otherwise returns Pro.Core.INVALID_STREAM_OFFSET.

Return type

int

See also GetStartXRefEx().

GetStartXRefEx(pos: int) → tuple¶

Retrieves the start cross-reference offset.

Parameters

pos (int) – An optional start position for the search.

Returns

Returns a tuple containing the offset and the offset of the “startxref” if successful; otherwise returns a tuple containing Pro.Core.INVALID_STREAM_OFFSET and an undefined value.

Return type

int

See also GetStartXRef().

GetStringObjectBytes(str: str) → bytes¶

Converts a string object back to its original bytes.

Parameters

str (str) – The string object.

Returns

Returns the raw data of the string object.

Return type

bytes

static GetSupportedFilterNames() → Pro.Core.NTStringList ¶

Returns

Returns the list of supported decoding filter names.

Return type

NTStringList

GetTrailer(i: int) → Pro.Core.NTStringStringHash ¶

Retrieves a specified trailer dictionary.

Parameters

i (int) – The index of the trailer dictionary to retrieve.

Returns

Returns the specified trailer dictionary if available; otherwise returns an empty Pro.Core.NTStringStringHash instance.

Return type

NTStringStringHash

See also GetTrailers().

GetTrailers() → Pro.Core.NTStringStringHashList ¶

Returns

Returns the list of trailer dictionaries.

Return type

NTStringStringHashList

See also GetTrailer().

HasEncryption() → bool¶

Returns

Returns True if the PDF has encryption; otherwise returns False.

Return type

bool

Available since Cerbero Suite 7.2 and Cerbero Engine 4.2.

See also IsDecrypted() and ProcessEncryption().

HexStringSize(str: str, _from: int = 0) → int¶

Computes the size of a hex string object.

Parameters

str (str) – The hex string objext.

_from (int) – An optional position into the string.

Returns

Returns the computed size.

Return type

int

static IsContainer(estr: str) → bool¶

Checks whether an element is either a dictionary or a list.

Parameters

estr (str) – The element.

Returns

Returns True if the element is a dictionary or a list; otherwise returns False.

Return type

bool

IsDecrypted() → bool¶

Returns

Returns True if the PDF doesn’t have encryption or was decrypted; otherwise returns False.

Return type

bool

Available since Cerbero Suite 7.2 and Cerbero Engine 4.2.

See also HasEncryption() and ProcessEncryption().

IsFilterEnabled(type: int) → bool¶

Checks whether the specified decoding filter is enabled.

Parameters

type (int) – The decoding filter (e.g., ASCIIHexDecode).

Returns

Returns True if the filter is enabled; otherwise returns False.

Return type

bool

See also EnableFilter().

IsValidPDF() → bool¶

Returns

Returns True if the PDF document has a “%%EOF” signature; otherwise returns False.

Return type

bool

See also GetEOF().

JBIG2DecodeOpt_HelperProcess: Final[int]¶

JBIG2 decoding option to decode JBIG2 streams in a separate process.

See also SetJBIG2DecodeOptions() and SetJBIG2DecodeTimeout().

JBIG2DecodeOpt_InProcess: Final[int]¶

Default JBIG2 decoding option to decode JBIG2 streams in the same process.

See also GetJBIG2DecodeOptions() and SetJBIG2DecodeOptions().

JBIG2DecodeOpt_NoDecode: Final[int]¶

JBIG2 decoding option to disable the decoding of JBIG2 streams.

See also GetJBIG2DecodeOptions() and SetJBIG2DecodeOptions().

LiteralStringSize(str: str, _from: int = 0) → int¶

Calculates the size of a literal string.

Parameters

str (str) – The literal string.

_from (int) – An optional start position.

Returns

Returns the size of the literal string.

Return type

int

See also GetElementSize().

static OBJID(id: int, generation: int) → int¶

Creates an object ID from its number and generation.

Parameters

id (int) – The number of the object.

generation (int) – The generation of the object.

Returns

Returns the object ID.

Return type

int

See also OBJIDGEN() and OBJIDNUM().

static OBJIDGEN(oid: int) → int¶

Retrieves the object generation from an object ID.

Parameters

oid (int) – The object ID.

Returns

Returns the generation of the object.

Return type

int

See also OBJID() and OBJIDNUM().

static OBJIDNUM(oid: int) → int¶

Retrieves the object number from an object ID.

Parameters

oid (int) – The object ID.

Returns

Returns the number of the object.

Return type

int

See also OBJID() and OBJIDGEN().

ObjectToString(eid_or_objtable: Union[Pro.PDF.PDFObjectTable, int], eid_or_ref: Union[Pro.PDF.PDFObjectRef, int]) → str¶

Converts an object ID to a string.

Parameters

eid_or_objtable (Union[PDFObjectTable, int]) – Either an object table or an object ID.

eid_or_ref (Union[PDFObjectRef, int]) – Either an object ID or the object reference information.

Returns

Returns the object ID converted to string.

Return type

str

PDFValueLength(offset_or_str: Union[int, str], _from: int = 0) → int¶

Calculates the size of a value.

Parameters

offset_or_str (Union[int, str]) – The offset to the value or the string containing the value.

_from (int) – An optional start position into the string.

Returns

Returns the size of the value.

Return type

int

ParseContainerElement(estr: str, eid: int = PDF_INVALID_OBJECT_REF) → Pro.Core.NTStringStringHash ¶

Parses a container element such as a dictionary or a list.

Parameters

estr (str) – The element to parse.

eid (int) – The optional object ID of the element.

Returns

Returns the parsed element if successful; otherwise returns an empty Pro.Core.NTStringStringHash instance.

Return type

NTStringStringHash

ParseCrossRefEntry(bytes: bytes) → Pro.PDF.PDFObjectRef ¶

Parses a cross-reference entry.

Parameters

bytes (bytes) – The cross-reference entry.

Returns

Returns the object reference information if successful; otherwise returns an invalid PDFObjectRef instance.

Return type

PDFObjectRef

ParseObject(objtable: Pro.PDF.PDFObjectTable, eid: int) → tuple¶

Parses a PDF object without decoding its stream.

Parameters

objtable (PDFObjectTable) – The object table.

eid (int) – The object ID.

Returns

Returns a tuple containing the size of the object, its dictionary, its stream data and its parsing information if successful; otherwise returns a tuple containing -1, an empty Pro.Core.NTStringStringHash instance, an empty bytes object and invalid parsing information.

Return type

tuple[int, NTStringStringHash, bytes, PDFObjectParseInfo]

See also ParseObjectInfo(), ParseObjectDictionary(), ParseObjectContent() and DecodeObjectStream().

ParseObjectContent(objtable: Pro.PDF.PDFObjectTable, eid: int) → bytes¶

Retrieves the stream of a PDF object without decoding it.

Parameters

objtable (PDFObjectTable) – The object table.

eid (int) – The object ID.

Returns

Returns the stream data if successful; otherwise return an empty bytes object.

Return type

bytes

See also ParseObject(), ParseObjectInfo(), ParseObjectDictionary() and DecodeObjectStream().

ParseObjectDictionary(objtable: Pro.PDF.PDFObjectTable, eid: int) → Pro.Core.NTStringStringHash ¶

Retrieves the dictionary of a PDF object.

Parameters

objtable (PDFObjectTable) – The object table.

eid (int) – The object ID.

Returns

Returns the dictionary if successful; otherwise return an empty Pro.Core.NTStringStringHash object.

Return type

NTStringStringHash

See also ParseObject(), ParseObjectInfo(), ParseObjectContent() and DecodeObjectStream().

ParseObjectInfo(objtable: Pro.PDF.PDFObjectTable, eid: int) → Pro.PDF.PDFObjectParseInfo ¶

Retrieves the parsing information of a PDF object.

Parameters

objtable (PDFObjectTable) – The object table.

eid (int) – The object ID.

Returns

Returns the parsing information if successful; otherwise returns invalid parsing information.

Return type

PDFObjectParseInfo

See also ParseObject(), ParseObjectDictionary(), ParseObjectContent() and DecodeObjectStream().

static ParseObjectName(ref: str) → int¶

Converts an object name such as “10 1 obj” into its ID.

Parameters

ref (str) – The object name.

Returns

Returns the object ID if successful; otherwise returns PDF_INVALID_OBJECT_REF.

Return type

int

See also ParseObjectRef().

static ParseObjectRef(ref: str) → int¶

Converts an object reference such as “10 1 R” into its ID.

Parameters

ref (str) – The object reference.

Returns

Returns the object ID if successful; otherwise returns PDF_INVALID_OBJECT_REF.

Return type

int

See also ParseObjectName().

static ParseObjectRefOrName(ref: str, parse_ref: bool) → int¶

Converts an object name or reference into its ID.

Parameters

ref (str) – The object name or reference.

parse_ref (bool) – If True, parses an object reference; otherwise parses an object name.

Returns

Returns the object ID if successful; otherwise returns PDF_INVALID_OBJECT_REF.

Return type

int

See also ParseObjectName() and ParseObjectRef().

ProcessEncryption() → bool¶

Decrypts the PDF document if encrypted.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also SetObjectTable().

ReadHexString(offset: int) → tuple¶

Reads a hex strings.

Parameters

offset (int) – The offset of the hex string.

Returns

Returns a tuple containing the string and its size if successful; otherwise returns a tuple containing an empty string and -1.

Return type

str

ReadLiteralString(offset: int, encrypted_string: bool = False) → tuple¶

Reads a literal string.

Parameters

offset (int) – The offset of the literal string.

encrypted_string (bool) – If True, the reading of encrypted strings is supported.

Returns

Returns a tuple containing the string and its size if successful; otherwise returns a tuple containing an empty string and -1.

Return type

str

RegularCharsLength(offset_or_str: Union[int, str], _from: int = 0) → int¶

Calculates the length of characters not interrupted by reserved characters.

Parameters

offset_or_str (Union[int, str]) – Either the offset to the data or the string with the data.

_from (int) – An optional start position into the string.

Returns

Returns number of non-reserved characters.

Return type

int

SetJBIG2DecodeOptions(options: int) → None¶

Sets the decoding options for JBIG2 streams.

Parameters

options (int) – The decoding options for JBIG2 streams (e.g., JBIG2DecodeOpt_HelperProcess).

See also GetJBIG2DecodeOptions().

SetJBIG2DecodeTimeout(timeout: int) → None¶

Sets the JBIG2 decoding time-out value.

Parameters

timeout (int) – The time-out in milliseconds.

See also GetJBIG2DecodeTimeout().

SetJBIG2LibraryVersion(version: int) → None¶

Sets the version of the JBIG2 library used for decoding.

Available versions are:

1 - Deprecated.

2 - Default.

Parameters

version (int) – The version of the JBIG2 library.

See also SetJBIG2LibraryVersion().

SetObjectTable(objtable: Pro.PDF.PDFObjectTable) → None¶

Sets the internally stored object table.

Parameters

objtable (PDFObjectTable) – The object table.

See also GetObjectTable().

SkipEmptyChars(offset_or_str: Union[int, str], _from_or_down: Union[bool, int] = True) → int¶

Skips empty characters and comments.

Parameters

offset_or_str (Union[int, str]) – Either the offset to the data or a string containing the data.

_from_or_down (Union[bool, int]) – Either a boolean specifying the direction or an optional start position into the string.

Returns

Returns the number of skipped characters.

Return type

int

SkipNewLine(offset: int) → int¶

Skips new-line characters if present.

Parameters

offset (int) – The offset to the data.

Returns

Returns the number of skipped bytes.

Return type

int

static UnescapeLiteralString(str: str) → str¶

Unescapes a literal string.

Parameters

str (str) – The escaped string.

Returns

Returns the unescaped string.

Return type

str

Unpredict(raw: bytes, filter: str, parms: Pro.Core.NTStringStringHash, eid: int = PDF_INVALID_OBJECT_REF) → bytes¶

Removes PNG prediction on input data.

Parameters

raw (bytes) – The input data.

filter (str) – The filter name.

parms (NTStringStringHash) – The filter parameters.

eid (int) – The object ID.

Returns

Returns the decoded data.

Return type

bytes

class PDFObjectParseInfo¶

This class contains the information of a parsed PDF object.

See also PDFObject.ParseObject() and PDFObject.ParseObjectInfo().

Methods:

clear()

Clears the fields of the instance.

Attributes:

dict_offset

The offset of the object dictionary.

dict_size

The size of the object dictionary.

offset

The offset of the object.

parent_content

The data of the parent object if it’s a compressed object.

size

The size of the object.

stream_offset

The offset of the object stream.

stream_size

The size of the object stream.

clear() → None¶

Clears the fields of the instance.

dict_offset¶

The offset of the object dictionary.

See also dict_size.

dict_size¶

The size of the object dictionary.

See also dict_offset.

offset¶

The offset of the object.

See also size.

parent_content¶

The data of the parent object if it’s a compressed object.

size¶

The size of the object.

See also offset.

stream_offset¶

The offset of the object stream.

See also stream_size.

stream_size¶

The size of the object stream.

See also stream_offset.

class PDFObjectRef¶

This class represents a reference to a PDF object.

Attributes:

flags

The flags of the object (e.g., PDF_OBJREF_FLAG_UNREFERENCED).

generation

The generation of the object.

index

The index of the object if it’s a compressed object.

n

The object type.

offset

The offset of the object if it’s an uncompressed object.

parent

The parent of the object if it’s a compressed object.

flags¶

The flags of the object (e.g., PDF_OBJREF_FLAG_UNREFERENCED).

generation¶

The generation of the object.

index¶

The index of the object if it’s a compressed object.

See also n.

n¶

The object type.

The following values are supported:

"f" - Free object.

"n" - Uncompressed object.

"N" - Compressed object.

offset¶

The offset of the object if it’s an uncompressed object.

See also n.

parent¶

The parent of the object if it’s a compressed object.

See also n.

class PDFObjectTable¶

Dictionary of int -> PDFObjectRef elements.

Methods:

clear()

Removes all items from the hash.

contains(key)

Checks whether key is present in the hash.

count(key)

Counts the numbers of values associated with key in the hash.

insert(key, value)

Inserts a new item with key and a value of value.

insertMulti(key, value)

Inserts a new item with key and a value of value.

isEmpty()

Checks whether the hash is empty.

iterator()

Creates an iterator for the hash.

remove(key)

Removes all the items that have key from the hash.

reserve(alloc)

Ensures that the internal hash table consists of at least size buckets.

size()

Returns the number of items in the hash.

take(key)

Removes the item with key from the hash and returns the value associated with it.

value(key[, defaultValue])

Returns the value associated with key.

clear() → None¶

Removes all items from the hash.

contains(key: int) → bool¶

Checks whether key is present in the hash.

Parameters

key (int) – The key value to check for.

Returns

Returns True if the hash contains an item with the key; otherwise returns False.

Return type

bool

See also count().

count(key: int) → int¶

Counts the numbers of values associated with key in the hash.

Parameters

key (int) – The key value.

Returns

Returns the number of items associated with the key.

Return type

int

See also contains().

insert(key: int, value: Pro.PDF.PDFObjectRef) → None¶

Inserts a new item with key and a value of value.

Parameters

key (int) – The key.

value (PDFObjectRef) – The value.

See also insertMulti().

insertMulti(key: int, value: Pro.PDF.PDFObjectRef) → None¶

Inserts a new item with key and a value of value.

If there is already an item with the same key in the hash, this method will simply create a new one. (This behaviour is different from insert(), which overwrites the value of an existing item.)

Parameters

key (int) – The key.

value (PDFObjectRef) – The value.

See also insert().

isEmpty() → bool¶

Checks whether the hash is empty.

Returns

Returns True if the hash contains no items; otherwise returns False.

Return type

bool

See also size().

iterator() → Pro.PDF.PDFObjectTableIt ¶

Creates an iterator for the hash.

Returns

Returns the iterator.

Return type

PDFObjectTableIt

remove(key: int) → int¶

Removes all the items that have key from the hash.

Parameters

key (int) – The key to remove.

Returns

Returns the number of items removed which is usually 1 but will be 0 if the key isn’t in the hash, or greater than 1 if insertMulti() has been used with the key.

Return type

int

See also clear() and take().

reserve(alloc: int) → None¶

Ensures that the internal hash table consists of at least size buckets.

Parameters

alloc (int) – The allocation size.

size() → int¶

Returns

Returns the number of items in the hash.

Return type

int

See also isEmpty() and count().

take(key: int) → Pro.PDF.PDFObjectRef ¶

Removes the item with key from the hash and returns the value associated with it.

If the item does not exist in the hash, the method simply returns a default-constructed value. If there are multiple items for key in the hash, only the most recently inserted one is removed.

If you don’t use the return value, remove() is more efficient.

Parameters

key (int) – The key.

Returns

Returns the removed value.

Return type

PDFObjectRef

See also remove().

value(key: int, defaultValue: Optional[Pro.PDF.PDFObjectRef] = None) → Pro.PDF.PDFObjectRef ¶

Returns the value associated with key. If the hash contains no item with key, the method returns a default-constructed value if defaultValue is not provided. If there are multiple items for key in the hash, the value of the most recently inserted one is returned.

Parameters

key (int) – The key.

defaultValue (Optional[PDFObjectRef]) – The default value to return if key is not present in the hash.

Returns

Returns the value associated with key.

Return type

PDFObjectRef

See also contains().

class PDFObjectTableIt(obj: Pro.PDF.PDFObjectTable)¶

Iterator class for PDFObjectTable.

Parameters

obj (PDFObjectTable) – The object to iterate over.

Methods:

hasNext()

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

hasPrevious()

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

key()

Returns the key of the last item that was jumped over using one of the traversal functions (previous(), next()).

next()

Returns the next item and advances the iterator by one position.

previous()

Returns the previous item and moves the iterator back by one position.

toBack()

Moves the iterator to the back of the container (after the last item).

toFront()

Moves the iterator to the front of the container (before the first item).

value()

Returns the value of the last item that was jumped over using one of the traversal functions (previous(), next()).

hasNext() → bool¶

Returns

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

Return type

bool

See also hasPrevious() and next().

hasPrevious() → bool¶

Returns

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

Return type

bool

See also hasNext() and previous().

key() → int¶

Returns

Returns the key of the last item that was jumped over using one of the traversal functions (previous(), next()).

Return type

int

See also value().

next() → None¶

Returns

Returns the next item and advances the iterator by one position.

Return type

None

See also hasNext() and previous().

previous() → None¶

Returns

Returns the previous item and moves the iterator back by one position.

Return type

None

See also hasPrevious() and next().

toBack() → None¶

Moves the iterator to the back of the container (after the last item).

See also toFront() and previous().

toFront() → None¶

Moves the iterator to the front of the container (before the first item).

See also toBack() and next().

value() → Pro.PDF.PDFObjectRef ¶

Returns

Returns the value of the last item that was jumped over using one of the traversal functions (previous(), next()).

Return type

PDFObjectRef

See also key().

PDF_CROSSREF_ENTRY_SIZE: Final[int]¶

Size of an entry in a cross-reference table.

PDF_CreateEncryptionKey(Password: bytes, O: bytes, P: int, ID_1: bytes, EncryptMetaData: bool, Revision: int, KeyLenInBytes: int) → tuple¶

Generates an encryption key for a PDF document.

Parameters

Password (bytes) – The password.

O (bytes) – The O parameter.

P (int) – The P parameter.

ID_1 (bytes) – The ID_1 parameter.

EncryptMetaData (bool) – The EncryptMetadata parameter.

Revision (int) – The cryptographic revision number.

KeyLenInBytes (int) – The length of the key.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_GenerateOwnerKey(OwnerKey: bytes, UserKey: bytes, Revision: int, KeyLenInBytes: int) → tuple¶

Generates an owner key for a PDF document.

Parameters

OwnerKey (bytes) – The optional owner key.

UserKey (bytes) – The optional user key.

Revision (int) – The cryptographic revision number.

KeyLenInBytes (int) – The length of the key.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_GenerateUserKey(UserKey: bytes, O: bytes, P: int, ID_1: bytes, EncryptMetaData: bool, Revision: int, KeyLenInBytes: int) → tuple¶

Generates a user key for a PDF document.

Parameters

UserKey (bytes) – The optional user key.

O (bytes) – The O parameter.

P (int) – The P parameter.

ID_1 (bytes) – The ID_1 parameter.

EncryptMetaData (bool) – The EncryptMetadata parameter.

Revision (int) – The cryptographic revision number.

KeyLenInBytes (int) – The length of the key.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_INVALID_OBJECT_REF: Final[int]¶

Invalid PDF object id.

PDF_OBJREF_FLAG_UNREFERENCED: Final[int]¶

Flag for unreferenced PDF objects.

See also PDFObjectRef.flags.

PDF_R5_GenerateDecryptionKey(user_password: bytes, O: bytes, OE: bytes, U: bytes, UE: bytes) → tuple¶

Generates a revision 5 decryption key for a PDF document.

Parameters

user_password (bytes) – The optional user password.

O (bytes) – The O parameter.

OE (bytes) – The OE parameter.

U (bytes) – The U parameter.

UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R5_GenerateOwnerDecryptionKey(usrpwd: bytes, O: bytes, OE: bytes, U: bytes) → tuple¶

Generates a revision 5 owner decryption key for a PDF document.

Parameters

usrpwd (bytes) – The user password.

O (bytes) – The O parameter.

OE (bytes) – The OE parameter.

U (bytes) – The U parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R5_GenerateUserDecryptionKey(usrpwd: bytes, U: bytes, UE: bytes) → tuple¶

Generates a revision 5 user decryption key for a PDF document.

Parameters

usrpwd (bytes) – The user password.

U (bytes) – The U parameter.

UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R6_GenerateDecryptionKey(user_password: bytes, O: bytes, OE: bytes, U: bytes, UE: bytes) → tuple¶

Generates a revision 6 decryption key for a PDF document.

Parameters

user_password (bytes) – The optional user password.

O (bytes) – The O parameter.

OE (bytes) – The OE parameter.

U (bytes) – The U parameter.

UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R6_GenerateUserDecryptionKey(usrpwd: bytes, U: bytes, UE: bytes) → tuple¶

Generates a revision 6 user decryption key for a PDF document.

Parameters

usrpwd (bytes) – The user password.

U (bytes) – The U parameter.

UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]