Pro.PDF — API for parsing PDF documents

Overview

The Pro.PDF module contains the API for parsing PDF documents.

PDF Parsing

The following code example demonstrates how to iterate through all the objects in a PDF document:

from Pro.Core import *
from Pro.PDF import *

def parsePDF(fname):
    # open the file
    c = createContainerFromFile(fname)
    if c.isNull():
        print("error: couldn't open file")
        return
    # load the file as PDF
    pdf = PDFObject()
    if not pdf.Load(c):
        print("error: invalid file format")
        return
    # parse all referenced objects
    objtable = pdf.BuildObjectTable()
    # detect unreferenced objects
    # (corrupted or malicious PDFs may contain them)
    pdf.DetectObjects(objtable)
    # store the object table internally
    pdf.SetObjectTable(objtable)
    # process PDF encryption
    if not pdf.ProcessEncryption():
        print("warning: couldn't decrypt file")
    # [optional] sort objects by ID
    oids = []
    it = objtable.iterator()
    while it.hasNext():
        oid, _ = it.next()
        oids.append(oid)
    oids.sort()
    # iterate through the objects
    for oid in oids:
        # print out the object id
        print("\nOBJECT ID:", oid >> 32, "\n")
        # parse the object
        ret, dictn, content, info = pdf.ParseObject(objtable, oid)
        if not ret:
            print("warning: couldn't parse object %d" % (oid,))
            continue
        # print out the object dictionary
        it = dictn.iterator()
        while it.hasNext():
            k, v = it.next()
            print("   ", k, "-", v)
        # print out the decoded object stream
        content = pdf.DecodeObjectStream(content, dictn, oid)
        if not content:
            continue
        out = NTTextBuffer()
        out.printHex(content)
        print("\n", out.buffer)

Hint

Since PDF parsing can be a complex operation, it is often recommended to leverage the scan engine to extract artifacts.

In the following example, a hook is used to extract JavaScript code from a PDF document:

from Pro.Core import *

def printJSEntry(sp, xml, tnode):
    # data node
    dnode = xml.findChild(tnode, "d")
    if not dnode:
        return
    # we let the scan engine extract the JavaScript for us
    params = NTStringVariantHash()
    params.insert("op", "js")
    idnode = xml.findChild(dnode, "id")
    if idnode:
        params.insert("id", int(xml.value(idnode), 16))
    ridnode = xml.findChild(dnode, "rid")
    if idnode:
        params.insert("rid", int(xml.value(ridnode), 16))
    js = sp.customOperation(params)
    # print out the JavaScript
    print("JS CODE")
    print("-------")
    print(js)

def pdfExtractJS(sp, ud):
    xml = sp.getReportXML()
    # object node
    onode = xml.findChild(None, "o")
    if onode:
        # scan node
        snode = xml.findChild(onode, "s")
        if snode:
            # enumerate scan entries
            tchild = xml.firstChild(snode)
            while tchild:
                if xml.name(tchild) == "t":
                    # type attribute
                    tattr = xml.findAttribute(tchild, "t")
                    # check if it's a JavaScript entry
                    if tattr and int(xml.value(tattr)) == CT_JavaScript:
                        printJSEntry(sp, xml, tchild)
                tchild = xml.nextSibling(tchild)

Module API

Pro.PDF module API.

Classes:

PDFCrossRefTable()

This class represents a PDF cross-reference table.

PDFCrossRefTableList()

List of PDFCrossRefTable elements.

PDFCrossRefTableListIt(obj)

Iterator class for PDFCrossRefTableList.

PDFCrossRefTableSection()

This class represents the section of a PDF cross-reference table.

PDFCrossRefTableSectionList()

List of PDFCrossRefTableSection elements.

PDFCrossRefTableSectionListIt(obj)

Iterator class for PDFCrossRefTableSectionList.

PDFObject()

This class represents a PDF document.

PDFObjectParseInfo()

This class contains the information of a parsed PDF object.

PDFObjectRef()

This class represents a reference to a PDF object.

PDFObjectTable()

Dictionary of int -> PDFObjectRef elements.

PDFObjectTableIt(obj)

Iterator class for PDFObjectTable.

Attributes:

PDF_CROSSREF_ENTRY_SIZE

Size of an entry in a cross-reference table.

PDF_INVALID_OBJECT_REF

Invalid PDF object id.

PDF_OBJREF_FLAG_UNREFERENCED

Flag for unreferenced PDF objects.

Functions:

PDF_CreateEncryptionKey(Password, O, P, …)

Generates an encryption key for a PDF document.

PDF_GenerateOwnerKey(OwnerKey, UserKey, …)

Generates an owner key for a PDF document.

PDF_GenerateUserKey(UserKey, O, P, ID_1, …)

Generates a user key for a PDF document.

PDF_R5_GenerateDecryptionKey(user_password, …)

Generates a revision 5 decryption key for a PDF document.

PDF_R5_GenerateOwnerDecryptionKey(usrpwd, O, …)

Generates a revision 5 owner decryption key for a PDF document.

PDF_R5_GenerateUserDecryptionKey(usrpwd, U, UE)

Generates a revision 5 user decryption key for a PDF document.

PDF_R6_GenerateDecryptionKey(user_password, …)

Generates a revision 6 decryption key for a PDF document.

PDF_R6_GenerateUserDecryptionKey(usrpwd, U, UE)

Generates a revision 6 user decryption key for a PDF document.

class PDFCrossRefTable

This class represents a PDF cross-reference table.

Attributes:

Prev

Offset of the previous cross-reference table if available; otherwise Pro.Core.INVALID_STREAM_OFFSET.

sections

List of PDFCrossRefTableSection.

trailer

Trailer dictionary.

Prev

Offset of the previous cross-reference table if available; otherwise Pro.Core.INVALID_STREAM_OFFSET.

sections

List of PDFCrossRefTableSection.

See also PDFCrossRefTableSectionList.

trailer

Trailer dictionary.

See also Pro.Core.NTStringStringHash.

class PDFCrossRefTableList

List of PDFCrossRefTable elements.

Methods:

append(value)

Inserts value at the end of the list.

at(i)

Returns the item at index position i in the list.

clear()

Removes all items from the list.

contains(value)

Checks the presence of an element in the list.

count(value)

Returns the number of occurrences of value in the list.

indexOf(value[, start])

Searches for an element in the list.

insert(i, value)

Inserts value at index position i in the list.

isEmpty()

Checks whether the list is empty.

iterator()

Creates an iterator for the list.

removeAll(value)

Removes all occurrences of value in the list and returns the number of entries removed.

removeAt(i)

Removes the item at index position i.

reserve(alloc)

Reserve space for alloc elements.

size()

Returns the number of items in the list.

takeAt(i)

Removes the item at index position i and returns it.

append(value: Pro.PDF.PDFCrossRefTable)None

Inserts value at the end of the list.

Parameters

value (PDFCrossRefTable) – The value to add to the list.

See also insert().

at(i: int)Pro.PDF.PDFCrossRefTable

Returns the item at index position i in the list. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to return.

Returns

Returns the requested element.

Return type

PDFCrossRefTable

clear()None

Removes all items from the list.

contains(value: Pro.PDF.PDFCrossRefTable)bool

Checks the presence of an element in the list.

Parameters

value (PDFCrossRefTable) – The value to check for.

Returns

Returns True if the list contains an occurrence of value; otherwise returns False.

Return type

bool

See also indexOf() and count().

count(value: Pro.PDF.PDFCrossRefTable)int

Returns the number of occurrences of value in the list.

Parameters

value (PDFCrossRefTable) – The value to count.

Returns

Returns the number of occurrences.

Return type

int

See also indexOf() and contains().

indexOf(value: Pro.PDF.PDFCrossRefTable, start: int = 0)int

Searches for an element in the list.

Parameters
  • value (PDFCrossRefTable) – The value to search for.

  • start (int) – The start index.

Returns

Returns the index position of the first occurrence of value in the list. Returns -1 if no item was found.

Return type

int

See also contains().

insert(i: int, value: Pro.PDF.PDFCrossRefTable)None

Inserts value at index position i in the list. If i is 0, the value is prepended to the list. If i is size(), the value is appended to the list.

Parameters
  • i (int) – The position at which to add the value.

  • value (PDFCrossRefTable) – The value to add.

See also append() and removeAt().

isEmpty()bool

Checks whether the list is empty.

Returns

Returns True if the list contains no items; otherwise returns False.

Return type

bool

See also size().

iterator()Pro.PDF.PDFCrossRefTableListIt

Creates an iterator for the list.

Returns

Returns the iterator.

Return type

PDFCrossRefTableListIt

removeAll(value: Pro.PDF.PDFCrossRefTable)int

Removes all occurrences of value in the list and returns the number of entries removed.

Parameters

value (PDFCrossRefTable) – The value to remove from the list.

Returns

Returns the number of entries removed.

Return type

int

See also removeAt().

removeAt(i: int)None

Removes the item at index position i. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the item to remove.

See also removeAll().

reserve(alloc: int)None

Reserve space for alloc elements. Calling this method doesn’t change the size of the list.

Parameters

alloc (int) – The amount of elements to reserve space for.

size()int
Returns

Returns the number of items in the list.

Return type

int

See also isEmpty().

takeAt(i: int)Pro.PDF.PDFCrossRefTable

Removes the item at index position i and returns it. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to remove from the list.

Returns

Returns the removed element. If you don’t use the return value, removeAt() is more efficient.

Return type

PDFCrossRefTable

See also removeAt().

class PDFCrossRefTableListIt(obj: Pro.PDF.PDFCrossRefTableList)

Iterator class for PDFCrossRefTableList.

Parameters

obj (PDFCrossRefTableList) – The object to iterate over.

Methods:

hasNext()

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

hasPrevious()

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

next()

Returns the next item and advances the iterator by one position.

previous()

Returns the previous item and moves the iterator back by one position.

toBack()

Moves the iterator to the back of the container (after the last item).

toFront()

Moves the iterator to the front of the container (before the first item).

hasNext()bool
Returns

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

Return type

bool

See also hasPrevious() and next().

hasPrevious()bool
Returns

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

Return type

bool

See also hasNext() and previous().

next()Pro.PDF.PDFCrossRefTable
Returns

Returns the next item and advances the iterator by one position.

Return type

PDFCrossRefTable

See also hasNext() and previous().

previous()Pro.PDF.PDFCrossRefTable
Returns

Returns the previous item and moves the iterator back by one position.

Return type

PDFCrossRefTable

See also hasPrevious() and next().

toBack()None

Moves the iterator to the back of the container (after the last item).

See also toFront() and previous().

toFront()None

Moves the iterator to the front of the container (before the first item).

See also toBack() and next().

class PDFCrossRefTableSection

This class represents the section of a PDF cross-reference table.

See also PDFCrossRefTable.

Attributes:

array_offset

The offset of the cross-reference section entries.

count

The number of entries.

start_id

The id of the first object in the array.

array_offset

The offset of the cross-reference section entries.

count

The number of entries.

start_id

The id of the first object in the array.

See also PDFObject.OBJID().

class PDFCrossRefTableSectionList

List of PDFCrossRefTableSection elements.

Methods:

append(value)

Inserts value at the end of the list.

at(i)

Returns the item at index position i in the list.

clear()

Removes all items from the list.

contains(value)

Checks the presence of an element in the list.

count(value)

Returns the number of occurrences of value in the list.

indexOf(value[, start])

Searches for an element in the list.

insert(i, value)

Inserts value at index position i in the list.

isEmpty()

Checks whether the list is empty.

iterator()

Creates an iterator for the list.

removeAll(value)

Removes all occurrences of value in the list and returns the number of entries removed.

removeAt(i)

Removes the item at index position i.

reserve(alloc)

Reserve space for alloc elements.

size()

Returns the number of items in the list.

takeAt(i)

Removes the item at index position i and returns it.

append(value: Pro.PDF.PDFCrossRefTableSection)None

Inserts value at the end of the list.

Parameters

value (PDFCrossRefTableSection) – The value to add to the list.

See also insert().

at(i: int)Pro.PDF.PDFCrossRefTableSection

Returns the item at index position i in the list. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to return.

Returns

Returns the requested element.

Return type

PDFCrossRefTableSection

clear()None

Removes all items from the list.

contains(value: Pro.PDF.PDFCrossRefTableSection)bool

Checks the presence of an element in the list.

Parameters

value (PDFCrossRefTableSection) – The value to check for.

Returns

Returns True if the list contains an occurrence of value; otherwise returns False.

Return type

bool

See also indexOf() and count().

count(value: Pro.PDF.PDFCrossRefTableSection)int

Returns the number of occurrences of value in the list.

Parameters

value (PDFCrossRefTableSection) – The value to count.

Returns

Returns the number of occurrences.

Return type

int

See also indexOf() and contains().

indexOf(value: Pro.PDF.PDFCrossRefTableSection, start: int = 0)int

Searches for an element in the list.

Parameters
Returns

Returns the index position of the first occurrence of value in the list. Returns -1 if no item was found.

Return type

int

See also contains().

insert(i: int, value: Pro.PDF.PDFCrossRefTableSection)None

Inserts value at index position i in the list. If i is 0, the value is prepended to the list. If i is size(), the value is appended to the list.

Parameters

See also append() and removeAt().

isEmpty()bool

Checks whether the list is empty.

Returns

Returns True if the list contains no items; otherwise returns False.

Return type

bool

See also size().

iterator()Pro.PDF.PDFCrossRefTableSectionListIt

Creates an iterator for the list.

Returns

Returns the iterator.

Return type

PDFCrossRefTableSectionListIt

removeAll(value: Pro.PDF.PDFCrossRefTableSection)int

Removes all occurrences of value in the list and returns the number of entries removed.

Parameters

value (PDFCrossRefTableSection) – The value to remove from the list.

Returns

Returns the number of entries removed.

Return type

int

See also removeAt().

removeAt(i: int)None

Removes the item at index position i. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the item to remove.

See also removeAll().

reserve(alloc: int)None

Reserve space for alloc elements. Calling this method doesn’t change the size of the list.

Parameters

alloc (int) – The amount of elements to reserve space for.

size()int
Returns

Returns the number of items in the list.

Return type

int

See also isEmpty().

takeAt(i: int)Pro.PDF.PDFCrossRefTableSection

Removes the item at index position i and returns it. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to remove from the list.

Returns

Returns the removed element. If you don’t use the return value, removeAt() is more efficient.

Return type

PDFCrossRefTableSection

See also removeAt().

class PDFCrossRefTableSectionListIt(obj: Pro.PDF.PDFCrossRefTableSectionList)

Iterator class for PDFCrossRefTableSectionList.

Parameters

obj (PDFCrossRefTableSectionList) – The object to iterate over.

Methods:

hasNext()

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

hasPrevious()

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

next()

Returns the next item and advances the iterator by one position.

previous()

Returns the previous item and moves the iterator back by one position.

toBack()

Moves the iterator to the back of the container (after the last item).

toFront()

Moves the iterator to the front of the container (before the first item).

hasNext()bool
Returns

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

Return type

bool

See also hasPrevious() and next().

hasPrevious()bool
Returns

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

Return type

bool

See also hasNext() and previous().

next()Pro.PDF.PDFCrossRefTableSection
Returns

Returns the next item and advances the iterator by one position.

Return type

PDFCrossRefTableSection

See also hasNext() and previous().

previous()Pro.PDF.PDFCrossRefTableSection
Returns

Returns the previous item and moves the iterator back by one position.

Return type

PDFCrossRefTableSection

See also hasPrevious() and next().

toBack()None

Moves the iterator to the back of the container (after the last item).

See also toFront() and previous().

toFront()None

Moves the iterator to the front of the container (before the first item).

See also toBack() and next().

class PDFObject

Bases: Pro.Core.CFFObject

This class represents a PDF document.

Methods:

AddManuallyObjectToTable(objtable, offset)

Adds a PDF object to a PDF object table.

BuildObjectTable([xref_offset])

Creates a PDF object table.

BuildStringObjectFromBytes(bytes)

Builds a string object from raw bytes.

CatalogTreeToInverseHash(catalog)

Converts a PDF catalog tree into an inverse hash mapping object IDs to page numbers.

ComputeCatalogTree(objtable[, eid])

Computes the catalog tree of the PDF document.

CountUncompressedObjects(objtable)

Calculates the number of uncompressed objects.

CurrentEOF()

Returns the offset of the “%%EOF” string.

DecodeNameObject(raw)

Decodes a name object.

DecodeObjectStream(raw, dictionary_or_filter)

Decodes the stream of an object.

DecodeObjectStreamEx(raw, dictionary[, eid])

Decodes the stream of an object.

DecodeObjectStreamWithFilter(raw, filter[, …])

Decodes the stream of an object.

DecodingOperationsFinished()

Closes helper processes if spawned.

DetectObjects(objtable)

Detects unreferenced objects in the PDF document.

EnableFilter(type[, b])

Sets whether the specified decoding filter is enabled.

EnumerateCrossRefTables([xref_offset])

Enumerates cross-reference tables in a PDF document.

FindObjects(objtable, pathstr[, compressed])

Finds objects matching the specified criteria.

FlattenCatalogTree(catalog)

Converts a PDF catalog tree into a list of pages in their correct order as object IDs.

GetDictValue(dict, key[, dflt])

Gets the value from a PDF dictionary.

GetEOF()

Finds the position of the “%%EOF” string.

GetElement(offset)

Retrieves a PDF element such as a dictionary, a list or other objects.

GetElementSize(offset_or_str[, _from])

Calculates the size of a PDF element such as a dictionary, a list or other objects.

GetFilterDefaultParameters(filter)

Retrieves the default parameters for a filter.

GetJBIG2DecodeOptions()

Returns the decoding options for JBIG2 streams.

GetJBIG2DecodeTimeout()

Returns the time-out value for the JBIG2 decoding process.

GetJBIG2LibraryVersion()

Returns the version of the JBIG2 library used for decoding.

GetObjectContent(objtable, eid)

Retrieves the decoded stream of a PDF object and its dictionary.

GetObjectContentEx(objtable, eid)

Retrieves the decoded stream of a PDF object and its dictionary.

GetObjectTable()

Returns the internally stored object table.

GetStartXRef([pos])

Retrieves the start cross-reference offset.

GetStartXRefEx(pos)

Retrieves the start cross-reference offset.

GetStringObjectBytes(str)

Converts a string object back to its original bytes.

GetSupportedFilterNames()

Returns the list of supported decoding filter names.

GetTrailer(i)

Retrieves a specified trailer dictionary.

GetTrailers()

Returns the list of trailer dictionaries.

HasEncryption()

Returns True if the PDF has encryption; otherwise returns False.

HexStringSize(str[, _from])

Computes the size of a hex string object.

IsContainer(estr)

Checks whether an element is either a dictionary or a list.

IsDecrypted()

Returns True if the PDF doesn’t have encryption or was decrypted; otherwise returns False.

IsFilterEnabled(type)

Checks whether the specified decoding filter is enabled.

IsValidPDF()

Returns True if the PDF document has a “%%EOF” signature; otherwise returns False.

LiteralStringSize(str[, _from])

Calculates the size of a literal string.

OBJID(id, generation)

Creates an object ID from its number and generation.

OBJIDGEN(oid)

Retrieves the object generation from an object ID.

OBJIDNUM(oid)

Retrieves the object number from an object ID.

ObjectToString(eid_or_objtable, eid_or_ref)

Converts an object ID to a string.

PDFValueLength(offset_or_str[, _from])

Calculates the size of a value.

ParseContainerElement(estr[, eid])

Parses a container element such as a dictionary or a list.

ParseCrossRefEntry(bytes)

Parses a cross-reference entry.

ParseObject(objtable, eid)

Parses a PDF object without decoding its stream.

ParseObjectContent(objtable, eid)

Retrieves the stream of a PDF object without decoding it.

ParseObjectDictionary(objtable, eid)

Retrieves the dictionary of a PDF object.

ParseObjectInfo(objtable, eid)

Retrieves the parsing information of a PDF object.

ParseObjectName(ref)

Converts an object name such as “10 1 obj” into its ID.

ParseObjectRef(ref)

Converts an object reference such as “10 1 R” into its ID.

ParseObjectRefOrName(ref, parse_ref)

Converts an object name or reference into its ID.

ProcessEncryption()

Decrypts the PDF document if encrypted.

ReadHexString(offset)

Reads a hex strings.

ReadLiteralString(offset[, encrypted_string])

Reads a literal string.

RegularCharsLength(offset_or_str[, _from])

Calculates the length of characters not interrupted by reserved characters.

SetJBIG2DecodeOptions(options)

Sets the decoding options for JBIG2 streams.

SetJBIG2DecodeTimeout(timeout)

Sets the JBIG2 decoding time-out value.

SetJBIG2LibraryVersion(version)

Sets the version of the JBIG2 library used for decoding.

SetObjectTable(objtable)

Sets the internally stored object table.

SkipEmptyChars(offset_or_str[, _from_or_down])

Skips empty characters and comments.

SkipNewLine(offset)

Skips new-line characters if present.

UnescapeLiteralString(str)

Unescapes a literal string.

Unpredict(raw, filter, parms[, eid])

Removes PNG prediction on input data.

Attributes:

FilterType_ASCII85Decode

ASCII85Decode filter type.

FilterType_ASCIIHexDecode

ASCIIHexDecode filter type.

FilterType_CCITTFaxDecode

CCITTFaxDecode filter type.

FilterType_DCTDecode

DCTDecode filter type.

FilterType_FlateDecode

FlateDecode filter type.

FilterType_JBIG2Decode

JBIG2Decode filter type.

FilterType_JPXDecode

JPXDecode filter type.

FilterType_LZWDecode

LZWDecode filter type.

FilterType_RunLengthDecode

RunLengthDecode filter type.

JBIG2DecodeOpt_HelperProcess

JBIG2 decoding option to decode JBIG2 streams in a separate process.

JBIG2DecodeOpt_InProcess

Default JBIG2 decoding option to decode JBIG2 streams in the same process.

JBIG2DecodeOpt_NoDecode

JBIG2 decoding option to disable the decoding of JBIG2 streams.

AddManuallyObjectToTable(objtable: Pro.PDF.PDFObjectTable, offset: int)bool

Adds a PDF object to a PDF object table.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • offset (int) – The offset of the object.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also BuildObjectTable(), DetectObjects() and SetObjectTable().

BuildObjectTable(xref_offset: int = INVALID_STREAM_OFFSET)Pro.PDF.PDFObjectTable

Creates a PDF object table.

Parameters

xref_offset (int) – The optional offset of the cross-reference table.

Returns

Returns the object table if successful; otherwise returns an empty PDFObjectTable instance.

Return type

PDFObjectTable

See also DetectObjects(), SetObjectTable() and AddManuallyObjectToTable().

BuildStringObjectFromBytes(bytes: bytes)str

Builds a string object from raw bytes.

This method is used to construct strings after decryption.

Parameters

bytes (bytes) – The input data.

Returns

Returns the string object if successful; otherwise returns an empty string.

Return type

str

CatalogTreeToInverseHash(catalog: Pro.Core.NTMaxUIntTree)Pro.Core.NTUInt64UIntHash

Converts a PDF catalog tree into an inverse hash mapping object IDs to page numbers.

Parameters

catalog (NTMaxUIntTree) – The catalog tree.

Returns

Returns the inverse hash if successful; otherwise returns an empty Pro.Core.NTUInt64UIntHash instance.

Return type

NTUInt64UIntHash

See also ComputeCatalogTree() and FlattenCatalogTree().

ComputeCatalogTree(objtable: Pro.PDF.PDFObjectTable, eid: int = PDF_INVALID_OBJECT_REF)Pro.Core.NTMaxUIntTree

Computes the catalog tree of the PDF document.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • eid (int) – The optional catalog tree object ID.

Returns

Returns the computed catalog tree if successful; otherwise returns an empty Pro.Core.NTMaxUIntTree instance.

Return type

NTMaxUIntTree

See also CatalogTreeToInverseHash() and FlattenCatalogTree().

CountUncompressedObjects(objtable: Pro.PDF.PDFObjectTable)int

Calculates the number of uncompressed objects.

Parameters

objtable (PDFObjectTable) – The object table.

Returns

Returns the number of uncompressed objects.

Return type

int

See also PDFObjectRef.n.

CurrentEOF()int
Returns

Returns the offset of the “%%EOF” string.

Return type

int

DecodeNameObject(raw: str)str

Decodes a name object.

For instance, it converts something like “/Adobe#20Green” to “/Adobe Green”.

Parameters

raw (str) – The raw name object.

Returns

Returns the decode name object.

Return type

str

DecodeObjectStream(raw: bytes, dictionary_or_filter: Union[Pro.Core.NTStringStringHash, str], eid_or_parms: Union[Pro.Core.NTStringStringHash, int] = NTStringStringHashList(), eid: int = PDF_INVALID_OBJECT_REF)bytes

Decodes the stream of an object.

Parameters
  • raw (bytes) – The stream data.

  • dictionary_or_filter (Union[NTStringStringHash, str]) – Either the dictionary of the PDF object or the name of the filter.

  • eid_or_parms (Union[NTStringStringHash, int]) – Either the object ID or the parameters of the filter if the dictionary wasn’t provided.

  • eid (int) – The object ID if the dictionary wasn’t provided.

Returns

Returns the decoded data if successful; otherwise returns an empty bytes object.

Return type

bytes

See also DecodeObjectStreamEx(), DecodeObjectStreamWithFilter() and ParseObject().

DecodeObjectStreamEx(raw: bytes, dictionary: Pro.Core.NTStringStringHash, eid: Optional[int] = None)tuple

Decodes the stream of an object.

Parameters
  • raw (bytes) – The stream data.

  • dictionary (NTStringStringHash) – The dictionary of the PDF object.

  • eid (Optional[int]) – The object ID.

Returns

Returns a tuple containing the decoded data and an error string.

Return type

tuple[bytes, str]

See also DecodeObjectStream(), DecodeObjectStreamWithFilter() and ParseObject().

DecodeObjectStreamWithFilter(raw: bytes, filter: str, parms: Optional[Pro.Core.NTStringStringHash] = None, eid: Optional[int] = None)tuple

Decodes the stream of an object.

Parameters
  • raw (bytes) – The stream data.

  • filter (str) – The name of the filter.

  • parms (Optional[NTStringStringHash]) – The parameters of the filter.

  • eid (Optional[int]) – The object ID.

Returns

Returns a tuple containing the decoded data and an error string.

Return type

tuple[bytes, str]

See also DecodeObjectStream(), DecodeObjectStreamEx() and ParseObject().

DecodingOperationsFinished()None

Closes helper processes if spawned.

Note

This method should be used only in conjunction with JBIG2DecodeOpt_HelperProcess.

See also SetJBIG2DecodeOptions().

DetectObjects(objtable: Pro.PDF.PDFObjectTable)None

Detects unreferenced objects in the PDF document.

Hint

This method can be called after BuildObjectTable() to detect additional unreferenced objects.

Parameters

objtable (PDFObjectTable) – The PDF object table.

See also BuildObjectTable(), SetObjectTable() and AddManuallyObjectToTable().

EnableFilter(type: int, b: bool = True)None

Sets whether the specified decoding filter is enabled.

Note

By default all decoding filters are enabled.

Parameters
  • type (int) – The decoding filter (e.g., ASCIIHexDecode).

  • b (bool) – If True, enables the filter; otherwise disables it.

See also IsFilterEnabled().

EnumerateCrossRefTables(xref_offset: int = INVALID_STREAM_OFFSET)Pro.PDF.PDFCrossRefTableList

Enumerates cross-reference tables in a PDF document.

Note

This method is called internally by BuildObjectTable().

Parameters

xref_offset (int) – The optional offset of the first cross-reference table.

Returns

Returns the list of cross-reference tables if successful; otherwise returns an empty PDFCrossRefTableList instance.

Return type

PDFCrossRefTableList

See also BuildObjectTable().

FilterType_ASCII85Decode: Final[int]

ASCII85Decode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_ASCIIHexDecode: Final[int]

ASCIIHexDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_CCITTFaxDecode: Final[int]

CCITTFaxDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_DCTDecode: Final[int]

DCTDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_FlateDecode: Final[int]

FlateDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_JBIG2Decode: Final[int]

JBIG2Decode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_JPXDecode: Final[int]

JPXDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_LZWDecode: Final[int]

LZWDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FilterType_RunLengthDecode: Final[int]

RunLengthDecode filter type.

See also EnableFilter() and IsFilterEnabled().

FindObjects(objtable: Pro.PDF.PDFObjectTable, pathstr: str, compressed: bool = True)Pro.Core.NTUInt64List

Finds objects matching the specified criteria.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • pathstr (str) – The search criteria. This can match multiple keys as well as specify allowed values (e.g., "Parent;Type|T;A|B").

  • compressed (bool) – If True, includes compressed objects in the search.

Returns

Returns a list of matching objects.

Return type

NTUInt64List

FlattenCatalogTree(catalog: Pro.Core.NTMaxUIntTree)Pro.Core.NTUInt64List

Converts a PDF catalog tree into a list of pages in their correct order as object IDs.

Parameters

catalog (NTMaxUIntTree) – The catalog tree to convert.

Returns

Returns the list if successful; otherwise returns an empty Pro.Core.NTUInt64List instance.

Return type

NTUInt64List

See also ComputeCatalogTree() and CatalogTreeToInverseHash().

GetDictValue(dict: Pro.Core.NTStringStringHash, key: str, dflt: str = str())str

Gets the value from a PDF dictionary.

This method automatically resolves object references.

Parameters
  • dict (NTStringStringHash) – The dictionary.

  • key (str) – The value to extract. This parameter can specify a sub-key using the semi-colon character as separator.

  • dflt (str) – The default value.

Returns

Returns the value from the dictionary if successful; otherwise returns the default value.

Return type

str

GetEOF()int

Finds the position of the “%%EOF” string.

Returns

Returns the offset if successful; otherwise returns Pro.Core.INVALID_STREAM_OFFSET().

Return type

int

See also CurrentEOF().

GetElement(offset: int)tuple

Retrieves a PDF element such as a dictionary, a list or other objects.

Parameters

offset (int) – The offset of the element.

Returns

Returns a tuple containing the element as string and its size if successful; otherwise returns an empty string and -1.

Return type

str

See also GetElementSize().

GetElementSize(offset_or_str: Union[int, str], _from: int = 0)int

Calculates the size of a PDF element such as a dictionary, a list or other objects.

Parameters
  • offset_or_str (Union[int, str]) – Either the offset of an element or the element as string.

  • _from (int) – An optional start position into the string.

Returns

Returns the size of the element.

Return type

int

See also GetElement().

static GetFilterDefaultParameters(filter: str)Pro.Core.NTStringStringHash

Retrieves the default parameters for a filter.

Parameters

filter (str) – The filter type.

Returns

Returns the default parameters if available; otherwise returns an empty Pro.Core.NTStringStringHash instance.

Return type

NTStringStringHash

GetJBIG2DecodeOptions()int
Returns

Returns the decoding options for JBIG2 streams.

Return type

int

See also SetJBIG2DecodeOptions().

GetJBIG2DecodeTimeout()int
Returns

Returns the time-out value for the JBIG2 decoding process.

Return type

int

See also SetJBIG2DecodeTimeout().

GetJBIG2LibraryVersion()int
Returns

Returns the version of the JBIG2 library used for decoding.

Return type

int

See also SetJBIG2LibraryVersion().

GetObjectContent(objtable: Pro.PDF.PDFObjectTable, eid: int)tuple

Retrieves the decoded stream of a PDF object and its dictionary.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • eid (int) – The object ID.

Returns

Returns a tuple containing the decoded stream and the dictionary of the object if successful; otherwise returns a tuple containing an empty bytes object and an empty NTStringStringHash() instance.

Return type

tuple[bytes, NTStringStringHash]

See also GetObjectContentEx().

GetObjectContentEx(objtable: Pro.PDF.PDFObjectTable, eid: int)tuple

Retrieves the decoded stream of a PDF object and its dictionary.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • eid (int) – The object ID.

Returns

Returns a tuple containing the decoded stream, the dictionary of the object and an empty string if successful; otherwise returns a tuple containing an empty bytes object, an empty NTStringStringHash() instance and an error string.

Return type

tuple[bytes, NTStringStringHash, str]

See also GetObjectContent().

GetObjectTable()Pro.PDF.PDFObjectTable
Returns

Returns the internally stored object table.

Return type

PDFObjectTable

See also SetObjectTable().

GetStartXRef(pos: int = 0)int

Retrieves the start cross-reference offset.

Parameters

pos (int) – An optional start position for the search.

Returns

Returns the offset if successful; otherwise returns Pro.Core.INVALID_STREAM_OFFSET.

Return type

int

See also GetStartXRefEx().

GetStartXRefEx(pos: int)tuple

Retrieves the start cross-reference offset.

Parameters

pos (int) – An optional start position for the search.

Returns

Returns a tuple containing the offset and the offset of the “startxref” if successful; otherwise returns a tuple containing Pro.Core.INVALID_STREAM_OFFSET and an undefined value.

Return type

int

See also GetStartXRef().

GetStringObjectBytes(str: str)bytes

Converts a string object back to its original bytes.

Parameters

str (str) – The string object.

Returns

Returns the raw data of the string object.

Return type

bytes

static GetSupportedFilterNames()Pro.Core.NTStringList
Returns

Returns the list of supported decoding filter names.

Return type

NTStringList

GetTrailer(i: int)Pro.Core.NTStringStringHash

Retrieves a specified trailer dictionary.

Parameters

i (int) – The index of the trailer dictionary to retrieve.

Returns

Returns the specified trailer dictionary if available; otherwise returns an empty Pro.Core.NTStringStringHash instance.

Return type

NTStringStringHash

See also GetTrailers().

GetTrailers()Pro.Core.NTStringStringHashList
Returns

Returns the list of trailer dictionaries.

Return type

NTStringStringHashList

See also GetTrailer().

HasEncryption()bool
Returns

Returns True if the PDF has encryption; otherwise returns False.

Return type

bool

Available since Cerbero Suite 7.2 and Cerbero Engine 4.2.

See also IsDecrypted() and ProcessEncryption().

HexStringSize(str: str, _from: int = 0)int

Computes the size of a hex string object.

Parameters
  • str (str) – The hex string objext.

  • _from (int) – An optional position into the string.

Returns

Returns the computed size.

Return type

int

static IsContainer(estr: str)bool

Checks whether an element is either a dictionary or a list.

Parameters

estr (str) – The element.

Returns

Returns True if the element is a dictionary or a list; otherwise returns False.

Return type

bool

IsDecrypted()bool
Returns

Returns True if the PDF doesn’t have encryption or was decrypted; otherwise returns False.

Return type

bool

Available since Cerbero Suite 7.2 and Cerbero Engine 4.2.

See also HasEncryption() and ProcessEncryption().

IsFilterEnabled(type: int)bool

Checks whether the specified decoding filter is enabled.

Parameters

type (int) – The decoding filter (e.g., ASCIIHexDecode).

Returns

Returns True if the filter is enabled; otherwise returns False.

Return type

bool

See also EnableFilter().

IsValidPDF()bool
Returns

Returns True if the PDF document has a “%%EOF” signature; otherwise returns False.

Return type

bool

See also GetEOF().

JBIG2DecodeOpt_HelperProcess: Final[int]

JBIG2 decoding option to decode JBIG2 streams in a separate process.

See also SetJBIG2DecodeOptions() and SetJBIG2DecodeTimeout().

JBIG2DecodeOpt_InProcess: Final[int]

Default JBIG2 decoding option to decode JBIG2 streams in the same process.

See also GetJBIG2DecodeOptions() and SetJBIG2DecodeOptions().

JBIG2DecodeOpt_NoDecode: Final[int]

JBIG2 decoding option to disable the decoding of JBIG2 streams.

See also GetJBIG2DecodeOptions() and SetJBIG2DecodeOptions().

LiteralStringSize(str: str, _from: int = 0)int

Calculates the size of a literal string.

Parameters
  • str (str) – The literal string.

  • _from (int) – An optional start position.

Returns

Returns the size of the literal string.

Return type

int

See also GetElementSize().

static OBJID(id: int, generation: int)int

Creates an object ID from its number and generation.

Parameters
  • id (int) – The number of the object.

  • generation (int) – The generation of the object.

Returns

Returns the object ID.

Return type

int

See also OBJIDGEN() and OBJIDNUM().

static OBJIDGEN(oid: int)int

Retrieves the object generation from an object ID.

Parameters

oid (int) – The object ID.

Returns

Returns the generation of the object.

Return type

int

See also OBJID() and OBJIDNUM().

static OBJIDNUM(oid: int)int

Retrieves the object number from an object ID.

Parameters

oid (int) – The object ID.

Returns

Returns the number of the object.

Return type

int

See also OBJID() and OBJIDGEN().

ObjectToString(eid_or_objtable: Union[Pro.PDF.PDFObjectTable, int], eid_or_ref: Union[Pro.PDF.PDFObjectRef, int])str

Converts an object ID to a string.

Parameters
  • eid_or_objtable (Union[PDFObjectTable, int]) – Either an object table or an object ID.

  • eid_or_ref (Union[PDFObjectRef, int]) – Either an object ID or the object reference information.

Returns

Returns the object ID converted to string.

Return type

str

static PDFValueLength(offset_or_str: Union[int, str], _from: int = 0)int

Calculates the size of a value.

Parameters
  • offset_or_str (Union[int, str]) – The offset to the value or the string containing the value.

  • _from (int) – An optional start position into the string.

Returns

Returns the size of the value.

Return type

int

ParseContainerElement(estr: str, eid: int = PDF_INVALID_OBJECT_REF)Pro.Core.NTStringStringHash

Parses a container element such as a dictionary or a list.

Parameters
  • estr (str) – The element to parse.

  • eid (int) – The optional object ID of the element.

Returns

Returns the parsed element if successful; otherwise returns an empty Pro.Core.NTStringStringHash instance.

Return type

NTStringStringHash

ParseCrossRefEntry(bytes: bytes)Pro.PDF.PDFObjectRef

Parses a cross-reference entry.

Parameters

bytes (bytes) – The cross-reference entry.

Returns

Returns the object reference information if successful; otherwise returns an invalid PDFObjectRef instance.

Return type

PDFObjectRef

ParseObject(objtable: Pro.PDF.PDFObjectTable, eid: int)tuple

Parses a PDF object without decoding its stream.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • eid (int) – The object ID.

Returns

Returns a tuple containing the size of the object, its dictionary, its stream data and its parsing information if successful; otherwise returns a tuple containing -1, an empty Pro.Core.NTStringStringHash instance, an empty bytes object and invalid parsing information.

Return type

tuple[int, NTStringStringHash, bytes, PDFObjectParseInfo]

See also ParseObjectInfo(), ParseObjectDictionary(), ParseObjectContent() and DecodeObjectStream().

ParseObjectContent(objtable: Pro.PDF.PDFObjectTable, eid: int)bytes

Retrieves the stream of a PDF object without decoding it.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • eid (int) – The object ID.

Returns

Returns the stream data if successful; otherwise return an empty bytes object.

Return type

bytes

See also ParseObject(), ParseObjectInfo(), ParseObjectDictionary() and DecodeObjectStream().

ParseObjectDictionary(objtable: Pro.PDF.PDFObjectTable, eid: int)Pro.Core.NTStringStringHash

Retrieves the dictionary of a PDF object.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • eid (int) – The object ID.

Returns

Returns the dictionary if successful; otherwise return an empty Pro.Core.NTStringStringHash object.

Return type

NTStringStringHash

See also ParseObject(), ParseObjectInfo(), ParseObjectContent() and DecodeObjectStream().

ParseObjectInfo(objtable: Pro.PDF.PDFObjectTable, eid: int)Pro.PDF.PDFObjectParseInfo

Retrieves the parsing information of a PDF object.

Parameters
  • objtable (PDFObjectTable) – The object table.

  • eid (int) – The object ID.

Returns

Returns the parsing information if successful; otherwise returns invalid parsing information.

Return type

PDFObjectParseInfo

See also ParseObject(), ParseObjectDictionary(), ParseObjectContent() and DecodeObjectStream().

static ParseObjectName(ref: str)int

Converts an object name such as “10 1 obj” into its ID.

Parameters

ref (str) – The object name.

Returns

Returns the object ID if successful; otherwise returns PDF_INVALID_OBJECT_REF.

Return type

int

See also ParseObjectRef().

static ParseObjectRef(ref: str)int

Converts an object reference such as “10 1 R” into its ID.

Parameters

ref (str) – The object reference.

Returns

Returns the object ID if successful; otherwise returns PDF_INVALID_OBJECT_REF.

Return type

int

See also ParseObjectName().

static ParseObjectRefOrName(ref: str, parse_ref: bool)int

Converts an object name or reference into its ID.

Parameters
  • ref (str) – The object name or reference.

  • parse_ref (bool) – If True, parses an object reference; otherwise parses an object name.

Returns

Returns the object ID if successful; otherwise returns PDF_INVALID_OBJECT_REF.

Return type

int

See also ParseObjectName() and ParseObjectRef().

ProcessEncryption()bool

Decrypts the PDF document if encrypted.

Returns

Returns True if successful; otherwise returns False.

Return type

bool

See also SetObjectTable().

ReadHexString(offset: int)tuple

Reads a hex strings.

Parameters

offset (int) – The offset of the hex string.

Returns

Returns a tuple containing the string and its size if successful; otherwise returns a tuple containing an empty string and -1.

Return type

str

ReadLiteralString(offset: int, encrypted_string: bool = False)tuple

Reads a literal string.

Parameters
  • offset (int) – The offset of the literal string.

  • encrypted_string (bool) – If True, the reading of encrypted strings is supported.

Returns

Returns a tuple containing the string and its size if successful; otherwise returns a tuple containing an empty string and -1.

Return type

str

static RegularCharsLength(offset_or_str: Union[int, str], _from: int = 0)int

Calculates the length of characters not interrupted by reserved characters.

Parameters
  • offset_or_str (Union[int, str]) – Either the offset to the data or the string with the data.

  • _from (int) – An optional start position into the string.

Returns

Returns number of non-reserved characters.

Return type

int

SetJBIG2DecodeOptions(options: int)None

Sets the decoding options for JBIG2 streams.

Parameters

options (int) – The decoding options for JBIG2 streams (e.g., JBIG2DecodeOpt_HelperProcess).

See also GetJBIG2DecodeOptions().

SetJBIG2DecodeTimeout(timeout: int)None

Sets the JBIG2 decoding time-out value.

Parameters

timeout (int) – The time-out in milliseconds.

See also GetJBIG2DecodeTimeout().

SetJBIG2LibraryVersion(version: int)None

Sets the version of the JBIG2 library used for decoding.

Available versions are:

  • 1 - Deprecated.

  • 2 - Default.

Parameters

version (int) – The version of the JBIG2 library.

See also SetJBIG2LibraryVersion().

SetObjectTable(objtable: Pro.PDF.PDFObjectTable)None

Sets the internally stored object table.

Parameters

objtable (PDFObjectTable) – The object table.

See also GetObjectTable().

static SkipEmptyChars(offset_or_str: Union[int, str], _from_or_down: Union[bool, int] = True)int

Skips empty characters and comments.

Parameters
  • offset_or_str (Union[int, str]) – Either the offset to the data or a string containing the data.

  • _from_or_down (Union[bool, int]) – Either a boolean specifying the direction or an optional start position into the string.

Returns

Returns the number of skipped characters.

Return type

int

SkipNewLine(offset: int)int

Skips new-line characters if present.

Parameters

offset (int) – The offset to the data.

Returns

Returns the number of skipped bytes.

Return type

int

static UnescapeLiteralString(str: str)str

Unescapes a literal string.

Parameters

str (str) – The escaped string.

Returns

Returns the unescaped string.

Return type

str

Unpredict(raw: bytes, filter: str, parms: Pro.Core.NTStringStringHash, eid: int = PDF_INVALID_OBJECT_REF)bytes

Removes PNG prediction on input data.

Parameters
  • raw (bytes) – The input data.

  • filter (str) – The filter name.

  • parms (NTStringStringHash) – The filter parameters.

  • eid (int) – The object ID.

Returns

Returns the decoded data.

Return type

bytes

class PDFObjectParseInfo

This class contains the information of a parsed PDF object.

See also PDFObject.ParseObject() and PDFObject.ParseObjectInfo().

Methods:

clear()

Clears the fields of the instance.

Attributes:

dict_offset

The offset of the object dictionary.

dict_size

The size of the object dictionary.

offset

The offset of the object.

parent_content

The data of the parent object if it’s a compressed object.

size

The size of the object.

stream_offset

The offset of the object stream.

stream_size

The size of the object stream.

clear()None

Clears the fields of the instance.

dict_offset

The offset of the object dictionary.

See also dict_size.

dict_size

The size of the object dictionary.

See also dict_offset.

offset

The offset of the object.

See also size.

parent_content

The data of the parent object if it’s a compressed object.

size

The size of the object.

See also offset.

stream_offset

The offset of the object stream.

See also stream_size.

stream_size

The size of the object stream.

See also stream_offset.

class PDFObjectRef

This class represents a reference to a PDF object.

Attributes:

flags

The flags of the object (e.g., PDF_OBJREF_FLAG_UNREFERENCED).

generation

The generation of the object.

index

The index of the object if it’s a compressed object.

n

The object type.

offset

The offset of the object if it’s an uncompressed object.

parent

The parent of the object if it’s a compressed object.

flags

The flags of the object (e.g., PDF_OBJREF_FLAG_UNREFERENCED).

generation

The generation of the object.

index

The index of the object if it’s a compressed object.

See also n.

n

The object type.

The following values are supported:

  • "f" - Free object.

  • "n" - Uncompressed object.

  • "N" - Compressed object.

offset

The offset of the object if it’s an uncompressed object.

See also n.

parent

The parent of the object if it’s a compressed object.

See also n.

class PDFObjectTable

Dictionary of int -> PDFObjectRef elements.

Methods:

clear()

Removes all items from the hash.

contains(key)

Checks whether key is present in the hash.

count(key)

Counts the numbers of values associated with key in the hash.

insert(key, value)

Inserts a new item with key and a value of value.

insertMulti(key, value)

Inserts a new item with key and a value of value.

isEmpty()

Checks whether the hash is empty.

iterator()

Creates an iterator for the hash.

remove(key)

Removes all the items that have key from the hash.

reserve(alloc)

Ensures that the internal hash table consists of at least size buckets.

size()

Returns the number of items in the hash.

take(key)

Removes the item with key from the hash and returns the value associated with it.

value(key[, defaultValue])

Returns the value associated with key.

clear()None

Removes all items from the hash.

contains(key: int)bool

Checks whether key is present in the hash.

Parameters

key (int) – The key value to check for.

Returns

Returns True if the hash contains an item with the key; otherwise returns False.

Return type

bool

See also count().

count(key: int)int

Counts the numbers of values associated with key in the hash.

Parameters

key (int) – The key value.

Returns

Returns the number of items associated with the key.

Return type

int

See also contains().

insert(key: int, value: Pro.PDF.PDFObjectRef)None

Inserts a new item with key and a value of value.

Parameters
  • key (int) – The key.

  • value (PDFObjectRef) – The value.

See also insertMulti().

insertMulti(key: int, value: Pro.PDF.PDFObjectRef)None

Inserts a new item with key and a value of value.

If there is already an item with the same key in the hash, this method will simply create a new one. (This behaviour is different from insert(), which overwrites the value of an existing item.)

Parameters
  • key (int) – The key.

  • value (PDFObjectRef) – The value.

See also insert().

isEmpty()bool

Checks whether the hash is empty.

Returns

Returns True if the hash contains no items; otherwise returns False.

Return type

bool

See also size().

iterator()Pro.PDF.PDFObjectTableIt

Creates an iterator for the hash.

Returns

Returns the iterator.

Return type

PDFObjectTableIt

remove(key: int)int

Removes all the items that have key from the hash.

Parameters

key (int) – The key to remove.

Returns

Returns the number of items removed which is usually 1 but will be 0 if the key isn’t in the hash, or greater than 1 if insertMulti() has been used with the key.

Return type

int

See also clear() and take().

reserve(alloc: int)None

Ensures that the internal hash table consists of at least size buckets.

Parameters

alloc (int) – The allocation size.

size()int
Returns

Returns the number of items in the hash.

Return type

int

See also isEmpty() and count().

take(key: int)Pro.PDF.PDFObjectRef

Removes the item with key from the hash and returns the value associated with it.

If the item does not exist in the hash, the method simply returns a default-constructed value. If there are multiple items for key in the hash, only the most recently inserted one is removed.

If you don’t use the return value, remove() is more efficient.

Parameters

key (int) – The key.

Returns

Returns the removed value.

Return type

PDFObjectRef

See also remove().

value(key: int, defaultValue: Optional[Pro.PDF.PDFObjectRef] = None)Pro.PDF.PDFObjectRef

Returns the value associated with key. If the hash contains no item with key, the method returns a default-constructed value if defaultValue is not provided. If there are multiple items for key in the hash, the value of the most recently inserted one is returned.

Parameters
  • key (int) – The key.

  • defaultValue (Optional[PDFObjectRef]) – The default value to return if key is not present in the hash.

Returns

Returns the value associated with key.

Return type

PDFObjectRef

See also contains().

class PDFObjectTableIt(obj: Pro.PDF.PDFObjectTable)

Iterator class for PDFObjectTable.

Parameters

obj (PDFObjectTable) – The object to iterate over.

Methods:

hasNext()

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

hasPrevious()

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

key()

Returns the key of the last item that was jumped over using one of the traversal functions (previous(), next()).

next()

Returns the next item and advances the iterator by one position.

previous()

Returns the previous item and moves the iterator back by one position.

toBack()

Moves the iterator to the back of the container (after the last item).

toFront()

Moves the iterator to the front of the container (before the first item).

value()

Returns the value of the last item that was jumped over using one of the traversal functions (previous(), next()).

hasNext()bool
Returns

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

Return type

bool

See also hasPrevious() and next().

hasPrevious()bool
Returns

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

Return type

bool

See also hasNext() and previous().

key()int
Returns

Returns the key of the last item that was jumped over using one of the traversal functions (previous(), next()).

Return type

int

See also value().

next()None
Returns

Returns the next item and advances the iterator by one position.

Return type

None

See also hasNext() and previous().

previous()None
Returns

Returns the previous item and moves the iterator back by one position.

Return type

None

See also hasPrevious() and next().

toBack()None

Moves the iterator to the back of the container (after the last item).

See also toFront() and previous().

toFront()None

Moves the iterator to the front of the container (before the first item).

See also toBack() and next().

value()Pro.PDF.PDFObjectRef
Returns

Returns the value of the last item that was jumped over using one of the traversal functions (previous(), next()).

Return type

PDFObjectRef

See also key().

PDF_CROSSREF_ENTRY_SIZE: Final[int]

Size of an entry in a cross-reference table.

PDF_CreateEncryptionKey(Password: bytes, O: bytes, P: int, ID_1: bytes, EncryptMetaData: bool, Revision: int, KeyLenInBytes: int)tuple

Generates an encryption key for a PDF document.

Parameters
  • Password (bytes) – The password.

  • O (bytes) – The O parameter.

  • P (int) – The P parameter.

  • ID_1 (bytes) – The ID_1 parameter.

  • EncryptMetaData (bool) – The EncryptMetadata parameter.

  • Revision (int) – The cryptographic revision number.

  • KeyLenInBytes (int) – The length of the key.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_GenerateOwnerKey(OwnerKey: bytes, UserKey: bytes, Revision: int, KeyLenInBytes: int)tuple

Generates an owner key for a PDF document.

Parameters
  • OwnerKey (bytes) – The optional owner key.

  • UserKey (bytes) – The optional user key.

  • Revision (int) – The cryptographic revision number.

  • KeyLenInBytes (int) – The length of the key.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_GenerateUserKey(UserKey: bytes, O: bytes, P: int, ID_1: bytes, EncryptMetaData: bool, Revision: int, KeyLenInBytes: int)tuple

Generates a user key for a PDF document.

Parameters
  • UserKey (bytes) – The optional user key.

  • O (bytes) – The O parameter.

  • P (int) – The P parameter.

  • ID_1 (bytes) – The ID_1 parameter.

  • EncryptMetaData (bool) – The EncryptMetadata parameter.

  • Revision (int) – The cryptographic revision number.

  • KeyLenInBytes (int) – The length of the key.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_INVALID_OBJECT_REF: Final[int]

Invalid PDF object id.

PDF_OBJREF_FLAG_UNREFERENCED: Final[int]

Flag for unreferenced PDF objects.

See also PDFObjectRef.flags.

PDF_R5_GenerateDecryptionKey(user_password: bytes, O: bytes, OE: bytes, U: bytes, UE: bytes)tuple

Generates a revision 5 decryption key for a PDF document.

Parameters
  • user_password (bytes) – The optional user password.

  • O (bytes) – The O parameter.

  • OE (bytes) – The OE parameter.

  • U (bytes) – The U parameter.

  • UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R5_GenerateOwnerDecryptionKey(usrpwd: bytes, O: bytes, OE: bytes, U: bytes)tuple

Generates a revision 5 owner decryption key for a PDF document.

Parameters
  • usrpwd (bytes) – The user password.

  • O (bytes) – The O parameter.

  • OE (bytes) – The OE parameter.

  • U (bytes) – The U parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R5_GenerateUserDecryptionKey(usrpwd: bytes, U: bytes, UE: bytes)tuple

Generates a revision 5 user decryption key for a PDF document.

Parameters
  • usrpwd (bytes) – The user password.

  • U (bytes) – The U parameter.

  • UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R6_GenerateDecryptionKey(user_password: bytes, O: bytes, OE: bytes, U: bytes, UE: bytes)tuple

Generates a revision 6 decryption key for a PDF document.

Parameters
  • user_password (bytes) – The optional user password.

  • O (bytes) – The O parameter.

  • OE (bytes) – The OE parameter.

  • U (bytes) – The U parameter.

  • UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]

PDF_R6_GenerateUserDecryptionKey(usrpwd: bytes, U: bytes, UE: bytes)tuple

Generates a revision 6 user decryption key for a PDF document.

Parameters
  • usrpwd (bytes) – The user password.

  • U (bytes) – The U parameter.

  • UE (bytes) – The UE parameter.

Returns

Returns a tuple containing a boolean and the generated key. The boolean value is True if successful; otherwise it is False.

Return type

tuple[bool, bytes]