Pro.PDF
— API for parsing PDF documents¶
PDF Parsing¶
The following code example demonstrates how to iterate through all the objects in a PDF document:
from Pro.Core import *
from Pro.PDF import *
def parsePDF(fname):
# open the file
c = createContainerFromFile(fname)
if c.isNull():
print("error: couldn't open file")
return
# load the file as PDF
pdf = PDFObject()
if not pdf.Load(c):
print("error: invalid file format")
return
# parse all referenced objects
objtable = pdf.BuildObjectTable()
# detect unreferenced objects
# (corrupted or malicious PDFs may contain them)
pdf.DetectObjects(objtable)
# store the object table internally
pdf.SetObjectTable(objtable)
# process PDF encryption
if not pdf.ProcessEncryption():
print("warning: couldn't decrypt file")
# [optional] sort objects by ID
oids = []
it = objtable.iterator()
while it.hasNext():
oid, _ = it.next()
oids.append(oid)
oids.sort()
# iterate through the objects
for oid in oids:
# print out the object id
print("\nOBJECT ID:", oid >> 32, "\n")
# parse the object
ret, dictn, content, info = pdf.ParseObject(objtable, oid)
if not ret:
print("warning: couldn't parse object %d" % (oid,))
continue
# print out the object dictionary
it = dictn.iterator()
while it.hasNext():
k, v = it.next()
print(" ", k, "-", v)
# print out the decoded object stream
content = pdf.DecodeObjectStream(content, dictn, oid)
if not content:
continue
out = NTTextBuffer()
out.printHex(content)
print("\n", out.buffer)
Hint
Since PDF parsing can be a complex operation, it is often recommended to leverage the scan engine to extract artifacts.
In the following example, a hook is used to extract JavaScript code from a PDF document:
from Pro.Core import *
def printJSEntry(sp, xml, tnode):
# data node
dnode = xml.findChild(tnode, "d")
if not dnode:
return
# we let the scan engine extract the JavaScript for us
params = NTStringVariantHash()
params.insert("op", "js")
idnode = xml.findChild(dnode, "id")
if idnode:
params.insert("id", int(xml.value(idnode), 16))
ridnode = xml.findChild(dnode, "rid")
if idnode:
params.insert("rid", int(xml.value(ridnode), 16))
js = sp.customOperation(params)
# print out the JavaScript
print("JS CODE")
print("-------")
print(js)
def pdfExtractJS(sp, ud):
xml = sp.getReportXML()
# object node
onode = xml.findChild(None, "o")
if onode:
# scan node
snode = xml.findChild(onode, "s")
if snode:
# enumerate scan entries
tchild = xml.firstChild(snode)
while tchild:
if xml.name(tchild) == "t":
# type attribute
tattr = xml.findAttribute(tchild, "t")
# check if it's a JavaScript entry
if tattr and int(xml.value(tattr)) == CT_JavaScript:
printJSEntry(sp, xml, tchild)
tchild = xml.nextSibling(tchild)
Module API¶
Pro.PDF module API.
Classes:
This class represents a PDF cross-reference table.
List of
PDFCrossRefTable
elements.Iterator class for
PDFCrossRefTableList
.This class represents the section of a PDF cross-reference table.
List of
PDFCrossRefTableSection
elements.Iterator class for
PDFCrossRefTableSectionList
.This class represents a PDF document.
This class contains the information of a parsed PDF object.
This class represents a reference to a PDF object.
Dictionary of
int
->PDFObjectRef
elements.
PDFObjectTableIt
(obj)Iterator class for
PDFObjectTable
.Attributes:
Size of an entry in a cross-reference table.
Invalid PDF object id.
Flag for unreferenced PDF objects.
Functions:
PDF_CreateEncryptionKey
(Password, O, P, …)Generates an encryption key for a PDF document.
PDF_GenerateOwnerKey
(OwnerKey, UserKey, …)Generates an owner key for a PDF document.
PDF_GenerateUserKey
(UserKey, O, P, ID_1, …)Generates a user key for a PDF document.
PDF_R5_GenerateDecryptionKey
(user_password, …)Generates a revision 5 decryption key for a PDF document.
PDF_R5_GenerateOwnerDecryptionKey
(usrpwd, O, …)Generates a revision 5 owner decryption key for a PDF document.
PDF_R5_GenerateUserDecryptionKey
(usrpwd, U, UE)Generates a revision 5 user decryption key for a PDF document.
PDF_R6_GenerateDecryptionKey
(user_password, …)Generates a revision 6 decryption key for a PDF document.
PDF_R6_GenerateUserDecryptionKey
(usrpwd, U, UE)Generates a revision 6 user decryption key for a PDF document.
- class PDFCrossRefTable¶
This class represents a PDF cross-reference table.
Attributes:
Offset of the previous cross-reference table if available; otherwise
Pro.Core.INVALID_STREAM_OFFSET
.List of
PDFCrossRefTableSection
.Trailer dictionary.
- Prev¶
Offset of the previous cross-reference table if available; otherwise
Pro.Core.INVALID_STREAM_OFFSET
.
- sections¶
List of
PDFCrossRefTableSection
.See also
PDFCrossRefTableSectionList
.
- trailer¶
Trailer dictionary.
See also
Pro.Core.NTStringStringHash
.
- class PDFCrossRefTableList¶
List of
PDFCrossRefTable
elements.Methods:
append
(value)Inserts
value
at the end of the list.
at
(i)Returns the item at index position
i
in the list.
clear
()Removes all items from the list.
contains
(value)Checks the presence of an element in the list.
count
(value)Returns the number of occurrences of
value
in the list.
indexOf
(value[, start])Searches for an element in the list.
insert
(i, value)Inserts
value
at index positioni
in the list.
isEmpty
()Checks whether the list is empty.
iterator
()Creates an iterator for the list.
removeAll
(value)Removes all occurrences of
value
in the list and returns the number of entries removed.
removeAt
(i)Removes the item at index position
i
.
reserve
(alloc)Reserve space for
alloc
elements.
size
()Returns the number of items in the list.
takeAt
(i)Removes the item at index position
i
and returns it.
- append(value: Pro.PDF.PDFCrossRefTable) → None¶
Inserts
value
at the end of the list.
- Parameters
value (PDFCrossRefTable) – The value to add to the list.
See also
insert()
.
- at(i: int) → Pro.PDF.PDFCrossRefTable¶
Returns the item at index position
i
in the list.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the element to return.
- Returns
Returns the requested element.
- Return type
- clear() → None¶
Removes all items from the list.
- contains(value: Pro.PDF.PDFCrossRefTable) → bool¶
Checks the presence of an element in the list.
- Parameters
value (PDFCrossRefTable) – The value to check for.
- Returns
Returns
True
if the list contains an occurrence ofvalue
; otherwise returnsFalse
.- Return type
bool
- count(value: Pro.PDF.PDFCrossRefTable) → int¶
Returns the number of occurrences of
value
in the list.
- Parameters
value (PDFCrossRefTable) – The value to count.
- Returns
Returns the number of occurrences.
- Return type
int
See also
indexOf()
andcontains()
.
- indexOf(value: Pro.PDF.PDFCrossRefTable, start: int = 0) → int¶
Searches for an element in the list.
- Parameters
value (PDFCrossRefTable) – The value to search for.
start (int) – The start index.
- Returns
Returns the index position of the first occurrence of
value
in the list. Returns-1
if no item was found.- Return type
int
See also
contains()
.
- insert(i: int, value: Pro.PDF.PDFCrossRefTable) → None¶
Inserts
value
at index positioni
in the list. Ifi
is0
, the value is prepended to the list. Ifi
issize()
, the value is appended to the list.
- Parameters
i (int) – The position at which to add the value.
value (PDFCrossRefTable) – The value to add.
See also
append()
andremoveAt()
.
- isEmpty() → bool¶
Checks whether the list is empty.
- Returns
Returns
True
if the list contains no items; otherwise returnsFalse
.- Return type
bool
See also
size()
.
- iterator() → Pro.PDF.PDFCrossRefTableListIt¶
Creates an iterator for the list.
- Returns
Returns the iterator.
- Return type
- removeAll(value: Pro.PDF.PDFCrossRefTable) → int¶
Removes all occurrences of
value
in the list and returns the number of entries removed.
- Parameters
value (PDFCrossRefTable) – The value to remove from the list.
- Returns
Returns the number of entries removed.
- Return type
int
See also
removeAt()
.
- removeAt(i: int) → None¶
Removes the item at index position
i
.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the item to remove.
See also
removeAll()
.
- reserve(alloc: int) → None¶
Reserve space for
alloc
elements. Calling this method doesn’t change the size of the list.
- Parameters
alloc (int) – The amount of elements to reserve space for.
- takeAt(i: int) → Pro.PDF.PDFCrossRefTable¶
Removes the item at index position
i
and returns it.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the element to remove from the list.
- Returns
Returns the removed element. If you don’t use the return value,
removeAt()
is more efficient.- Return type
See also
removeAt()
.
- class PDFCrossRefTableListIt(obj: Pro.PDF.PDFCrossRefTableList)¶
Iterator class for
PDFCrossRefTableList
.
- Parameters
obj (PDFCrossRefTableList) – The object to iterate over.
Methods:
hasNext
()Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.
next
()Returns the next item and advances the iterator by one position.
previous
()Returns the previous item and moves the iterator back by one position.
toBack
()Moves the iterator to the back of the container (after the last item).
toFront
()Moves the iterator to the front of the container (before the first item).
- hasNext() → bool¶
- Returns
Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.- Return type
bool
See also
hasPrevious()
andnext()
.
- hasPrevious() → bool¶
- Returns
Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.- Return type
bool
See also
hasNext()
andprevious()
.
- next() → Pro.PDF.PDFCrossRefTable¶
- Returns
Returns the next item and advances the iterator by one position.
- Return type
See also
hasNext()
andprevious()
.
- previous() → Pro.PDF.PDFCrossRefTable¶
- Returns
Returns the previous item and moves the iterator back by one position.
- Return type
See also
hasPrevious()
andnext()
.
- toBack() → None¶
Moves the iterator to the back of the container (after the last item).
See also
toFront()
andprevious()
.
- class PDFCrossRefTableSection¶
This class represents the section of a PDF cross-reference table.
See also
PDFCrossRefTable
.Attributes:
The offset of the cross-reference section entries.
The number of entries.
The id of the first object in the array.
- array_offset¶
The offset of the cross-reference section entries.
- count¶
The number of entries.
- start_id¶
The id of the first object in the array.
See also
PDFObject.OBJID()
.
- class PDFCrossRefTableSectionList¶
List of
PDFCrossRefTableSection
elements.Methods:
append
(value)Inserts
value
at the end of the list.
at
(i)Returns the item at index position
i
in the list.
clear
()Removes all items from the list.
contains
(value)Checks the presence of an element in the list.
count
(value)Returns the number of occurrences of
value
in the list.
indexOf
(value[, start])Searches for an element in the list.
insert
(i, value)Inserts
value
at index positioni
in the list.
isEmpty
()Checks whether the list is empty.
iterator
()Creates an iterator for the list.
removeAll
(value)Removes all occurrences of
value
in the list and returns the number of entries removed.
removeAt
(i)Removes the item at index position
i
.
reserve
(alloc)Reserve space for
alloc
elements.
size
()Returns the number of items in the list.
takeAt
(i)Removes the item at index position
i
and returns it.
- append(value: Pro.PDF.PDFCrossRefTableSection) → None¶
Inserts
value
at the end of the list.
- Parameters
value (PDFCrossRefTableSection) – The value to add to the list.
See also
insert()
.
- at(i: int) → Pro.PDF.PDFCrossRefTableSection¶
Returns the item at index position
i
in the list.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the element to return.
- Returns
Returns the requested element.
- Return type
- clear() → None¶
Removes all items from the list.
- contains(value: Pro.PDF.PDFCrossRefTableSection) → bool¶
Checks the presence of an element in the list.
- Parameters
value (PDFCrossRefTableSection) – The value to check for.
- Returns
Returns
True
if the list contains an occurrence ofvalue
; otherwise returnsFalse
.- Return type
bool
- count(value: Pro.PDF.PDFCrossRefTableSection) → int¶
Returns the number of occurrences of
value
in the list.
- Parameters
value (PDFCrossRefTableSection) – The value to count.
- Returns
Returns the number of occurrences.
- Return type
int
See also
indexOf()
andcontains()
.
- indexOf(value: Pro.PDF.PDFCrossRefTableSection, start: int = 0) → int¶
Searches for an element in the list.
- Parameters
value (PDFCrossRefTableSection) – The value to search for.
start (int) – The start index.
- Returns
Returns the index position of the first occurrence of
value
in the list. Returns-1
if no item was found.- Return type
int
See also
contains()
.
- insert(i: int, value: Pro.PDF.PDFCrossRefTableSection) → None¶
Inserts
value
at index positioni
in the list. Ifi
is0
, the value is prepended to the list. Ifi
issize()
, the value is appended to the list.
- Parameters
i (int) – The position at which to add the value.
value (PDFCrossRefTableSection) – The value to add.
See also
append()
andremoveAt()
.
- isEmpty() → bool¶
Checks whether the list is empty.
- Returns
Returns
True
if the list contains no items; otherwise returnsFalse
.- Return type
bool
See also
size()
.
- iterator() → Pro.PDF.PDFCrossRefTableSectionListIt¶
Creates an iterator for the list.
- Returns
Returns the iterator.
- Return type
- removeAll(value: Pro.PDF.PDFCrossRefTableSection) → int¶
Removes all occurrences of
value
in the list and returns the number of entries removed.
- Parameters
value (PDFCrossRefTableSection) – The value to remove from the list.
- Returns
Returns the number of entries removed.
- Return type
int
See also
removeAt()
.
- removeAt(i: int) → None¶
Removes the item at index position
i
.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the item to remove.
See also
removeAll()
.
- reserve(alloc: int) → None¶
Reserve space for
alloc
elements. Calling this method doesn’t change the size of the list.
- Parameters
alloc (int) – The amount of elements to reserve space for.
- takeAt(i: int) → Pro.PDF.PDFCrossRefTableSection¶
Removes the item at index position
i
and returns it.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the element to remove from the list.
- Returns
Returns the removed element. If you don’t use the return value,
removeAt()
is more efficient.- Return type
See also
removeAt()
.
- class PDFCrossRefTableSectionListIt(obj: Pro.PDF.PDFCrossRefTableSectionList)¶
Iterator class for
PDFCrossRefTableSectionList
.
- Parameters
obj (PDFCrossRefTableSectionList) – The object to iterate over.
Methods:
hasNext
()Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.
next
()Returns the next item and advances the iterator by one position.
previous
()Returns the previous item and moves the iterator back by one position.
toBack
()Moves the iterator to the back of the container (after the last item).
toFront
()Moves the iterator to the front of the container (before the first item).
- hasNext() → bool¶
- Returns
Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.- Return type
bool
See also
hasPrevious()
andnext()
.
- hasPrevious() → bool¶
- Returns
Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.- Return type
bool
See also
hasNext()
andprevious()
.
- next() → Pro.PDF.PDFCrossRefTableSection¶
- Returns
Returns the next item and advances the iterator by one position.
- Return type
See also
hasNext()
andprevious()
.
- previous() → Pro.PDF.PDFCrossRefTableSection¶
- Returns
Returns the previous item and moves the iterator back by one position.
- Return type
See also
hasPrevious()
andnext()
.
- toBack() → None¶
Moves the iterator to the back of the container (after the last item).
See also
toFront()
andprevious()
.
- class PDFObject¶
Bases:
Pro.Core.CFFObject
This class represents a PDF document.
Methods:
AddManuallyObjectToTable
(objtable, offset)Adds a PDF object to a PDF object table.
BuildObjectTable
([xref_offset])Creates a PDF object table.
BuildStringObjectFromBytes
(bytes)Builds a string object from raw bytes.
CatalogTreeToInverseHash
(catalog)Converts a PDF catalog tree into an inverse hash mapping object IDs to page numbers.
ComputeCatalogTree
(objtable[, eid])Computes the catalog tree of the PDF document.
CountUncompressedObjects
(objtable)Calculates the number of uncompressed objects.
Returns the offset of the “%%EOF” string.
DecodeNameObject
(raw)Decodes a name object.
DecodeObjectStream
(raw, dictionary_or_filter)Decodes the stream of an object.
DecodeObjectStreamEx
(raw, dictionary[, eid])Decodes the stream of an object.
DecodeObjectStreamWithFilter
(raw, filter[, …])Decodes the stream of an object.
Closes helper processes if spawned.
DetectObjects
(objtable)Detects unreferenced objects in the PDF document.
EnableFilter
(type[, b])Sets whether the specified decoding filter is enabled.
EnumerateCrossRefTables
([xref_offset])Enumerates cross-reference tables in a PDF document.
FindObjects
(objtable, pathstr[, compressed])Finds objects matching the specified criteria.
FlattenCatalogTree
(catalog)Converts a PDF catalog tree into a list of pages in their correct order as object IDs.
GetDictValue
(dict, key[, dflt])Gets the value from a PDF dictionary.
GetEOF
()Finds the position of the “%%EOF” string.
GetElement
(offset)Retrieves a PDF element such as a dictionary, a list or other objects.
GetElementSize
(offset_or_str[, _from])Calculates the size of a PDF element such as a dictionary, a list or other objects.
GetFilterDefaultParameters
(filter)Retrieves the default parameters for a filter.
Returns the decoding options for JBIG2 streams.
Returns the time-out value for the JBIG2 decoding process.
Returns the version of the JBIG2 library used for decoding.
GetObjectContent
(objtable, eid)Retrieves the decoded stream of a PDF object and its dictionary.
GetObjectContentEx
(objtable, eid)Retrieves the decoded stream of a PDF object and its dictionary.
Returns the internally stored object table.
GetStartXRef
([pos])Retrieves the start cross-reference offset.
GetStartXRefEx
(pos)Retrieves the start cross-reference offset.
GetStringObjectBytes
(str)Converts a string object back to its original bytes.
Returns the list of supported decoding filter names.
GetTrailer
(i)Retrieves a specified trailer dictionary.
Returns the list of trailer dictionaries.
Returns
True
if the PDF has encryption; otherwise returnsFalse
.
HexStringSize
(str[, _from])Computes the size of a hex string object.
IsContainer
(estr)Checks whether an element is either a dictionary or a list.
Returns
True
if the PDF doesn’t have encryption or was decrypted; otherwise returnsFalse
.
IsFilterEnabled
(type)Checks whether the specified decoding filter is enabled.
Returns
True
if the PDF document has a “%%EOF” signature; otherwise returnsFalse
.
LiteralStringSize
(str[, _from])Calculates the size of a literal string.
OBJID
(id, generation)Creates an object ID from its number and generation.
OBJIDGEN
(oid)Retrieves the object generation from an object ID.
OBJIDNUM
(oid)Retrieves the object number from an object ID.
ObjectToString
(eid_or_objtable, eid_or_ref)Converts an object ID to a string.
PDFValueLength
(offset_or_str[, _from])Calculates the size of a value.
ParseContainerElement
(estr[, eid])Parses a container element such as a dictionary or a list.
ParseCrossRefEntry
(bytes)Parses a cross-reference entry.
ParseObject
(objtable, eid)Parses a PDF object without decoding its stream.
ParseObjectContent
(objtable, eid)Retrieves the stream of a PDF object without decoding it.
ParseObjectDictionary
(objtable, eid)Retrieves the dictionary of a PDF object.
ParseObjectInfo
(objtable, eid)Retrieves the parsing information of a PDF object.
ParseObjectName
(ref)Converts an object name such as “10 1 obj” into its ID.
ParseObjectRef
(ref)Converts an object reference such as “10 1 R” into its ID.
ParseObjectRefOrName
(ref, parse_ref)Converts an object name or reference into its ID.
Decrypts the PDF document if encrypted.
ReadHexString
(offset)Reads a hex strings.
ReadLiteralString
(offset[, encrypted_string])Reads a literal string.
RegularCharsLength
(offset_or_str[, _from])Calculates the length of characters not interrupted by reserved characters.
SetJBIG2DecodeOptions
(options)Sets the decoding options for JBIG2 streams.
SetJBIG2DecodeTimeout
(timeout)Sets the JBIG2 decoding time-out value.
SetJBIG2LibraryVersion
(version)Sets the version of the JBIG2 library used for decoding.
SetObjectTable
(objtable)Sets the internally stored object table.
SkipEmptyChars
(offset_or_str[, _from_or_down])Skips empty characters and comments.
SkipNewLine
(offset)Skips new-line characters if present.
Unescapes a literal string.
Unpredict
(raw, filter, parms[, eid])Removes PNG prediction on input data.
Attributes:
ASCII85Decode filter type.
ASCIIHexDecode filter type.
CCITTFaxDecode filter type.
DCTDecode filter type.
FlateDecode filter type.
JBIG2Decode filter type.
JPXDecode filter type.
LZWDecode filter type.
RunLengthDecode filter type.
JBIG2 decoding option to decode JBIG2 streams in a separate process.
Default JBIG2 decoding option to decode JBIG2 streams in the same process.
JBIG2 decoding option to disable the decoding of JBIG2 streams.
- AddManuallyObjectToTable(objtable: Pro.PDF.PDFObjectTable, offset: int) → bool¶
Adds a PDF object to a PDF object table.
- Parameters
objtable (PDFObjectTable) – The object table.
offset (int) – The offset of the object.
- Returns
Returns
True
if successful; otherwise returnsFalse
.- Return type
bool
See also
BuildObjectTable()
,DetectObjects()
andSetObjectTable()
.
- BuildObjectTable(xref_offset: int = INVALID_STREAM_OFFSET) → Pro.PDF.PDFObjectTable¶
Creates a PDF object table.
- Parameters
xref_offset (int) – The optional offset of the cross-reference table.
- Returns
Returns the object table if successful; otherwise returns an empty
PDFObjectTable
instance.- Return type
See also
DetectObjects()
,SetObjectTable()
andAddManuallyObjectToTable()
.
- BuildStringObjectFromBytes(bytes: bytes) → str¶
Builds a string object from raw bytes.
This method is used to construct strings after decryption.
- Parameters
bytes (bytes) – The input data.
- Returns
Returns the string object if successful; otherwise returns an empty string.
- Return type
str
- CatalogTreeToInverseHash(catalog: Pro.Core.NTMaxUIntTree) → Pro.Core.NTUInt64UIntHash¶
Converts a PDF catalog tree into an inverse hash mapping object IDs to page numbers.
- Parameters
catalog (NTMaxUIntTree) – The catalog tree.
- Returns
Returns the inverse hash if successful; otherwise returns an empty
Pro.Core.NTUInt64UIntHash
instance.- Return type
See also
ComputeCatalogTree()
andFlattenCatalogTree()
.
- ComputeCatalogTree(objtable: Pro.PDF.PDFObjectTable, eid: int = PDF_INVALID_OBJECT_REF) → Pro.Core.NTMaxUIntTree¶
Computes the catalog tree of the PDF document.
- Parameters
objtable (PDFObjectTable) – The object table.
eid (int) – The optional catalog tree object ID.
- Returns
Returns the computed catalog tree if successful; otherwise returns an empty
Pro.Core.NTMaxUIntTree
instance.- Return type
See also
CatalogTreeToInverseHash()
andFlattenCatalogTree()
.
- CountUncompressedObjects(objtable: Pro.PDF.PDFObjectTable) → int¶
Calculates the number of uncompressed objects.
- Parameters
objtable (PDFObjectTable) – The object table.
- Returns
Returns the number of uncompressed objects.
- Return type
int
See also
PDFObjectRef.n
.
- CurrentEOF() → int¶
- Returns
Returns the offset of the “%%EOF” string.
- Return type
int
- DecodeNameObject(raw: str) → str¶
Decodes a name object.
For instance, it converts something like “/Adobe#20Green” to “/Adobe Green”.
- Parameters
raw (str) – The raw name object.
- Returns
Returns the decode name object.
- Return type
str
- DecodeObjectStream(raw: bytes, dictionary_or_filter: Union[Pro.Core.NTStringStringHash, str], eid_or_parms: Union[Pro.Core.NTStringStringHash, int] = NTStringStringHashList(), eid: int = PDF_INVALID_OBJECT_REF) → bytes¶
Decodes the stream of an object.
- Parameters
raw (bytes) – The stream data.
dictionary_or_filter (Union[NTStringStringHash, str]) – Either the dictionary of the PDF object or the name of the filter.
eid_or_parms (Union[NTStringStringHash, int]) – Either the object ID or the parameters of the filter if the dictionary wasn’t provided.
eid (int) – The object ID if the dictionary wasn’t provided.
- Returns
Returns the decoded data if successful; otherwise returns an empty
bytes
object.- Return type
bytes
See also
DecodeObjectStreamEx()
,DecodeObjectStreamWithFilter()
andParseObject()
.
- DecodeObjectStreamEx(raw: bytes, dictionary: Pro.Core.NTStringStringHash, eid: Optional[int] = None) → tuple¶
Decodes the stream of an object.
- Parameters
raw (bytes) – The stream data.
dictionary (NTStringStringHash) – The dictionary of the PDF object.
eid (Optional[int]) – The object ID.
- Returns
Returns a tuple containing the decoded data and an error string.
- Return type
tuple[bytes, str]
See also
DecodeObjectStream()
,DecodeObjectStreamWithFilter()
andParseObject()
.
- DecodeObjectStreamWithFilter(raw: bytes, filter: str, parms: Optional[Pro.Core.NTStringStringHash] = None, eid: Optional[int] = None) → tuple¶
Decodes the stream of an object.
- Parameters
raw (bytes) – The stream data.
filter (str) – The name of the filter.
parms (Optional[NTStringStringHash]) – The parameters of the filter.
eid (Optional[int]) – The object ID.
- Returns
Returns a tuple containing the decoded data and an error string.
- Return type
tuple[bytes, str]
See also
DecodeObjectStream()
,DecodeObjectStreamEx()
andParseObject()
.
- DecodingOperationsFinished() → None¶
Closes helper processes if spawned.
Note
This method should be used only in conjunction with
JBIG2DecodeOpt_HelperProcess
.See also
SetJBIG2DecodeOptions()
.
- DetectObjects(objtable: Pro.PDF.PDFObjectTable) → None¶
Detects unreferenced objects in the PDF document.
Hint
This method can be called after
BuildObjectTable()
to detect additional unreferenced objects.
- Parameters
objtable (PDFObjectTable) – The PDF object table.
See also
BuildObjectTable()
,SetObjectTable()
andAddManuallyObjectToTable()
.
- EnableFilter(type: int, b: bool = True) → None¶
Sets whether the specified decoding filter is enabled.
Note
By default all decoding filters are enabled.
- Parameters
type (int) – The decoding filter (e.g.,
ASCIIHexDecode
).b (bool) – If
True
, enables the filter; otherwise disables it.See also
IsFilterEnabled()
.
- EnumerateCrossRefTables(xref_offset: int = INVALID_STREAM_OFFSET) → Pro.PDF.PDFCrossRefTableList¶
Enumerates cross-reference tables in a PDF document.
Note
This method is called internally by
BuildObjectTable()
.
- Parameters
xref_offset (int) – The optional offset of the first cross-reference table.
- Returns
Returns the list of cross-reference tables if successful; otherwise returns an empty
PDFCrossRefTableList
instance.- Return type
See also
BuildObjectTable()
.
- FilterType_ASCII85Decode: Final[int]¶
ASCII85Decode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_ASCIIHexDecode: Final[int]¶
ASCIIHexDecode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_CCITTFaxDecode: Final[int]¶
CCITTFaxDecode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_DCTDecode: Final[int]¶
DCTDecode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_FlateDecode: Final[int]¶
FlateDecode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_JBIG2Decode: Final[int]¶
JBIG2Decode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_JPXDecode: Final[int]¶
JPXDecode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_LZWDecode: Final[int]¶
LZWDecode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FilterType_RunLengthDecode: Final[int]¶
RunLengthDecode filter type.
See also
EnableFilter()
andIsFilterEnabled()
.
- FindObjects(objtable: Pro.PDF.PDFObjectTable, pathstr: str, compressed: bool = True) → Pro.Core.NTUInt64List¶
Finds objects matching the specified criteria.
- Parameters
objtable (PDFObjectTable) – The object table.
pathstr (str) – The search criteria. This can match multiple keys as well as specify allowed values (e.g.,
"Parent;Type|T;A|B"
).compressed (bool) – If
True
, includes compressed objects in the search.- Returns
Returns a list of matching objects.
- Return type
- FlattenCatalogTree(catalog: Pro.Core.NTMaxUIntTree) → Pro.Core.NTUInt64List¶
Converts a PDF catalog tree into a list of pages in their correct order as object IDs.
- Parameters
catalog (NTMaxUIntTree) – The catalog tree to convert.
- Returns
Returns the list if successful; otherwise returns an empty
Pro.Core.NTUInt64List
instance.- Return type
See also
ComputeCatalogTree()
andCatalogTreeToInverseHash()
.
- GetDictValue(dict: Pro.Core.NTStringStringHash, key: str, dflt: str = str()) → str¶
Gets the value from a PDF dictionary.
This method automatically resolves object references.
- Parameters
dict (NTStringStringHash) – The dictionary.
key (str) – The value to extract. This parameter can specify a sub-key using the semi-colon character as separator.
dflt (str) – The default value.
- Returns
Returns the value from the dictionary if successful; otherwise returns the default value.
- Return type
str
- GetEOF() → int¶
Finds the position of the “%%EOF” string.
- Returns
Returns the offset if successful; otherwise returns
Pro.Core.INVALID_STREAM_OFFSET()
.- Return type
int
See also
CurrentEOF()
.
- GetElement(offset: int) → tuple¶
Retrieves a PDF element such as a dictionary, a list or other objects.
- Parameters
offset (int) – The offset of the element.
- Returns
Returns a tuple containing the element as string and its size if successful; otherwise returns an empty string and
-1
.- Return type
str
See also
GetElementSize()
.
- GetElementSize(offset_or_str: Union[int, str], _from: int = 0) → int¶
Calculates the size of a PDF element such as a dictionary, a list or other objects.
- Parameters
offset_or_str (Union[int, str]) – Either the offset of an element or the element as string.
_from (int) – An optional start position into the string.
- Returns
Returns the size of the element.
- Return type
int
See also
GetElement()
.
- static GetFilterDefaultParameters(filter: str) → Pro.Core.NTStringStringHash¶
Retrieves the default parameters for a filter.
- Parameters
filter (str) – The filter type.
- Returns
Returns the default parameters if available; otherwise returns an empty
Pro.Core.NTStringStringHash
instance.- Return type
- GetJBIG2DecodeOptions() → int¶
- Returns
Returns the decoding options for JBIG2 streams.
- Return type
int
See also
SetJBIG2DecodeOptions()
.
- GetJBIG2DecodeTimeout() → int¶
- Returns
Returns the time-out value for the JBIG2 decoding process.
- Return type
int
See also
SetJBIG2DecodeTimeout()
.
- GetJBIG2LibraryVersion() → int¶
- Returns
Returns the version of the JBIG2 library used for decoding.
- Return type
int
See also
SetJBIG2LibraryVersion()
.
- GetObjectContent(objtable: Pro.PDF.PDFObjectTable, eid: int) → tuple¶
Retrieves the decoded stream of a PDF object and its dictionary.
- Parameters
objtable (PDFObjectTable) – The object table.
eid (int) – The object ID.
- Returns
Returns a tuple containing the decoded stream and the dictionary of the object if successful; otherwise returns a tuple containing an empty
bytes
object and an emptyNTStringStringHash()
instance.- Return type
tuple[bytes, NTStringStringHash]
See also
GetObjectContentEx()
.
- GetObjectContentEx(objtable: Pro.PDF.PDFObjectTable, eid: int) → tuple¶
Retrieves the decoded stream of a PDF object and its dictionary.
- Parameters
objtable (PDFObjectTable) – The object table.
eid (int) – The object ID.
- Returns
Returns a tuple containing the decoded stream, the dictionary of the object and an empty string if successful; otherwise returns a tuple containing an empty
bytes
object, an emptyNTStringStringHash()
instance and an error string.- Return type
tuple[bytes, NTStringStringHash, str]
See also
GetObjectContent()
.
- GetObjectTable() → Pro.PDF.PDFObjectTable¶
- Returns
Returns the internally stored object table.
- Return type
See also
SetObjectTable()
.
- GetStartXRef(pos: int = 0) → int¶
Retrieves the start cross-reference offset.
- Parameters
pos (int) – An optional start position for the search.
- Returns
Returns the offset if successful; otherwise returns
Pro.Core.INVALID_STREAM_OFFSET
.- Return type
int
See also
GetStartXRefEx()
.
- GetStartXRefEx(pos: int) → tuple¶
Retrieves the start cross-reference offset.
- Parameters
pos (int) – An optional start position for the search.
- Returns
Returns a tuple containing the offset and the offset of the “startxref” if successful; otherwise returns a tuple containing
Pro.Core.INVALID_STREAM_OFFSET
and an undefined value.- Return type
int
See also
GetStartXRef()
.
- GetStringObjectBytes(str: str) → bytes¶
Converts a string object back to its original bytes.
- Parameters
str (str) – The string object.
- Returns
Returns the raw data of the string object.
- Return type
bytes
- static GetSupportedFilterNames() → Pro.Core.NTStringList¶
- Returns
Returns the list of supported decoding filter names.
- Return type
- GetTrailer(i: int) → Pro.Core.NTStringStringHash¶
Retrieves a specified trailer dictionary.
- Parameters
i (int) – The index of the trailer dictionary to retrieve.
- Returns
Returns the specified trailer dictionary if available; otherwise returns an empty
Pro.Core.NTStringStringHash
instance.- Return type
See also
GetTrailers()
.
- GetTrailers() → Pro.Core.NTStringStringHashList¶
- Returns
Returns the list of trailer dictionaries.
- Return type
See also
GetTrailer()
.
- HasEncryption() → bool¶
- Returns
Returns
True
if the PDF has encryption; otherwise returnsFalse
.- Return type
bool
Available since Cerbero Suite 7.2 and Cerbero Engine 4.2.
See also
IsDecrypted()
andProcessEncryption()
.
- HexStringSize(str: str, _from: int = 0) → int¶
Computes the size of a hex string object.
- Parameters
str (str) – The hex string objext.
_from (int) – An optional position into the string.
- Returns
Returns the computed size.
- Return type
int
- static IsContainer(estr: str) → bool¶
Checks whether an element is either a dictionary or a list.
- Parameters
estr (str) – The element.
- Returns
Returns
True
if the element is a dictionary or a list; otherwise returnsFalse
.- Return type
bool
- IsDecrypted() → bool¶
- Returns
Returns
True
if the PDF doesn’t have encryption or was decrypted; otherwise returnsFalse
.- Return type
bool
Available since Cerbero Suite 7.2 and Cerbero Engine 4.2.
See also
HasEncryption()
andProcessEncryption()
.
- IsFilterEnabled(type: int) → bool¶
Checks whether the specified decoding filter is enabled.
- Parameters
type (int) – The decoding filter (e.g.,
ASCIIHexDecode
).- Returns
Returns
True
if the filter is enabled; otherwise returnsFalse
.- Return type
bool
See also
EnableFilter()
.
- IsValidPDF() → bool¶
- Returns
Returns
True
if the PDF document has a “%%EOF” signature; otherwise returnsFalse
.- Return type
bool
See also
GetEOF()
.
- JBIG2DecodeOpt_HelperProcess: Final[int]¶
JBIG2 decoding option to decode JBIG2 streams in a separate process.
See also
SetJBIG2DecodeOptions()
andSetJBIG2DecodeTimeout()
.
- JBIG2DecodeOpt_InProcess: Final[int]¶
Default JBIG2 decoding option to decode JBIG2 streams in the same process.
See also
GetJBIG2DecodeOptions()
andSetJBIG2DecodeOptions()
.
- JBIG2DecodeOpt_NoDecode: Final[int]¶
JBIG2 decoding option to disable the decoding of JBIG2 streams.
See also
GetJBIG2DecodeOptions()
andSetJBIG2DecodeOptions()
.
- LiteralStringSize(str: str, _from: int = 0) → int¶
Calculates the size of a literal string.
- Parameters
str (str) – The literal string.
_from (int) – An optional start position.
- Returns
Returns the size of the literal string.
- Return type
int
See also
GetElementSize()
.
- static OBJID(id: int, generation: int) → int¶
Creates an object ID from its number and generation.
- Parameters
id (int) – The number of the object.
generation (int) – The generation of the object.
- Returns
Returns the object ID.
- Return type
int
See also
OBJIDGEN()
andOBJIDNUM()
.
- static OBJIDGEN(oid: int) → int¶
Retrieves the object generation from an object ID.
- Parameters
oid (int) – The object ID.
- Returns
Returns the generation of the object.
- Return type
int
See also
OBJID()
andOBJIDNUM()
.
- static OBJIDNUM(oid: int) → int¶
Retrieves the object number from an object ID.
- Parameters
oid (int) – The object ID.
- Returns
Returns the number of the object.
- Return type
int
See also
OBJID()
andOBJIDGEN()
.
- ObjectToString(eid_or_objtable: Union[Pro.PDF.PDFObjectTable, int], eid_or_ref: Union[Pro.PDF.PDFObjectRef, int]) → str¶
Converts an object ID to a string.
- Parameters
eid_or_objtable (Union[PDFObjectTable, int]) – Either an object table or an object ID.
eid_or_ref (Union[PDFObjectRef, int]) – Either an object ID or the object reference information.
- Returns
Returns the object ID converted to string.
- Return type
str
- static PDFValueLength(offset_or_str: Union[int, str], _from: int = 0) → int¶
Calculates the size of a value.
- Parameters
offset_or_str (Union[int, str]) – The offset to the value or the string containing the value.
_from (int) – An optional start position into the string.
- Returns
Returns the size of the value.
- Return type
int
- ParseContainerElement(estr: str, eid: int = PDF_INVALID_OBJECT_REF) → Pro.Core.NTStringStringHash¶
Parses a container element such as a dictionary or a list.
- Parameters
estr (str) – The element to parse.
eid (int) – The optional object ID of the element.
- Returns
Returns the parsed element if successful; otherwise returns an empty
Pro.Core.NTStringStringHash
instance.- Return type
- ParseCrossRefEntry(bytes: bytes) → Pro.PDF.PDFObjectRef¶
Parses a cross-reference entry.
- Parameters
bytes (bytes) – The cross-reference entry.
- Returns
Returns the object reference information if successful; otherwise returns an invalid
PDFObjectRef
instance.- Return type
- ParseObject(objtable: Pro.PDF.PDFObjectTable, eid: int) → tuple¶
Parses a PDF object without decoding its stream.
- Parameters
objtable (PDFObjectTable) – The object table.
eid (int) – The object ID.
- Returns
Returns a tuple containing the size of the object, its dictionary, its stream data and its parsing information if successful; otherwise returns a tuple containing
-1
, an emptyPro.Core.NTStringStringHash
instance, an emptybytes
object and invalid parsing information.- Return type
tuple[int, NTStringStringHash, bytes, PDFObjectParseInfo]
See also
ParseObjectInfo()
,ParseObjectDictionary()
,ParseObjectContent()
andDecodeObjectStream()
.
- ParseObjectContent(objtable: Pro.PDF.PDFObjectTable, eid: int) → bytes¶
Retrieves the stream of a PDF object without decoding it.
- Parameters
objtable (PDFObjectTable) – The object table.
eid (int) – The object ID.
- Returns
Returns the stream data if successful; otherwise return an empty
bytes
object.- Return type
bytes
See also
ParseObject()
,ParseObjectInfo()
,ParseObjectDictionary()
andDecodeObjectStream()
.
- ParseObjectDictionary(objtable: Pro.PDF.PDFObjectTable, eid: int) → Pro.Core.NTStringStringHash¶
Retrieves the dictionary of a PDF object.
- Parameters
objtable (PDFObjectTable) – The object table.
eid (int) – The object ID.
- Returns
Returns the dictionary if successful; otherwise return an empty
Pro.Core.NTStringStringHash
object.- Return type
See also
ParseObject()
,ParseObjectInfo()
,ParseObjectContent()
andDecodeObjectStream()
.
- ParseObjectInfo(objtable: Pro.PDF.PDFObjectTable, eid: int) → Pro.PDF.PDFObjectParseInfo¶
Retrieves the parsing information of a PDF object.
- Parameters
objtable (PDFObjectTable) – The object table.
eid (int) – The object ID.
- Returns
Returns the parsing information if successful; otherwise returns invalid parsing information.
- Return type
See also
ParseObject()
,ParseObjectDictionary()
,ParseObjectContent()
andDecodeObjectStream()
.
- static ParseObjectName(ref: str) → int¶
Converts an object name such as “10 1 obj” into its ID.
- Parameters
ref (str) – The object name.
- Returns
Returns the object ID if successful; otherwise returns
PDF_INVALID_OBJECT_REF
.- Return type
int
See also
ParseObjectRef()
.
- static ParseObjectRef(ref: str) → int¶
Converts an object reference such as “10 1 R” into its ID.
- Parameters
ref (str) – The object reference.
- Returns
Returns the object ID if successful; otherwise returns
PDF_INVALID_OBJECT_REF
.- Return type
int
See also
ParseObjectName()
.
- static ParseObjectRefOrName(ref: str, parse_ref: bool) → int¶
Converts an object name or reference into its ID.
- Parameters
ref (str) – The object name or reference.
parse_ref (bool) – If
True
, parses an object reference; otherwise parses an object name.- Returns
Returns the object ID if successful; otherwise returns
PDF_INVALID_OBJECT_REF
.- Return type
int
See also
ParseObjectName()
andParseObjectRef()
.
- ProcessEncryption() → bool¶
Decrypts the PDF document if encrypted.
- Returns
Returns
True
if successful; otherwise returnsFalse
.- Return type
bool
See also
SetObjectTable()
.
- ReadHexString(offset: int) → tuple¶
Reads a hex strings.
- Parameters
offset (int) – The offset of the hex string.
- Returns
Returns a tuple containing the string and its size if successful; otherwise returns a tuple containing an empty string and
-1
.- Return type
str
- ReadLiteralString(offset: int, encrypted_string: bool = False) → tuple¶
Reads a literal string.
- Parameters
offset (int) – The offset of the literal string.
encrypted_string (bool) – If
True
, the reading of encrypted strings is supported.- Returns
Returns a tuple containing the string and its size if successful; otherwise returns a tuple containing an empty string and
-1
.- Return type
str
- static RegularCharsLength(offset_or_str: Union[int, str], _from: int = 0) → int¶
Calculates the length of characters not interrupted by reserved characters.
- Parameters
offset_or_str (Union[int, str]) – Either the offset to the data or the string with the data.
_from (int) – An optional start position into the string.
- Returns
Returns number of non-reserved characters.
- Return type
int
- SetJBIG2DecodeOptions(options: int) → None¶
Sets the decoding options for JBIG2 streams.
- Parameters
options (int) – The decoding options for JBIG2 streams (e.g.,
JBIG2DecodeOpt_HelperProcess
).See also
GetJBIG2DecodeOptions()
.
- SetJBIG2DecodeTimeout(timeout: int) → None¶
Sets the JBIG2 decoding time-out value.
- Parameters
timeout (int) – The time-out in milliseconds.
See also
GetJBIG2DecodeTimeout()
.
- SetJBIG2LibraryVersion(version: int) → None¶
Sets the version of the JBIG2 library used for decoding.
Available versions are:
1 - Deprecated.
2 - Default.
- Parameters
version (int) – The version of the JBIG2 library.
See also
SetJBIG2LibraryVersion()
.
- SetObjectTable(objtable: Pro.PDF.PDFObjectTable) → None¶
Sets the internally stored object table.
- Parameters
objtable (PDFObjectTable) – The object table.
See also
GetObjectTable()
.
- static SkipEmptyChars(offset_or_str: Union[int, str], _from_or_down: Union[bool, int] = True) → int¶
Skips empty characters and comments.
- Parameters
offset_or_str (Union[int, str]) – Either the offset to the data or a string containing the data.
_from_or_down (Union[bool, int]) – Either a boolean specifying the direction or an optional start position into the string.
- Returns
Returns the number of skipped characters.
- Return type
int
- SkipNewLine(offset: int) → int¶
Skips new-line characters if present.
- Parameters
offset (int) – The offset to the data.
- Returns
Returns the number of skipped bytes.
- Return type
int
- static UnescapeLiteralString(str: str) → str¶
Unescapes a literal string.
- Parameters
str (str) – The escaped string.
- Returns
Returns the unescaped string.
- Return type
str
- Unpredict(raw: bytes, filter: str, parms: Pro.Core.NTStringStringHash, eid: int = PDF_INVALID_OBJECT_REF) → bytes¶
Removes PNG prediction on input data.
- Parameters
raw (bytes) – The input data.
filter (str) – The filter name.
parms (NTStringStringHash) – The filter parameters.
eid (int) – The object ID.
- Returns
Returns the decoded data.
- Return type
bytes
- class PDFObjectParseInfo¶
This class contains the information of a parsed PDF object.
See also
PDFObject.ParseObject()
andPDFObject.ParseObjectInfo()
.Methods:
clear
()Clears the fields of the instance.
Attributes:
The offset of the object dictionary.
The size of the object dictionary.
The offset of the object.
The data of the parent object if it’s a compressed object.
The size of the object.
The offset of the object stream.
The size of the object stream.
- clear() → None¶
Clears the fields of the instance.
- dict_size¶
The size of the object dictionary.
See also
dict_offset
.
- parent_content¶
The data of the parent object if it’s a compressed object.
- stream_offset¶
The offset of the object stream.
See also
stream_size
.
- stream_size¶
The size of the object stream.
See also
stream_offset
.
- class PDFObjectRef¶
This class represents a reference to a PDF object.
Attributes:
The flags of the object (e.g.,
PDF_OBJREF_FLAG_UNREFERENCED
).The generation of the object.
The index of the object if it’s a compressed object.
The object type.
The offset of the object if it’s an uncompressed object.
The parent of the object if it’s a compressed object.
- flags¶
The flags of the object (e.g.,
PDF_OBJREF_FLAG_UNREFERENCED
).
- generation¶
The generation of the object.
- n¶
The object type.
The following values are supported:
"f"
- Free object.
"n"
- Uncompressed object.
"N"
- Compressed object.
- class PDFObjectTable¶
Dictionary of
int
->PDFObjectRef
elements.Methods:
clear
()Removes all items from the hash.
contains
(key)Checks whether
key
is present in the hash.
count
(key)Counts the numbers of values associated with
key
in the hash.
insert
(key, value)Inserts a new item with
key
and a value ofvalue
.
insertMulti
(key, value)Inserts a new item with
key
and a value ofvalue
.
isEmpty
()Checks whether the hash is empty.
iterator
()Creates an iterator for the hash.
remove
(key)Removes all the items that have
key
from the hash.
reserve
(alloc)Ensures that the internal hash table consists of at least
size
buckets.
size
()Returns the number of items in the hash.
take
(key)Removes the item with
key
from the hash and returns the value associated with it.
value
(key[, defaultValue])Returns the value associated with
key
.
- clear() → None¶
Removes all items from the hash.
- contains(key: int) → bool¶
Checks whether
key
is present in the hash.
- Parameters
key (int) – The key value to check for.
- Returns
Returns
True
if the hash contains an item with the key; otherwise returnsFalse
.- Return type
bool
See also
count()
.
- count(key: int) → int¶
Counts the numbers of values associated with
key
in the hash.
- Parameters
key (int) – The key value.
- Returns
Returns the number of items associated with the key.
- Return type
int
See also
contains()
.
- insert(key: int, value: Pro.PDF.PDFObjectRef) → None¶
Inserts a new item with
key
and a value ofvalue
.
- Parameters
key (int) – The key.
value (PDFObjectRef) – The value.
See also
insertMulti()
.
- insertMulti(key: int, value: Pro.PDF.PDFObjectRef) → None¶
Inserts a new item with
key
and a value ofvalue
.If there is already an item with the same key in the hash, this method will simply create a new one. (This behaviour is different from
insert()
, which overwrites the value of an existing item.)
- Parameters
key (int) – The key.
value (PDFObjectRef) – The value.
See also
insert()
.
- isEmpty() → bool¶
Checks whether the hash is empty.
- Returns
Returns
True
if the hash contains no items; otherwise returnsFalse
.- Return type
bool
See also
size()
.
- iterator() → Pro.PDF.PDFObjectTableIt¶
Creates an iterator for the hash.
- Returns
Returns the iterator.
- Return type
- remove(key: int) → int¶
Removes all the items that have
key
from the hash.
- Parameters
key (int) – The key to remove.
- Returns
Returns the number of items removed which is usually
1
but will be0
if the key isn’t in the hash, or greater than1
ifinsertMulti()
has been used with the key.- Return type
int
- reserve(alloc: int) → None¶
Ensures that the internal hash table consists of at least
size
buckets.
- Parameters
alloc (int) – The allocation size.
- size() → int¶
- Returns
Returns the number of items in the hash.
- Return type
int
- take(key: int) → Pro.PDF.PDFObjectRef¶
Removes the item with
key
from the hash and returns the value associated with it.If the item does not exist in the hash, the method simply returns a default-constructed value. If there are multiple items for key in the hash, only the most recently inserted one is removed.
If you don’t use the return value,
remove()
is more efficient.
- Parameters
key (int) – The key.
- Returns
Returns the removed value.
- Return type
See also
remove()
.
- value(key: int, defaultValue: Optional[Pro.PDF.PDFObjectRef] = None) → Pro.PDF.PDFObjectRef¶
Returns the value associated with
key
. If the hash contains no item withkey
, the method returns a default-constructed value ifdefaultValue
is not provided. If there are multiple items forkey
in the hash, the value of the most recently inserted one is returned.
- Parameters
key (int) – The key.
defaultValue (Optional[PDFObjectRef]) – The default value to return if
key
is not present in the hash.- Returns
Returns the value associated with
key
.- Return type
See also
contains()
.
- class PDFObjectTableIt(obj: Pro.PDF.PDFObjectTable)¶
Iterator class for
PDFObjectTable
.
- Parameters
obj (PDFObjectTable) – The object to iterate over.
Methods:
hasNext
()Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.
key
()Returns the key of the last item that was jumped over using one of the traversal functions (
previous()
,next()
).
next
()Returns the next item and advances the iterator by one position.
previous
()Returns the previous item and moves the iterator back by one position.
toBack
()Moves the iterator to the back of the container (after the last item).
toFront
()Moves the iterator to the front of the container (before the first item).
value
()Returns the value of the last item that was jumped over using one of the traversal functions (
previous()
,next()
).
- hasNext() → bool¶
- Returns
Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.- Return type
bool
See also
hasPrevious()
andnext()
.
- hasPrevious() → bool¶
- Returns
Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.- Return type
bool
See also
hasNext()
andprevious()
.
- key() → int¶
- Returns
Returns the key of the last item that was jumped over using one of the traversal functions (
previous()
,next()
).- Return type
int
See also
value()
.
- next() → None¶
- Returns
Returns the next item and advances the iterator by one position.
- Return type
None
See also
hasNext()
andprevious()
.
- previous() → None¶
- Returns
Returns the previous item and moves the iterator back by one position.
- Return type
None
See also
hasPrevious()
andnext()
.
- toBack() → None¶
Moves the iterator to the back of the container (after the last item).
See also
toFront()
andprevious()
.
- toFront() → None¶
Moves the iterator to the front of the container (before the first item).
- value() → Pro.PDF.PDFObjectRef¶
- Returns
Returns the value of the last item that was jumped over using one of the traversal functions (
previous()
,next()
).- Return type
See also
key()
.
- PDF_CROSSREF_ENTRY_SIZE: Final[int]¶
Size of an entry in a cross-reference table.
- PDF_CreateEncryptionKey(Password: bytes, O: bytes, P: int, ID_1: bytes, EncryptMetaData: bool, Revision: int, KeyLenInBytes: int) → tuple¶
Generates an encryption key for a PDF document.
- Parameters
Password (bytes) – The password.
O (bytes) – The O parameter.
P (int) – The P parameter.
ID_1 (bytes) – The ID_1 parameter.
EncryptMetaData (bool) – The EncryptMetadata parameter.
Revision (int) – The cryptographic revision number.
KeyLenInBytes (int) – The length of the key.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]
- PDF_GenerateOwnerKey(OwnerKey: bytes, UserKey: bytes, Revision: int, KeyLenInBytes: int) → tuple¶
Generates an owner key for a PDF document.
- Parameters
OwnerKey (bytes) – The optional owner key.
UserKey (bytes) – The optional user key.
Revision (int) – The cryptographic revision number.
KeyLenInBytes (int) – The length of the key.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]
- PDF_GenerateUserKey(UserKey: bytes, O: bytes, P: int, ID_1: bytes, EncryptMetaData: bool, Revision: int, KeyLenInBytes: int) → tuple¶
Generates a user key for a PDF document.
- Parameters
UserKey (bytes) – The optional user key.
O (bytes) – The O parameter.
P (int) – The P parameter.
ID_1 (bytes) – The ID_1 parameter.
EncryptMetaData (bool) – The EncryptMetadata parameter.
Revision (int) – The cryptographic revision number.
KeyLenInBytes (int) – The length of the key.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]
- PDF_INVALID_OBJECT_REF: Final[int]¶
Invalid PDF object id.
- PDF_OBJREF_FLAG_UNREFERENCED: Final[int]¶
Flag for unreferenced PDF objects.
See also
PDFObjectRef.flags
.
- PDF_R5_GenerateDecryptionKey(user_password: bytes, O: bytes, OE: bytes, U: bytes, UE: bytes) → tuple¶
Generates a revision 5 decryption key for a PDF document.
- Parameters
user_password (bytes) – The optional user password.
O (bytes) – The O parameter.
OE (bytes) – The OE parameter.
U (bytes) – The U parameter.
UE (bytes) – The UE parameter.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]
- PDF_R5_GenerateOwnerDecryptionKey(usrpwd: bytes, O: bytes, OE: bytes, U: bytes) → tuple¶
Generates a revision 5 owner decryption key for a PDF document.
- Parameters
usrpwd (bytes) – The user password.
O (bytes) – The O parameter.
OE (bytes) – The OE parameter.
U (bytes) – The U parameter.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]
- PDF_R5_GenerateUserDecryptionKey(usrpwd: bytes, U: bytes, UE: bytes) → tuple¶
Generates a revision 5 user decryption key for a PDF document.
- Parameters
usrpwd (bytes) – The user password.
U (bytes) – The U parameter.
UE (bytes) – The UE parameter.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]
- PDF_R6_GenerateDecryptionKey(user_password: bytes, O: bytes, OE: bytes, U: bytes, UE: bytes) → tuple¶
Generates a revision 6 decryption key for a PDF document.
- Parameters
user_password (bytes) – The optional user password.
O (bytes) – The O parameter.
OE (bytes) – The OE parameter.
U (bytes) – The U parameter.
UE (bytes) – The UE parameter.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]
- PDF_R6_GenerateUserDecryptionKey(usrpwd: bytes, U: bytes, UE: bytes) → tuple¶
Generates a revision 6 user decryption key for a PDF document.
- Parameters
usrpwd (bytes) – The user password.
U (bytes) – The U parameter.
UE (bytes) – The UE parameter.
- Returns
Returns a tuple containing a boolean and the generated key. The boolean value is
True
if successful; otherwise it isFalse
.- Return type
tuple[bool, bytes]