Pro.XML
— API for parsing XML documents¶
Extracting JavaScript from an XDP Document¶
XPD objects (XML Data Package) are XML documents contained in PDF documents that can contain JavaScript code and even embed other PDF documents.
The following code example demonstrates how to extract JavaScript code from an XDP document:
from Pro.Core import *
from Pro.XML import *
def parseXDP(fname):
c = createContainerFromFile(fname)
if c.isNull():
return
obj = XMLObject()
if not obj.Load(c):
return
def callback(ud, type, opaque_identifier):
if type == XMLObj_JavaScript:
c = obj.GetObject(type, opaque_identifier)
if c.isValid():
print(c.read(0, c.size()).decode("utf-8", errors="ignore"))
return True
obj.SetIdentifiedObjectsCallback(callback, None)
obj.IdentifyObjects()
Module API¶
Pro.XML module API.
Attributes:
Represents a file object embedded within the XML document.
Represents a JavaScript object embedded within the XML document.
Represents a VBScript object embedded within the XML document.
Classes:
Represents an XML document object and provides methods to parse and manipulate XML content.
Helper class for parsing XML documents.
- XMLObj_File: Final[int]¶
Represents a file object embedded within the XML document.
See also
XMLObject.GetObject()
.
- XMLObj_JavaScript: Final[int]¶
Represents a JavaScript object embedded within the XML document.
See also
XMLObject.GetObject()
.
- XMLObj_VBS: Final[int]¶
Represents a VBScript object embedded within the XML document.
See also
XMLObject.GetObject()
.
- class XMLObject¶
Bases:
Pro.Core.CFFObject
Represents an XML document object and provides methods to parse and manipulate XML content.
See also
XMLParseHelper
.Methods:
GetObject
(type, opaque_identifier)Retrieves an embedded object from the XML document.
GetXML
()Retrieves the XML document associated with this object.
Retrieves the XML parse helper associated with this object.
Identifies embedded objects within the XML document.
Identifies the type of XML content based on the element name.
SetIdentifiedObjectsCallback
(cb, ud)Sets a callback function to be invoked when embedded objects are identified.
SetXML
(xml)Sets the XML document for this object.
SetXMLHelper
(h_or_name)Sets the XML parse helper for this object.
- GetObject(type: int, opaque_identifier: bytes) → Pro.Core.NTContainer¶
Retrieves an embedded object from the XML document.
- Parameters
type (int) – The type of the object to retrieve (e.g.,
XMLObj_File
).opaque_identifier (bytes) – An opaque identifier used to locate the object within the XML structure.
- Returns
Returns the requested object encapsulated in an
NTContainer
.- Return type
See also
XMLObj_File
,XMLObj_JavaScript
andXMLObj_VBS
.
- GetXML() → Pro.Core.NTXml¶
Retrieves the XML document associated with this object.
- Returns
Returns the XML content as an
Pro.Core.NTXml
object.- Return type
See also
SetXML()
.
- GetXMLHelper() → Pro.XML.XMLParseHelper¶
Retrieves the XML parse helper associated with this object.
- Returns
Returns the XML parse helper.
- Return type
See also
SetXMLHelper()
.
- IdentifyObjects() → None¶
Identifies embedded objects within the XML document.
This method processes the XML content to locate and identify any embedded objects, such as files or scripts.
See also
GetObject()
.
- static IdentifyXMLFromElementName(name: str) → str¶
Identifies the type of XML content based on the element name.
- Parameters
name (str) – The name of the XML element.
- Returns
Returns a string representing the identified type of XML content.
- Return type
str
- SetIdentifiedObjectsCallback(cb: object, ud: object) → None¶
Sets a callback function to be invoked when embedded objects are identified.
- Parameters
cb (object) – The callback function to be called.
ud (object) – User-defined data to be passed to the callback.
See also
IdentifyObjects()
.
- SetXML(xml: Pro.Core.NTXml) → None¶
Sets the XML document for this object.
- Parameters
xml (NTXml) – The XML content to associate with this object.
See also
GetXML()
.
- SetXMLHelper(h_or_name: Union[Pro.XML.XMLParseHelper, str]) → None¶
Sets the XML parse helper for this object.
- Parameters
h_or_name (Union[XMLParseHelper, str]) – The parse helper to set or the name of the helper.
See also
GetXMLHelper()
.
- class XMLParseHelper¶
Helper class for parsing XML documents.
Provides methods to handle embedded objects and assist in the parsing process.
See also
XMLObject
.Methods:
GetObject
(type, opaque_identifier)Retrieves an embedded object during the parsing process.
Identifies embedded objects within the XML content during parsing.
- GetObject(type: int, opaque_identifier: bytes) → Pro.Core.NTContainer¶
Retrieves an embedded object during the parsing process.
- Parameters
type (int) – The type of the object to retrieve (e.g.,
XMLObj_File
).opaque_identifier (bytes) – An identifier used to locate the object.
- Returns
Returns the embedded object as an
Pro.Core.NTContainer
.- Return type
See also
IdentifyObjects()
.
- IdentifyObjects() → None¶
Identifies embedded objects within the XML content during parsing.
This method scans the XML structure to locate embedded files or scripts.
See also
GetObject()
.