Pro.RTF
— API for parsing Rich-Text documents¶
Extracting Objects¶
The following code example demonstrates how to extract objects from an RTF document:
from Pro.Core import *
from Pro.RTF import *
def parseRTF(fname):
c = createContainerFromFile(fname)
if c.isNull():
return
obj = RTFObject()
if not obj.Load(c):
return
class Visitor(RTFObjectVisitor):
def __init__(self):
super().__init__()
def visit(self, type, start, size):
if type == RTF_EO_Object:
print("start:", hex(start), "size:", hex(size))
data = obj.ExtractObject(start, size)
obj.DetectObjects(Visitor())
Module API¶
Pro.RTF module API.
Classes:
Represents a destination in the RTF document, which is a group of text and control words enclosed within braces ({}).
List of
RTFDestination
elements.
RTFDestinationListIt
(obj)Iterator class for
RTFDestinationList
.Represents an RTF document object and provides methods to parse and manipulate RTF content.
Interface for visiting embedded objects and pictures within an RTF document.
Helper class for parsing RTF documents.
Attributes:
Indicates an embedded object in the RTF document.
Indicates an embedded picture in the RTF document.
- class RTFDestination¶
Represents a destination in the RTF document, which is a group of text and control words enclosed within braces ({}). Destinations can contain text and formatting commands.
See also
RTFParseHelper
.Attributes:
The character set code page used in this destination.
The control word associated with this destination.
The data content of this destination.
The offset in the document where the destination ends.
The offset in the document where the destination starts.
The end offset of this destination in the document.
Flags associated with this destination.
The group nesting level of this destination.
The start offset of this destination in the document.
- charset¶
The character set code page used in this destination.
- cword¶
The control word associated with this destination.
- data¶
The data content of this destination.
- destination_end¶
The offset in the document where the destination ends.
- destination_start¶
The offset in the document where the destination starts.
- end¶
The end offset of this destination in the document.
- flags¶
Flags associated with this destination.
- group_level¶
The group nesting level of this destination.
- start¶
The start offset of this destination in the document.
- class RTFDestinationList¶
List of
RTFDestination
elements.Methods:
append
(value)Inserts
value
at the end of the list.
at
(i)Returns the item at index position
i
in the list.
clear
()Removes all items from the list.
contains
(value)Checks the presence of an element in the list.
count
(value)Returns the number of occurrences of
value
in the list.
indexOf
(value[, start])Searches for an element in the list.
insert
(i, value)Inserts
value
at index positioni
in the list.
isEmpty
()Checks whether the list is empty.
iterator
()Creates an iterator for the list.
removeAll
(value)Removes all occurrences of
value
in the list and returns the number of entries removed.
removeAt
(i)Removes the item at index position
i
.
reserve
(alloc)Reserve space for
alloc
elements.
size
()Returns the number of items in the list.
takeAt
(i)Removes the item at index position
i
and returns it.
- append(value: Pro.RTF.RTFDestination) → None¶
Inserts
value
at the end of the list.
- Parameters
value (RTFDestination) – The value to add to the list.
See also
insert()
.
- at(i: int) → Pro.RTF.RTFDestination¶
Returns the item at index position
i
in the list.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the element to return.
- Returns
Returns the requested element.
- Return type
- clear() → None¶
Removes all items from the list.
- contains(value: Pro.RTF.RTFDestination) → bool¶
Checks the presence of an element in the list.
- Parameters
value (RTFDestination) – The value to check for.
- Returns
Returns
True
if the list contains an occurrence ofvalue
; otherwise returnsFalse
.- Return type
bool
- count(value: Pro.RTF.RTFDestination) → int¶
Returns the number of occurrences of
value
in the list.
- Parameters
value (RTFDestination) – The value to count.
- Returns
Returns the number of occurrences.
- Return type
int
See also
indexOf()
andcontains()
.
- indexOf(value: Pro.RTF.RTFDestination, start: int = 0) → int¶
Searches for an element in the list.
- Parameters
value (RTFDestination) – The value to search for.
start (int) – The start index.
- Returns
Returns the index position of the first occurrence of
value
in the list. Returns-1
if no item was found.- Return type
int
See also
contains()
.
- insert(i: int, value: Pro.RTF.RTFDestination) → None¶
Inserts
value
at index positioni
in the list. Ifi
is0
, the value is prepended to the list. Ifi
issize()
, the value is appended to the list.
- Parameters
i (int) – The position at which to add the value.
value (RTFDestination) – The value to add.
See also
append()
andremoveAt()
.
- isEmpty() → bool¶
Checks whether the list is empty.
- Returns
Returns
True
if the list contains no items; otherwise returnsFalse
.- Return type
bool
See also
size()
.
- iterator() → Pro.RTF.RTFDestinationListIt¶
Creates an iterator for the list.
- Returns
Returns the iterator.
- Return type
- removeAll(value: Pro.RTF.RTFDestination) → int¶
Removes all occurrences of
value
in the list and returns the number of entries removed.
- Parameters
value (RTFDestination) – The value to remove from the list.
- Returns
Returns the number of entries removed.
- Return type
int
See also
removeAt()
.
- removeAt(i: int) → None¶
Removes the item at index position
i
.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the item to remove.
See also
removeAll()
.
- reserve(alloc: int) → None¶
Reserve space for
alloc
elements. Calling this method doesn’t change the size of the list.
- Parameters
alloc (int) – The amount of elements to reserve space for.
- takeAt(i: int) → Pro.RTF.RTFDestination¶
Removes the item at index position
i
and returns it.i
must be a valid index position in the list (i.e.,0 <= i < size()
).
- Parameters
i (int) – The index of the element to remove from the list.
- Returns
Returns the removed element. If you don’t use the return value,
removeAt()
is more efficient.- Return type
See also
removeAt()
.
- class RTFDestinationListIt(obj: Pro.RTF.RTFDestinationList)¶
Iterator class for
RTFDestinationList
.
- Parameters
obj (RTFDestinationList) – The object to iterate over.
Methods:
hasNext
()Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.
next
()Returns the next item and advances the iterator by one position.
previous
()Returns the previous item and moves the iterator back by one position.
toBack
()Moves the iterator to the back of the container (after the last item).
toFront
()Moves the iterator to the front of the container (before the first item).
- hasNext() → bool¶
- Returns
Returns
True
if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returnsFalse
.- Return type
bool
See also
hasPrevious()
andnext()
.
- hasPrevious() → bool¶
- Returns
Returns
True
if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returnsFalse
.- Return type
bool
See also
hasNext()
andprevious()
.
- next() → Pro.RTF.RTFDestination¶
- Returns
Returns the next item and advances the iterator by one position.
- Return type
See also
hasNext()
andprevious()
.
- previous() → Pro.RTF.RTFDestination¶
- Returns
Returns the previous item and moves the iterator back by one position.
- Return type
See also
hasPrevious()
andnext()
.
- toBack() → None¶
Moves the iterator to the back of the container (after the last item).
See also
toFront()
andprevious()
.
- class RTFObject¶
Bases:
Pro.Core.CFFObject
Represents an RTF document object and provides methods to parse and manipulate RTF content.
Methods:
DetectObjects
(visitor)Detects embedded objects and pictures in the RTF document.
ExtractObject
(start, size)Extracts an embedded object or picture from the RTF document at the specified position.
Output
(out)Outputs the RTF content to the provided text stream.
Parse
(helper[, segment])Parses the RTF document using the provided helper.
- DetectObjects(visitor: Pro.RTF.RTFObjectVisitor) → bool¶
Detects embedded objects and pictures in the RTF document.
This method traverses the RTF content and notifies the provided visitor for each embedded object or picture found.
- Parameters
visitor (RTFObjectVisitor) – The visitor that will handle embedded objects or pictures.
- Returns
Returns
True
if objects were successfully detected; otherwise returnsFalse
.- Return type
bool
See also
RTFObjectVisitor.visit()
.
- ExtractObject(start: int, size: int) → Pro.Core.NTContainer¶
Extracts an embedded object or picture from the RTF document at the specified position.
- Parameters
start (int) – The starting offset of the object within the document.
size (int) – The size of the object to extract.
- Returns
Returns the extracted object as a container.
- Return type
See also
DetectObjects()
.
- Output(out: Pro.Core.NTTextStream) → None¶
Outputs the RTF content to the provided text stream.
- Parameters
out (NTTextStream) – The text stream to which the RTF content will be written.
- Parse(helper: Pro.RTF.RTFParseHelper, segment: Optional[Pro.Core.NTOffsetRange] = None) → bool¶
Parses the RTF document using the provided helper.
- Parameters
helper (RTFParseHelper) – The helper object that assists in parsing.
segment (Optional[NTOffsetRange]) – An optional segment of the document to parse.
- Returns
Returns
True
if parsing was successful; otherwise returnsFalse
.- Return type
bool
See also
RTFParseHelper
.
- class RTFObjectVisitor¶
Interface for visiting embedded objects and pictures within an RTF document.
Implement this class to handle embedded objects found during the detection process.
See also
RTFObject.DetectObjects()
.Methods:
visit
(type, start, size)Called for each embedded object or picture found in the RTF document.
- visit(type: int, start: int, size: int) → None¶
Called for each embedded object or picture found in the RTF document.
- Parameters
type (int) – The type of the embedded element. Can be
RTF_EO_Object
orRTF_EO_Picture
.start (int) – The starting offset of the embedded element within the document.
size (int) – The size of the embedded element.
See also
RTF_EO_Object
,RTF_EO_Picture
.
- class RTFParseHelper¶
Helper class for parsing RTF documents.
This class provides methods for handling control words, symbols, groups, and text during the parsing process.
See also
RTFObject.Parse()
.Methods:
BinData
(bin_data)Handles binary data encountered during parsing.
Clear
()Clears the parsing state and resets the helper to its initial state.
Closes the current destination and updates the parsing context accordingly.
Closes the current group level and decreases the group nesting level.
ControlSymbol
(symbol)Handles a control symbol encountered in the RTF content.
ControlWord
(cword, has_param, param)Handles a control word encountered in the RTF content.
Opens a new destination, starting a new parsing context for RTF content.
Opens a new group level, increasing the group nesting level.
Retrieves the previous destination from the parsing context.
Text
(text, len)Handles text content encountered during parsing.
Attributes:
The buffer containing the RTF content to be parsed.
The current destination being parsed.
A list of parsed destinations.
The current group nesting level during parsing.
The
RTFObject
associated with this parse helper.
- BinData(bin_data: Pro.Core.NTContainer) → None¶
Handles binary data encountered during parsing.
- Parameters
bin_data (NTContainer) – The binary data container.
- Clear() → None¶
Clears the parsing state and resets the helper to its initial state.
- CloseDestination() → None¶
Closes the current destination and updates the parsing context accordingly.
- CloseGroup() → None¶
Closes the current group level and decreases the group nesting level.
- ControlSymbol(symbol: str) → None¶
Handles a control symbol encountered in the RTF content.
- Parameters
symbol (str) – The control symbol character.
- ControlWord(cword: str, has_param: bool, param: int) → None¶
Handles a control word encountered in the RTF content.
- Parameters
cword (str) – The control word string.
has_param (bool) – Indicates whether the control word has an associated parameter.
param (int) – The parameter value associated with the control word, if any.
- OpenDestination() → None¶
Opens a new destination, starting a new parsing context for RTF content.
- OpenGroup() → None¶
Opens a new group level, increasing the group nesting level.
- PreviousDestination() → Pro.RTF.RTFDestination¶
Retrieves the previous destination from the parsing context.
- Returns
The previous destination.
- Return type
See also
current_destination
.
- Text(text: str, len: int) → None¶
Handles text content encountered during parsing.
- Parameters
text (str) – The text content.
len (int) – The length of the text.
- buffer¶
The buffer containing the RTF content to be parsed.
- current_destination¶
The current destination being parsed.
- destinations¶
A list of parsed destinations.
- group_level¶
The current group nesting level during parsing.
- RTF_EO_Object: Final[int]¶
Indicates an embedded object in the RTF document.
Used in
RTFObjectVisitor.visit()
.
- RTF_EO_Picture: Final[int]¶
Indicates an embedded picture in the RTF document.
Used in
RTFObjectVisitor.visit()
.