Pro.RTF — API for parsing Rich-Text documents

Overview

The Pro.RTF module contains the API for parsing Rich-Text documents.

Extracting Objects

The following code example demonstrates how to extract objects from an RTF document:

from Pro.Core import *
from Pro.RTF import *

def parseRTF(fname):
    c = createContainerFromFile(fname)
    if c.isNull():
        return
    obj = RTFObject()
    if not obj.Load(c):
        return
    class Visitor(RTFObjectVisitor):
        def __init__(self):
            super().__init__()
        def visit(self, type, start, size):
            if type == RTF_EO_Object:
                print("start:", hex(start), "size:", hex(size))
                data = obj.ExtractObject(start, size)
    obj.DetectObjects(Visitor())

Module API

Pro.RTF module API.

Classes:

RTFDestination()

Represents a destination in the RTF document, which is a group of text and control words enclosed within braces ({}).

RTFDestinationList()

List of RTFDestination elements.

RTFDestinationListIt(obj)

Iterator class for RTFDestinationList.

RTFObject()

Represents an RTF document object and provides methods to parse and manipulate RTF content.

RTFObjectVisitor()

Interface for visiting embedded objects and pictures within an RTF document.

RTFParseHelper()

Helper class for parsing RTF documents.

Attributes:

RTF_EO_Object

Indicates an embedded object in the RTF document.

RTF_EO_Picture

Indicates an embedded picture in the RTF document.

class RTFDestination

Represents a destination in the RTF document, which is a group of text and control words enclosed within braces ({}). Destinations can contain text and formatting commands.

See also RTFParseHelper.

Attributes:

charset

The character set code page used in this destination.

cword

The control word associated with this destination.

data

The data content of this destination.

destination_end

The offset in the document where the destination ends.

destination_start

The offset in the document where the destination starts.

end

The end offset of this destination in the document.

flags

Flags associated with this destination.

group_level

The group nesting level of this destination.

start

The start offset of this destination in the document.

charset

The character set code page used in this destination.

cword

The control word associated with this destination.

data

The data content of this destination.

destination_end

The offset in the document where the destination ends.

destination_start

The offset in the document where the destination starts.

end

The end offset of this destination in the document.

flags

Flags associated with this destination.

group_level

The group nesting level of this destination.

start

The start offset of this destination in the document.

class RTFDestinationList

List of RTFDestination elements.

Methods:

append(value)

Inserts value at the end of the list.

at(i)

Returns the item at index position i in the list.

clear()

Removes all items from the list.

contains(value)

Checks the presence of an element in the list.

count(value)

Returns the number of occurrences of value in the list.

indexOf(value[, start])

Searches for an element in the list.

insert(i, value)

Inserts value at index position i in the list.

isEmpty()

Checks whether the list is empty.

iterator()

Creates an iterator for the list.

removeAll(value)

Removes all occurrences of value in the list and returns the number of entries removed.

removeAt(i)

Removes the item at index position i.

reserve(alloc)

Reserve space for alloc elements.

size()

Returns the number of items in the list.

takeAt(i)

Removes the item at index position i and returns it.

append(value: Pro.RTF.RTFDestination)None

Inserts value at the end of the list.

Parameters

value (RTFDestination) – The value to add to the list.

See also insert().

at(i: int)Pro.RTF.RTFDestination

Returns the item at index position i in the list. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to return.

Returns

Returns the requested element.

Return type

RTFDestination

clear()None

Removes all items from the list.

contains(value: Pro.RTF.RTFDestination)bool

Checks the presence of an element in the list.

Parameters

value (RTFDestination) – The value to check for.

Returns

Returns True if the list contains an occurrence of value; otherwise returns False.

Return type

bool

See also indexOf() and count().

count(value: Pro.RTF.RTFDestination)int

Returns the number of occurrences of value in the list.

Parameters

value (RTFDestination) – The value to count.

Returns

Returns the number of occurrences.

Return type

int

See also indexOf() and contains().

indexOf(value: Pro.RTF.RTFDestination, start: int = 0)int

Searches for an element in the list.

Parameters
  • value (RTFDestination) – The value to search for.

  • start (int) – The start index.

Returns

Returns the index position of the first occurrence of value in the list. Returns -1 if no item was found.

Return type

int

See also contains().

insert(i: int, value: Pro.RTF.RTFDestination)None

Inserts value at index position i in the list. If i is 0, the value is prepended to the list. If i is size(), the value is appended to the list.

Parameters
  • i (int) – The position at which to add the value.

  • value (RTFDestination) – The value to add.

See also append() and removeAt().

isEmpty()bool

Checks whether the list is empty.

Returns

Returns True if the list contains no items; otherwise returns False.

Return type

bool

See also size().

iterator()Pro.RTF.RTFDestinationListIt

Creates an iterator for the list.

Returns

Returns the iterator.

Return type

RTFDestinationListIt

removeAll(value: Pro.RTF.RTFDestination)int

Removes all occurrences of value in the list and returns the number of entries removed.

Parameters

value (RTFDestination) – The value to remove from the list.

Returns

Returns the number of entries removed.

Return type

int

See also removeAt().

removeAt(i: int)None

Removes the item at index position i. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the item to remove.

See also removeAll().

reserve(alloc: int)None

Reserve space for alloc elements. Calling this method doesn’t change the size of the list.

Parameters

alloc (int) – The amount of elements to reserve space for.

size()int
Returns

Returns the number of items in the list.

Return type

int

See also isEmpty().

takeAt(i: int)Pro.RTF.RTFDestination

Removes the item at index position i and returns it. i must be a valid index position in the list (i.e., 0 <= i < size()).

Parameters

i (int) – The index of the element to remove from the list.

Returns

Returns the removed element. If you don’t use the return value, removeAt() is more efficient.

Return type

RTFDestination

See also removeAt().

class RTFDestinationListIt(obj: Pro.RTF.RTFDestinationList)

Iterator class for RTFDestinationList.

Parameters

obj (RTFDestinationList) – The object to iterate over.

Methods:

hasNext()

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

hasPrevious()

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

next()

Returns the next item and advances the iterator by one position.

previous()

Returns the previous item and moves the iterator back by one position.

toBack()

Moves the iterator to the back of the container (after the last item).

toFront()

Moves the iterator to the front of the container (before the first item).

hasNext()bool
Returns

Returns True if there is at least one item ahead of the iterator, i.e. the iterator is not at the back of the container; otherwise returns False.

Return type

bool

See also hasPrevious() and next().

hasPrevious()bool
Returns

Returns True if there is at least one item behind the iterator, i.e. the iterator is not at the front of the container; otherwise returns False.

Return type

bool

See also hasNext() and previous().

next()Pro.RTF.RTFDestination
Returns

Returns the next item and advances the iterator by one position.

Return type

RTFDestination

See also hasNext() and previous().

previous()Pro.RTF.RTFDestination
Returns

Returns the previous item and moves the iterator back by one position.

Return type

RTFDestination

See also hasPrevious() and next().

toBack()None

Moves the iterator to the back of the container (after the last item).

See also toFront() and previous().

toFront()None

Moves the iterator to the front of the container (before the first item).

See also toBack() and next().

class RTFObject

Bases: Pro.Core.CFFObject

Represents an RTF document object and provides methods to parse and manipulate RTF content.

Methods:

DetectObjects(visitor)

Detects embedded objects and pictures in the RTF document.

ExtractObject(start, size)

Extracts an embedded object or picture from the RTF document at the specified position.

Output(out)

Outputs the RTF content to the provided text stream.

Parse(helper[, segment])

Parses the RTF document using the provided helper.

DetectObjects(visitor: Pro.RTF.RTFObjectVisitor)bool

Detects embedded objects and pictures in the RTF document.

This method traverses the RTF content and notifies the provided visitor for each embedded object or picture found.

Parameters

visitor (RTFObjectVisitor) – The visitor that will handle embedded objects or pictures.

Returns

Returns True if objects were successfully detected; otherwise returns False.

Return type

bool

See also RTFObjectVisitor.visit().

ExtractObject(start: int, size: int)Pro.Core.NTContainer

Extracts an embedded object or picture from the RTF document at the specified position.

Parameters
  • start (int) – The starting offset of the object within the document.

  • size (int) – The size of the object to extract.

Returns

Returns the extracted object as a container.

Return type

NTContainer

See also DetectObjects().

Output(out: Pro.Core.NTTextStream)None

Outputs the RTF content to the provided text stream.

Parameters

out (NTTextStream) – The text stream to which the RTF content will be written.

Parse(helper: Pro.RTF.RTFParseHelper, segment: Optional[Pro.Core.NTOffsetRange] = None)bool

Parses the RTF document using the provided helper.

Parameters
  • helper (RTFParseHelper) – The helper object that assists in parsing.

  • segment (Optional[NTOffsetRange]) – An optional segment of the document to parse.

Returns

Returns True if parsing was successful; otherwise returns False.

Return type

bool

See also RTFParseHelper.

class RTFObjectVisitor

Interface for visiting embedded objects and pictures within an RTF document.

Implement this class to handle embedded objects found during the detection process.

See also RTFObject.DetectObjects().

Methods:

visit(type, start, size)

Called for each embedded object or picture found in the RTF document.

visit(type: int, start: int, size: int)None

Called for each embedded object or picture found in the RTF document.

Parameters
  • type (int) – The type of the embedded element. Can be RTF_EO_Object or RTF_EO_Picture.

  • start (int) – The starting offset of the embedded element within the document.

  • size (int) – The size of the embedded element.

See also RTF_EO_Object, RTF_EO_Picture.

class RTFParseHelper

Helper class for parsing RTF documents.

This class provides methods for handling control words, symbols, groups, and text during the parsing process.

See also RTFObject.Parse().

Methods:

BinData(bin_data)

Handles binary data encountered during parsing.

Clear()

Clears the parsing state and resets the helper to its initial state.

CloseDestination()

Closes the current destination and updates the parsing context accordingly.

CloseGroup()

Closes the current group level and decreases the group nesting level.

ControlSymbol(symbol)

Handles a control symbol encountered in the RTF content.

ControlWord(cword, has_param, param)

Handles a control word encountered in the RTF content.

OpenDestination()

Opens a new destination, starting a new parsing context for RTF content.

OpenGroup()

Opens a new group level, increasing the group nesting level.

PreviousDestination()

Retrieves the previous destination from the parsing context.

Text(text, len)

Handles text content encountered during parsing.

Attributes:

buffer

The buffer containing the RTF content to be parsed.

current_destination

The current destination being parsed.

destinations

A list of parsed destinations.

group_level

The current group nesting level during parsing.

obj

The RTFObject associated with this parse helper.

BinData(bin_data: Pro.Core.NTContainer)None

Handles binary data encountered during parsing.

Parameters

bin_data (NTContainer) – The binary data container.

Clear()None

Clears the parsing state and resets the helper to its initial state.

CloseDestination()None

Closes the current destination and updates the parsing context accordingly.

CloseGroup()None

Closes the current group level and decreases the group nesting level.

ControlSymbol(symbol: str)None

Handles a control symbol encountered in the RTF content.

Parameters

symbol (str) – The control symbol character.

ControlWord(cword: str, has_param: bool, param: int)None

Handles a control word encountered in the RTF content.

Parameters
  • cword (str) – The control word string.

  • has_param (bool) – Indicates whether the control word has an associated parameter.

  • param (int) – The parameter value associated with the control word, if any.

OpenDestination()None

Opens a new destination, starting a new parsing context for RTF content.

OpenGroup()None

Opens a new group level, increasing the group nesting level.

PreviousDestination()Pro.RTF.RTFDestination

Retrieves the previous destination from the parsing context.

Returns

The previous destination.

Return type

RTFDestination

See also current_destination.

Text(text: str, len: int)None

Handles text content encountered during parsing.

Parameters
  • text (str) – The text content.

  • len (int) – The length of the text.

buffer

The buffer containing the RTF content to be parsed.

current_destination

The current destination being parsed.

destinations

A list of parsed destinations.

group_level

The current group nesting level during parsing.

obj

The RTFObject associated with this parse helper.

RTF_EO_Object: Final[int]

Indicates an embedded object in the RTF document.

Used in RTFObjectVisitor.visit().

RTF_EO_Picture: Final[int]

Indicates an embedded picture in the RTF document.

Used in RTFObjectVisitor.visit().