Understanding Headers and Structures

Introduction

Structures are the aggregate data type available in Cerbero Suite and Cerbero Engine. They can be used to parse files, viewed in tables, and displayed in hex view layouts. Structures are represented by the Pro.Core.CFFStruct class.

Headers are the containers in which structures reside. They are databases that can be either SQLite files or XML strings. Headers are represented by the Pro.Core.CFFHeader class.

Structures have an XML schema that can be created either manually or more conveniently by using the Header Manager tool to convert them from C/C++ structures. The Header Manager tool features a full-fledged C/C++ parser for this purpose. SQLite headers can also be edited using the Header Manager tool.

This guide provides a comprehensive overview of the capabilities of structures, the C/C++ features they support, and the extent of the various compiler standards they support.

Headers

A Pro.Core.CFFHeader represents a database in which structures are stored.

The following code example shows how to retrieve a specific structure from a header and use it.

from Pro.Core import *

def output(s):
    out = proTextStream()
    s.Dump(out)
    print(out.buffer)

obj = proCoreContext().currentScanProvider().getObject()
hdr = CFFHeader()
if hdr.LoadFromFile("WinNT"):
    s = obj.MakeStruct(hdr, "_IMAGE_DOS_HEADER", 0, CFFSO_Pack1)
    output(s)

The output of the script is:

e_magic   : 5A4D
e_cblp    : 0090
e_cp      : 0003
e_crlc    : 0000
e_cparhdr : 0004
e_minalloc: 0000
e_maxalloc: FFFF
e_ss      : 0000
e_sp      : 00B8
e_csum    : 0000
e_ip      : 0000
e_cs      : 0000
e_lfarlc  : 0040
e_ovno    : 0000
e_res.0   : 0000
e_res.1   : 0000
e_res.2   : 0000
e_res.3   : 0000
e_oemid   : 0000
e_oeminfo : 0000
e_res2.0  : 0000
e_res2.1  : 0000
e_res2.2  : 0000
e_res2.3  : 0000
e_res2.4  : 0000
e_res2.5  : 0000
e_res2.6  : 0000
e_res2.7  : 0000
e_res2.8  : 0000
e_res2.9  : 0000
e_lfanew  : 000000F8

It is possible to specify various options when loading a structure:

CFFSO_EndiannessDefault
CFFSO_EndiannessLittle
CFFSO_EndiannessBig

CFFSO_PointerDefault
CFFSO_Pointer16
CFFSO_Pointer32
CFFSO_Pointer64

CFFSO_PackNone
CFFSO_Pack1
CFFSO_Pack2
CFFSO_Pack4
CFFSO_Pack8
CFFSO_Pack16

CFFSO_NoCompiler
CFFSO_VC
CFFSO_GCC
CFFSO_Clang

These are the same options that are available in the UI when adding a structure to a layout.

When options are not specified, they default to the default structure options of the object. It’s possible to specify the default structure options using the Pro.Core.CFFObject.SetDefaultStructOptions() method. The implications of the various flags are explained later.

Pro.Core.CFFHeader represents an abstract database in the sense that it is not tied to a specific format internally. The standard format used by headers is SQLite and which is always used when creating layouts associated to structures. However, when using structures from Python it can be handy to avoid an associated header file. When the number of structures is limited, structures can be stored into an XML string. In fact, the internal format of structures is XML.

For example:

<r id='_IMAGE_DOS_HEADER' type='struct'>
  <f id='e_magic' type='unsigned short'/>
  <f id='e_cblp' type='unsigned short'/>
  <f id='e_cp' type='unsigned short'/>
  <f id='e_crlc' type='unsigned short'/>
  <f id='e_cparhdr' type='unsigned short'/>
  <f id='e_minalloc' type='unsigned short'/>
  <f id='e_maxalloc' type='unsigned short'/>
  <f id='e_ss' type='unsigned short'/>
  <f id='e_sp' type='unsigned short'/>
  <f id='e_csum' type='unsigned short'/>
  <f id='e_ip' type='unsigned short'/>
  <f id='e_cs' type='unsigned short'/>
  <f id='e_lfarlc' type='unsigned short'/>
  <f id='e_ovno' type='unsigned short'/>
  <f id='e_res' type='unsigned short [4]'/>
  <f id='e_oemid' type='unsigned short'/>
  <f id='e_oeminfo' type='unsigned short'/>
  <f id='e_res2' type='unsigned short [10]'/>
  <f id='e_lfanew' type='long'/>
</r>

The format of a structure stored in a header can be inspected using the Header Manager tool in the Explore tab by double clicking on it.

It is also possible to avoid creating a SQLite header altogether and output the schema of parsed structures directly when importing them from C++. To achieve this, ensure ‘Test mode’ is selected and ‘schemas’ is chosen as ‘Output’.

If a simple structure such as the following is imported:

struct A
{
    int a;
};

The output will be:

<r id='A' type='struct'>
  <f id='a' type='int'/>
</r>

To use the structure from Python it is possible to write the following code:

schema = """
<header>

<r id='A' type='struct'>
  <f id='a' type='int'/>
</r>

</header>
"""

hdr = CFFHeader()
if hdr.LoadFromXml(schema):
    s = obj.MakeStruct(hdr, "A", 0)
    output(s)

Pointers

As a general guideline, if a structure includes a pointer (or a vtable pointer), specifying the desired size is advisable. If the size is not provided in either the explicit options or the default structure options, it defaults to the standard pointer size of the object.

Endianness

When endianness is not specified, it defaults to the object’s one.

Arrays

The first point to note is the distinction between an array of top-level structures and an array of fields. Creating a top-level array of structures is straightforward:

s = obj.MakeStructArray(hdr, "A", 0, 10)

Array support is somewhat limited. Multidimensional arrays are only partially supported, meaning they will be converted into a single dimension. For example:

struct A
{
    int a[10][10];
};

Or as XML:

<r id='A' type='struct'>
  <f id='a' type='int [10][10]'/>
</r>

Is converted to:

a.0 : 00905A4D
a.1 : 00000003
a.2 : 00000004
a.3 : 0000FFFF
a.4 : 000000B8
a.5 : 00000000
a.6 : 00000040
a.7 : 00000000
a.8 : 00000000
a.9 : 00000000
a.10: 00000000
a.11: 00000000
a.12: 00000000

; etc.

To access an array element in a Pro.Core.CFFStruct the syntax to use is not ‘a[15]’ but ‘a.15’. For example:

print(s.Str("a.15"))

Sub-Structures

The key point to note about sub-structures is that complex sub-types are always dumped separately. For example:

struct A
{
    int a;
    struct SUB
    {
        int sub;
    } b;
};

As XML:

<r id='A::SUB' type='struct'>
  <f id='sub' type='int'/>
</r>

<r id='A' type='struct'>
  <f id='a' type='int'/>
  <f id='b' type='struct A::SUB'/>
</r>

The Python code:

schema = """
<header>

<r id='A::SUB' type='struct'>
  <f id='sub' type='int'/>
</r>

<r id='A' type='struct'>
  <f id='a' type='int'/>
  <f id='b' type='struct A::SUB'/>
</r>

</header>
"""
hdr = CFFHeader()
if hdr.LoadFromXml(schema):
    s = obj.MakeStruct(hdr, "A", 0)
    output(s)

The output:

a    : 00905A4D
b.sub: 00000003

Since it’s a separate type, it is also possible to use ‘A::Sub’ without its parent.

Note

For clarity, the complete Python code was provided once again. In subsequent examples, it won’t be repeated as only the header string changes, not the Python code.

Unions

Unions, like sub-structures, are fully supported. It’s important to note that when dealing with a top-level union, not contained within another structure, as shown below:

union A
{
    int a;
    short b;
}

Then, to access its members, it is necessary to add a ‘u.’ prefix. This is because Pro.Core.CFFStruct only supports unions as members, so the union shown above will result in a Pro.Core.CFFStruct with a union member named ‘u’.

u.a: 00905A4D
u.b: 5A4D

Anonymous Types

Anonymous types are only partially supported in that they are assigned a name upon import. Consider a type like the following:

struct A
{
    union
    {
        int a;
        int b;
    } u;
};

Produces the following XML:

<r id='A::_Union_0' type='union'>
  <f id='a' type='int'/>
  <f id='b' type='int'/>
</r>

<r id='A' type='struct'>
  <f id='u' type='union A::_Union_0'/>
</r>

A ‘Type’ + number naming convention is used to rename anonymous types. The underscore (‘_’) at the beginning of the name signifies the default prefix for anonymous types. If a typedef is associated with an anonymous type, the new name for that type will be created by combining the anonymous prefix with the typedef name.

Bit-Fields

Bit-fields are fully supported.

struct A
{
    int a : 1;
    int b : 4;
};

XML:

<r id='A' type='struct'>
  <f id='a' type='int' bits='1'/>
  <f id='b' type='int' bits='4'/>
</r>

Output:

a: 01
b: 06
 : 0482D2

The unnamed field at the end accounts for the unused bits, given the field size. In this case, it is an ‘int’ type, and only 5 bits of it were utilized.

Hint

There are significant differences in how compilers handle bit-fields. Visual C++ behaves differently than GCC/Clang. Some of the differences are summarized in this message by Richard W.M. Jones.

An important difference is how bit fields are coalesced when the type changes. For example:

struct A
{
    int a : 1;
    short b : 1;
    int c : 1;
};

Without delving into the specifics of they are coalesced, it’s important to note that all these cases are handled. However, specifying the compiler is necessary to achieve the correct result.

Namespaces

Namespaces are fully supported.

namespace N
{

struct A
{
    int a;
};

}

Results in:

<r id='N::A' type='struct'>
  <f id='a' type='int'/>
</r>

Furthermore, just as in C++, it is possible to use namespaces to encapsulate ‘#include’ directives.

namespace N
{

#include <Something>

}

This will prefix all types declared in ‘Something’ with the namespace ‘N::’. This approach is useful for including types with the same name in the same header file.

Inheritance

Inheritance is fully supported.

struct A
{
    int a;
};

struct B : public A
{
    int b;
};

XML:

<r id='A' type='struct'>
  <f id='a' type='int'/>
</r>

<r id='B' type='struct'>
  <b>
    <b type='struct A' access='public'/>
  </b>
  <f id='b' type='int'/>
</r>

Output:

a: 00905A4D
b: 00000003

Similarly, with multiple inheritance:

<r id='A' type='struct'>
  <f id='a' type='int'/>
</r>

<r id='B' type='struct'>
  <f id='b' type='int'/>
</r>

<r id='C' type='struct'>
  <b>
    <b type='struct A' access='public'/>
    <b type='struct B' access='public'/>
  </b>
  <f id='c' type='int'/>
</r>

Output:

a: 00905A4D
b: 00000003
c: 00000004

VTables

The presence of virtual table pointers in structures that require them is fully supported. For example:

struct A
{
    virtual void v() { }
    int a;
};

XML:

<r id='A' type='struct'>
  <mv id='v' type='void (void)'/>
  <f id='a' type='int'/>
  <m id='operator=' type='struct A &amp;(const struct A &amp;)'/>
  <m id='operator=' type='struct A &amp;(struct A &amp;&amp;)'/>
  <m id='~A' type='void (void)'/>
</r>

Output:

__vtable_ptr_0: 00905A4D
a             : 00000003

Here an example with multiple inheritance:

struct A
{
    virtual void va() { }
    int a;
};

struct B
{
    virtual void vb() { }
    int b;
};

struct C : public A, public B
{
    int c;
};

Output:

__vtable_ptr_0: 00905A4D
__vtable_ptr_1: 00000003
a             : 00000004
b             : 0000FFFF
c             : 000000B8

Important

When dealing with virtual tables, specifying the compiler is crucial, as there can be significant differences between VC++ and GCC/Clang.

Virtual Inheritance

Virtual inheritance is fully supported. It is a C++ feature designed for scenarios involving multiple inheritance with a common base class.

Here a complex case:

struct A
{
    int a;
    virtual void va() {}
};

struct B : public virtual A
{
    virtual void vb() {}
};

struct B2
{
    virtual void vb2() {}
};

struct C : public virtual A, public B
{
    int b;
    virtual void vc() {}
};

struct TOP
{
    int top;
    C c;
    virtual void vtop() {}
};

Output (Visual C++):

__vtable_ptr_0  : 00905A4D
top             : 00000003
c.__vtable_ptr_0: 00000004
c.__vtable_ptr_1: 0000FFFF
c.__vtable_ptr_2: 000000B8
c.b             : 00000000
c.a             : 00000040

Output (GCC):

__vtable_ptr_0  : 00905A4D
top             : 00000003
c.__vtable_ptr_0: 00000004
c.b             : 0000FFFF
c.a             : 000000B8

As it’s possible to see, the layout varies between Visual C++ and GCC. Additionally, it’s important to note that members of virtual base classes are appended at the end.

Hint

For more information, Igor Skochinsky’s excellent presentation on C++ decompilation is available here .

Field Alignment

Field alignment plays a crucial role. Structures not subject to packing constraints align according to their largest native member. The situation is more complex, as sub-structures influence the alignment of parent structures but not the other way around. Although there are some internal complexities, Cerbero’s structures are designed to handle all such cases correctly.

Packing

When a packing constraint is applied, fields are aligned to the lesser of the field size or the packing size. A packing constraint of 1 is crucial for reading raw data without any padding between fields. For example, PE structures in ‘WinNT.h’ are all pragma packed to 1, so the same packing must be specified when using these structures.

Templates

For example:

template <typename T>
struct A
{
    T a;
};

template <typename T>
struct B
{
    T b;
};

XML:

<r id='A' type='struct' tparams='T'>
  <f id='a' type='T'/>
</r>

<r id='B' type='struct' tparams='T'>
  <f id='b' type='T'/>
</r>

It is possible to specify template parameters following the C++ syntax:

s = obj.MakeStruct(hdr, "B<A<int>>", 0)

Output:

b.a: 00905A4D

Hence, even nested templates are supported.