001016.1 A R. Brender Representation Interludes (aka trampolines)

Motivation

C++ implementations sometimes use small compiler generated functions,
here called "interludes", that serve as surrogate method functions when a
class is inherited into another class. The sole purpose of the interlude
function is to adjust the value of the implicit 'this' pointer parameter and
then pass control to the real method function in the inherited class.
(Interludes are sometimes called "trampolines" or "thunks".)

When a debugger is asked to set a breakpoint on a method function that happens
to be implemented by an interlude, the breakpoint name may resolve to the
entry point of that derived function (interlude). Similarly, if the user is
single stepping into a derived method, control may well step into the interlude.
In either case, since the interlude is an artifact of the inheritance rather
than a distinct user visible member function in the inheriting class, it is
desirable practice for the debugger to set the breakpoint for stopping in, or
step into, the ultimate real method instead of the interlude as such. For
breakpoints, this assures that the breakpoint will trigger regardless of
whether original or derived function is called.

The following provides a means to accomplish this.


PROPOSAL
--------

Add the following to DWARF in a new Section 3.3.9:

    "3.3.9 Interludes

    "<i> An interlude is a compiler generated member function of a class
    whose purpose is to adjust the implicit 'this' pointer and then call a
    corresponding member function from another inherited class.</i>

    "An interlude is represented by a debugging information entry with the
    tag DW_TAG_subprogram or DW_TAG_inlined_subprogram that has a
    DW_AT_interlude attribute. The value of that attribute indicates the
    corresponding member function of the inherited class. (An interlude
    entry may but need not also have a DW_AT_artificial attribute.) The value
    may be either of class reference or class address. If class reference is
    used, it refers to the debugging information entry for the declaration, if
    available, otherwise the definition, of the inherited member function.
    If class address is used, it specifies the value of the entry PC for the
    generated code of the inherited member function. In either case, the
    inherited member function may itself be an interlude. (Such a sequence
    of interlude functions necessarily ends with a non-interlude function.)

    "<i> A reference can always be used if the inherited member function is
    defined as part of the current compilation unit. An address can always
    be used if the inherited member function is defined outside of the
    current compilation unit. (An address can even be used when the inherited
    member function is defined in a compilation unit that does not have
    DWARF debugging information.) </i>"


Note: For the purposes of this proposal, any of the terms "interlude",
"trampoline" and "thunk" may be considered equivalent. In some email exchanges,
it appears that "trampoline" may be preferable to many. I have no problem
with such a change of name.


DISCUSSION
----------

There are two parts to achieving the goals mentioned above:

1) Identify that a method function is an interlude
2) Given an interlude, determine the corresponding member function in
    the inherited class from which it is derived

Identification

There seem several ways to make this identification:

1) The interlude-ness may be reflected in the mangled name

2) The generated code for the method might be examined to determine whether
    it "looks" like an interlude.

    Is it possible for an optimizing compiler to transform an explicit member
    function so that it cannot be distinguished from an interlude based only
    on the generated code? I think the answer is yes. Consider, for example,
    a member function that does nothing more than call the "corresponding"
    member function of a class that it inherits. As a result, depending
    on just examination of the generated code can lead to false positive
    identification.

3) The DW_AT_artificial attribute might be used as a hint that a function
    is an interlude. Since artificial functions might be generated for
    various purposes, this hint needs some kind of confirming action
    such as checking the generated code to see if it "looks" like that
    of an interlude.

    Is it possible for a compiler generated member function that is not
    an interlude to look like one? This seems pretty unlikely but I am
    reluctant to claim it is impossible. If it is possible, then even the
    combination of the DW_AT_artificial attribute and generated code
    examination could lead to a false positive.

4) We might define a more explicit DW_AT_interlude attribute that would
    make this identification simple and unambiguous.

If interludes can be inlined by a compiler, so that the 'this' pointer
adjustment occurring directly as part of the calling function, then no
technique that depends even in part on examination of generated code is
likely to be both reliable and simple enough to be practical. That appears
to leave only 1) and 4) as viable approaches.

    Note: I assume a debugger that does have good support for inlining.
    That is, it is not the mere occurrence of inlining of itself that is
    significant but rather that even with good inlining support examination
    of the generated code is untenable.

"Un-derivation"

Suppose that the appropriate interlude has been identified and confirmed
by some mechanism, and next consider how best to work back to find the
member function from which it derives.

1) If the interlude is identified on the basis of its mangled name, could
    the function from which it derives also be determined from the name?

    This is possible, but probably not attractive.

    - Such names will tend to be long (perhaps double the length
      they would otherwise have?)
    - The extra information is only relevant to debugging. Implementations
      are likely hesitant to modify the mangling rules for debugging
      purposes if there is a viable alternative that is available as
      part of the debugging information itself (here, DWARF).

2) If the original method function is defined in the same compilation unit
    as the derived one, then a debugger can probably start at the interlude
    and, using a combination of knowledge of the name mangling scheme and
    the DWARF representation, work backward to identify the original
    method function. (If the method is overloaded then the algorithm may be
    non-trivial but is still quite doable.)

    If the original method function is defined in some other compilation
    and the DWARF information for the class declaration is less than complete
    (for example, because the implementation is using a space optimization
    technique which attempts to describe a complete class only once) this
    becomes rather harder.

3) If there is a non-inlined (closed form) version of the interlude, then
    it is probably possible to interpret that code to identify the address
    of the target member function that it invokes.

    In simple cases, this is viable. If the interlude can be
    inlined into the caller, this starts to become hard if not impossible.
    And if it is possible for the original member function to be inlined
    into the interlude (does that make it not an interlude? it is surely
    still artificial...), then the mind boggles.

4) We might define a more explicit DW_AT_interlude attribute that would
    make this relationship simple and unambiguous.

Here, perhaps even more than in the earlier step, we see that inlining and
other compiler optimizations either complicate or eliminate approaches that
involve debugger interrogation of the generated code.


The DW_AT_interlude proposal

Since all of the other approaches have problems of one kind of another,
I was lead to offering the proposal given above.

With this formulation, both the identification of a member function as
being an interlude as well as the member function that is inherited are
explicit and simple to determine. There is no need to analyze or derive
any information from the generated code.

The only remaining question should be whether either the DWARF and/or
debugger support for inlining are sufficient to handle the full complexity
that might result. While I cannot speak from experience, I do suggest
that any weakness in this regard can and should be considered an inlining
problem as such rather than a problem with interludes or a reason to not
define/use interludes.

Aside: besides the inherited function, the other key property of an interlude
is the amount of the 'this' pointer adjustment. While this could be included in
the interlude representation, I don't know of any particular purpose to which
this information could be usefully put by a debugger...


The proposal was accepted with the stipulation that the new attribute
be named DW_AT_trampoline.  Additionally, it was stipulated that the
FORM of the DIE can be any of at least the following:  A string, an
actual function name. This function name lookup is implementation-
dependent and could reference an ABI/implementation/object specific
table, such as an Elf symbol table.  An Address of a function that
will be called.  A reference to a DIE, the function definition DIE
of the function that is being called. A flag (when there is no way
to know the function address or name).  When it's just a flag the
debugger must step (or equivalent) to get to the target function.