000914.1 A Brender Representation Discontinuous Ranges

This proposal replaces 000531.1, which was discussed by the Committee on
27 June 2000. In reviewing that proposal, the group requested an alternative
approach that

    - Did not require use of LEB representation
    - Would work for subprograms whose code might be split among
      multiple program sections

I originally intended to offer two proposals, with the intent that both be
adopted. The first was to be kept as simple as possible yet still be
sufficient to cover all cases of discontiguous scopes. The second
was to be more complicated (more like the earlier 000531.1), but designed to
be as compact as possible. The reason for two was to allow a compiler
vendor to choose whichever better suited its design goals and constaints.

Because the committee is trying to wrap up its work on this revision,
and because of my own time constraints, I am presenting just the simpler
of the two at this time.


META CONTEXT/ASIDE:

    I think the following four presentational capabilities provide a core
    set that allows sophisticated compilers and debuggers to support a
    significant and highly useful level of debugging for optimized code:

      - Represent split lifetimes for variables
      - Represent subprogram inlining
      - Represent discontiguous scopes
      - Represent semantic breakpoint locations

    The first two are already included in DWARF V2. The third is the
    subject of this proposal. The fourth will (hopefully) be the subject
    of a future (real soon now) proposal.

    If we can complete this set, then I think we can claim a significant
    improvement in the ability of DWARF to allow support for debugging
    optimized code.

END ASIDE


GENERAL DISCUSSION
------------------

In the presence of optimization (and sometimes even for non-optimized code),
it is desirable to be able to describe a scope that consists of a set of
address ranges rather than just a single range.

Let us consider possible representation approaches from the inside out.

1) Address range description as such

    For consistency and simplicity, I looked for precedents that could serve
    as models. There appear to be (just) two precedents for representing a
    sequence of address ranges:

      - a location list, found in the .debug_loc section
      - the tuples for the addresses of a compilation unit, found in the
        .debug_aranges section

    A location list entry is

      - a starting address offset (relative to start of compilation unit)
      - an ending address offset (relative to start of compilation unit)
      - a block containing a location expression

    Since the two addresses are offsets, they really are just constants
    and require no associated relocation. (In effect, relocation needs to be
    performed by the debugger when it uses this information.) Both addresses
    occupy the size of an address on the target system.

    A tuple for the addresses of a compilation unit is

      - an address
      - a length

    Since the address is really an address, there must be an associated
    relocation. (No relocation is required by a debugger prior use of the
    information.) The address and length both occupy the size of an address
    on the target system.

    Both kinds of lists are terminated by a pair of zero values.

    Since no relocation is involved, the location list model is more
    attractive. However, a location list also includes a location expression
    which is not needed for scopes.

    So, let us define a "scope list entry" as

      - a starting address offset (relative to start of compilation unit)
      - an ending address offset (relative to start of compilation unit)

    and a "scope list" is

        a sequence of scope list entries terminated by a pair of
        (address-sized) zero values.

    Also observe that there is no need for a scope list, just like a location
    list, to "point back" to the entity that references it. Conversely,
    tuples for the addresses of a compilation unit do point to the .debug_info
    section and there is in fact no pointer from .debug_info to .debug_aranges.
    This reinforces the affinity between scope lists and locations lists.

2) Location of scope lists

    Since scope lists are conceptually much like location lists and since
    there is no precedent for including similar kinds of lists immediately
    within the .debug_info section, it seems reasonable to store such
    lists in a separate section.

    There are two choices:

      - define a new section specifically for scope lists, named, oh say,
        .debug_scope or the like

      - use the existing .debug_loc section

3) Some way to reference a scope list from a scoping DIE that represents
    a scope (DW_TAG_block, DW_TAG_subroutine, etc)

    Depending on where the information is located, there are these choices:

      - for location lists stored in a new section, define a new attribute,
        named say DW_AT_ranges, whose single operand is new class rangeptr,
        which can use either DW_FORM_data4 or DW_FORM_data8 as appropriate.
        (This is strictly analogous to DW_AT_location and class locptr.)

      - for location lists stored in the .debug_loc section, define a new
        attribute named say DW_AT_ranges, whose single operand is the
        (existing) class locptr (which can use either DW_FORM_data4 or
        DW_FORM_data8 as appropriate).

     [- A less attractive variation would be to re-use the existing attribute
        DW_AT_location on the scope DIEs. This seems unnecessarily obtuse
        and perhaps confusing.]

    Of the possibilities, reusing the existing .debug_loc section seems
    attractive, in which case the new attribute DW_AT_ranges with a locptr
    operand completes the needs.

    There is one and only one downside to combining both location lists and
    scope lists in a single section: since neither is self-describing, it
    becomes impossible to make a simple linear scan of the section to parse
    and interpret location/scope list data. There is no reason for a debugger
    to do this as far as I can imagine. But it might be convenient for a
    debugger or compiler implementor that is trying to debug DWARF2 related
    tools. If it really were important to retain parsability, a new section
    should be used or a scope list could be made to look like a location list
    by including two bytes of zeros in every entry; I don't think either is
    warranted but I solicit other input.

4) More regarding scope lists

    Without loss of generality, we can restrict the set of address ranges
    to be

      - a sequence of address ranges (scope list entries), such that
      - the address ranges occur sorted in increasing beginning address order,
      - all adjacent pairs of ranges have a gap between them (that is, they
        are not only disjoint but also cannot be combined into a single range
        without also including an address that should not be included).

    This provides a canonical representation for the discontiguous range of
    addresses.

    Requiring a canonical representation creates some additional work
    for producers but may have advantages for consumers. However, neither
    location lists nor the tuples in the address range table are required
    to be sorted, so no such requirement is proposed here.

    It does seem worthwhile to require a modicum of minimality/well-formednes
    in the following sense:

      - all pairs of ranges are disjoint (there are no overlaps)

    Bringing the pieces all together, we get the following proposal.


PROPOSAL (with one open choice)
-------------------------------

Add the ability to describe discontiguous scopes as follows:

1) Add new attribute DW_AT_ranges, which takes a single argument of
    class locptr (DW_FORM_data4 or DW_FORM_data8 as appropriate).

2) This attribute can be used with the following DIEs (all of which describe
    scopes of one form or another [essentially any DIE that allows DW_AT_low_pc
    and DW_AT_high_pc]):

      - DW_TAG_catch_block
      - DW_TAG_compile_unit
      - DW_TAG_inlined_subroutine
      - DW_TAG_lexical_block
      - DW_TAG_module
      - DW_TAG_subprogram
      - DW_TAG_try_block
      - DW_TAB_with_stmt

3) DW_AT_ranges and DW_AT_low_pc/DW_AT_high_pc cannot both be used on the
    same DIE.

4) If DW_AT_ranges is used and DW_AT_entry_pc is absent, then the entry
    point for the scope defaults to be:

WE NEED TO CHOOSE ONE:

    a) The lowest PC of the scopelist
    b) The low PC of the first range of the scopelist

  Note that if a) is chosen, then there is an advantage to requiring that
  the ranges of a scope are sorted by address. b) has the advantage that
  an entrypoint other the lowest PC can sometimes be specified without
  using DW_AT_entry_pc merely by putting the appropriate range first in
  the scope list. (In the likely case that the entry is at the lowest PC,
  both choices can be used to avoid needing a DW_AT_entry_pc attribute.)

END CHOICE:

5) The argument of DW_AT_ranges is an offset in the .debug_loc section (by
    virtue of being of class locptr) that begins a scope list.

6) A scope list consists of a sequence of scope list entries, where each
    entry consists of a beginning address offset and an ending address offset
    (the first address past the last address of the that range). The list
    is terminated by a pair of zero address offsets. [This encoding is
    identical to location lists except that there is no location description
    in an scope list entry.]

7) Location lists and scope lists may be freely intermixed in the .debug_loc
    section.


EDITORIAL CHANGES
-----------------

The following summarizes where changes will be made and indicates the kind
of change:

  - Section 1.5: list "discontigous scopes"
  - Section 2.2, Figure 2: Add DW_AT_ranges attribute
  - Section 2.16: Add discussion of DW_AT_ranges
  - Section 3.1: Add discussion of DW_AT_ranges (new item 12) [or see below]
  - Section 3.3.3: Add discussion of DW_AT_ranges
  - Section 3.3.8.1: Add DW_AT_ranges in list of non-occuring attributes
  - Section 3.3.8.2: Add discussion of DW_AT_ranges
  - Section 3.4: Add discussion of DW_AT_ranges
  - Section 3.6: Add discussion of DW_AT_ranges
  - Section 3.7: Add discussion of DW_AT_ranges
  - Section 7.5.4, Figure 18: Add DW_AT_ranges
  - Appendix 1: Add DW_AT_ranges to appropriate DIEs
  - Appendix 7, (f) and figure: mention scope list


EDITORIAL SUGGESTION
--------------------

The general and complete description of DW_AT_low_pc and DW_AT_high_pc
attributes now occurs in Section 2.16. Most DIEs that have an address range
include the following sort of description, which is replicated many places
(with the obvious substitution for each kind of entry):

    "The <xyz> entry has a DW_AT_low_pc attribute whose value is the relocated
    address of the first machine address generated for the <xyz>. It also has
    a DW_AT_high_pc attribute whose value is the relocated address of the first
    location past the last instruction generated for the <xyz>."

One editorial approach would be to add a third sentence following each
occurrence of the above, something like the following:

    "Alternatively, the <xyy> may instead have a DW_AT_ranges attribute whose
    value describes the several ranges of instructions generated for the <xyz>.

What I suggest instead is to delete the existing sentences and replace them
with something like:

    "The <xyz> entry has either DW_AT_low_pc and DW_AT_high_pc attributes or
    alternatively a DW_AT_ranges attribue, whose value(s) describe the one or
    more ranges of instructions generated for the <xyz> (see Section 2.16
    for the description of these attributes)."

In at least one case (notably Section 3.1), this means combining two
bullets into one (and avoiding addition of a third).


The proposal was accepted with a few modifications: the range entries are
unsorted and contained in a new section: .debug_ranges. Any object with a
discontinuous range must specify DW_AT_entry_pc.