10

Reading DWARF prototypes in ltrace (part 2)

As mentioned earlier, I'm adding functionality to ltrace to read function prototypes from DWARF debugging information. The bulk of this work was merged upstream. I'm now hunting corner cases and various details in this whole system before moving on to implement more features. Unsurprisingly, trying to trace calls in libc is a rich source of corner cases. Some of these are discussed here in no particular order.

Missing features

Ltrace currently chokes (crashes!) when encountering prototypes with particular features. Some of these are

  • Complex numbers
  • void variables
  • union fields
  • bit fields

Most of the time these aren't used, but glibc has them somewhere, and ltrace can get confused when the new DWARF-reading code parses glibc.

C++ symbol names

Some DWARF symbol DIEs have a DW_AT_linkage_name tag in addition to the normal DW_AT_name tag. The purpose of this wasn't entirely obvious until I tried to ltrace a C++ program. Suppose I have this trivial C++ program:

tst.cc

class C
{
    void f(void);
};

void C::f(void)
{
}

I compile it, and dump the debug info:

$ g++ -g -o tst.o -c tst.cc && readelf -w tst.o
....
 <2><37>: Abbrev Number: 3 (DW_TAG_subprogram)
    <38>   DW_AT_external    : 1
    <38>   DW_AT_name        : f
    <3a>   DW_AT_decl_file   : 1
    <3b>   DW_AT_decl_line   : 3
    <3c>   DW_AT_linkage_name: (indirect string, offset: 0x4e): _ZN1C1fEv
    <40>   DW_AT_declaration : 1
    <40>   DW_AT_object_pointer: <0x44>
....

Note that for my method f the DW_AT_name is f, but the DW_AT_linkage_name is _ZN1C1fEv. The linker does not know C++, and it only seems symbol names. Here this symbol name is the mangled _ZN1C1fEv, so as far as ltrace is concerned, this is the name of this function and thus it should use DW_AT_linkage_name here. One could think that the parsing rule in ltrace should be "use DW_AT_linkage_name if it exists, otherwise use DW_AT_linkage_name". One would be wrong, since the next section shows that this logic is too simple.

Aliased symbols (different symbol, same address)

Trying to ltrace this simple program doesn't work when reading the DWARF prototypes automatically:

tst.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
int main(void)
{
    nanosleep( &(struct timespec){.tv_sec=0,.tv_nsec=33}, NULL);
    usleep(44);
    return 0;
}

I get this:

$ gcc -o tst tst.c
$ ltrace -l 'libc.so*' -L ./tst
tst->__libc_start_main(0x40054d, 1, 0x7fffc04253f8, 0x400590 <unfinished ...>
tst->nanosleep(0x7fffc0425300, 0, 0x7fffc0425408, 0)                 = 0
tst->usleep(44)                                                      = <void>
+++ exited (status 0) +++

Note that the nanosleep() call does not have the correct prototype. This is because we call nanosleep(), but the DWARF defines __nanosleep and __GI___nanosleep:

$ nm -D tst | grep nanosleep

                 U nanosleep


$ nm -D /lib/x86_64-linux-gnu/libc-2.18.so | grep nanosleep

00000000000f26f0 T __clock_nanosleep
00000000000b7070 W __nanosleep
00000000000f26f0 W clock_nanosleep
00000000000b7070 W nanosleep


$ readelf -w /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.18.so | grep nanosleep

    <20c7cf>   DW_AT_name        : (indirect string, offset: 0x13a95): __nanosleep
    <20c7d5>   DW_AT_linkage_name: (indirect string, offset: 0x13a90): __GI___nanosleep
    <280d67>   DW_AT_name        : (indirect string, offset: 0x13a95): __nanosleep
    <280d6d>   DW_AT_linkage_name: (indirect string, offset: 0x13a90): __GI___nanosleep
    <2dc871>   DW_AT_name        : (indirect string, offset: 0x1d940): __clock_nanosleep
    <3b0b59>   DW_AT_name        : (indirect string, offset: 0x13a95): __nanosleep
    <3b0b5f>   DW_AT_linkage_name: (indirect string, offset: 0x13a90): __GI___nanosleep

We can resolve this discrepancy by noting that the nanosleep symbol in the libc symbol table has the same address as __nanosleep, and use __nanosleep's DWARF prototype. I implemented this, and the patch is currently in review.

Aliased addresses (same symbol, different address)

Testing further, I discovered that in the libc on my machine (Debian/sid amd64) some symbols appear at multiple addresses:

$ nm -D /lib/x86_64-linux-gnu/libc-2.18.so | awk '{print $NF}' | sort | uniq -d
_sys_errlist
_sys_nerr
_sys_siglist
memcpy
nftw
nftw64
posix_spawn
posix_spawnp
pthread_cond_broadcast
pthread_cond_destroy
pthread_cond_init
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
realpath
regexec
sched_getaffinity
sched_setaffinity
sys_errlist
sys_nerr
sys_sigabbrev
sys_siglist

This can make the DWARF parser confused. Looking into it, it looks like those are versioned symbols, with different implementation for different libc versions. This same-symbol-different-address idea doesn't fit into the data structures, as I've currently defined them. Currently I simply take the first such symbol I encounter and ignore the rest. I probalby should parse this out fully, but it hardly seems worth the effort.