Perf misreporting of user-space function arguments

In a previous post I noticed that perf can misreport arguments to user-space functions. I just hit this issue again, and dug into it a bit. Unsurprisingly, it is a bug. It (probably) only affects unoptimized code, which could result in it being undetected/unfixed for this long. I don't really know to fix it, and if others don't either, then that's another reason. In any case, I sent the report to the mailing list, so we'll see what they say.

The bug:

Let's say I have a trivial C program tst.c:

#include <stdio.h>
#include <stdlib.h>
int f(int x)
{
    return !printf("%d\n", x);
}
int main(int argc, char* argv[])
{
    return f(argc);
}

I ask perf to trap all calls to f() and to give me the value of the x argument:

$ gcc-5 -g -o tst tst.c &&
  perf probe -x tst --add 'f x' &&
  perf record -eprobe_tst:f ./tst 2 3 4 &&
  perf script

  .....

  tst 24626 [003] 98586.485680: probe_tst:f: (4004e6) x=0

Note that the value passed to f() was 4, but perf reported it as 0 instead.

If I look at what uprobes was actually asked to report, I see this:

$ cat /sys/kernel/debug/tracing/uprobe_events

p:probe_tst/f /tmp/tst:0x00000000000004e6 x=-12(%sp):s32

The corresponding disassembly is:

$ objdump -d tst | awk '/<f>:/,/^$/'

00000000004004e6 <f>:
  4004e6:       55                      push   %rbp
  4004e7:       48 89 e5                mov    %rsp,%rbp
  4004ea:       48 83 ec 10             sub    $0x10,%rsp
  4004ee:       89 7d fc                mov    %edi,-0x4(%rbp)
  4004f1:       8b 45 fc                mov    -0x4(%rbp),%eax
  4004f4:       89 c6                   mov    %eax,%esi
  4004f6:       bf b4 05 40 00          mov    $0x4005b4,%edi
  4004fb:       b8 00 00 00 00          mov    $0x0,%eax
  400500:       e8 bb fe ff ff          callq  4003c0 <printf@plt>
  400505:       85 c0                   test   %eax,%eax
  400507:       0f 94 c0                sete   %al
  40050a:       0f b6 c0                movzbl %al,%eax
  40050d:       c9                      leaveq
  40050e:       c3                      retq

So uprobes was looking at the argument as a local stack variable. However the trap was placed at the start of the function, where the stack variable wasn't yet available (the argument is still in %di).

This doesn't happen with optimized code, because (at least in this simple example) the variable is simply kept in %di, the DWARF data indicates this, and perf picks that up.

Possible ways to address this:

place the trace after the local variables are set (4004f1 in this example)
Look at %di instead of the stack variable

I didn't see anything obvious in the DWARF data to tell us how exactly to do either of these (but I can imagine I missed something). There's also a concern that both these are vulnerable to the optimizer coming in, and making them not work. I await the wisdom of the mailing list.