The Linux kernel comes with a performance counter/profiler: perf
. This was
originally a kernel-only tool, but it can now do some userspace stuff as well.
perf
is very powerful, but annoyingly under-documented. Plenty was written on
the topic, so here I'll mention only some hangups and strange behaviors I've
observed trying to use it. If I figure out enough of this, this could be used to
improve the docs.
For the below I'm using perf
version 3.16.0 from Debian package
linux-tools-3.16
version 3.16-2 on an amd64 box running a Debian vanilla
kernel 3.16-3-amd64 package version 3.16.5-1.
Basic userspace introspection
Let's say I have a tiny test program: tst.c
:
#include <stdio.h> #include <stdlib.h> int func(int xxx) { int zzz = xxx; printf("zzz: %d\n", zzz); return zzz+5; } int main(int argc, char* argv[]) { int i=0; for( i=0; i<10; i++) printf("yyy: %d\n", func(argc + i)); return 0; }
One can run it, and get the obvious results:
dima@shorty:/tmp$ gcc -g -o tst tst.c && ./tst zzz: 1 yyy: 6 zzz: 2 yyy: 7 zzz: 3 yyy: 8 zzz: 4 yyy: 9 zzz: 5 yyy: 10 zzz: 6 yyy: 11 zzz: 7 yyy: 12 zzz: 8 yyy: 13 zzz: 9 yyy: 14 zzz: 10 yyy: 15
Building with -g
is significant since it allows perf
to read the DWARF
information to know about symbols and variables.
The main command to to manipulate arbitrary probes, such as userspace ones, is
perf probe
. The main command to get information from each probe hit
individually (instead of as an aggregate) is perf script
. Let's do some basic
probing: let's see all the function returns from func
:
dima@shorty:/tmp$ sudo perf probe -x tst --add 'out=func%return $retval' Added new event: probe_tst:out (on func%return in /tmp/tst with $retval) You can now use it in all perf tools, such as: perf record -e probe_tst:out -aR sleep 1 dima@shorty:/tmp$ sudo perf record -g -e probe_tst:out -aR ./tst zzz: 1 yyy: 6 zzz: 2 yyy: 7 zzz: 3 yyy: 8 zzz: 4 yyy: 9 zzz: 5 yyy: 10 zzz: 6 yyy: 11 zzz: 7 yyy: 12 zzz: 8 yyy: 13 zzz: 9 yyy: 14 zzz: 10 yyy: 15 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.712 MB perf.data (~31090 samples) ] dima@shorty:/tmp$ sudo perf report --stdio Failed to open /tmp/perf-20159.map, continuing without symbols # Samples: 10 of event 'probe_tst:out' # Event count (approx.): 10 # # Children Self Command Shared Object Symbol # ........ ........ ....... ............. ..................... # 100.00% 0.00% tst libc-2.19.so [.] __libc_start_main | --- __libc_start_main 100.00% 100.00% tst tst [.] main | --- main __libc_start_main dima@shorty:/tmp$ sudo perf script Failed to open /tmp/perf-20159.map, continuing without symbols tst 20159 [001] 629384.650622: probe_tst:out: (400506 <- 400561) arg1=0x6 400561 main (/tmp/tst) 7fda0633eb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20159 [001] 629384.650653: probe_tst:out: (400506 <- 400561) arg1=0x7 400561 main (/tmp/tst) 7fda0633eb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20159 [001] 629384.650675: probe_tst:out: (400506 <- 400561) arg1=0x8 400561 main (/tmp/tst) 7fda0633eb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) ....
So things basically work. We made a probe called out
to fire when func
returns (func%return
). We asked to record the actual return value at that time
($retval
). The we looked at the aggregate call graph with perf report
.
Finally, we used perf script
to look at detailed output for each time we hit
this probe. In particular, we get the return value, given to us as arg1
.
Pitfall: incorrect reading of function argument
Update
I hit this issue again, and looked into it. A newer post describes the findings.
Let put in a probe at the start of func
, and print out the value of the
argument:
dima@shorty:/tmp$ sudo perf probe -x tst --add 'in=func xxx' Added new event: probe_tst:in (on func in /tmp/tst with xxx) You can now use it in all perf tools, such as: perf record -e probe_tst:in -aR sleep 1 dima@shorty:/tmp$ sudo perf record -g -e probe_tst:in -aR ./tst zzz: 1 yyy: 6 zzz: 2 yyy: 7 .... dima@shorty:/tmp$ sudo perf script Failed to open /tmp/perf-20159.map, continuing without symbols tst 20159 [001] 629384.650566: probe_tst:in: (400506) xxx=0 400506 func (/tmp/tst) 7fda0633eb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20159 [001] 629384.650639: probe_tst:in: (400506) xxx=1 400506 func (/tmp/tst) 7fda0633eb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20159 [001] 629384.650661: probe_tst:in: (400506) xxx=2 400506 func (/tmp/tst) 7fda0633eb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) ....
This is wrong! It says the passed-in value is 0 then 1 then 2. Actually it's 1 then 2 then 3. Let's put in two more probes to examine this:
- Another probe at
func
, but looking at thedi
register. The calling convention dictates that this is where the argument would go - A probe in the middle of
func
, afterzzz
was set. At that point, I'd like to look at bothxxx
andzzz
We ask perf
about what it thinks line numbers are in function func
(line
numbers here are relative to the start of the function not the start of the
file):
dima@shorty:/tmp$ sudo perf probe -x tst -L 'func' <func@/tmp/tst.c:0> 0 int func(int xxx) 1 { 2 int zzz = xxx; 4 printf("zzz: %d\n", zzz); 5 return zzz+5; 6 } int main(int argc, char* argv[]) { int i=0;
So I place my probe on line 4:
dima@shorty:/tmp$ sudo perf probe -x tst --add 'inreg=func %di' Added new event: probe_tst:inreg (on func in /tmp/tst with %di) You can now use it in all perf tools, such as: perf record -e probe_tst:inreg -aR sleep 1 dima@shorty:/tmp$ sudo perf probe -x tst --add 'mid=func:4 xxx zzz' Added new event: probe_tst:mid (on func:4 in /tmp/tst with xxx zzz) You can now use it in all perf tools, such as: perf record -e probe_tst:mid -aR sleep 1 dima@shorty:/tmp$ sudo perf record -g -e probe_tst:inreg,probe_tst:mid -aR ./tst zzz: 1 yyy: 6 zzz: 2 yyy: 7 .... dima@shorty:/tmp$ sudo perf script dima@shorty:/tmp$ sudo perf script Failed to open /tmp/perf-20335.map, continuing without symbols tst 20335 [000] 629951.439663: probe_tst:inreg: (400506) arg1=0x1 400506 func (/tmp/tst) 7f61a293fb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20335 [000] 629951.439681: probe_tst:mid: (400517) xxx=1 zzz=1 400517 func (/tmp/tst) 7fffffffe000 [unknown] (/tmp/perf-20335.map) 7f61a293fb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20335 [000] 629951.439720: probe_tst:inreg: (400506) arg1=0x2 400506 func (/tmp/tst) 7f61a293fb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20335 [000] 629951.439725: probe_tst:mid: (400517) xxx=2 zzz=2 400517 func (/tmp/tst) 7fffffffe000 [unknown] (/tmp/perf-20335.map) 7f61a293fb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20335 [000] 629951.439733: probe_tst:inreg: (400506) arg1=0x3 400506 func (/tmp/tst) 7f61a293fb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) tst 20335 [000] 629951.439738: probe_tst:mid: (400517) xxx=3 zzz=3 400517 func (/tmp/tst) 7fffffffe000 [unknown] (/tmp/perf-20335.map) 7f61a293fb45 __libc_start_main (/lib/x86_64-linux-gnu/libc-2.19.so) ....
Aha. So looking at the register di
instead of the variable xxx
makes it
work. Or I can look at the variable a little later; that works too. This is an
easy pitfall to get caught in. I'll try to investigate more and file a bug.
Pitfall: rebuilding
When using perf
to introspect one's own code, you hit another issue early.
Suppose we just did all the probing above. Now we rebuild the program under test
(with or without modifications), and probe again:
dima@shorty:/tmp$ gcc -g -o tst tst.c dima@shorty:/tmp$ sudo perf record -g -e probe_tst:inreg,probe_tst:mid -aR ./tst zzz: 1 yyy: 6 zzz: 2 yyy: 7 dima@shorty:/tmp$ sudo perf report --stdio Error: The perf.data file has no samples!
Whoa! This just worked. What happened? It turns out perf
records the build-id
that a particular probe corresponds to. So when you rebuild, the build-id
changes, and the probe no longer triggers. I haven't yet seriously looked into
disabling and/or fixing this. The easiest workaround is to simply remove all the
probes with
sudo perf probe -x tst --del "*"
Then when you re-add the probes, they will use the new build-id. This really needs better handling upstream as well. I'll patch when I get the time.