I consolidated the memory-leak-finding tools from here and here into their own repository: https://github.com/dkogan/memory_leak_instrumentation. It's written primarily for emacs, but the tools can be easily adapted for anything. The documentation is copied here:

Overview

This is a set of tools used to find memory leaks in long-running applications. These tools use perf to instrument all memory allocations/deallocations. These traces can then be analyzed to find allocated memory that was not properly cleaned up.

This was written to find memory leaks in emacs daemon sessions. The resulting bugs and mailing list posts:

Some of the tools are emacs-specific, but some are not. Note that this is all fairly rough, and the user would want to understand how each tool works to be able to use it effectively. This is also not well-documented yet, but the links above, and the text below show examples.

Tools

Non-emacs-specific

reademacsvar.sh

Exports some shell variables that all the scripts use to do their thing. Mostly this is paths, etc.

plotmem.sh

Used to generate a realtime plot of memory usage of a particular process. This lets us see the leaks as they happen.

probes.sh

Creates perf probes that we care about. This is all allocation/deallocations, and anything else we like.

record_alloc.sh

Runs perf record to record all allocations/deallocations.

parse_script.pl

Reads the output of perf script, following all allocations. The output is a list of all unexpected memory operations (calling free() on a pointer that didn't come from malloc() for instance) and a list of all potentially leaky memory (anything that was allocated but not freed). If we started logging some time after the process has started, and finished logging before the process has finished then both of these will have false positives:

  • We could see a free() of memory that was allocated before we started logging
  • We can report a leak for something that was free()-ed after we stopped logging

So take all output with a grain of salt, and use your best judgement

plotleaks.sh

Takes the output of parse_script.pl, and makes a plot of potential leak sizes vs input line number. This is useful to quickly see the leaks. For instance, let's say the emacs session we're tracing leaks 3072 bytes each time a new frame is created, and that we created 10 frames in a row while running perf record. The plot this script produces would then show 10 points at 3072 evenly spaced through time. Leaks at the start of the session are most likely to be true ones (there was time to free() the memory), so I generally follow up anything that leaked lots of memory at the start.

follow_alloc.pl

Reads the output of perf script, and filters out all memory operations that do not refer to a particular allocation size. This is useful to focus on particular allocations identified by plotleaks.sh. So if plotleaks.sh shows lots of leaks of size 3072, we use this to cut down the log to show only the leaks we care about.

Emacs-specific

daemon.sh

Starts up a new emacs daemon.

client.sh

Creates a new client frame.

kill.sh

Kills the emacs daemon.

loopclient.sh

Repeatedly creates/destroys a client frame.

show_stderr.sh

Shows the STDERR output of a process (emacs in this case). This is useful because the emacs daemon redirects its STDERR to /dev/null, but gdb printing commands such as pr and pp write to STDERR, and we want to see this output.

General notes

When running perf record, the perf process can be overloaded and drop events as a result (an error message says this). A larger buffer can help (-m option) at the expense of using more RAM. It also helps to make smaller logs (fewer things to record, fp backtrace generation instead of DWARF; see below).

perf can generate backtraces in two ways:

  • Using the frame-pointer. This is the preferred method, but it only works for functions that have a frame pointer. All gcc optimization levels strip this out, so rebuild with -fno-omit-frame-pointer if possible.
  • Using DWARF debug information. This doesn't require frame pointers, but needs debug info. Another down side is that this generates much bigger perf logs, and perf is more likely to drop events. Currently perf has a bug in that it's not able to read the split debug information in Debian packages, so you need this patch: http://lkml.iu.edu/hypermail/linux/kernel/1509.0/04006.html

If the binary application being instrumented is rebuilt, probes into that application need to be deleted and re-added.

Example

I'm observing that current build of emacs I'm using leaks memory. I can see this because I repeatedly create/destroy client frames in one terminal window:

$ ./daemon.sh
Starting Emacs daemon.

$ ./loopclient.sh
Waiting for Emacs...
Waiting for Emacs...
Waiting for Emacs...
...

And in another terminal window I look at memory consumption using ./plotmem.sh. The plot looks like this:

memory.svg

The memory use is climbing, so we have a leak. I make a memory consumption log while ./loopclient.sh is running:

$ ./record_alloc.sh
^C[ perf record: Captured and wrote 36.197 MB perf.data (165038 samples) ]

$ sudo perf script > script

Now I analyze the log, and plot the leaks

$ ./parse_script.pl < script > leaks

$ ./plotleaks.sh leaks

The leaks look like this:

leaks.svg

The large leak at the end is a false positive: it just hasn't been freed yet. Zooming-in to the smaller leaks at the start, I see this:

leaks_zoomed.svg

So there's a recurring leak of about 3000 bytes. Zooming in more, I see that these are all leaks of exactly 3072 bytes. Let me trace these leaks specifically:

$ ./follow_alloc.pl 3072 < script > script.3072

Looking through this filtered log, I see that the leaks all follow a pattern:

Line: 124809 Refcount: 1. enter emacs-tst 31381 [001] 609235.683003: probe_libc:malloc_ret: (7f7d99f5e020 <- 7f7d9eac1af0) arg1=0x34d4200
                   24af0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                 2400009 [unknown] ([unknown])

Line: 124818 Refcount: 0. exit emacs-tst 31381 [001] 609235.683015: probe_libc:free: (7f7d99f5e660) mem=0x34d4200
                   7c660 free (/lib/x86_64-linux-gnu/libc-2.19.so)
        676942363846585f [unknown] ([unknown])

Line: 124916 Refcount: 1. enter emacs-tst 31381 [001] 609235.683321: probe_libc:malloc_ret: (7f7d99f5e020 <- 7f7d9eac1af0) arg1=0x34d4200
                   24af0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                 2400009 [unknown] ([unknown])

                         ...

Line: 1572738 Refcount: 2. enter emacs-tst 31381 [000] 609237.688972: probe_libc:malloc_ret: (7f7d99f5e020 <- 7f7d9eac1af0) arg1=0xedc980
                   24af0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                 2400009 [unknown] ([unknown])

Line: 1572747 Refcount: 1. exit emacs-tst 31381 [000] 609237.688984: probe_libc:free: (7f7d99f5e660) mem=0xedc980
                   7c660 free (/lib/x86_64-linux-gnu/libc-2.19.so)
        676942363846585f [unknown] ([unknown])

Line: 1572845 Refcount: 2. enter emacs-tst 31381 [000] 609237.689323: probe_libc:malloc_ret: (7f7d99f5e020 <- 7f7d9eac1af0) arg1=0xedc980
                   24af0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                 2400009 [unknown] ([unknown])

                         ...

Line: 3000071 Refcount: 3. enter emacs-tst 31381 [000] 609239.698236: probe_libc:malloc_ret: (7f7d99f5e020 <- 7f7d9eac1af0) arg1=0x34eb190
                   24af0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                 2400009 [unknown] ([unknown])

Line: 3000080 Refcount: 2. exit emacs-tst 31381 [000] 609239.698250: probe_libc:free: (7f7d99f5e660) mem=0x34eb190
                   7c660 free (/lib/x86_64-linux-gnu/libc-2.19.so)
        676942363846585f [unknown] ([unknown])

Line: 3000178 Refcount: 3. enter emacs-tst 31381 [000] 609239.698617: probe_libc:malloc_ret: (7f7d99f5e020 <- 7f7d9eac1af0) arg1=0x34eb190
                   24af0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
                 2400009 [unknown] ([unknown])

                         ...
                         ...
                         ...

So we allocate 3072 bytes in _XQueryFont, then we free() this, then we allocate 3072 bytes again in _XQueryFont, and this second allocation leaks. Then time passes as we do these 3 things again. Every time (presumably with each frame) we leak 3072 bytes. Here the allocation was from libX11.so, which omitted frame pointers and thus we don't have useful backtraces. I make another trace using DWARF for backtraces. This is heavier, but I can limit logging to malloc(3072) since I now know that's what I care about. This cuts down on the data that perf needs to write:

$ source reademacsvar.sh

$ sudo perf record ${=RECORD_OPTS} -g --call-graph=dwarf -p ${EMACS_PID} -eprobe_libc:malloc --filter 'bytes==3072'

$ sudo perf script > script

Selecting the malloc() paths through _XQueryFont I get these:

emacs-tst  4165 [000] 612746.917886: probe_libc:malloc: (7fbbd7027020) bytes=0xc00
	    7fbbd7027020 malloc (/lib/x86_64-linux-gnu/libc-2.19.so)
	    7fbbdbb8aaf0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
	    7fbbdbb8b7aa XLoadQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
	    7fbbdbb8b54e _XF86LoadQueryLocaleFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
	    7fbbdbb965d0 XLoadFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
	    7fbbdc4f95e5 XtCvtStringToFont (/usr/lib/x86_64-linux-gnu/libXt.so.6.0.0)
	    7fbbdc4f606d [unknown] (/usr/lib/x86_64-linux-gnu/libXt.so.6.0.0)
	    7fbbdc4f6db7 XtCallConverter (/usr/lib/x86_64-linux-gnu/libXt.so.6.0.0)
	          52c818 x_term_init (/tmp/emacs-tst.patched)
	          5381e8 x_display_info_for_name (/tmp/emacs-tst.patched)
	          52d3b7 check_x_display_info (/tmp/emacs-tst.patched)
	          5345c5 Fx_create_frame (/tmp/emacs-tst.patched)
                         ...

emacs-tst  4165 [000] 612746.918039: probe_libc:malloc: (7fbbd7027020) bytes=0xc00
	    7fbbd7027020 malloc (/lib/x86_64-linux-gnu/libc-2.19.so)
	    7fbbdbb8aaf0 _XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
	    7fbbdbb8b2bd XQueryFont (/usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
	          52c851 x_term_init (/tmp/emacs-tst.patched)
	          5381e8 x_display_info_for_name (/tmp/emacs-tst.patched)
	          52d3b7 check_x_display_info (/tmp/emacs-tst.patched)
	          5345c5 Fx_create_frame (/tmp/emacs-tst.patched)
                         ...

Those 2 appear in clusters, which is consistent with the previous log. The first is freed, the second leaks. In emacs, both come from x_term_init(). Looking at the source, here are the relevant lines:

if (!XtCallConverter (dpy, XtCvtStringToFont, &d, 1, &fr, &to, NULL))
  emacs_abort ();
if (x_had_errors_p (dpy) || !XQueryFont (dpy, font))
  XrmPutLineResource (&xrdb, "Emacs.dialog.*.font: 9x15");
/* Do not free XFontStruct returned by the above call to XQueryFont.
   This leads to X protocol errors at XtCloseDisplay (Bug#18403).  */
x_uncatch_errors ();

Oh my. So we leak this on purpose, and that bug report describes why.

License

All source released under the terms of the Lesser GNU General Public License, version 3 or later: https://www.gnu.org/copyleft/lesser.html