Vnlog!

In the last few jobs I've worked at I ended up writing a tool to store data in a nice format, and to be able to manipulate it easily. I'd rewrite this from scratch each time partly because I was never satisfied with the previous version. Each iteration was an improvement on the previous one, and this version is the good one. I wrote it at NASA/JPL, went through the release process (this thing was called asciilog then), added a few more features, and I'm now releasing it. The toolkit lives here and here's the initial README:

Summary

Vnlog (pronounced "vanillog") is a trivially-simple log format:

A whitespace-separated table of ASCII human-readable text
Lines beginning with # are comments
The first line that begins with a single # (not ## or #!) is a legend, naming each column

Example:

#!/usr/bin/whatever
# a b c
1 2 3
## another comment
4 5 6

Such data works very nicely with normal UNIX tools (awk, sort, join), can be easily read by fancier tools (numpy, matlab (yuck), excel (yuck), etc), and can be plotted with feedgnuplot. This tookit provides some tools to manipulate vnlog data and a few libraries to read/write it. The core philosophy is to keep everything as simple and light as possible, and to provide methods to enable existing (and familiar!) tools and workflows to be utilized in nicer ways.

Synopsis

In one terminal, sample the CPU temperature over time, and write the data to a file as it comes in, at 1Hz:

$ ( echo '# time temp1 temp2 temp3';
    while true; do echo -n "`date +%s` "; < /proc/acpi/ibm/thermal awk '{print $2,$3,$4; fflush()}';
    sleep 1; done )
    > /tmp/temperature.vnl

In another terminal, I sample the consumption of CPU resources, and log that to a file:

$ (echo "# user system nice idle waiting hardware_interrupt software_interrupt stolen";
   top -b -d1 | awk '/%Cpu/ {print $2,$4,$6,$8,$10,$12,$14,$16; fflush()}')
   > /tmp/cpu.vnl

These logs are now accumulating, and I can do stuff with them. The legend and the last few measurements:

$ vnl-tail /tmp/temperature.vnl
# time temp1 temp2 temp3
1517986631 44 38 34
1517986632 44 38 34
1517986633 44 38 34
1517986634 44 38 35
1517986635 44 38 35
1517986636 44 38 35
1517986637 44 38 35
1517986638 44 38 35
1517986639 44 38 35
1517986640 44 38 34

I grab just the first temperature sensor, and align the columns

$ < /tmp/temperature.vnl vnl-tail |
    vnl-filter -p time,temp=temp1 |
    vnl-align
#  time    temp
1517986746 45
1517986747 45
1517986748 46
1517986749 46
1517986750 46
1517986751 46
1517986752 46
1517986753 45
1517986754 45
1517986755 45

I do the same, but read the log data in realtime, and feed it to a plotting tool to get a live reporting of the cpu temperature. This plot updates as data comes in. I then spin a CPU core (while true; do true; done), and see the temperature climb. Here I'm making an ASCII plot that's pasteable into the docs.

$ < /tmp/temperature.vnl vnl-tail -f           |
    vnl-filter --unbuffered -p time,temp=temp1 |
     feedgnuplot --stream --domain
       --lines --timefmt '%s' --set 'format x "%M:%S"' --ymin 40
       --unset grid --terminal 'dumb 80,40'

  70 +----------------------------------------------------------------------+
     |      +      +      +      +       +      +      +      +      +      |
     |                                                                      |
     |                                                                      |
     |                                                                      |
     |                      **                                              |
  65 |-+                   ***                                            +-|
     |                    ** *                                              |
     |                    *  *                                              |
     |                    *  *                                              |
     |                   *   *                                              |
     |                  **   **                                             |
  60 |-+                *     *                                           +-|
     |                 *      *                                             |
     |                 *      *                                             |
     |                 *      *                                             |
     |                **      *                                             |
     |                *       *                                             |
  55 |-+              *       *                                           +-|
     |                *       *                                             |
     |                *       **                                            |
     |                *        *                                            |
     |               **        *                                            |
     |               *         **                                           |
  50 |-+             *          **                                        +-|
     |               *           **                                         |
     |               *            ***                                       |
     |               *              *                                       |
     |               *              ****                                    |
     |               *                 *****                                |
  45 |-+             *                     ***********                    +-|
     |    ************                               ********************** |
     |          * **                                                        |
     |                                                                      |
     |                                                                      |
     |      +      +      +      +       +      +      +      +      +      |
  40 +----------------------------------------------------------------------+
   21:00  22:00  23:00  24:00  25:00   26:00  27:00  28:00  29:00  30:00  31:00

Cool. I can then join the logs, pull out simultaneous CPU consumption and temperature numbers, and plot the path in the temperature-cpu space:

$ vnl-join -j time /tmp/temperature.vnl /tmp/cpu.vnl |
  vnl-filter -p temp1,user                           |
  feedgnuplot --domain --lines
    --unset grid --terminal 'dumb 80,40'

  45 +----------------------------------------------------------------------+
     |           +           +           +          +           +           |
     |                                       *                              |
     |                                       *                              |
  40 |-+                                    **                            +-|
     |                                      **                              |
     |                                     * *                              |
     |                                     * *      *    *    *             |
  35 |-+               ****      *********** **** * **** ***  ******      +-|
     |        *********   ********       *   *  *****  *** * ** *  *        |
     |        *    *                            * * *  * * ** * *  *        |
     |        *    *                                   *   *  *    *        |
  30 |-+      *                                                    *      +-|
     |        *                                                    *        |
     |        *                                                    *        |
     |        *                                                    *        |
  25 |-+      *                                                    *      +-|
     |        *                                                    *        |
     |        *                                                    *        |
     |        *                                                    *        |
  20 |-+      *                                                    *      +-|
     |        *                                                    *        |
     |        *                                                    *        |
     |      * *                                                    *        |
  15 |-+    * *  *                                                 *      +-|
     |      * *  *                                                 *        |
     |      ***  *                                                 *        |
     |      ***  *                                                 *        |
  10 |-+    ***  *                                                 *      +-|
     |      ***  *                                                 *        |
     |      ***  *                                                 *        |
     |      ***  *                                                 *        |
   5 |-+    ***  *                                                 *      +-|
     |      ***  *                                                 *        |
     |      * *  * *                                               *        |
     |      **** * ** *****  *********** +       *******       *****        |
   0 +----------------------------------------------------------------------+
     40          45          50          55         60          65          70

Description

As stated before, vnlog tools are designed to be very simple and light. There exist other tools that are similar. For instance:

These all provide facilities to run various analyses, and are neither simple nor light. Vnlog by contrast doesn't analyze anything, but makes it easy to write simple bits of awk or perl to process stuff to your heart's content. The main envisioned use case is one-liners, and the tools are geared for that purpose. The above mentioned tools are much more powerful than vnlog, so they could be a better fit for some use cases.

In the spirit of doing as little as possible, the provided tools are wrappers around tools you already have and are familiar with. The provided tools are:

vnl-filter is a tool to select a subset of the rows/columns in a vnlog and/or to manipulate the contents. This is effectively an awk wrapper where the fields can be referenced by name instead of index. 20-second tutorial:

vnl-filter -p col1,col2,colx=col3+col4 'col5 > 10' --has col6

will read the input, and produce a vnlog with 3 columns: col1 and col2 from the input and a column colx that's the sum of col3 and col4 in the input. Only those rows for which the col5 > 10 is true will be output. Finally, only those rows that have a non-null value for col6 will be selected. A null entry is signified by a single - character.

vnl-filter --eval '{s += x} END {print s}'

will evaluate the given awk program on the input, but the column names work as you would hope they do: if the input has a column named x, this would produce the sum of all values in this column.

vnl-sort, vnl-join, vnl-tail are wrappers around the corresponding GNU Coreutils tools. These work exactly as you would expect also: the columns can be referenced by name, and the legend comment is handled properly. These are wrappers, so all the commandline options those tools have "just work" (except options that don't make sense in the context of vnlog). As an example, vnl-tail -f will follow a log: data will be read by vnl-tail as it is written into the log (just like tail -f, but handling the legend properly). And you already know how to use these tools without even reading the manpages! Note: these were written for and have been tested with the Linux kernel and GNU Coreutils sort, join and tail. Other kernels and tools probably don't (yet) work. Send me patches.
vnl-align aligns vnlog columns for easy interpretation by humans. The meaning is unaffected
Vnlog::Parser is a simple perl library to read a vnlog
libvnlog is a C library to simplify writing a vnlog. Clearly all you really need is printf(), but this is useful if we have lots of columns, many containing null values in any given row, and/or if we have parallel threads writing to a log
vnl-make-matrix converts a one-point-per-line vnlog to a matrix of data. I.e.

$ cat dat.vnl
# i j x
0 0 1
0 1 2
0 2 3
1 0 4
1 1 5
1 2 6
2 0 7
2 1 8
2 2 9
3 0 10
3 1 11
3 2 12

$ < dat.vnl vnl-filter -p i,x | vnl-make-matrix --outdir /tmp
Writing to '/tmp/x.matrix'

$ cat /tmp/x.matrix
1 2 3
4 5 6
7 8 9
10 11 12

All the tools have manpages that contain more detail. And tools will probably be added with time.

C interface

For most uses, these logfiles are simple enough to be generated with plain prints. But then each print statement has to know which numeric column we're populating, which becomes effortful with many columns. In my usage it's common to have a large parallelized C program that's writing logs with hundreds of columns where any one record would contain only a subset of the columns. In such a case, it's helpful to have a library that can output the log files. This is available. Basic usage looks like this:

In a shell:

$ vnl-gen-header 'int w' 'uint8_t x' 'char* y' 'double z' 'void* binary' > vnlog_fields_generated.h

In a C program test.c:

#include "vnlog_fields_generated.h"

int main()
{
    vnlog_emit_legend();

    vnlog_set_field_value__w(-10);
    vnlog_set_field_value__x(40);
    vnlog_set_field_value__y("asdf");
    vnlog_emit_record();

    vnlog_set_field_value__z(0.3);
    vnlog_set_field_value__x(50);
    vnlog_set_field_value__w(-20);
    vnlog_set_field_value__binary("\x01\x02\x03", 3);
    vnlog_emit_record();

    vnlog_set_field_value__w(-30);
    vnlog_set_field_value__x(10);
    vnlog_set_field_value__y("whoa");
    vnlog_set_field_value__z(0.5);
    vnlog_emit_record();

    return 0;
}

Then we build and run, and we get

$ cc -o test test.c -lvnlog

$ ./test

# w x y z binary
-10 40 asdf - -
-20 50 - 0.2999999999999999889 AQID
-30 10 whoa 0.5 -

The binary field in base64-encoded. This is a rarely-used feature, but sometimes you really need to log binary data for later processing, and this makes it possible.

So you

Generate the header to define your columns
Call vnlog_emit_legend()
Call vnlog_set_field_value__...() for each field you want to set in that row.
Call vnlog_emit_record() to write the row and to reset all fields for the next row. Any fields unset with a vnlog_set_field_value__...() call are written as null: -

This is enough for 99% of the use cases. Things get a bit more complex if we have have threading or if we have multiple vnlog ouput streams in the same program. For both of these we use vnlog contexts.

To support reentrant writing into the same vnlog by multiple threads, each log-writer should create a context, and use it when talking to vnlog. The context functions will make sure that the fields in each context are independent and that the output records won't clobber each other:

void child_writer( // the parent context also writes to this vnlog. Pass NULL to
                   // use the global one
                   struct vnlog_context_t* ctx_parent )
{
    struct vnlog_context_t ctx;
    vnlog_init_child_ctx(&ctx, ctx_parent);

    while(records)
    {
        vnlog_set_field_value_ctx__xxx(&ctx, ...);
        vnlog_set_field_value_ctx__yyy(&ctx, ...);
        vnlog_set_field_value_ctx__zzz(&ctx, ...);
        vnlog_emit_record_ctx(&ctx);
    }
}

If we want to have multiple independent vnlog writers to different streams (with different columns andlegends), we do this instead:

file1.c:

#include "vnlog_fields_generated1.h"

void f(void)
{
    // Write some data out to the default context and default output (STDOUT)
    vnlog_emit_legend();
    ...
    vnlog_set_field_value__xxx(...);
    vnlog_set_field_value__yyy(...);
    ...
    vnlog_emit_record();
}

file2.c:

#include "vnlog_fields_generated2.h"

void g(void)
{
    // Make a new session context, send output to a different file, write
    // out legend, and send out the data
    struct vnlog_context_t ctx;
    vnlog_init_session_ctx(&ctx);
    FILE* fp = fopen(...);
    vnlog_set_output_FILE(&ctx, fp);
    vnlog_emit_legend_ctx(&ctx);
    ...
    vnlog_set_field_value__a(...);
    vnlog_set_field_value__b(...);
    ...
    vnlog_emit_record();
}

Note that it's the user's responsibility to make sure the new sessions go to a different FILE by invoking vnlog_set_output_FILE(). Furthermore, note that the included vnlog_fields_....h file defines the fields we're writing to; and if we have multiple different vnlog field definitions in the same program (as in this example), then the different writers must live in different source files. The compiler will barf if you try to #include two different vnlog_fields_....h files in the same source.

More APIs are

vnlog_printf(...) and vnlog_printf_ctx(ctx, ...) write to a pipe like printf() does. This exists for comments.

vnlog_clear_fields_ctx(ctx, do_free_binary): Clears out the data in a context and makes it ready to be used for the next record. It is rare for the user to have to call this manually. The most common case is handled automatically (clearing out a context after emitting a record). One area where this is useful is when making a copy of a context:

struct vnlog_context_t ctx1;
// .... do stuff with ctx1 ... add data to it ...

struct vnlog_context_t ctx2 = ctx1;
// ctx1 and ctx2 now both have the same data, and the same pointers to
// binary data. I need to get rid of the pointer references in ctx1

vnlog_clear_fields_ctx(&ctx1, false);

vnlog_free_ctx(ctx):

Frees memory for an vnlog context. Do this before throwing the context away. Currently this is only needed for context that have binary fields, but this should be called in for all contexts, just in case

numpy interface

The built-in numpy.loadtxt numpy.savetxt functions work well to read and write these files. For example to write to standard output a vnlog with fields a, b and c:

numpy.savetxt(sys.stdout, array, fmt="%g", header="a b c")

Note that numpy automatically adds the # to the header. To read a vnlog from a file on disk, do something like

array = numpy.loadtxt('data.vnl')

These functions know that # lines are comments, but don't interpret anything as field headers. That's easy to do, so I'm not providing any helper libraries. I might do that at some point, but in the meantime, patches are welcome.

Caveats and bugs

The tools that wrap GNU coreutils (vnl-sort, vnl-join, vnl-tail) are written specifically to work with the Linux kernel and the GNU coreutils. None of these have been tested with BSD tools or with non-Linux kernels, and I'm sure things don't just work. It's probably not too effortful to get that running, but somebody needs to at least bug me for that. Or better yet, send me nice patches :)

These tools are meant to be simple, so some things are hard requirements. A big one is that columns are whitespace-separated. There is no mechanism for escaping or quoting whitespace into a single field. I think supporting something like that is more trouble than it's worth.

Repository

https://github.com/dkogan/vnlog/

Author

Dima Kogan (dima@secretsauce.net)

License and copyright

This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

b64_cencode.c comes from cencode.c in the libb64 project. It is written by Chris Venter (chris.venter@gmail.com) who placed it in the public domain. The full text of the license is in that file.