vnlog.slurp() with non-numerical data

For a while now I'd see an annoying problem when trying to analyze data. I would be trying to import into numpy an innocuous-looking data file like this:

#  image   x y z temperature
image1.png 1 2 5 34
image2.png 3 4 1 35

As usual, I would be using vnlog.slurp() (a thin wrapper around numpy.loadtxt()) to read this in, but that doesn't work: the image filenames aren't parseable as numerical values. Up until now I would work around this by using the suprocess module to fork off a vnl-filter -p !image and then slurp that, but it's a pain and slow and has other issues. I just solved this conclusively using the numpy structured dtypes. I can now do this:

dtype = np.dtype([ ('image',       'U16'),
                   ('x y z',       int, (3,)),
                   ('temperature', float), ])

arr = vnlog.slurp("data.vnl", dtype=dtype)

This will read the image filename, the xyz points and the temperature into different sub-arrays, with different types each. Accessing the result looks like this:

print(arr['image'])
---> array(['image1.png', 'image2.png'], dtype='<U16')

print(arr['x y z'])
---> array([[1, 2, 5],
            [3, 4, 1]])

print(arr['temperature'])
---> array([34., 35.])

Notes:

The given structured dtype defines both how to organize the data, and which data to extract. So it can be used to read in only a subset of the available columns. Here I could have omitted the temperature column, for instance

Sub-arrays are allowed. In the example I could say either

dtype = np.dtype([ ('image',       'U16'),
                   ('x y z',       int, (3,)),
                   ('temperature', float), ])

dtype = np.dtype([ ('image',       'U16'),
                   ('x',           int),
                   ('y',           int),
                   ('z',           int),
                   ('temperature', float), ])

The latter would read x, y, z into separate, individual arrays. Sometime we want this, sometimes not.

Nested structured dtypes are not allowed. Fields inside other fields are not supported, since it's not clear how to map that to a flat vnlog legend
If a structured dtype is given, slurp() returns the array only, since the field names are already available in the dtype

We still do not support records with any null values (-). This could probably be handled with the converters kwarg of numpy.loadtxt(), but that sounds slow. I'll look at that later.

This is available today in vnlog 1.38.