For a while now I'd see an annoying problem when trying to analyze data. I would be trying to import into numpy an innocuous-looking data file like this:
# image x y z temperature image1.png 1 2 5 34 image2.png 3 4 1 35
As usual, I would be using vnlog.slurp()
(a thin wrapper around
numpy.loadtxt()
) to read this in, but that doesn't work: the image filenames
aren't parseable as numerical values. Up until now I would work around this by
using the suprocess
module to fork off a vnl-filter -p !image
and then slurp
that, but it's a pain and slow and has other issues. I just solved this
conclusively using the numpy structured dtypes. I can now do this:
dtype = np.dtype([ ('image', 'U16'), ('x y z', int, (3,)), ('temperature', float), ]) arr = vnlog.slurp("data.vnl", dtype=dtype)
This will read the image filename, the xyz points and the temperature into different sub-arrays, with different types each. Accessing the result looks like this:
print(arr['image']) ---> array(['image1.png', 'image2.png'], dtype='<U16') print(arr['x y z']) ---> array([[1, 2, 5], [3, 4, 1]]) print(arr['temperature']) ---> array([34., 35.])
Notes:
- The given structured dtype defines both how to organize the data, and which
data to extract. So it can be used to read in only a subset of the available
columns. Here I could have omitted the
temperature
column, for instance - Sub-arrays are allowed. In the example I could say either
dtype = np.dtype([ ('image', 'U16'), ('x y z', int, (3,)), ('temperature', float), ])
or
dtype = np.dtype([ ('image', 'U16'), ('x', int), ('y', int), ('z', int), ('temperature', float), ])
The latter would read
x
,y
,z
into separate, individual arrays. Sometime we want this, sometimes not. - Nested structured dtypes are not allowed. Fields inside other fields are not supported, since it's not clear how to map that to a flat vnlog legend
- If a structured dtype is given,
slurp()
returns the array only, since the field names are already available in the dtype
We still do not support records with any null values (-
). This could
probably be handled with the converters
kwarg of numpy.loadtxt()
, but that
sounds slow. I'll look at that later.
This is available today in vnlog 1.38.