Pop quiz! Let's say I have a datafile describing some items (images and feature points in this example):

# filename x y
000.jpg 79.932824 35.609049
000.jpg 95.174662 70.876506
001.jpg 19.655072 52.475315
002.jpg 19.515351 33.077847
002.jpg 3.010392 80.198282
003.jpg 84.183099 57.901647
003.jpg 93.237358 75.984036
004.jpg 99.102619 7.260851
005.jpg 24.738357 80.490116
005.jpg 53.424477 27.815635
....
....
149.jpg 92.258132 99.284486

How do I get a random subset of N images, using only the shell and standard commandline tools?

Bam!

$ N=5;
  (  echo '# filename';
     seq 0 149                       |
       shuf                          |
       head -n $N                    |
       xargs -n1 printf "%03d.jpg\n" |
       sort)  |
  vnl-join -j filename input.vnl -

# filename x y
017.jpg 41.752204 96.753914
017.jpg 86.232504 3.936258
027.jpg 41.839110 89.148368
027.jpg 82.772742 27.880592
067.jpg 57.790706 46.153623
067.jpg 87.804939 15.853087
076.jpg 41.447477 42.844849
076.jpg 93.399829 64.552090
142.jpg 18.045497 35.381083
142.jpg 83.037867 17.252172