So even after years and years of experience, core tools still find ways to surprise me. Today I tried to do some timestamp comparisons with mawk (vnl-filter, to be more precise), and ran into a detail of the language that made it not work. Not a bug, I guess, since both mawk and gawk are affected. I'll claim "language design flaw", however.

Let's say I'm processing data with unix timestamps in it (seconds since the epoch). gawk and recent versions of mawk have strftime() for that:

$ date
Wed Jul 29 15:31:13 PDT 2020

$ date +"%s"
1596061880

$ date +"%s" | mawk '{print strftime("%H",$1)}'
15

And let's say I want to do something conditional on them. I want only data after 9:00 each day:

$ date +"%s" | mawk 'strftime("%H",$1) >= 9 {print "Yep. After 9:00"}'

That's right. No output. But it is 15:31 now, and I confirmed above that strftime() reports the right time, so it should know that it's after 9:00, but it doesn't. What gives?

As we know, awk (and perl after it) treat numbers and strings containing numbers similarly: 5+5 and ="5"+5= both work the same, which is really convenient. This can only work if it can be inferred from context whether we want a number or a string; it knows that addition takes two numbers, so it knows to convert ="5"= into a number in the example above.

But what if an operator is ambiguous? Then it picks a meaning based on some internal logic that I don't want to be familiar with. And apparently awk implements string comparisons with the same < and > operators, as numerical comparisons, creating the ambiguity I hit today. strftime returns strings, and you get silent, incorrect behavior that then demands debugging. How to fix? By telling awk to treat the output of strftime() as a number:

$ date +"%s" | mawk '0+strftime("%H",$1) >= 9 {print "Yep. After 9:00"}'

Yep. After 9:00

With the benefit of hindsight, they really should not have reused any operators for both number and string operations. Then these ambiguities wouldn't occur, and people wouldn't be grumbling into their blogs decades after these decisions were made.