Just found a highly surprising behavior in a core tool I've used for decades, so clearly I'm making a note here. None of these are surprising:

$ seq 1000 | wc -l


$ seq 1000 | tee /dev/null | wc -l


$ seq 1000 | tee >( true ) | wc -l


$ seq 1000 > >( true ) | wc -l


I.e. I can write 1000 lines into tee, do stuff in one of the children, and the other child get my 1000 lines still. The last one uses multios in zsh for the tee. But check out what happens when I bump up the data size:

$ seq 100000 | wc -l


$ seq 100000 | tee /dev/null | wc -l


$ seq 100000 | tee >( true ) | wc -l


$ seq 100000 > >( true ) | wc -l


Whoa. What the hell? When I stumbled on this I had another, unrelated problem breaking things in this area, which made for a long debugging session. Here're some runs that give a hint of what's going on:

$ seq 100000 | tee >( true ) | wc -c


$ seq 100000 > >( true ) | wc -c


$ seq 100000 | tee >( cat > /dev/null ) | wc -l


Figure it out?

Answer time! After a tee, a single writer parent feeds two reader children. If a child exits before reading all the data, then when the parent tries to feed that dead child, the parent will get a SIGPIPE. And apparently the default behavior of tee in GNU coreutils (and in the zsh multios redirection) is to give up and to stop feeding all the children at that point. So the second child (wc -l in the examples) ends up with incomplete input. No errors are thrown anywhere, and there's no indication at all that any data was truncated. Lots of the data is just silently missing.

The GNU coreutils implementation of tee has an innocuous-looking option:

-p     diagnose errors writing to non pipes

I read the manpage several times, and it's still not obvious to me that -p does anything more than change something about diagnostic printing. But it does: tee -p feeds all the children as much as it can until they're all dead (i.e. what everybody was assuming it was doing the whole time):

$ seq 100000 | tee -p >( true ) | wc -l


There's also pee, specific tee-to-process utility in the Debian moreutils package. This utility can be used here, and it does the reasonable thing by default:

$ seq 100000 | pee true 'wc -l'


So yeah. I'm not the first person to discover this, but I'm certain this was quite surprising to each of us.