Added bisimulation tests for the Free re-implementation (found 2 bugs ) and got to benchmarking the thing.
I was surprised that a round dance of 3 functors and a ping-pong of functions that pass control around is not only "a little slower" than a tight package, but instead twice as fast!
Okay, the numbers level out (with a slight advantage for the Free) when its sampling function becomes complicated (a primitive SDF vs a stack of 100 primitives). I'm now more sure that I'm measuring some right thing, and not some laziness fluke.