I just posted some of my results - going into more depth would require an essay in itself. I might play with this some more, it may actually be a good way to test prompt quality by identifying some of the questions that are more likely to flip in a stochastic manner.
Thanks for bringing the post up.
🙂