@rao2z.bsky.social presents an extremely interesting evaluation of LLMs' ability to reason. His team had been doing this research for a while now, but now with the emergence of Large Reasoning Models, finally there is some notable progress
His post on bsky: https://bsky.app/profile/rao2z.bsky.social/post/3lmplm3ogkk2l
The preprint: https://arxiv.org/abs/2504.09762