This is SO COOL, maybe Chain of Thought is a short-lived hack instead of a fundamental building block, whereas looped transformers seem fundamental if they scale. This aligns with empirical evidence that looping inner transformer layers can improve performance even without retraining. sites.google.com/wisc.edu/loop

Follow

@ericflo Not gonna bite the bitter lesson, are you? 😄

@dpwiz Hard to argue with the bitter lesson, but this seems orthogonal or even in line with it - those loops ain't free

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.