**Eric Florenzano** @ericflo@mastodon.xyz · Sep 27, 2024, 02:34

**Eric Florenzano** @ericflo@mastodon.xyz · Sep 27, 2024, 02:34

Eric Florenzano @ericflo@mastodon.xyz

Sep 27, 2024, 02:34

Eric Florenzano @ericflo@mastodon.xyz

This is SO COOL, maybe Chain of Thought is a short-lived hack instead of a fundamental building block, whereas looped transformers seem fundamental if they scale. This aligns with empirical evidence that looping inner transformer layers can improve performance even without retraining. https://sites.google.com/wisc.edu/looped-transformers-for-lengen/home

**l'empathie mécanique** @dpwiz@qoto.org · 2024-09-27T12:49:55Z

l'empathie mécanique @dpwiz@qoto.org

@ericflo Not gonna bite the bitter lesson, are you? 😄

Sep 27, 2024, 12:49 · · · ·

**Eric Florenzano** @ericflo@mastodon.xyz · Sep 27, 2024, 15:32

**Eric Florenzano** @ericflo@mastodon.xyz · Sep 27, 2024, 15:32

Sep 27, 2024, 15:32

Eric Florenzano @ericflo@mastodon.xyz

@dpwiz Hard to argue with the bitter lesson, but this seems orthogonal or even in line with it - those loops ain't free

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…