**Eric Florenzano** @ericflo@mastodon.xyz · Jan 04, 2025, 04:28 *

**Eric Florenzano** @ericflo@mastodon.xyz · Jan 04, 2025, 04:28 *

Eric Florenzano @ericflo@mastodon.xyz

Jan 04, 2025, 04:28 *

Eric Florenzano @ericflo@mastodon.xyz

I have a creeping intuition that the residual connections, flowing through the network with no degradation/impediment, are somehow holding back modern large transformer architectures. ResNet was a breakthrough, but I wonder if there's another way that encourages better internal representations and specializations.

**l'empathie mécanique** @dpwiz@qoto.org · 2025-01-04T12:22:47Z

l'empathie mécanique @dpwiz@qoto.org

@ericflo I think there are learned circuits to suppress irrelevant information to keep the transformer trunk tidy.

Jan 04, 2025, 12:22 · · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…