This truly is the final boss. 16-point FFT with a hard instruction target.

Except I'm also doing a 32-point and the world's weirdest 64-point afterwards, but the former doesn't seem hard, and the latter sounds like a heckin' fun time!

That's a lot of code. But I think it's possible to do in less than 17 instructions, and with minimal shuffles. I sure hope everything aligns well for vaddsubps.

Show thread

vextractf128 is 1.8% faster than vpermf128 + movaps
Oh well, on paper the former was faster, glad I tested it.

Show thread

Right now I dislike how subtraction is non-associative but my liking for addition being associative makes up for it.

Show thread

@emilis @lynne

If 1*n is an idempotent operation, -1*n is "contrapotent".

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves. A STEM-oriented instance.

An inclusive free speech instance.
All cultures and opinions welcome.
Explicit hate speech and harassment strictly forbidden.
We federate with all servers: we don't block any servers.