If you take a modern C++ or Rust compiler and think about optimizing it…
There’s lots of “low-hanging fruit” in the form of incrementality and parallelism.

Not truly “low-hanging” as in easy to implement; it’s actually extremely hard. But it’s easy to theorize about. It’s known to be *possible*, and capable of massive speedups in various cases.

But what about the rest? How much room is there for large speedups just by optimizing algorithms? To me that feels like much more of an unknown.

@comex I have this idea that compilers represent data incorrectly for modern platforms. Currently we have a vast ocean of tiny nodes full of pointers, and we just follow pointers all day. what we need is regular, tabular internal representations that we can throw onto tensor cores

@regehr @comex What you think of Aaron Hsu's dissertation on a compiler with regular, tabular internal representations for GPUs?

Follow

@regehr @comex Might find it interesting; has explored the idea you express and produced an apparently viable, working system called co-dfns.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.