**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 00:57

**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 00:57

John Regehr @regehr@mastodon.social

Sep 09, 2025, 00:57

is this bullshit? or does ISA not really matter in some fictitious world where we can normalize for process and other factors?

https://www.techpowerup.com/340779/amd-claims-arm-isa-doesnt-offer-efficiency-advantage-over-x86

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:07

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:07

Sep 09, 2025, 01:07

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr Every serious study (both from independent researchers and from vendors themselves) that I've ever seen (and I'm up to 5 or so at this point), broadly, supports this, with some caveats.

It's not "no difference", but for server/application cores, what differences there are typically somewhere in the single-digit %. You can always find pathological examples, but typically it's not that much.

There is a real cost to x86s many warts but it's mostly in design/validation cost and toolchains.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:12

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:12

Sep 09, 2025, 01:12

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr Some more details:
- the D/V and toolchain costs are amortized. Broadly speaking, the bigger your ecosystem/market share, the bigger your ability to absorb that cost.
- This holds for what ARM would call "application" cores; oversimplifying a bit, it's essentially a constant overhead on the design that adds some extra area and pipe stages. It's more onerous for smaller cores, but you need to be really small.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:19

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:19

Sep 09, 2025, 01:19

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr Eventually, there's nowhere left to hide. For applications where you'd use say an ARM Cortex-M0 or a bare-bones minimal RV32I CPU, I'm not aware of anything x86 past or present that would really make sense.

Intel did "Quark" a while back which I believe was either a 486 or P5 derivative, so still something like a 5-stage pipelined integer core. If you want to go even lower than that, I don't think anyone has (or wants to do) anything.

**Steve Canon** @steve@discuss.systems · Sep 09, 2025, 01:21

**Steve Canon** @steve@discuss.systems · Sep 09, 2025, 01:21

Sep 09, 2025, 01:21

Steve Canon @steve@discuss.systems

@rygorous @regehr yeah, this

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:24

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:24

Sep 09, 2025, 01:24

Fabian Giesen @rygorous@mastodon.gamedev.place

@steve @regehr Anyway, take that with whatever amount of salt you want, but Intel and AMD both are strongly incentivized to seriously look at this.

They for sure would prefer to sell you x86s because they have decades of experience with that, but they're looking at what it costs them to do it both in capex and in how much it hurts the resulting designs.

And for the latter, the consistent answer has been "a bit, but not much".

**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 01:29

**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 01:29

Sep 09, 2025, 01:29

John Regehr @regehr@mastodon.social

@rygorous @steve I've seen part of a convincing / complete formal spec for x86 and I would run away from any effort to validate an implementation of this

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:38

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:38

Sep 09, 2025, 01:38

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr @steve Anecdotally, there's at least 3 (Intel, AMD, Centaur) companies that do this on the regular, and one of them (Centaur) is quite small as such things go.

I wouldn't want to do it either, but the other thing you gotta keep in mind is that the CPU core, while important, is only part of a SoC and ISA has very little impact on the "everything else".

**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 01:40

**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 01:40

Sep 09, 2025, 01:40

John Regehr @regehr@mastodon.social

@rygorous @steve sure sure

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:47

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:47

Sep 09, 2025, 01:47

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr @steve For example, it's a goddamn NIGHTMARE doing a high-performance memory subsystem for absolutely anything.

This whole "shared memory" fiction we're committed to maintaining is a significant drag on all HW, but HW impls of it are just in another league perf-wise than "just" building message-passing and trying to work around it in SW (lots have tried, but there's little code for it and it's a PITA), so we're kind of stuck with it.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:49

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:49

Sep 09, 2025, 01:49

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr @steve Basically almost everything that _all_ major ISAs pretend is true about memory at the ISA level is an expensive lie, but one that ~ALL the SW depends on. :)

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:52

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:52

Sep 09, 2025, 01:52

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr @steve To wit: virtual memory is a lie, by design. Uniform memory is a lie. Shared instruction/data memory is a lie. Coherent caches are a lie, caches would rather be _anything_ else. Buses are a lie. Memory-mapped IO is IO lying about being memory. Oh and the data bits and wires are small and shitty enough now that they started lying too and everything is slowly creeping towards ECCing all the things

**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 01:53

**John Regehr** @regehr@mastodon.social · Sep 09, 2025, 01:53

Sep 09, 2025, 01:53

John Regehr @regehr@mastodon.social

@rygorous @steve yep this is more or less how I teach this stuff

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:55

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:55

Sep 09, 2025, 01:55

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr @steve Also, re: ISA efficiency, I like re-posting this, by now, rather old image that shows you what the score really is.

This was on the Xeon Phis but the general trend holds to this day. (Source: https://people.eecs.berkeley.edu/~ysshao/assets/papers/shao2013-islped.pdf p. 3) NB this is an in-order core with 512b vector units.

25a9e7374936712e.png

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:59

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 01:59

Sep 09, 2025, 01:59

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr @steve This is one of the bigger reasons for why ISA doesn't matter more.

Broadly, your uArch is only as good as its data movement, because that shit is what's really expensive, not the logic gates.

It's things like:
- how good is your entire memory subsystem
- how good is your bypass network
- how good are your register files
etc.

It's not like you can't make mistakes in the ISA that will really kill your design, you can. That's what happened to VAX.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 02:02

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 02:02

Sep 09, 2025, 02:02

Fabian Giesen @rygorous@mastodon.gamedev.place

@regehr @steve The VAX ISA turns out to be, inadvertently, _extremely_ hostile to an implementation that tries to decouple frontend and backend, which ultimately broke its neck.

x86 has many flaws, but nothing that makes it so that there is a massive discontinuity where there's basically nothing you can do about a particular problem until you have like 10x transistor/power/whatever budget, which is the kind of thing that kills archs.

**argv minus one** @argv_minus_one@mastodon.sdf.org · Sep 09, 2025, 03:54

**argv minus one** @argv_minus_one@mastodon.sdf.org · Sep 09, 2025, 03:54

Sep 09, 2025, 03:54

argv minus one @argv_minus_one@mastodon.sdf.org

@rygorous

What caused the demise of m68k?

@regehr @steve

**Tom Forsyth** @TomF@mastodon.gamedev.place · Sep 09, 2025, 04:10

**Tom Forsyth** @TomF@mastodon.gamedev.place · Sep 09, 2025, 04:10

Sep 09, 2025, 04:10

Tom Forsyth @TomF@mastodon.gamedev.place

@argv_minus_one @rygorous @regehr @steve 68k went way way too CISC right at the point RISC got all trendy. Like... RISC was wrong in the long term, but it was 20% right for a decade or so. And then it was wrong. Sadly, that was long enough to kill 68k as a mainstream part (though it lived on for a looooong time in the embedded space)

**Wolf480pl** @wolf480pl@mstdn.io · Sep 09, 2025, 06:23

**Wolf480pl** @wolf480pl@mstdn.io · Sep 09, 2025, 06:23

Sep 09, 2025, 06:23

Wolf480pl @wolf480pl@mstdn.io

@TomF
@argv_minus_one @rygorous @regehr @steve
wasn't 68k basically a VAX?

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 06:49

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 06:49

Sep 09, 2025, 06:49

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl it was absolutely not, no.

One of VAX's more notable problems was that absolutely every operand could be a memory reference or even indirect memory reference (meaning a memory location containing a pointer to a memory location that was accessed by the instruction). Some VAX instructions had 6 operands, each of which could be a double-indirect memory reference, and IIRC also unaligned and spanning a page boundary, so the worst case number of page faults per instruction was bonkers.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 06:51

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 06:51

Sep 09, 2025, 06:51

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl Everything could also be an immediate operand.

There were two ways to encode immediates, "literal" was for short integers and was more compact, anything out of range used the actual immediate encoding.

On the VAX, you had 16 GPRs R0-R15, and R15 was just your PC. (32-bit ARM later copied that mistake, and it is a mistake.)

The immediate encoding boiled down to (r15)+, i.e., fetch data (of whatever the right size is) at PC and auto-increment. That's also how it was encoded.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 06:54

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 06:54

Sep 09, 2025, 06:54

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl So, in the VAX encoding, if you have say an add instruction where the first operand is an immediate, you get the encoding for the first operand, then the immediate bytes, then the encoding for the second operand, and so forth.

Crucially, you don't really know where the byte describing the second operand starts until you've finished the first operand; and this goes for all (up to 6) operands.

Nobody does this anymore, because turns out, it's a _terrible_ idea.

**Wolf480pl** @wolf480pl@mstdn.io · Sep 09, 2025, 06:58

**Wolf480pl** @wolf480pl@mstdn.io · Sep 09, 2025, 06:58

Sep 09, 2025, 06:58

Wolf480pl @wolf480pl@mstdn.io

@rygorous
sounds like it'd save you a lot of gates in the uninteresting scenario of a cacheless byte-addressable memory and a core that takes 3+ cycles to process each operand

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:04

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:04

Sep 09, 2025, 07:04

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl VAX was multi-cycle everything, basically something like at least 1 cycle for the base operation (even if no operands), at least 1 cycle extra for every operand, more if they involved memory access.

They did try to pipeline it past that (with the NVAX) but the ISA proved to be remarkably resistant to doing something much better, at least with the transistor budget they had at the time (late 80s/early 90s).

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:15 *

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:15 *

Sep 09, 2025, 07:15 *

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl Which is all just a random historical footnote at this point, but it is important context because all the original first-wave early-to-mid-80s RISC papers were subtweeting VAXen, specifically and especially.

VAX is '77. 8086 is from '78, and descended from the 8080 and ultimately 8008 ('72). IBM z mainframes are still off the original System/360 architecture from 1965. Both decidedly not RISC. Both made the jumps to first pipelined, then superscalar, then OoO just fine. VAX, nope.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:20

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:20

Sep 09, 2025, 07:20

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl The original RISC papers put all "CISCs" in the same boat, but historically, that is demonstrably false.

VAX made some very specific decisions that felt clean and elegant in the short term and screwed them over big-time in the long term.

Same for the first gen RISCs - load delay slots never made it into series production for MIPS, branch delay slots did but were regretted not long after, etc.

I don't think there's a big lesson here other than predicting the future is hard.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:23

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:23

Sep 09, 2025, 07:23

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl Except, of course, for ISA designers, where there's plenty of immediately actionable information from how the VAX shook out, but that's less along RISC/CISC ideological lines and more like:
- make instructions fixed-size or, when not practical, at least make it easy to tell insn size from the first word
- don't bake in decisions that really lock you into one particular implementation, you might not want it in the future
- don't make the PC a GPR (SP is somewhat special too)
etc.

**Wolf480pl** @wolf480pl@mstdn.io · Sep 09, 2025, 07:25

**Wolf480pl** @wolf480pl@mstdn.io · Sep 09, 2025, 07:25

Sep 09, 2025, 07:25

Wolf480pl @wolf480pl@mstdn.io

@rygorous
those things are fairly obvious if you know pipelining is a thing. I'm guessing in the 70s they didn't know that yet?

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:32

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:32

Sep 09, 2025, 07:32

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl They did. Pipelining is 50s tech (IBM Stretch had a 3-stage pipeline, designed starting 1956). Superscalar, OoO was developed in the 60s (e.g. IBM ACS, Tomasulo algorithm) and shipped that decade (IBM System/360 Model 91 FPU, 1967).

But this was all mostly the purview of supercomputers.

The whole idea of an Instruction Set Architecture with multiple impls goes back to the S/360. That was barely 10 years old when they designed the VAX.

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:35

**Fabian Giesen** @rygorous@mastodon.gamedev.place · Sep 09, 2025, 07:35

Sep 09, 2025, 07:35

Fabian Giesen @rygorous@mastodon.gamedev.place

@wolf480pl For all of the 50s and 60s (unless you were at IBM), people mostly designed a sequence of computers, all incompatible with each other. (Related, but incompatible.)

Even if you'd told the original VAX designers around '75 that their decisions would screw over the VAX by '92 or so, I don't think they would've considered a 17-year life span a bad result.

We now know (via S/360, 8086 etc.) that commercial archs can last 50+ years (albeit with many changes). They did not.

**customdesigned** @customdesigned@qoto.org · 2025-09-09T21:28:46Z

customdesigned @customdesigned@qoto.org

@rygorous @wolf480pl That's why we used "emulators" of a virtual insn set (modern example being Java) - so that moving to a new ISA was trivial and barely a blip. Moved from 68k to 88k to PPC to x86 with all the code binaries unchanged - just port the emulator.

IBM did similar with their AS400. The apps were mostly RPG, which was unchanged as they kept changing the CPU.

Sep 09, 2025, 21:28 · · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…