> Thanks. So because of in-order retirement, the ROB is indeed limited to holdin...

> Thanks. So because of in-order retirement, the ROB is indeed limited to holding only consecutive µops. And since nothing can be executed until it is put in the ROB, this means that there is a hard limit on the distance in µops between the last unretired µop and what can possibly be executed.

Yes. The ROB holds even things that will never be executed, such as nops and zeroing idioms.

Note that the ROB is not really the bad guy here: if you invent some way to make the ROB retire out-of-order to break that restriction, you'll just immediately run into the PRF limit, since almost all pending instructions need a destination register. Since these values are all live (from CPU's point of view) until all older instructions have retired, you'll get the same kind of limit from the PRF.

The PRF is a big structure using lots of area and power, so probably the ROB size is kind of determined from the PRF size: how big are going to make the PRF? X? OK, let's make the ROB size X * 1.5 since then the ROB will rarely limit performance. Note: we know the ROB increased dramatically in size from 224 to 354 in SNC, but we don't yet know how big the reg files are!

There are good papers out there about super-high ILP designs if you are interested about how this kind of stuff can be solved, but there are many problems.

> Does this also mean that a load is only ever retired after it has been successfully fulfilled? I haven't understood the role of retirement for loads and stores.

Yes. Loads and stores both go in the ROB, and also in the load/store buffers, which are in-order just like the ROB.

A load can't retire until it completes: only then are the physical resources, like the register it writes to and the load buffer entry, able to be freed.

Note that this is a huge difference between loads and prefetches: prefetches can retire immediately (well as soon as the load address has been calculated). They just kick off the load in the memory subsystem and then their work is done.

Stores are different: they can retire as soon as the store address and store data are known (i.e., as soon as their inputs are available). At this point they become so-called senior stores: stores which have retired, and hence are non-speculative, but haven't become visible to the rest of the system yet. They must eventually become visible, which means that on an an interrupt this part of the store buffer is preserved or drained, never thrown away like the rest of the OoO buffers. When the time is right (write?) they commit to L1.