“Such” a good deal is 2-30% in benchmarks (and they don’t say if that’s with cac...

“Such” a good deal is 2-30% in benchmarks (and they don’t say if that’s with cache leak protections turned on or off). In previous generations it was more like -20-20%. If one thread is having issues with L1 and L2 cache, splitting that evenly with a completely different workflow isn’t going to help.

If cache contention weren’t a problem, and it was just a matter of jumping into previously unseen instructions and data (cold cache), you’d expect to see 50-300% numbers from hyperthreading, precisely because of how long the stalls are.