Do you have anything to say regarding NUMA for the 90s core counts though? As I ...

monocasa · 2025-07-09T12:18:27 1752063507

Windows didn't handle numa either until they created processor groups, and there's all sorts reasons why you'd want to run a process (particularly on Windows which encourages single process high thread count software archs) that spans numa nodes. It's really not that big if a deal for a lot of workloads where your working set fits just fine in cache, or you take the high hatdware thread count approach of just having enough contexts in flight that you can absorb the extra memory latency in exchange for higher throughput.

zamadatix · 2025-07-09T13:23:24 1752067404

3.1 (1993) - KAFFINITY bitmask

5.0 (1999) - NUMA scheduling

6.1 (2009) - Processor Groups to have the KAFFINITY limit be per NUMA node

Xeon E7-8800 (2011) - An x86 system exceeding 64 total cores is possible (10x8 -> requires Processor Groups)

Epyc 9004 (2022) - KAFFINITY has created an artificial limit for x86 where you need to split groups more granular than NUMA

If x86 had actually hit a KAFFINITY wall then the E7-8800 even would have occured years before processor groups because >8 core CPUs are desirable regardless if you can stick 8 in a single box.

The story is really a bit reverse from the claim: NT in the 90s supported architectures which could scale past the KAFFINITY limit. NT in the late 2000s supported scaling x86 but it wouldn't have mattered until the 2010s. Ultimately KAFFINITY wasn't an annoyance until the 2020s.