The NT kernel dates back to 1993. Computers didn’t exceed 64 logical processors ...

immibis · 2025-07-09T13:06:50 1752066410

Linux had many similar restrictions in its lifetime; it just has a different compatibility philosophy that allowed it to break all the relevant ABIs. Most recently, dual-socket 192-core Ampere systems were running into a hardcoded 256-processor limit. https://www.tomshardware.com/pc-components/cpus/yes-you-can-...

monocasa · 2025-07-09T16:51:53 1752079913

Tom's hardware is mistaken in their reporting. That's raisng the limit without using CPUMASK_OFFSTACK. The kernel already supported thousands of cores with CPUMASK_OFFSTACK and has at least since the 2.6.x days.

arp242 · 2025-07-09T12:51:33 1752065493

> Computers didn’t exceed 64 logical processors per system until around 2014.

Server systems were available with that since at least the late 90s. Server systems with >10 CPUs were already available in the mid-90s. By the early-to-mid 90s it was pretty obvious that was only going to increase and that the 64-CPU limit was going to be a problem down the line.

That said, development of NT started in 1988, and it may have been less obvious then.

p_ing · 2025-07-09T17:11:50 1752081110

"Server systems" but not server systems that Microsoft targeted. NT4 Enterprise Server (1996) only supported up to 8 sockets (some companies wrote their own HAL to exceed that limit). And 8 sockets was 8 threads with no NUMA back then, not something that would have been an issue for the purposes of this discussion.

monocasa · 2025-07-09T21:11:53 1752095513

Microsoft was absolutely wanting to target large servers at the time. They were actively trying to kill off the vendor unices in the 90s.

p_ing · 2025-07-09T22:11:32 1752099092

They successfully killed off vendor unicies in the 90s, but that was thanks to cheap x86.

monocasa · 2025-07-10T04:08:46 1752120526

That was what stuck, but supporting the big servers was also part of their multifaceted strategy. That's why the alpha, itanium, powerpc, and mips ports existed.

rsynnott · 2025-07-09T13:54:35 1752069275

The Sun E10K (up to 64 physical processors) came out in 1997.

(Now, NT for Sparc never actually became a thing, but it was certainly on Microsoft's radar at one point)

mixmastamyk · 2025-07-09T19:19:11 1752088751

SGI Origin did by 1996.

Though MS ported NT to a number of systems (mips, alpha, ppc) it wasn’t able to play in the very big leagues until later.

I agree it was a reasonable choice at the time. Few were getting mileage out of that many CPUs back then.

sidewndr46 · 2025-07-09T13:11:56 1752066716

That was actually the DEC team from what I understand, Microsoft just hired all of their OS engineers when they collapsed

meepmorp · 2025-07-09T13:16:50 1752067010

Dave Cutler left DEC in 1988 and started working on WINNT at MS, well before the collapse.

monocasa · 2025-07-09T11:15:33 1752059733

I mean, x86 didn't, but other systems had been exceeding 64 cores since the late 90s.

And x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it.

zamadatix · 2025-07-09T11:37:00 1752061020

> And x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it.

If that were the case the above system wouldn't have needed 8 sockets. With NUMA systems the app needs to be scheduling group aware anyways. The difference here really appears when you have a single socket with more than 64 hardware threads, which took until ~2019 for x86.

sidewndr46 · 2025-07-09T13:13:12 1752066792

Why would an application need to be NUMA aware on Linux? Most software I've ever written or looked at has no concept of NUMA. It works just fine.

zamadatix · 2025-07-09T16:19:21 1752077961

The same reasons it would on macOS or Windows, most people just aren't writing software which needs to worry about having a single process running many hundreds of threads across 8 sockets efficiently so it's fine to not be NUMA aware. It's not that it won't run at all, a multi-socket system is still a superset of a single socket system, just it will run much more poorly than it could in such scenarios.

The only difference with Windows is a single processor group cannot contain more than 64 cores. This is why 7-Zip needed to add processor group support - even though a 96 core Threadripper represents as a single NUMA node the software has to request assignment to 2x48 processor groups, the same as if it were 2 NUMA nodes with 48 cores each, because of the KAFFINITY limitation.

Examples of common NUMA aware Linux applications are SAP Hana and Oracle RDBMS. On multi-socket systems it can often be helpful to run postgres and such via https://linux.die.net/man/8/numactl too, even if you're not quite the scale you need full NUMA awareness in the DB. You generally also want hypervisors to pass the correct NUMA topologies to guests as well. E.g. if you have a KVM guest with 80 cores assigned on a 2x64 Epyc host setup then you want to set the guest topology to something like 2x40 cores or it'll run like crap because the guest is sees it can schedule one way but reality is another.

monocasa · 2025-07-09T11:46:33 1752061593

There were single image systems with hundreds of cores in the late 90s and thousands of cores in the early 2000s.

I absolutely stand by the fact that Intel and AMD didn't pursue high core count systems until that point because they were so focused on single core perf, in part because Windows didn't support high core counts. The end of Denmark scing forced their hand and Microsoft's processor group hack.

elzbardico · 2025-07-09T12:08:48 1752062928

AMD and Intel were focused on single core performance, because personal desktop computing was the bigger business until around mid to late 2000s.

Single core performance is really important for client computing.

monocasa · 2025-07-09T12:14:03 1752063243

They were absolutely interested in the server market as well.

zamadatix · 2025-07-09T11:59:58 1752062398

Do you have anything to say regarding NUMA for the 90s core counts though? As I said, it's not enough that there were a lot of cores - they have to be monolithically scheduled to matter. The largest UMA design I can recall was the CS6400 in 1993, to go past that they started to introduce NUMA designs.

monocasa · 2025-07-09T12:18:27 1752063507

Windows didn't handle numa either until they created processor groups, and there's all sorts reasons why you'd want to run a process (particularly on Windows which encourages single process high thread count software archs) that spans numa nodes. It's really not that big if a deal for a lot of workloads where your working set fits just fine in cache, or you take the high hatdware thread count approach of just having enough contexts in flight that you can absorb the extra memory latency in exchange for higher throughput.

zamadatix · 2025-07-09T13:23:24 1752067404

3.1 (1993) - KAFFINITY bitmask

5.0 (1999) - NUMA scheduling

6.1 (2009) - Processor Groups to have the KAFFINITY limit be per NUMA node

Xeon E7-8800 (2011) - An x86 system exceeding 64 total cores is possible (10x8 -> requires Processor Groups)

Epyc 9004 (2022) - KAFFINITY has created an artificial limit for x86 where you need to split groups more granular than NUMA

If x86 had actually hit a KAFFINITY wall then the E7-8800 even would have occured years before processor groups because >8 core CPUs are desirable regardless if you can stick 8 in a single box.

The story is really a bit reverse from the claim: NT in the 90s supported architectures which could scale past the KAFFINITY limit. NT in the late 2000s supported scaling x86 but it wouldn't have mattered until the 2010s. Ultimately KAFFINITY wasn't an annoyance until the 2020s.

Const-me · 2025-07-09T11:28:33 1752060513

> other systems had been exceeding 64 cores since the late 90s.

Windows didn’t run on these other systems, why would Microsoft care about them?

> x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it

For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.

monocasa · 2025-07-09T11:42:46 1752061366

> Windows didn’t run on these other systems, why would Microsoft care about them?

Because it was clear that high core count, single system image platforms were a viable server architecture, and NT was vying for the entire server space, intending to kill off the vendor Unices.

. For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.

Linux wasn't the only OS. Solaris and AIX were NT's competitors too back then, and supported higher core counts.

rsynnott · 2025-07-09T13:56:21 1752069381

Windows NT was originally intended to be multi-platform.

p_ing · 2025-07-09T14:24:30 1752071070

NT was and continues to be multi-platform.

That doesn't mean every platform was or would have been profitable. x86 became 'good enough' to run your mail or web server, it doomed other architectures (and commonly OSes) as the cost of x86 was vastly lower than the Alphas, PowerPCs, and so on.