> I was thinking a (very small) auxiliary chip, dedicated to the task.
Extra cost (both in design and manufacture), new potential point of failure (both for manufacture and in the field), when the CPU failing was already a single point of failure for the machine. It's not that a full-blown CPU is required, it's that you already have a full-blown CPU that can do the job, simplifying the design. Well, you might see dedicated fan control hardware for server motherboards and other more industrial focused applications, but they often need to coordinate pumps/fans for an entire building - reliability in this context is more about redundancy, and alerting maintainence to the need for repairs, or perhaps switching to a new primary datacenter if the failure is big enough - not in attempting the impossible of ensuring 100% reliability for any individual component.
> CPU death spirals
Have you actually seen one of these that noticably bogs down the fan-driving firmware/drivers and causing issues? I haven't. I've had fan failures. I've had plenty of hardware controlled fans go full apeshit 100% power to the point of being not merely a nuisance, but a problem (audio, vibration, wear+tear, ...). I've heard of building cooling failures. But I don't think I've seen so much as a blog post about the CPU getting so starved that it can't spin up the CPU fans.
And I've had fans not working hard enough - but I'd rather flip a setting in software than open up a case and go hunting for the right jumper, typically. Less disruptive - and the machine is typically usable enough I can still download/install missing software, and google appropriate documentation, which is frequently a lot more difficult to do with the case open.
---
I guess my main point here is that reliable hardware must already assume the potential for cooling failures, and that extra hardware or engineering for a minor improvement to a "purely theoretical" failure mode doesn't sound like it'd pay for itself.
100% have hardware temperature throttles and cutoffs though. Those cut in for a lot of very real failure modes, that I've only heard of actually happening, but personally experienced. Those will pay for themselves.
> Extra cost (both in design and manufacture), new potential point of failure (both for manufacture and in the field)
"But the BOM" is a tired trope in any discussion about "why can't it do X?". My money would be where my mouth is. But HW manufacturers routinely make design decisions that are simple negatives.
Motherboard manufacturers, in particular. Mobo marketing is focused on colors and flashy effects instead of on quality of production and actual function, beyond a bare minimum of tech specification. AFAICT the products are indistinguishable as no attempt is made to stand out.
> when the CPU failing was already a single point of failure for the machine
… and so is the RAM, the disk, the GPU, the northbridge, the southbridge, the eastbridge…
That's not a valid rationale for "let's a adopt a fundamentally flawed design".
> Have you actually seen one of these that noticably bogs down the fan-driving firmware/drivers and causing issues?
"Fan-driving firmware" is essentially the suggestion. It's what I haven't got.
> I've had fan failures. I've had plenty of hardware controlled fans go full apeshit 100% power to the point of being not merely a nuisance, but a problem (audio, vibration, wear+tear, ...).
No; in my current rigs 10 year lifespan, the fans have been run continuously at 100%, and no fan failures.
The noise, of course, is a nuisance. That's why I'm looking for something that can control the fans in response to temperature, such that the fans can be driven at a temperature-appropriate RPM.
But without doing that in userspace, where the controller might just "die" or effectively die, for any number of reasons, outlined earlier.
> But I don't think I've seen so much as a blog post about the CPU getting so starved that it can't spin up the CPU fans.
I couldn't find any posts reasoning about anything. Either it's not a problem, or just nobody is thinking.
> and that extra hardware or engineering for a minor improvement to a "purely theoretical" failure mode doesn't sound like it'd pay for itself.
Yeah, I was just trying to learn before attempting, literally, "IDK, try it and see if you trip the critical temp cutoff."
Yeah, no doubt the various hardware safeties will save you (although it might put some undue stress on stuff to be at >100℃…) but it might also leave you with a rather hard-to-debug situation if you start experiencing stalls or shutdowns.
… and it's only in desktops that this problem seems to exist. On every laptop I've owned, fan control is in response to temp. (And not handled in userspace, although I presume I could download any of the various fan control programs and have it be that way, but the point is that the HW is, out of the box, doing the sane thing.)
Extra cost (both in design and manufacture), new potential point of failure (both for manufacture and in the field), when the CPU failing was already a single point of failure for the machine. It's not that a full-blown CPU is required, it's that you already have a full-blown CPU that can do the job, simplifying the design. Well, you might see dedicated fan control hardware for server motherboards and other more industrial focused applications, but they often need to coordinate pumps/fans for an entire building - reliability in this context is more about redundancy, and alerting maintainence to the need for repairs, or perhaps switching to a new primary datacenter if the failure is big enough - not in attempting the impossible of ensuring 100% reliability for any individual component.
> CPU death spirals
Have you actually seen one of these that noticably bogs down the fan-driving firmware/drivers and causing issues? I haven't. I've had fan failures. I've had plenty of hardware controlled fans go full apeshit 100% power to the point of being not merely a nuisance, but a problem (audio, vibration, wear+tear, ...). I've heard of building cooling failures. But I don't think I've seen so much as a blog post about the CPU getting so starved that it can't spin up the CPU fans.
And I've had fans not working hard enough - but I'd rather flip a setting in software than open up a case and go hunting for the right jumper, typically. Less disruptive - and the machine is typically usable enough I can still download/install missing software, and google appropriate documentation, which is frequently a lot more difficult to do with the case open.
---
I guess my main point here is that reliable hardware must already assume the potential for cooling failures, and that extra hardware or engineering for a minor improvement to a "purely theoretical" failure mode doesn't sound like it'd pay for itself.
100% have hardware temperature throttles and cutoffs though. Those cut in for a lot of very real failure modes, that I've only heard of actually happening, but personally experienced. Those will pay for themselves.