The primary goal of big companies is (/has become) maintaining market dominance, but this doesn't always translate to a well run business with great profits, it depends on internal and external factors. Maybe profits should have actually gone down due to tarrifs and uncertainty but the big companies have kept profit stable.
> It does appear that this is part of a site-wide refactor where some material got lost because the same truncation is visible in the annotation page
You mean that if you already assume it wasn't malicious, there must be a problem that affected multiple pages because the source material in the build somewhere got lost.
Why not? Congress is meant to be a check on the other branches of government (though the filibuster rule / partisan omnibus bill system has degraded its collective will, which is why they won't do anything), so pre-emptively removing the powers of congress from the list sounds like a power grab. It's not like it's something completely irrelevant. Same for the limitations of government listed there.
Besides, the thing about organizations is that they encourage behaviors. A big corporation encourages different behaviors than a startup, and different company cultures can also encourage / optimize for different things. There's also a reason why we do expect ultimate responsibility to rest with leaders.
Power is power when it is exercised, and power can be exercised when people think you have the power. Overly formalizing things and ignoring how they play out is why congress is today a dysfunctional body, for example. See also here for an example in practice https://prototypingpolitics.substack.com/p/dead-letter-live-...
Yes, self-reporting has biases and estimating tasks is still a fool's errand, which is why I noted that the estimates from these surveys matched the findings from other RTC studies.
However, what doesn't get discussed enough about the METR study is that there was a spike in overall idle time as they waited for the AI to finish. I haven't run the numbers so I don't know how much of the increased completion time it accounts for, but if your cognitive load drops almost to 0, it will of course feel like your work is sped up, even though calendar time has increased. I wonder if that is the more important finding of that paper.
It's difficult to fix because the incentive is to make sure it has the answer, not to give it lots of questions to which there are known answers but have it answer "I don't know" (if you did that, you'd bias the model to be unable to answer those specific questions). Ergo, in inference, on questions not in the dataset, it's more inclined to make up an answer because it has very few "I don't know" samples in general.
> By default, Microsoft Edge provides spelling and grammar checking using Microsoft Editor. When using Microsoft Editor, Microsoft Edge sends your typed text and a service token to a Microsoft cloud service over a secure HTTPS connection. The service token doesn't contain any user-identifiable information. A Microsoft cloud service then processes the text to detect spelling and grammar errors in your text. All your typed text that's sent to Microsoft is deleted immediately after processing occurs. No data is stored for any period of time.
That's because it's obvious due to effects other than the one you're trying to observe. Which is of course the case when you're dealing with psychedelics (and of course many other drugs).