More

codegladiator · 2025-11-11T13:58:25 1762869505

happens all the time

codegladiator · 2025-11-09T17:50:29 1762710629

Is this a monthly post now ?

https://news.ycombinator.com/item?id=44766031

https://news.ycombinator.com/item?id=45260596

AndrewKemendo · 2025-11-09T17:57:15 1762711035

Ha. I even gave it a serious response!

skrebbel · 2025-11-09T17:54:55 1762710895

Wow so weird! Is this like some weird roundabout indiehacker marketing trick?

allenu · 2025-11-09T19:38:26 1762717106

Yeah, super shady. Perhaps they're collecting data points for a product or else generating content for a blog post about how lonely building solo is.

codegladiator · 2025-10-31T15:53:11 1761925991

Logging is a hack now ?

codegladiator · 2025-10-25T13:42:59 1761399779

There is always a burden of deployment of assets

codegladiator · 2025-10-20T11:08:33 1760958513

Each passion project should be built in the language most suitable for that project, not based on love.

codegladiator · 2025-10-20T08:56:25 1760950585

They haven't listed SES there yet in the affected services on their status page

codegladiator · 2025-10-10T06:54:17 1760079257

The problem is that this "comparison" is being used both ways, on one hand LLM leaders tell you "smarter than the smartest", and then it makes very pretty obvious mistakes and the leaders are like even an "average" (dumb) humans can/will make the same mistake.

ACCount37 · 2025-10-10T07:12:00 1760080320

Why not both?

LLMs have jagged capabilities, as AIs tend to do. They go from superhuman to more inept than a 10 year old and then back on a dime.

Really, for an AI system, the LLMs we have are surprisingly well rounded. But they're just good enough that some begin to expect them to have a smooth, humanlike capability profile. Which is a mistake.

Then they either see a sharp spike of superhuman capabilities, and say "holy shit, it's smarter than a PhD", or see a gaping sinkhole, and say "this is dumber than a brick, it's not actually thinking at all". Both are wrong but not entirely wrong. They make the right observations and draw the wrong conclusions.

codegladiator · 2025-10-10T07:44:35 1760082275

It cannot be both. A system with superhuman capabilities cannot make basic mistakes consistently. (like forgetting a name as it moves from generating 1st line to 3rd line).

LLMs are a great tool, but the narrative around them is not healthy and will burn a lot of real users.

exe34 · 2025-10-10T07:48:24 1760082504

> A system with superhuman capabilities cannot make basic mistakes consistently

That sounds like a definition you just made up to fit your story. A system can both make bigger leaps in a field where the smartest human is unfamiliar and make dumber mistakes than a 10 year old. I can say that confidently, because we have such systems. We call them LLMs.

It's like claiming that it can't both be sunny and rainy. Nevertheless, it happens.

codegladiator · 2025-10-10T07:55:12 1760082912

Yeah I don't know what your definition of human is, but in my definition of when comparing something to an average human, knowing a name is an innate quality. If a human is consistently forgetting names I will think something is wrong with that human that they are unable to remember names.

conception · 2025-10-10T08:12:51 1760083971

I think you should work with a bunch of highly respected PhD researchers. This is a quality many share - the classic “can solve super hard problems but can’t tie their shoes” is a trope because versions of it ring true. This is not to say what LLMs are doing is thinking per se, but what we do isn’t magic either. We just haven’t explained all the mechanisms of human thought yet. How much overlap between the two is up for debate considering how little actual thinking people do day to day; most folks almost always are just reacting to a stimuli.

exe34 · 2025-10-10T11:52:51 1760097171

Would you then dismiss this same human who forgets names as dumb and unthinking if he can handle quantum physics effortlessly?

ACCount37 · 2025-10-10T09:25:03 1760088303

If I had to fight Deep Blue and win? I'd pick a writing contest over a game of chess.

For AIs, having incredibly narrow capabilities is the norm rather than an exception. That doesn't make those narrow superhuman AIs any less superhuman. I could spend a lifetime doing nothing but learning chess and Deep Blue would still kick my shit in on the chessboard.

somenameforme · 2025-10-10T08:23:56 1760084636

I think the capability of something or somebody, in a given domain, is mostly defined by their floor, not their ceiling. This is probably true in general but with LLMs it's extremely true due to their self recursion. Once they get one thing wrong, they tend to start basing other things on that falsehood to the point that I often find that when they get something wrong, you're far better off just starting with a new context instead of trying to correct them.

With humans we don't really have to care about this because our floor and our ceiling tend to be extremely close, but obviously that's not the case for LLMs. This is made especially annoying with ChatGPT which seems to be being intentionally designed to convince you that you're the most brilliant person to have ever lived, even when what you're saying/doing is fundamentally flawed.

ACCount37 · 2025-10-10T08:59:15 1760086755

Consistency drive. All LLMs have a desire for consistency, right at the very foundation at their behavior. The best tokens to predict are the ones that are consistent with the previous tokens, always.

Makes for a very good base for predicting text. Makes them learn and apply useful patterns. Makes them sharp few-shot learners. Not always good for auto-regressive reasoning though, or multi-turn instruction following, or a number of other things we want LLMs to do.

So you have to un-teach them maladaptive consistency-driven behaviors - things like defensiveness or error amplification or loops. Bring out consistency-suppressed latent capabilities - like error checking and self-correction. Stitch it all together with more RLVR. Not a complex recipe, just hard to pull off right.

somenameforme · 2025-10-10T09:19:20 1760087960

LLMs have no desire for anything. They're algorithms and this anthropomorphicization is nonsense.

And no, the best tokens to predict are not "consistent", based on what the algorithm would perceive, with the previous tokens. The goal is for them to be able to generate novel information self-expand their 'understanding'. All you're describing is a glorified search/remix engine, which indeed is precisely what LLMs are, but not what the hype is selling them as.

In other words, the concept of the hype is that you train them on the data just before relativity and they should be able to derive relativity. But of course that is in no way whatsoever consistent with the past tokens because it's an entirely novel concept. You can't simply carry out token prediction, but actually have have some degree of logic, understanding, and so on - things which are entirely absent, probably irreconcilably so, from LLMs.

ACCount37 · 2025-10-10T09:31:47 1760088707

Not anthropomorphizing LLMs is complete and utter nonsense. They're full of complex behaviors, and most of them are copied off human behavior.

It seems to me like this is just some kind of weird coping mechanism. "The LLM is not actually intelligent" because the alternative is fucking terrifying.

somenameforme · 2025-10-10T12:16:39 1760098599

No they are not copied off of human behavior in any way shape or fashion. They are simply mathematical token predictors based on relatively primitive correlations across a large set of inputs. Their success is exclusively because it turns out, by fortunate coincidence, that our languages are absurdly redundant.

Change their training content to e.g. stock prices over time and you have a market prediction algorithm. That the next token being predicted is a word doesn't suddenly make them some sort of human-like or intelligent entity.

simonw · 2025-10-10T13:15:02 1760102102

"No they are not copied off of human behavior in any way shape or fashion."

The pre-training phase produces the next-token predictors. The post-training phase is where its shown examples of selected human behavior for it to imitate - examples of conversation patterns, expert code production, how to argue a point... there's an enormous amount of "copying human behavior" involved in producing a useful LLM.

ACCount37 · 2025-10-11T00:16:08 1760141768

Why single out SFT?

It's not like the pre-training dataset didn't contain any examples of human behaviors for an LLM to copy.

SFT is just a more selective process. And a lot of how it does what it does is less "teach this LLM new tricks" and more "teach this LLM how to reach into its bag of tricks and produce the right tricks at the right times".

simonw · 2025-10-11T00:49:57 1760143797

I think it is a more clear example of deliberately teaching a model specific ways to behave based on human examples.

somenameforme · 2025-10-11T08:47:56 1760172476

I think what he's saying (and what I would at least) is that again all you're doing is the exact same thing - tuning the weights that drive the correlations. For an analog, in a video game if you code a dragon such that its elevation changes over time while you play a wing flapping animation, you're obviously not teaching it dragon-like behaviors, but rather simply trying to create a mimicry of the appearance of flying using relatively simple mathematical tools and 'tricks.' And indeed even basic neural network game bots benefit from RLHF/SFT.

ACCount37 · 2025-10-10T12:31:05 1760099465

You are a mathematical predictor based on relatively primitive correlations across a large set of inputs.

The gap between you and an LLM is hilariously small.

somenameforme · 2025-10-10T13:04:56 1760101496

No you're not. Humans started with literally nothing, not even language. We went from an era with no language and with the greatest understanding of technology being 'poke them with the pointy side' to putting a man on the Moon, unlocking the secrets of the atom, and much more. And given how inefficiently we store and transfer knowledge, we did it in what was essentially the blink of an eye.

Give an LLM the entire breadth of human knowledge at the time and it would do nothing except remix what we knew at that point in history, forever. You could give it infinite processing power, and it's still not moving beyond 'poke them with the pointy side.'

codegladiator · 2025-09-19T17:38:49 1758303529

Across the last 18 years I've been working on many different projects with different material systems; my journey looked like wood framing → steel → reinforced concrete → precast → mass timber → structural steel → composite materials → modular construction + all possible foundation types (from shallow footings to deep pile systems with geotechnical analysis).

I have a lot of projects I want to design/prototype quickly, but every time starting a new structure I feel stuck and paralyzed with many choices, end up reading engineering journals and CE forums for 3 days on all possible material and structural system options and find myself exhausted even before breaking ground.

I know I should just pick one system and stick to it, but it's very hard. I spent most of my career working exclusively on residential concrete construction and I know that system inside out. I was efficient and could estimate quantities quickly. With the rest I feel like everything is at the same level of unknowns and I have zero engineering judgment.

Do you happen to experience the same and how do you fight it?

codegladiator · 2025-09-16T07:38:31 1758008311

Well myspace didn't have any issues, did it ?

codegladiator · 2025-09-14T10:33:06 1757845986

This is pushing the limits to identify the boundaries

dvh · 2025-09-14T10:54:33 1757847273

Also known as premature optimization. You had to literally invent new dataset just to show there is a difference. You are inventing problems, stop doing that!

dspillett · 2025-09-14T11:16:32 1757848592

> You are inventing problems

Sometimes that is how useful jumps are made. Maybe someone will come along with a problem and the data they have just happens to have similar properties.

Rather than premature optimisation this sort of thing is pre-emptive research - better to do it now than when you hit a performance problem and need the solution PDQ. Many useful things have come out of what started as “I wonder what if …?” playing.

gpvos · 2025-09-15T11:12:07 1757934727

This is research, not production code. Premature optimization is irrelevant.