Hacker Newsnew | past | comments | ask | show | jobs | submit | HarHarVeryFunny's commentslogin

The entire history of RL-trained "reasoning models" from o1 to DeepSeek_R1 is basically just a year old!

Sure, but that's basically the same as saying that we'll have human-equivalent AI one day (let's not call it AGI, since that means something different to everyone that uses it), and then everything that humans can do could then be done by AI (whether or not it will be, is another question).

So, yes, ONE DAY, AI will be doing all sorts of things (from POTUS and CEO on down), once it is capable of on-the-job learning and picking up new skills, and everything else that isn't just language model + agent + RAG. It the meantime, the core competence of an LLM is blinkers-on (context-on) executing - coding - according to tasks (part of some plan) assigned to it by a human who, just like a lead assigning tasks to human team members, is aware of what it can and can not do, and is capable of overseeing the project.


I used Qt back in the day, pre-Nokia, when it was just QtWidgets for cross-platform (Linux/Windows/Mac) desktop apps. I just wanted a decent C++ library/API to create the GUI for a Linux app (real-time spectrogram). It was a great for this, although I was never a fan of MOC - I wish they had committed to a pure/native C++ design.

For me Qt lost it's way when Trolltech was acquired by Nokia, and the focus became mobile rather than desktop, with different UI requirements resulting in QML/QtQuick being added.

Maybe the earlier addition of QtScript (or even MOC!) was a foreshadowing of what was to come, but in any case what had been a great cross-platform desktop UI toolkit, and the primary C++ one for Linux (with GTK being more C focused) ended up orphaning it's desktop roots to focus on mobile instead, having become a sprawling mish-mash of languages, GUI component technologies and scripting.


I wonder what lab rats would experience - lots of tiny rats ?

What's the point?

None of this is individually difficult, but an actual human being had the motivation and talent to bring it all together in 7 days, which is impressive.

So what if an LLM can create the same components if you tell it to. It's a bit like someone sharing a handknit sweater they just made, and you counter with "Well, here's a machine made one I bought in Walmart, made in 5 min in China".


Is it impressive in the way Max Verstappen winning the F1 World Championship in 2023 was impressive, operating at the absolute limit under pressure and getting paid beaucoup bux for it? Or is it impressive in the way your kid is impressive the first time they manage to draw stick figures and a house with crayons? Those are both real achievements, but they are impressive in completely different ways, and the value of the work produced is wildly different. I might fly to Vegas along with 300,000 other people and pay for hotel rooms and pay to watch some shows while I'm there as well as to watch him race, but (and don't take this the wrong way), but I ain't gonna do that to watch your kid draw with crayons.

The difference is the baseline. Once the default outcome is cheap, fast, and good enough, the human effort stops standing out in a way that matters. At that point, pointing at the Walmart sweater is not missing the point, it is the point.


So you don't put any value on being human, learning skills, showing creativity, doing inspiring things?

Should people stop playing chess just because a free chess engine can trounce everyone on the planet?

Humans can be awesome. Machines are just machines.


Makes no sense - why would Salesforce's customers care if the company is using AI or not, other than when it impacts them (the customer) such as worse customer service.

This just seems a poor decision made by C-suite folk who were neither AI-savvy enough to understand the limits of the tech, nor smart enough to run a meaningful trial to evaluate it. A failure of wishful thinking over rational evaluation.


If you consider the extent to which our economy has become financialized, then you see these decisions have little to do with providing a product for customers but rather a stock for investors.

The product is the press release.

I figured the messaging is target more at investors than customers

I need to talk to Jim, where is Jim?

Plus:

Waymos are safer - they have Lidar and Radar in addition to vision

Waymos have human fallback to remote operators

Google takes responsibility for Waymos, vs Teslas being privately owned

Google took a more regulator savvy incremental approach


Teslas currently have a driver in the front who could take over in these situations.

Waymo said they normally handle traffic light outages as 4-way stops, but sometimes call home for help - perhaps if they detect someone in the intersection directing traffic ?

Makes you wonder in general how these cars are designed to handle police directing traffic.


It kind of makes sense. Why program or train on such a rare occurrence. Just send it off to a human to interpret and be done with it. If that's the case then Tesla is closer to Waymo then previously thought. Maybe even ahead.

I don't think traffic light outages (e.g. flashing yellow) or police directing traffic at intersections is that rare, but regardless these cars do need to handle it in a safe and legal manner, which either means recognizing police gestures in a reliable way, or phoning home.

We know that Waymos phone home when needed, but not sure how Tesla handles these situations. I'm not sure how you conclude anything about Tesla based on their current temporary "safety monitor" humans in the cars - this is just a temporary measure until they get approval to go autonomous.


I seem to remember as a kid that cops would be directing traffic often if a signal was out or malfunctioning. I haven't seen that in years. The only time I see anyone directing traffic is around accidents, construction zones, or special events.

I can conclude based on using FSD every single day. I've hit issues just like this, as well as police directing. And it's completely fine.

Googling for this, apparently Tesla do try to recognize police gestures, and are getting better at it.

I wonder who gets the ticket when a driverless car does break the law and get stopped by police? If it's a Taxi service (maybe without a passenger in the car) then maybe it'd the service, but that's a bit different than issuing a traffic ticket to a driver (where there's points as well as a fine).

What if it's a privately owner car - would the ticket go to the car owner, or to the company that built the car ?!


A base LLM that has only been pre-trained (no RL = reinforcement learning), is not "planning" very far ahead. It has only been trained to minimize prediction errors on the next word it is generating. You might consider this a bit like a person who speaks before thinking/planning, or a freestyle rapper spitting out words so fast they only have time to maintain continuity with what they've just said, not plan ahead.

The purpose of RL (applied to LLMs as a second "post-training" stage after pre-training) is to train the LLM to act as if it had planned ahead before "speaking", so that rather than just focusing on the next word it will instead try to choose a sequence of words that will steer the output towards a particular type of response that had been rewarded during RL training.

There are two types of RL generally applied to LLMs.

1) RLHF - RL from Human Feedback, where the goal is to generate responses that during A/B testing humans had indicated a preference for (for whatever reason).

2) RLVR - RL with Verifiable Rewards, used to promote the appearance of reasoning in domains like math and programming where the LLM's output can be verified in someway (e.g. math result or program output checked).

Without RLHF (as was the case pre-ChatGPT) the output of an LLM can be quite unhinged. Without RLVR, aka RL for reasoning, the abilty of the model to reason (or give the appearance of reasoning) is a function of pre-training, and won't have the focus (like putting blinkers on a horse) to narrow generative output to achieve the desired goal.


What are some of the use cases for Claude Code + LSP ? What does LSP support let you do, or do better, that Claude Code couldn't do by itself ?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: