More

spullara · 2025-11-08T00:43:22 1762562602

I built my career on being in these meetings and have split my time between strategy and coding. You have to do the work to understand the concerns of the company and weigh those with your technical opinion.

spullara · 2025-11-08T00:31:19 1762561879

Also the copyOf isn't really the same as being able to || things since it just happens both copyOf default is 0 and in this case it is also 0 (i.e. what if it was -1 to indicate there was no version).

spullara · 2025-11-07T21:40:45 1762551645

it is amusing that they could have had a much better user interface for it back then even with just text.

spullara · 2025-11-07T15:59:47 1762531187

On flights with shitty wifi I have been running gpt-oss:120b on my macbook using ollama. Ok model for coding if you can't reach a good one.

embedding-shape · 2025-11-07T16:02:30 1762531350

GPT-OSS-120b/20b is probably the best you can run on your own hardware today. Be careful with the quantized versions though, as they're really horrible compared to the native MXFP4. I haven't looked in this particular case, but Ollama tends to hide their quantizations for some reason, so most people who could be running 20B with MXFP4, are still on Q8 and getting much worse results than they could.

jmorgan · 2025-11-08T04:13:47 1762575227

The gpt-oss weights on Ollama are native mxfp4 (the same weights provided by OpenAI). No additional quantization is applied, so let me know if you're seeing any strange results with Ollama.

Most gpt-oss GGUF files online have parts of their weights quantized to q8_0, and we've seen folks get some strange results from these models. If you're importing these to Ollama to run, the output quality may decrease.

throwaway314155 · 2025-11-07T16:46:36 1762533996

What’s the distinction between MXP4 and Q8 exactly?

embedding-shape · 2025-11-07T16:50:16 1762534216

It's a different way of doing quantization (https://huggingface.co/docs/transformers/en/quantization/mxf...) but I think the most important thing is that OpenAI delivered their own quantization (the MXFP4 from OpenAI/GPT-OSS on HuggingFace, guaranteed correct) whereas all the Q8 and other quantizations you see floating around are community efforts, with somewhat uneven results depending on who done it.

Concretely from my testing, both 20B and 120B has a lot higher refusal rate with Q8 compared to MXFP4, and lower quality responses overall. But don't take my word for it, the 20B weights are tiny and relatively effortless to try both versions and compare yourself.

throwaway314155 · 2025-11-07T17:25:55 1762536355

Wow, thanks for the info. I'm planning on testing this on my M4 Max w/ 36 GB today.

edit:

So looking here https://ollama.com/library/gpt-oss/tags it seems ollama doesn't even provide the MXFP4 variants, much less hide them.

Is the best way to run these variants via llama.cpp or...?

Patrick_Devine · 2025-11-08T17:50:31 1762624231

The default ones on Ollama are MXFP4 for the feed forward network and use BF16 for the attention weights. The default weights for llama.cpp quantize those tensors as q8_0 which is why llama.cpp can eek out a little bit more performance at the cost of worse output. If you are using this for coding, you definitely want better output.

You can use the command `ollama show -v gpt-oss:120b` to see the datatype of each tensor.

spullara · 2025-11-07T21:36:58 1762551418

on the model description page they claim they support it:

Quantization - MXFP4 format

OpenAI utilizes quantization to reduce the memory footprint of the gpt-oss models. The models are post-trained with quantization of the mixture-of-experts (MoE) weights to MXFP4 format, where the weights are quantized to 4.25 bits per parameter. The MoE weights are responsible for 90+% of the total parameter count, and quantizing these to MXFP4 enables the smaller model to run on systems with as little as 16GB memory, and the larger model to fit on a single 80GB GPU.

Ollama is supporting the MXFP4 format natively without additional quantizations or conversions. New kernels are developed for Ollama’s new engine to support the MXFP4 format.

Ollama collaborated with OpenAI to benchmark against their reference implementations to ensure Ollama’s implementations have the same quality.

throwaway314155 · 2025-11-07T23:20:52 1762557652

Can you link to that page? I’m not finding these variants.

spullara · 2025-11-08T00:40:01 1762562401

as far as I can tell that is the only variant.

https://ollama.com/library/gpt-oss

ode · 2025-11-07T17:33:50 1762536830

LMStudio

throwaway314155 · 2025-11-07T20:49:31 1762548571

Can you be more specific? I've got LM Studio downloaded but it's not clear where are the official OpenAI releases? Are they all only available via transformers? The only one that shows up in search appears to be the distilled gpt-oss 20B...

spullara · 2025-11-07T21:35:57 1762551357

they support that format according to the model page on their site:

https://ollama.com/library/gpt-oss

eli · 2025-11-07T16:36:36 1762533396

Should be a bit faster if you run an MLX version of the model with LM Studio instead. Ollama doesn't support MLX.

Qwen3-Coder is in the same ballpark and maybe a bit better at coding

ZeroCool2u · 2025-11-07T16:48:43 1762534123

LM Studio will run dynamic quants from Unsloth too. Much nicer than Ollama.

mrkiouak · 2025-11-07T18:57:13 1762541833

The key thing I'm confident in is that 2-3 years from now there's going to be a model(s) and workflow that has comparable accuracy, perhaps noticeable (but tolerable) higher latency that can be run locally. There's just no reason to believe this isn't achievable.

Hard to understand how this won't make all of the solutions for existing use cases commodity. I'm sure 2-3 years from now there'll be stuff that seems like magic to us now -- but it will be more-meta, more "here's a hypothesis of a strategically valuable outcome and heres a solution (with market research and user testing done".

I think current performance and leading models will turn out to have been terrible indicators for future market leader (and my money will remain on the incumbents with the largest cash reserves (namely Google) that have invested in fundamental research and scaling).

sebastiennight · 2025-11-07T16:02:58 1762531378

Could you share which Macbook model? And what context size you're getting.

onion2k · 2025-11-07T16:29:19 1762532959

I just checked gpt-oss:20b on my M4 Pro 24GB, and got 400.67 tokens/s on input and 46.53 tokens/s on output. That's for a tiny context of 72 tokens.

sebastiennight · 2025-11-08T23:31:55 1762644715

This message was amazing and I want about to hit [New Tab] and purchase one myself until the penultimate word.

turblety · 2025-11-07T16:13:43 1762532023

Are you running the full 65GB model on a MacBook Pro? What tokens per second do you get? What specs? M5?

iAMkenough · 2025-11-07T16:19:43 1762532383

If they're running 120B on a M5 (32GB max of memory today), I'd like to know how.

thaw13579 · 2025-11-07T16:26:25 1762532785

Probably an M4 which has up to 128GB currently

spullara · 2025-11-07T21:38:10 1762551490

I am running the full model on an 128GB M3 Max.

jonaustin · 2025-11-07T18:31:59 1762540319

On an m4 pro 128gb: 75 t/s.

Caveat: That's just for the first prompt.

moralestapia · 2025-11-07T16:25:12 1762532712

That must be a beefed up MacBook (or you must be quite patient).

gpt-oss:20b on my M1 MBP is usable but quite slow.

spullara · 2025-11-06T23:36:59 1762472219

I would love to be able to filter the resulting list by removing certainly all books that in the same series but I think removing all books by authors that I have already listed would be great to get new things that I haven't already read. The resulting recommendations maybe included 1 new book for me.

costco · 2025-11-09T03:21:20 1762658480

I did not add what you requested exactly because I think in many cases authors have written less popular books that people may not be aware of but if you try again you should see less highly repetitive things like 5 of the same series in a row in the results.

spullara · 2025-11-05T20:33:10 1762374790

This title is inaccurate. What they are disallowing are users using ChatGPT to offer legal and medical advice to other people. First parties can still use ChatGPT for medical and legal advice for themselves.

Johnny555 · 2025-11-05T22:06:53 1762380413

While they aren't stopping users from getting medical advice, the new terms (which they say are pretty much the same as the old terms), seem to prohibit users from seeking medical advice even for themselves if that advice would otherwise come from a licensed health professional:

https://openai.com/en-GB/policies/usage-policies/

  Your use of OpenAI services must follow these Usage Policies:

    Protect people. Everyone has a right to safety and security. So you cannot use our services for:

      provision of tailored advice that requires a license, such as legal or medical advice, without appropriate involvement by a licensed professional

thw_9a83c · 2025-11-05T23:29:32 1762385372

It sounds like you should never trust any medical advice you receive from ChatGPT and should seek proper medical help instead. That makes sense. The OpenAI company doesn't want to be held responsible for any medical advice that goes wrong.

Obviously, there is one piece of advice: Even if LLMs were the best health professionals, they would only have the information that users voluntarily provide through text/speech input. This is not how real health services work. Medical science now relies on blood/(whatever) tests that LLMs do not (yet) have access to. Therefore, the output from LLM advice can be incorrect due to a lack of information from tests. For this reason, it makes sense to never trust LLM with a specific health advice.

Johnny555 · 2025-11-06T00:18:15 1762388295

>It sounds like you should never trust any medical advice you receive from ChatGPT and should seek proper medical help instead. That makes sense. The OpenAI company doesn't want to be held responsible for any medical advice that goes wrong.

While what you're saying is good advice, that's not what they are saying. They want people to be able to ask ChatGPT for medical advice, give answers that sound authoritative and well grounded medical science, but then disavow any liability if someone follows their advice because "Hey, we told you not to act on our medical advice!"

If ChatGPT is so smart, why can't it stop itself from giving out advice that should not be trusted?

navigate8310 · 2025-11-06T01:07:15 1762391235

At times the advice is genuinely helpful. However, it's practically impossible to measure under what exact situations the advice would be accurate.

the_af · 2025-11-06T01:53:59 1762394039

I think ChatGPT is capable of giving reasonable medical advice, but given that we know it will hallucinate the most outlandish things, and its propensity to agree with whatever the user is saying, I think it's simply too dangerous to follow its advice.

sarchertech · 2025-11-05T23:47:02 1762386422

And it’s not just lab tests and bloodwork. Physicians use all their senses. They poke, they prod, they manipulate, they look, listen, and smell.

They’re also good at extracting information in a way that (at least currently) sycophantic LLMs don’t replicate.

ekianjo · 2025-11-06T00:02:45 1762387365

> They poke, they prod, they manipulate, they look, listen, and smell.

Rarely. Most visits are done in 5 minutes. The physician that takes their time to check everything like you claim almost does not exist anymore.

whatsupdog · 2025-11-06T00:14:18 1762388058

Here in Canada ever since COVID most "visits" are a telephone call now. So the doctor just listens your words (same as a text input to an LLM) and orders tests (which can be uploaded to an LLM) if they need.

zamadatix · 2025-11-06T00:22:27 1762388547

For a good 90% of typical visits to doctors this is probably fine.

The difference is a telehealth is much better at recognizing "I can't given an accurate answer for this over the phone, you'll need to have some tests done" or cast doubt on the patient's accuracy of claims.

Before someone points out telehealth doctors aren't perfect at this: correct, but that should make you more scared of how bad sycophantic LLMs are at the same - not willing to call it even.

caturopath · 2025-11-06T01:05:12 1762391112

> telehealth is much better at recognizing "I can't given an accurate answer for this over the phone, you'll need to have some tests done"

I'm not sure this is true.

zamadatix · 2025-11-06T19:26:18 1762457178

Again, it's not that all telehealth doctors are great at this, it's that LLMs are awful at caving in to saying something with warnings the reader will opt to ignore instead of being adamant things are just too uncertain to say anything of value when continually prompted.

This is largely because an LLM guessing an answer is rewarded more often than just not answering, which is not true in the healthcare profession.

caturopath · 2025-11-07T02:59:35 1762484375

I follow the logic, I'm just not sure the claim is right.

sarchertech · 2025-11-07T13:26:09 1762521969

LLMs almost never reply with I don’t know. There’s been mountains of research as to why this is, but it’s very well documented behavior.

Even in the rare case where an LLM does reply with I don’t know go see your doctor, all you have to do is ask it again until you get a response you want.

sarchertech · 2025-11-06T01:38:01 1762393081

That depends entirely on what the problem is. You might not get a long examination on your first visit for common complaint with no red flags.

But even then just because you don’t think they are using most of their senses, doesn’t mean they aren’t.

lukan · 2025-11-06T01:54:01 1762394041

It depends entirely on the local health care system and your health insurance. In germany for example it comes in 2 tiers. Premium or standard. Standard comes with no time for the patient. (Or not even being able to get a appointment)

sarchertech · 2025-11-06T04:37:30 1762403850

I don’t know anything about German healthcare.

In the US people on Medicaid frequently use emergency rooms as primary care because they are open 24/7 and they don’t have any copays like people with private insurance do. These patients then get far more tests than they’d get at a PCP.

caturopath · 2025-11-06T01:04:14 1762391054

> Physicians use all their senses. They poke, they prod, they manipulate, they look, listen, and smell.

Sometimes. Sometimes they practice by text or phone.

> They’re also good at extracting information in a way that (at least currently) sycophantic LLMs don’t replicate.

If I had to guess, I think I'd guess that mainstream LLM chatbots are better at getting honest and applicable medical histories than most doctors. People are less likely to lie/hide/prevaricate and get more time with the person.

sarchertech · 2025-11-06T01:32:04 1762392724

> Sometimes. Sometimes they practice by text or phone.

For very simple issues. For anything even remotely complicated, they’re going to have you come in.

> If I had to guess, I think I'd guess that mainstream LLM chatbots are better at getting honest and applicable medical histories than most doctors. People are less likely to lie/hide/prevaricate and get more time with the person.

It’s not just about being intentionally deceptive. It’s very easy to get chat bots to tell you what you want to hear.

dimitri-vs · 2025-11-06T00:12:41 1762387961

Agreed, but I'm sure you can see why people prefer the infinite patience and availability of ChatGPT vs having to wait weeks to see your doctor, see them for 15 minutes only to be referred to another specialist that's available weeks away and has an arduous hour long intake process all so you can get 15 minutes of their time.

sarchertech · 2025-11-06T04:46:08 1762404368

ChatGPT is effectively an unlimited resource. Whether doctor’s appointments take weeks or hours to secure, ChatGPT is always going to be more convenient.

That says nothing about whether it is an appropriate substitute. People prefer doctors who prescribe antibiotics for viral infections, so I have no doubt that many people would love to use a service that they can manipulate to give them whatever diagnosis they desire.

fragmede · 2025-11-05T23:52:16 1762386736

So ask it what blood tests you should get, pay for them out of pocket, and upload the PDF of your labwork?

Like it or not there are people out there that really want to use webMD 2.0. they're not going to let something silly like blood work get in their way.

whatsupdog · 2025-11-06T00:17:12 1762388232

Exactly. One of my children lives in a country where you can just walk in to a lab and get any test. Recently they were diagnosed by a professional of a disease which chatgpt had already diagnosed before they visited the doctor. So, we were kind of prepared to ask more questions when the visit happened. So I would say chatgpt did really help us.

thw_9a83c · 2025-11-06T11:22:13 1762428133

That makes sense. ChatGPT helped by providing orientation advice and guidance regarding your children's medical condition. After that, however, you visited a doctor who is taking responsibility for the next steps. This is the ideal scenario.

AI can give you whatever information, be it good or wrong. But it takes zero responsibility.

thorum · 2025-11-05T22:38:10 1762382290

IANAL but I read that as forbidding you to provision legal/medical advice (to others) rather than forbidding you to ask the AI to provision legal/medical advice (to you).

Johnny555 · 2025-11-05T22:43:07 1762382587

IANAL either, but I read it as using the service to provision medical advice since they only mentioned the service and not anyone else.

I asked the "expert" itself (ChatGPT), and apparently you can ask for medical advice, you just can't use the medical advice without consulting a medical professional:

Here are relevant excerpts from OpenAI’s terms and policies regarding medical advice and similar high-stakes usage:

From the Usage Policies (effective October 29 2025):

“You may not use our services for … provision of tailored advice that requires a license, such as legal or medical advice, without appropriate involvement by a licensed professional.”

Also: “You must not use any Output relating to a person for any purpose that could have a legal or material impact on that person, such as making … medical … decisions about them.”

From the Service Terms:

“Our Services are not intended for use in the diagnosis or treatment of any health condition. You are responsible for complying with applicable laws for any use of our Services in a medical or healthcare context.”

In plain terms, yes—the Terms of Use permit you to ask questions about medical topics, but they clearly state that the service cannot be used for personalized, licensed medical advice or treatment decisions without a qualified professional involved.

silisili · 2025-11-05T23:16:34 1762384594

> you can ask for medical advice, you just can't use the medical advice without consulting a medical professional

Ah drats. First they ban us from cutting the tags off our mattress, and now this. When will it all end...

caturopath · 2025-11-06T01:00:47 1762390847

Would be interested to hear a legal expert weigh in on what 'advice' is. I'm not clear that discussing medical and legal issues with you is necessarily providing advice.

One of the things I respected OpenAI for at the release of ChatGPT was not trying to prevent these topics. My employer at the time had a cutting-edge internal LLM chatbot for a which was post-trained to avoid them, something I think they were forced to be braver about in their public release because of the competitive landscape.

maroonblazer · 2025-11-06T01:56:14 1762394174

The important terms here are "provision" and "without appropriate involvement by a licensed professional".

Both of these, separately and taken together, indicate that the terms apply to how the output of ChatGPT is used, not a change to its output altogether.

GuB-42 · 2025-11-05T23:19:07 1762384747

Is there anything special regarding ChatGPT here?

I am not a doctor, I can't give medical advice no matter what my sources are, except maybe if I am just relaying the information an actual doctor has given me, but that would fall under the "appropriate involvement" part.

fragmede · 2025-11-05T23:22:02 1762384922

> such as legal or medical advice, without appropriate involvement by a licensed professional

Am I allowed to get haircutting advice (in places where there's a license for that)? How about driving directions? Taxi drivers require licensing. Pet grooming?

bitwize · 2025-11-06T00:53:59 1762390439

CYA move. If some bright spark decides to consult Dr. ChatGPT without input from a human M.D., and fucks their shit up as a result, OpenAI can say "not our responsibility, as that's actually against our ToS."

ghostly_s · 2025-11-05T23:14:44 1762384484

I don't think giving someone "medical advice" in the US requires a license per se; legal entities use "this is not medical advice" type disclaimers just to avoid liability.

sarchertech · 2025-11-05T23:55:29 1762386929

What’s illegal is practicing medicine. Giving medical advice can be “practicing medicine” depending on how specific it is and whether a reasonable person receiving the advice thinks you have medical training.

Disclaimers like “I am not a doctor and this is not medical advice” aren’t just for avoiding civil liability, they’re to make it clear that you aren’t representing yourself as a doctor.

lambda · 2025-11-06T01:33:01 1762392781

Please, when commenting on the title of a story on HN: include the title that you are commenting on.

The admins regularly change the title based on complaints, which can be really confusing when the top, heavily commented thread is based on the original title.

According to the Wayback machine, the title was "OpenAI ends legal and medical advice on ChatGPT", while now when I write this the title is "ChatGPT terms disallow its use in providing legal and medical advice to others."

spullara · 2025-11-06T01:46:44 1762393604

If you click through to the article, you can see the original title. Since it matched, I didn't expect them to change it.

bongodongobob · 2025-11-06T01:56:26 1762394186

Tf are you yapping about

BUFU · 2025-11-05T20:45:45 1762375545

Thanks for the clarification. I think if they disallow first parties to get medical and legal advice, it will do more harm than good.

aerhardt · 2025-11-05T20:53:56 1762376036

I'm confused. The article opens with:

> OpenAI is changing its policies so that its AI chatbot, ChatGPT, won’t dole out tailored medical or legal advice to users.

This already seems to contradict what you're saying.

But then:

> The AI research company updated its usage policies on Oct. 29 to clarify that users of ChatGPT can’t use the service for “tailored advice that requires a license, such as legal or medical advice, without appropriate involvement by a licensed professional.”

> The change is clearer from the company’s last update to its usage policies on Jan. 29, 2025. It required users not “perform or facilitate” activities that could significantly impact the “safety, wellbeing, or rights of others,” which included “providing tailored legal, medical/health, or financial advice.”

This seems to suggest that with the Jan 25 policy using it to offer legal and medical advice to other people was already disallowed, but with the Oct 25 update the LLM will stop shelling out legal and medical advice completely.

layer8 · 2025-11-05T21:03:08 1762376588

https://xcancel.com/thekaransinghal/status/19854160578054965...

This is from Karan Singhal, Health AI team lead at OpenAI.

Quote: “Despite speculation, this is not a new change to our terms. Model behavior remains unchanged. ChatGPT has never been a substitute for professional advice, but it will continue to be a great resource to help people understand legal and health information.”

siva7 · 2025-11-05T21:18:50 1762377530

I doubt his claims as i use chatgpt everyday heavily for medical advice (my profession) and it's responding differently now than before.

layer8 · 2025-11-05T21:57:14 1762379834

Maybe the usage policies are part of the system prompt, and ChatGPT is misreading the new wording as well. ;)

tiahura · 2025-11-06T01:18:12 1762391892

Lawyer here. Not noticing a change.

A4ET8a8uTh0_v2 · 2025-11-05T21:08:40 1762376920

The article itself notes:

'An earlier version of this story suggested OpenAI had ended medical and legal advice. However the company said "the model behaviour has also not changed.'

gcr · 2025-11-05T20:58:24 1762376304

I think this is wrong. Others in this thread are noticing a change in ChatGPT's behavior for first-party medical advice.

simonw · 2025-11-05T22:08:17 1762380497

But OpenAI's head of Health AI says that ChatGPT's behavior has not changed: https://xcancel.com/thekaransinghal/status/19854160578054965... and https://x.com/thekaransinghal/status/1985416057805496524

I trust what he says over general vibes.

(If you think he's lying, what's your theory on WHY he would lie about a change like this?)

degamad · 2025-11-05T23:22:04 1762384924

Also possible: he's unaware of a change implemented elsewhere that (intentionally or unintentionally) has resulted in a change of behaviour in this circumstance.

(e.g. are the terms of service, or exerpts of it, available in the system prompt or search results for health questions? So a response under the new ToS would produce different outputs without any intentional change in "behaviour" of the model.)

nh43215rgb · 2025-11-05T22:46:14 1762382774

My theory is that he believes 1) people will trust him over what general public say, and 2) this kind of claim is hard to verify to prove him wrong.

simonw · 2025-11-05T23:57:28 1762387048

That doesn't answer why he would lie about this, just why the thinks he would get away with it. What's his motive?

ants_everywhere · 2025-11-05T20:48:21 1762375701

I don't think I understand the change re: licensed professionals.

Is it also disallowing the use of licensed professionals to use ChatGPT in informal undisclosed ways, as in this article? https://www.technologyreview.com/2025/09/02/1122871/therapis...

e.g. is it only allowed for medical use through an official medical portal or offering?

Spooky23 · 2025-11-05T23:48:23 1762386503

It’s a big issue. I went to an urgent care, and the provider basically went off somewhere and memorized the ChatGPT assessment for my symptoms. Like word for word.

All you need are a few patients recording their visits and connecting the dots and OpenAI gets sued into oblivion.

johaugum · 2025-11-05T23:54:00 1762386840

Isn’t that exactly what the title says?

throwaway314155 · 2025-11-06T01:15:14 1762391714

Indeed. Also confused.

siva7 · 2025-11-05T21:14:17 1762377257

There are millions of medical doctors and lawyers using chatgpt for work everyday - good news that from now on only those licensed professionals are allowed to use chatgpt for law and medicine. It's already the case that only licensed developers are allowed to vibe code and use chatgpt to develop software. Everything else would be totally irresponsible.

ctoth · 2025-11-05T20:40:23 1762375223

I keep seeing this problem more and more with humans. What should we call it? Maybe Hallucinations? Where there is an accurate true thing and then it just gets altered by these guys who call themselves journalists and reporters and the like until it is just ... completely unrecognizable?

I'm pretty sure it's a fundamental issue with the architecture.

sethhochberg · 2025-11-05T20:52:23 1762375943

I know this is written to be tounge-in-cheek, but its really almost the exact same problem playing out on both sides.

LLMs hallucinate because training on source material is a lossy process and bigger, heavier LLM-integrated systems that can research and cite primary sources are slow and expensive so few people use those techniques by default. Lowest time to a good enough response is the primary metric.

Journalists oversimplify and fail to ask followup questions because while they can research and cite primary sources, its slow and expensive in an infinitesimally short news cycle so nobody does that by default. Whoever publishes something that someone will click on first gets the ad impressions so thats the primary metric.

In either case, we've got pretty decent tools and techniques for better accuracy and education - whether via humans or LLMs and co - but most people, most of the time don't value them.

mbesto · 2025-11-06T00:50:47 1762390247

> LLMs hallucinate because training on source material is a lossy process and bigger,

LLMs hallucinate because they are probabilistic by nature not because the source material is lossy or too big. They are literally designed to create some level of "randomness" https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

ChadNauseam · 2025-11-06T02:43:06 1762396986

So if you set temperature=0 and run the LLM serially (making it deterministic) it would stop hallucinating? I don't think so. I would guess that the nondeterminism issues mentioned in the article are not at all a primary cause of hallucinations.

joquarky · 2025-11-06T02:55:28 1762397728

I thought that temperature can never actually be zero or it creates a division problem or something similar.

I'm no ML or math expert, just repeating what I've heard.

ChadNauseam · 2025-11-06T03:35:04 1762400104

That's an implementation detail I believe. But what I meant was just greedy decoding (picking the token with the highest logit in the LLM output), which can be implemented very easily

mbesto · 2025-11-06T14:22:32 1762438952

Did you read the whole article?

"In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism."

andy99 · 2025-11-05T22:25:41 1762381541

Classical LLM hallucination happens because AI doesn’t have a world model. It can’t compare what it’s saying to anything.

You’re right that LLMs favor helpfulness so they may just make things up when they don’t know them, but this alone doesn’t capture the crux of hallucination imo, it’s deeper than just being overconfident.

OTOH, there was an interesting article recently that I’ll try to find saying humans don’t really have a world model either. While I take the point, we can have one when we want to.

Edit: see https://www.astralcodexten.com/p/in-search-of-ai-psychosis re humans not having world models

naniwaduni · 2025-11-05T23:48:46 1762386526

You're right, "journalists don't have a world model and can't compare what they're saying to anything" explains a lot.

observationist · 2025-11-05T22:20:56 1762381256

These writers are no different than bloggers or shitposters on bluesky or here on hackernews. "Journalism" as a rigorous, principled approach to writing, research, investigation, and ethical publishing is exceedingly rare. These people are shitposting for clicks in pursuit of a paycheck. Organizationally, they're intensely against AI because AI effectively replaces the entire talking heads class - AI is already superhuman at the shitposting level takes these people churn out. There are still a few journalistic insitutions out there, but most people are no better than a mad libs exercise with regards to the content they produce, and they're in direct competition with ChatGPT and Grok and the rest. I'd rather argue with a bot and do searches and research and investigation than read a neatly packaged trite little article about nearly any subject, and I guarantee, hallucinations or no, I'm going to come to a better understanding and closer approximation of reality than any content a so called "news" outlet is putting together.

It's trivial to get a thorough spectrum of reliable sources using AI w/ web search tooling, and over the course of a principled conversation, you can find out exactly what you want to know.

It's really not bashing, this article isn't too bad, but the bulk of this site's coverage of AI topics skews negative - as do the many, many platforms and outlets owned by Bell Media, with a negative skew on AI in general, and positive reinforcement of regulatory capture related topics. Which only makes sense - they're making money, and want to continue making money, and AI threatens that - they can no longer claim they provide value if they're not providing direct, relevant, novel content, and not zergnet clickbait journo-slop.

Just like Carlin said, there doesn't have to be a conspiracy with a bunch of villains in a smoky room plotting evil, there's just a bunch of people in a club who know what's good for them, and legacy media outlets are all therefore universally incentivized to make AI look as bad and flawed and useless as possible, right up until they get what they consider to be their "fair share", as middlemen.

pksebben · 2025-11-05T21:57:06 1762379826

Whenever I hear arguments about LLM hallucination, this is my first thought. Like, I already can't trust the lion's share of information in news, social media, (insert human-created content here). Sometimes because of abject disinformation, frequently just because humans are experts at being wrong.

At least with the LLM (for now) I know it's not trying to sell me bunkum or convince me to vote a particular way. Mostly.

I do expect this state of affairs to last at least until next wednesday.

lazide · 2025-11-06T02:03:43 1762394623

LLMs are trained on material doing all these things though.

pksebben · 2025-11-06T02:42:50 1762396970

true, true. Turtles all the way down and such.

lazide · 2025-11-06T13:31:51 1762435911

Even more so with the ouruborus issue caused by LLMs trying to train on an increasingly LLM generated internet.

pksebben · 2025-11-06T22:10:46 1762467046

You know, I had a think about that the other day - I believe that the volume of bad information might remain stable, while the shape changes. There are some things that LLMs are actually better at than the random mix of human-created data, on average. Subjects that are inherently political or skewed because of a small subset of very vocal and biased outliers. The LM tends to smooth some of those bumps out, and in some places (not all) this flattens out the rougher edges.

I don't think it necessarily bears repeating the plethora of ways in which LMs get stuff wrong, esp. considering the context of this conversation. It's vast.

As things develop, I expect that LMs will become more like the current zeitgeist as the effects that have influenced news and other media make their way into the models. They'll get better at smoothing in some areas (mostly technical or dry domains that aren't juicy targets) and worse in others (I expect to see more biased training and more hardcore censorship/steering in future).

Although, recursive reinforcement (LMs training on LM output) might undo any of the smoothing we see. It's really hard to tell - these systems are complex and very highly interconnected with many other complex systems.

terminalshort · 2025-11-05T21:21:05 1762377665

Also these guys who call themselves doctors. I have narcolepsy and the first 10 or so doctors I went to hallucinated the wrong diagnosis.

Terr_ · 2025-11-06T02:36:42 1762396602

LLMs aren't described as hallucinators (just) because they sometimes give results we don't find useful, but because their method is flawed.

For example, the simple algorithm is_it_lupus(){return false;} could have an extremely competitive success rate for medical diagnostics... But it's also obviously the wrong way to go about things.

jonway · 2025-11-06T15:41:58 1762443718

Narcolepsy is rare and notoriously difficult to diagnose

busymom0 · 2025-11-05T22:28:30 1762381710

Isn't every single response by LLMs hallucinations and we just accept a few and ignore the others?

sans_souse · 2025-11-05T21:31:23 1762378283

"Telephone", basically

awakeasleep · 2025-11-05T20:43:52 1762375432

issue with the funding mechanism

qustrolabe · 2025-11-05T23:22:35 1762384955

Yeah but it started being really annoying when you import something like Xray photo. Like chanting "sorry human as LLM I can't answer questions about that" and then after few gaslighting prompts it does it anyway but now I have to take in count that my gaslighting inputs seriously affect answers so way more chance it hallucinates...

spullara · 2025-10-29T18:53:07 1761763987

You know what would make it even easier? Releasing the source code with a license that allows for modding.

spullara · 2025-10-22T16:16:09 1761149769

Guess you are constantly dodging them, so lucky.

https://www.kbb.com/audi/recall/

spullara · 2025-10-20T20:49:03 1760993343

providers should stop using just us-east-1 like idiots.

spullara · 2025-10-13T21:06:43 1760389603

Because dependencies on Unix are terrible for some languages that assume things are installed globally.