I built my career on being in these meetings and have split my time between strategy and coding. You have to do the work to understand the concerns of the company and weigh those with your technical opinion.
Also the copyOf isn't really the same as being able to || things since it just happens both copyOf default is 0 and in this case it is also 0 (i.e. what if it was -1 to indicate there was no version).
GPT-OSS-120b/20b is probably the best you can run on your own hardware today. Be careful with the quantized versions though, as they're really horrible compared to the native MXFP4. I haven't looked in this particular case, but Ollama tends to hide their quantizations for some reason, so most people who could be running 20B with MXFP4, are still on Q8 and getting much worse results than they could.
The gpt-oss weights on Ollama are native mxfp4 (the same weights provided by OpenAI). No additional quantization is applied, so let me know if you're seeing any strange results with Ollama.
Most gpt-oss GGUF files online have parts of their weights quantized to q8_0, and we've seen folks get some strange results from these models. If you're importing these to Ollama to run, the output quality may decrease.
It's a different way of doing quantization (https://huggingface.co/docs/transformers/en/quantization/mxf...) but I think the most important thing is that OpenAI delivered their own quantization (the MXFP4 from OpenAI/GPT-OSS on HuggingFace, guaranteed correct) whereas all the Q8 and other quantizations you see floating around are community efforts, with somewhat uneven results depending on who done it.
Concretely from my testing, both 20B and 120B has a lot higher refusal rate with Q8 compared to MXFP4, and lower quality responses overall. But don't take my word for it, the 20B weights are tiny and relatively effortless to try both versions and compare yourself.
The default ones on Ollama are MXFP4 for the feed forward network and use BF16 for the attention weights. The default weights for llama.cpp quantize those tensors as q8_0 which is why llama.cpp can eek out a little bit more performance at the cost of worse output. If you are using this for coding, you definitely want better output.
You can use the command `ollama show -v gpt-oss:120b` to see the datatype of each tensor.
on the model description page they claim they support it:
Quantization - MXFP4 format
OpenAI utilizes quantization to reduce the memory footprint of the gpt-oss models. The models are post-trained with quantization of the mixture-of-experts (MoE) weights to MXFP4 format, where the weights are quantized to 4.25 bits per parameter. The MoE weights are responsible for 90+% of the total parameter count, and quantizing these to MXFP4 enables the smaller model to run on systems with as little as 16GB memory, and the larger model to fit on a single 80GB GPU.
Ollama is supporting the MXFP4 format natively without additional quantizations or conversions. New kernels are developed for Ollama’s new engine to support the MXFP4 format.
Ollama collaborated with OpenAI to benchmark against their reference implementations to ensure Ollama’s implementations have the same quality.
Can you be more specific? I've got LM Studio downloaded but it's not clear where are the official OpenAI releases? Are they all only available via transformers? The only one that shows up in search appears to be the distilled gpt-oss 20B...
The key thing I'm confident in is that 2-3 years from now there's going to be a model(s) and workflow that has comparable accuracy, perhaps noticeable (but tolerable) higher latency that can be run locally. There's just no reason to believe this isn't achievable.
Hard to understand how this won't make all of the solutions for existing use cases commodity. I'm sure 2-3 years from now there'll be stuff that seems like magic to us now -- but it will be more-meta, more "here's a hypothesis of a strategically valuable outcome and heres a solution (with market research and user testing done".
I think current performance and leading models will turn out to have been terrible indicators for future market leader (and my money will remain on the incumbents with the largest cash reserves (namely Google) that have invested in fundamental research and scaling).
I would love to be able to filter the resulting list by removing certainly all books that in the same series but I think removing all books by authors that I have already listed would be great to get new things that I haven't already read. The resulting recommendations maybe included 1 new book for me.
I did not add what you requested exactly because I think in many cases authors have written less popular books that people may not be aware of but if you try again you should see less highly repetitive things like 5 of the same series in a row in the results.
This title is inaccurate. What they are disallowing are users using ChatGPT to offer legal and medical advice to other people. First parties can still use ChatGPT for medical and legal advice for themselves.
While they aren't stopping users from getting medical advice, the new terms (which they say are pretty much the same as the old terms), seem to prohibit users from seeking medical advice even for themselves if that advice would otherwise come from a licensed health professional:
Your use of OpenAI services must follow these Usage Policies:
Protect people. Everyone has a right to safety and security. So you cannot use our services for:
provision of tailored advice that requires a license, such as legal or medical advice, without appropriate involvement by a licensed professional
It sounds like you should never trust any medical advice you receive from ChatGPT and should seek proper medical help instead. That makes sense. The OpenAI company doesn't want to be held responsible for any medical advice that goes wrong.
Obviously, there is one piece of advice: Even if LLMs were the best health professionals, they would only have the information that users voluntarily provide through text/speech input. This is not how real health services work. Medical science now relies on blood/(whatever) tests that LLMs do not (yet) have access to. Therefore, the output from LLM advice can be incorrect due to a lack of information from tests. For this reason, it makes sense to never trust LLM with a specific health advice.
>It sounds like you should never trust any medical advice you receive from ChatGPT and should seek proper medical help instead. That makes sense. The OpenAI company doesn't want to be held responsible for any medical advice that goes wrong.
While what you're saying is good advice, that's not what they are saying. They want people to be able to ask ChatGPT for medical advice, give answers that sound authoritative and well grounded medical science, but then disavow any liability if someone follows their advice because "Hey, we told you not to act on our medical advice!"
If ChatGPT is so smart, why can't it stop itself from giving out advice that should not be trusted?
I think ChatGPT is capable of giving reasonable medical advice, but given that we know it will hallucinate the most outlandish things, and its propensity to agree with whatever the user is saying, I think it's simply too dangerous to follow its advice.
Here in Canada ever since COVID most "visits" are a telephone call now. So the doctor just listens your words (same as a text input to an LLM) and orders tests (which can be uploaded to an LLM) if they need.
For a good 90% of typical visits to doctors this is probably fine.
The difference is a telehealth is much better at recognizing "I can't given an accurate answer for this over the phone, you'll need to have some tests done" or cast doubt on the patient's accuracy of claims.
Before someone points out telehealth doctors aren't perfect at this: correct, but that should make you more scared of how bad sycophantic LLMs are at the same - not willing to call it even.
Again, it's not that all telehealth doctors are great at this, it's that LLMs are awful at caving in to saying something with warnings the reader will opt to ignore instead of being adamant things are just too uncertain to say anything of value when continually prompted.
This is largely because an LLM guessing an answer is rewarded more often than just not answering, which is not true in the healthcare profession.
LLMs almost never reply with I don’t know. There’s been mountains of research as to why this is, but it’s very well documented behavior.
Even in the rare case where an LLM does reply with I don’t know go see your doctor, all you have to do is ask it again until you get a response you want.
It depends entirely on the local health care system and your health insurance. In germany for example it comes in 2 tiers. Premium or standard. Standard comes with no time for the patient. (Or not even being able to get a appointment)
In the US people on Medicaid frequently use emergency rooms as primary care because they are open 24/7 and they don’t have any copays like people with private insurance do. These patients then get far more tests than they’d get at a PCP.
> Physicians use all their senses. They poke, they prod, they manipulate, they look, listen, and smell.
Sometimes. Sometimes they practice by text or phone.
> They’re also good at extracting information in a way that (at least currently) sycophantic LLMs don’t replicate.
If I had to guess, I think I'd guess that mainstream LLM chatbots are better at getting honest and applicable medical histories than most doctors. People are less likely to lie/hide/prevaricate and get more time with the person.
> Sometimes. Sometimes they practice by text or phone.
For very simple issues. For anything even remotely complicated, they’re going to have you come in.
> If I had to guess, I think I'd guess that mainstream LLM chatbots are better at getting honest and applicable medical histories than most doctors. People are less likely to lie/hide/prevaricate and get more time with the person.
It’s not just about being intentionally deceptive. It’s very easy to get chat bots to tell you what you want to hear.
Agreed, but I'm sure you can see why people prefer the infinite patience and availability of ChatGPT vs having to wait weeks to see your doctor, see them for 15 minutes only to be referred to another specialist that's available weeks away and has an arduous hour long intake process all so you can get 15 minutes of their time.
ChatGPT is effectively an unlimited resource. Whether doctor’s appointments take weeks or hours to secure, ChatGPT is always going to be more convenient.
That says nothing about whether it is an appropriate substitute. People prefer doctors who prescribe antibiotics for viral infections, so I have no doubt that many people would love to use a service that they can manipulate to give them whatever diagnosis they desire.
So ask it what blood tests you should get, pay for them out of pocket, and upload the PDF of your labwork?
Like it or not there are people out there that really want to use webMD 2.0. they're not going to let something silly like blood work get in their way.
Exactly. One of my children lives in a country where you can just walk in to a lab and get any test. Recently they were diagnosed by a professional of a disease which chatgpt had already diagnosed before they visited the doctor. So, we were kind of prepared to ask more questions when the visit happened. So I would say chatgpt did really help us.
That makes sense. ChatGPT helped by providing orientation advice and guidance regarding your children's medical condition. After that, however, you visited a doctor who is taking responsibility for the next steps. This is the ideal scenario.
AI can give you whatever information, be it good or wrong. But it takes zero responsibility.
IANAL but I read that as forbidding you to provision legal/medical advice (to others) rather than forbidding you to ask the AI to provision legal/medical advice (to you).
IANAL either, but I read it as using the service to provision medical advice since they only mentioned the service and not anyone else.
I asked the "expert" itself (ChatGPT), and apparently you can ask for medical advice, you just can't use the medical advice without consulting a medical professional:
Here are relevant excerpts from OpenAI’s terms and policies regarding medical advice and similar high-stakes usage:
From the Usage Policies (effective October 29 2025):
“You may not use our services for … provision of tailored advice that requires a license, such as legal or medical advice, without appropriate involvement by a licensed professional.”
Also: “You must not use any Output relating to a person for any purpose that could have a legal or material impact on that person, such as making … medical … decisions about them.”
From the Service Terms:
“Our Services are not intended for use in the diagnosis or treatment of any health condition. You are responsible for complying with applicable laws for any use of our Services in a medical or healthcare context.”
In plain terms, yes—the Terms of Use permit you to ask questions about medical topics, but they clearly state that the service cannot be used for personalized, licensed medical advice or treatment decisions without a qualified professional involved.
Would be interested to hear a legal expert weigh in on what 'advice' is. I'm not clear that discussing medical and legal issues with you is necessarily providing advice.
One of the things I respected OpenAI for at the release of ChatGPT was not trying to prevent these topics. My employer at the time had a cutting-edge internal LLM chatbot for a which was post-trained to avoid them, something I think they were forced to be braver about in their public release because of the competitive landscape.
The important terms here are "provision" and "without appropriate involvement by a licensed professional".
Both of these, separately and taken together, indicate that the terms apply to how the output of ChatGPT is used, not a change to its output altogether.
I am not a doctor, I can't give medical advice no matter what my sources are, except maybe if I am just relaying the information an actual doctor has given me, but that would fall under the "appropriate involvement" part.
> such as legal or medical advice, without appropriate involvement by a licensed professional
Am I allowed to get haircutting advice (in places where there's a license for that)? How about driving directions? Taxi drivers require licensing. Pet grooming?
CYA move. If some bright spark decides to consult Dr. ChatGPT without input from a human M.D., and fucks their shit up as a result, OpenAI can say "not our responsibility, as that's actually against our ToS."
I don't think giving someone "medical advice" in the US requires a license per se; legal entities use "this is not medical advice" type disclaimers just to avoid liability.
What’s illegal is practicing medicine. Giving medical advice can be “practicing medicine” depending on how specific it is and whether a reasonable person receiving the advice thinks you have medical training.
Disclaimers like “I am not a doctor and this is not medical advice” aren’t just for avoiding civil liability, they’re to make it clear that you aren’t representing yourself as a doctor.
Please, when commenting on the title of a story on HN: include the title that you are commenting on.
The admins regularly change the title based on complaints, which can be really confusing when the top, heavily commented thread is based on the original title.
According to the Wayback machine, the title was "OpenAI ends legal and medical advice on ChatGPT", while now when I write this the title is "ChatGPT terms disallow its use in providing legal and medical advice to others."
> OpenAI is changing its policies so that its AI chatbot, ChatGPT, won’t dole out tailored medical or legal advice to users.
This already seems to contradict what you're saying.
But then:
> The AI research company updated its usage policies on Oct. 29 to clarify that users of ChatGPT can’t use the service for “tailored advice that requires a license, such as legal or medical advice, without appropriate involvement by a licensed professional.”
> The change is clearer from the company’s last update to its usage policies on Jan. 29, 2025. It required users not “perform or facilitate” activities that could significantly impact the “safety, wellbeing, or rights of others,” which included “providing tailored legal, medical/health, or financial advice.”
This seems to suggest that with the Jan 25 policy using it to offer legal and medical advice to other people was already disallowed, but with the Oct 25 update the LLM will stop shelling out legal and medical advice completely.
This is from Karan Singhal, Health AI team lead at OpenAI.
Quote: “Despite speculation, this is not a new change to our terms. Model behavior remains unchanged. ChatGPT has never been a substitute for professional advice, but it will continue to be a great resource to help people understand legal and health information.”
'An earlier version of this story suggested OpenAI had ended medical and legal advice. However the company said "the model behaviour has also not changed.'
Also possible: he's unaware of a change implemented elsewhere that (intentionally or unintentionally) has resulted in a change of behaviour in this circumstance.
(e.g. are the terms of service, or exerpts of it, available in the system prompt or search results for health questions? So a response under the new ToS would produce different outputs without any intentional change in "behaviour" of the model.)
It’s a big issue. I went to an urgent care, and the provider basically went off somewhere and memorized the ChatGPT assessment for my symptoms. Like word for word.
All you need are a few patients recording their visits and connecting the dots and OpenAI gets sued into oblivion.
There are millions of medical doctors and lawyers using chatgpt for work everyday - good news that from now on only those licensed professionals are allowed to use chatgpt for law and medicine. It's already the case that only licensed developers are allowed to vibe code and use chatgpt to develop software. Everything else would be totally irresponsible.
I keep seeing this problem more and more with humans. What should we call it? Maybe Hallucinations? Where there is an accurate true thing and then it just gets altered by these guys who call themselves journalists and reporters and the like until it is just ... completely unrecognizable?
I'm pretty sure it's a fundamental issue with the architecture.
I know this is written to be tounge-in-cheek, but its really almost the exact same problem playing out on both sides.
LLMs hallucinate because training on source material is a lossy process and bigger, heavier LLM-integrated systems that can research and cite primary sources are slow and expensive so few people use those techniques by default. Lowest time to a good enough response is the primary metric.
Journalists oversimplify and fail to ask followup questions because while they can research and cite primary sources, its slow and expensive in an infinitesimally short news cycle so nobody does that by default. Whoever publishes something that someone will click on first gets the ad impressions so thats the primary metric.
In either case, we've got pretty decent tools and techniques for better accuracy and education - whether via humans or LLMs and co - but most people, most of the time don't value them.
So if you set temperature=0 and run the LLM serially (making it deterministic) it would stop hallucinating? I don't think so. I would guess that the nondeterminism issues mentioned in the article are not at all a primary cause of hallucinations.
That's an implementation detail I believe. But what I meant was just greedy decoding (picking the token with the highest logit in the LLM output), which can be implemented very easily
"In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism."
Classical LLM hallucination happens because AI doesn’t have a world model. It can’t compare what it’s saying to anything.
You’re right that LLMs favor helpfulness so they may just make things up when they don’t know them, but this alone doesn’t capture the crux of hallucination imo, it’s deeper than just being overconfident.
OTOH, there was an interesting article recently that I’ll try to find saying humans don’t really have a world model either. While I take the point, we can have one when we want to.
These writers are no different than bloggers or shitposters on bluesky or here on hackernews. "Journalism" as a rigorous, principled approach to writing, research, investigation, and ethical publishing is exceedingly rare. These people are shitposting for clicks in pursuit of a paycheck. Organizationally, they're intensely against AI because AI effectively replaces the entire talking heads class - AI is already superhuman at the shitposting level takes these people churn out. There are still a few journalistic insitutions out there, but most people are no better than a mad libs exercise with regards to the content they produce, and they're in direct competition with ChatGPT and Grok and the rest. I'd rather argue with a bot and do searches and research and investigation than read a neatly packaged trite little article about nearly any subject, and I guarantee, hallucinations or no, I'm going to come to a better understanding and closer approximation of reality than any content a so called "news" outlet is putting together.
It's trivial to get a thorough spectrum of reliable sources using AI w/ web search tooling, and over the course of a principled conversation, you can find out exactly what you want to know.
It's really not bashing, this article isn't too bad, but the bulk of this site's coverage of AI topics skews negative - as do the many, many platforms and outlets owned by Bell Media, with a negative skew on AI in general, and positive reinforcement of regulatory capture related topics. Which only makes sense - they're making money, and want to continue making money, and AI threatens that - they can no longer claim they provide value if they're not providing direct, relevant, novel content, and not zergnet clickbait journo-slop.
Just like Carlin said, there doesn't have to be a conspiracy with a bunch of villains in a smoky room plotting evil, there's just a bunch of people in a club who know what's good for them, and legacy media outlets are all therefore universally incentivized to make AI look as bad and flawed and useless as possible, right up until they get what they consider to be their "fair share", as middlemen.
Whenever I hear arguments about LLM hallucination, this is my first thought. Like, I already can't trust the lion's share of information in news, social media, (insert human-created content here). Sometimes because of abject disinformation, frequently just because humans are experts at being wrong.
At least with the LLM (for now) I know it's not trying to sell me bunkum or convince me to vote a particular way. Mostly.
I do expect this state of affairs to last at least until next wednesday.
You know, I had a think about that the other day - I believe that the volume of bad information might remain stable, while the shape changes. There are some things that LLMs are actually better at than the random mix of human-created data, on average. Subjects that are inherently political or skewed because of a small subset of very vocal and biased outliers. The LM tends to smooth some of those bumps out, and in some places (not all) this flattens out the rougher edges.
I don't think it necessarily bears repeating the plethora of ways in which LMs get stuff wrong, esp. considering the context of this conversation. It's vast.
As things develop, I expect that LMs will become more like the current zeitgeist as the effects that have influenced news and other media make their way into the models. They'll get better at smoothing in some areas (mostly technical or dry domains that aren't juicy targets) and worse in others (I expect to see more biased training and more hardcore censorship/steering in future).
Although, recursive reinforcement (LMs training on LM output) might undo any of the smoothing we see. It's really hard to tell - these systems are complex and very highly interconnected with many other complex systems.
LLMs aren't described as hallucinators (just) because they sometimes give results we don't find useful, but because their method is flawed.
For example, the simple algorithm is_it_lupus(){return false;} could have an extremely competitive success rate for medical diagnostics... But it's also obviously the wrong way to go about things.
Yeah but it started being really annoying when you import something like Xray photo. Like chanting "sorry human as LLM I can't answer questions about that" and then after few gaslighting prompts it does it anyway but now I have to take in count that my gaslighting inputs seriously affect answers so way more chance it hallucinates...
reply