Hacker Newsnew | past | comments | ask | show | jobs | submit | mkagenius's commentslogin

I(non autistic) would love to be friends with someone like you.

It was a little weird that they didnt suspect phishing from the get go.

I always wondered how they achieved this - is it just retries while generating tokens and as soon as they find mismatch - they retry? Or the model itself is trained extremely well in this version of 4.5?

They're using the same trick OpenAI have been using for a while: they compile a grammar and then have that running as part of token inference, such that only tokens that fit the grammar are selected as the next-token.

This trick has also been in llama.cpp for a couple of years: https://til.simonwillison.net/llms/llama-cpp-python-grammars



Yea, and now there are mature OSS solutions with outlines and xgrammar, so it makes even more weird that only now do we have this supported by Anthropic.

I reaaaaally wish we could provide an EBNF grammar like llama.cpp. JSON Schema has much fewer use cases for me.

What are some examples that you can’t express in json schema?

Anything not JSON

This makes me wonder if there are cases where one would want the LLM to generate a syntactically invalid response (which could be identified as such) rather than guarantee syntactic validity at the potential cost of semantic accuracy.

How sure are you that OpenAI is using that?

I would have suspected it too, but I’ve been struggling with OpenAI returning syntactically invalid JSON when provided with a simple pydantic class (a list of strings), which shouldn’t be possible unless they have a glaring error in their grammar.


You might be using JSON mode, which doesn’t guarantee a schema will be followed, or structured outputs not in strict mode. It is possible to get the property that the response is either a valid instance of the schema or an error (eg for refusal)

How do you activate strict mode when using pydantic schemas? It doesn't look like that is a valid parameter to me.

No, I don't get refusals, I see literally invalid json, like: `{"field": ["value...}`


https://github.com/guidance-ai/llguidance

> 2025-05-20 LLGuidance shipped in OpenAI for JSON Schema


OpenAI is using [0] LLGuidance [1]. You need to set strict:true in your request for schema validation to kick in though.

[0] https://platform.openai.com/docs/guides/function-calling#lar... [1] https://github.com/guidance-ai/llguidance


I don't think that parameter is an option when using pydantic schemas.

class FooBar(BaseModel): foo: list[str] bar: list[int]

prompt = """#Task Your job is to reply with Foo Bar, a json object with foo, a list of strings, and bar, a list of ints """

response = openai_client.chat.completions.parse( model="gpt-5-nano-2025-08-07", messages=[{"role": "system", "content": FooBar}], max_completion_tokens=4096, seed=123, response_format=CommentAnalysis, strict=True )

TypeError: Completions.parse() got an unexpected keyword argument 'strict'


You have to explicitly opt into it by passing strict=True https://platform.openai.com/docs/guides/structured-outputs/s...

Are you able to use `strict=True` when using pydantic models? It doesn't seem to be valid for me. I think that only works for json schemas.

class FooBar(BaseModel): foo: list[str] bar: list[int]

prompt = """#Task Your job is to reply with Foo Bar, a json object with foo, a list of strings, and bar, a list of ints """

response = openai_client.chat.completions.parse( model="gpt-5-nano-2025-08-07", messages=[{"role": "system", "content": FooBar}], max_completion_tokens=4096, seed=123, response_format=CommentAnalysis, strict=True )

> TypeError: Completions.parse() got an unexpected keyword argument 'strict'


The inference doesn't return a single token, but the probably for all tokens. You just select the token that is allowed according to the compiler.

Hmm, wouldn't it sacrifice a better answer in some cases (not sure how many though)?

I'll be surprised if they hadn't specifically trained for structured "correct" output for this, in addition to picking next token following the structure.


In my experience (I've put hundreds of billions of tokens through structured outputs over the last 18 months), I think the answer is yes, but only in edge cases.

It generally happens when the grammar is highly constrained, for example if a boolean is expected next.

If the model assigns a low probability to both true and false coming next, then the sampling strategy will pick whichever one happens to score highest. Most tokens have very similar probabilities close to 0 most of the time, and if you're picking between two of these then the result will often feel random.

It's always the result of a bad prompt though, if you improve the prompt so that the model understands the task better, then there will then be a clear difference in the scores the tokens get, and so it seems less random.


It's not just the prompt that matters, it's also field order (and a bunch of other things).

Imagine you're asking your model to give you a list of tasks mentioned in a meeting, along with a boolean indicating whether the task is done. If you put the boolean first, the model must decide both what the task is and whether it is done at the same time. If you put the task description first, the model can separate that work into two distinct steps.

There are more tricks like this. It's really worth thinking about which calculations you delegate to the model and which you do in code, and how you integrate the two.


Grammars work best when aligned with prompt. That is, if your prompt gives you the right format of answer 80% of the time, the grammar will take you to a 100%. If it gives you the right answer 1% of the time, the grammar will give you syntactically correct garbage.

Sampling is already constrained with temperature, top_k, top_p, top_a, typical_p, min_p, entropy_penalty, smoothing etc. – filtering tokens to valid ones according to grammar is just yet another alternative. It does make sense and can be used for producing programming language output as well – what's the point in generating/bothering with up front know, invalid output? Better to filter it out and allow valid completions only.

The "better answer" wouldnt had respected the schema in this case.

> Nano Banana is still bad at rendering text perfectly/without typos as most image generation models.

I figured that if you write the text in Google docs and share the screenshot with banana it will not make any spelling mistake.

So, use something like "can you write my name on this Wimbledon trophy, both images are attached. Use them" will work.


Google's example documentation for Nano Banana does demo that pipeline: https://ai.google.dev/gemini-api/docs/image-generation#pytho...

That's on my list of blog-post-worthy things to test, namely text rendering to image in Python directly and passing both input images to the model for compositing.


Yeah, close.

But it is still generating it with a prompt

> Logo: "A simple, modern logo with the letters 'G' and 'A' in a white circle.

My idea was do to it manually so that there is no probabilities involved.

Though your idea of using python is same.


Are we reading too much into one sentence? HN comments dese days


No, we aren't.

It was this exact part of the conversation that touched me negatively too. marsf expresses some very valid criticism that, instead of being publicly addressed, is being handled by "let's discuss it privately". This always means that they don't want to discuss, they just want to shut you down.


I don’t think so. Working in tech with many busy people, I say “hop on a call”, but only in “let’s sync live, it’ll be faster” situations.

This stuck out to me as rude. I would never say that to someone on my team who expressed serious concerns, far less than this person quitting after years of dedication.

I would offer an apology, explanation, and follow up questions to understand more in public, then say I’m happy to set up time to talk privately if they would like to or feel more comfortable.


> This stuck out to me as rude.

Very much so, and I'm German ;)

In my experience, and in my feeling as someone reading such things, you need to tone-match. The resignation message was somewhat formal, structured and serious in tone. Replying in such an informal tone means that you are not taking things seriously, which is insulting. Even more so because that informal answer is public.

I'm tone-deaf by culture and by personality. I often make those kinds of mistakes. But a public resignation like this is a brightly flashing warning light saying: "this needs a serious formal answer".


What about the reply in the link indicates to you that the person has empathy for marsf’s complaints and is willing to change anything at Mozilla in response to them?

For the reasons I stated above, the response comes off as faking understanding to manage a PR issue rather than genuine empathy and possible negotiation, but I am often wrong about many things.


I mean, its right and also not the only sentence too.


> The Takeaway: Skills are the right abstraction. They formalize the “scripting”-based agent model, which is more robust and flexible than the rigid, API-like model that MCP represents.

Just to not confuse, MCP is like an api but the underlying api can execute an Skill. So, its not MCP vs Skill as a contest. It's just the broad concept of a "flexible" skill vs "parameter" based Api. And again parameter based APIs can also be flexible depending on how we write it except that it lacks SKILL.md in case of Skills which guides llm to be more generic than a pure API.

By the way, if you are a Mac user, you can execute Skills locally via OpenSkills[1] that I have created using apple contianers.

1. OpenSkills -https://github.com/BandarLabs/open-skills


Since you are on Mac, if you need some kind code execution sandbox, check out Coderunner[1] which is based on Apple container, provides a way execute any LLM generated cod e without risking arbitrary code execution on your machine.

I have recently added claude skills to it. So, all the claude skills can be executed locally on your mac too.

1. https://github.com/instavm/coderunner


Interesting that a whole economy is based on fake supply constraint. Or is making butterfly knife really hard?

It seems like NFT before NFT.


yeah CS skins is one of the biggest markets of digital-only-aesthetic-items before NFT came around (and now probably still bigger than NFTs). The main thing with NFTs was that there's no "central database", CS skins solely lives in Valve's database.

making a butterfly knife for Valve isn't hard (in the past Steam Customer Service duplicated items lost in scams). It's hard for the players because they have to "gamble" for it through paying keys to open cases.


It's hard as in "it's hard to trick or manipulate the centralized database".

Similarly making USD in a bank account isn't technically hard, but it's fucking hard to get a bank to tweak some numbers in your favour.


it's not a fake supply

CSGO knifes actually currency run by shadow banks providing RMB <-> USD convertion.

Google for "挂刀"


This should be a top level comment, it is the "ah hah" that suddenly makes everything clear.


Can you explain the shadow banking / conversion angle? All I found was that selling knives was used to get a discount on steam balance thanks to price arbitrage.

> "Selling Knives" (挂刀) refers to the technique of buying in-game items from 3rd-party (Chinese) trading sites like NetEase BUFF, C5, IGXE, and UUYP, and then selling them on the Steam Market to obtain a discounted Steam Wallet balance by capitalizing on price differences.

I'm surprised the price difference did not disappear if people make that trade.

Source https://github.com/EricZhu-42/SteamTradingSiteTracker/wiki


China notoriously has intense capital controls. It's difficult for ordinary Chinese citizens to take capital out of the country. CS2 items can be bought and sold in both USD and RMB, and can be transferred between Chinese and international accounts. It's not about Steam wallet balances.


Interesting. I'm curious though, assuming I am Chinese and I trade knives for USD - where would I be able to receive USD to evade capital control? Surely not my bank account or Steam wallet. Or is it for people with bank account in both countries? But in that case crypto could be more convenient? I'm puzzled


Yes you would need to receive in a foreign USD bank account outside of China, the whole goal is to get the capital out of China and into a foreign account. Cryptocurrency transactions/exchanges are illegal in China so that's definitely not convenient! Meanwhile you can buy CS2 items with any ordinary payment method.


Both US and CN have a massive player base, they all need to buy games in their own currency

You can buy games with Steam Wallet

You can also buy/sell in-game items with Steam Wallet

Now only if someone invents a commodity with a stable price. Hmm what could that be?


Remember the 15% transaction fee on the Steam market? That's why the price difference hasn't disappeared. Players can avoid this fee through gifting and off-platform transactions. And all of this is just Chinese players trying to buy games cheaper—after all, what else can Steam wallet funds be used for? Some comments claim this is a way for Chinese people to evade financial regulations, but that's complete nonsense. The Steam market's capacity is entirely insufficient to meet the demand. They could easily choose to legally exchange currency using the foreign exchange quotas of relatives and friends, engage in cross-border wash trading through underground banks, or use fake trade and fake investment schemes.


Artificial scarcity has existed for ages. Watches, playing cards, cars, etc.

Selling 10 of something for $1000 instead of 1000 of something for $10 is not new.

Also builds brand value.


I feel watches and cars are different. You cant magically "print" 10000000 Bentley's so supply will be constrained and they are expensive to make. I feel the luxury is more tangible than just being rare.


See the discussion around the supposedly lost Van Gogh painting, eg at https://news.artnet.com/art-world/van-gogh-lmi-group-2602847

Nothing about the painting itself would have changed, but its market value depends very much on whether Van Gogh painted it.


A lot of real economies are based on fake constraints. Or the constraint is a closely held secret that's pretty arbitrary and not based on any grand amount of skill or effort.


All this froth on the ocean surface is only possible in an economy where household net worth has been inflated to 150 Trillion.


Yeah the measly peasants should have never gotten their hands on such luxuries as game knives skins.


It is NFT. But because it's Valve its actually good. Because of reasons.


> something that is truly local-first

Hey, we built coderunner[1] exactly for this purpose. It's completely local. We use apple containers for this (which are 1:1 mapped to a lightweight VM).

1. Coderunner - https://github.com/instavm/coderunner


Very cool! Apple containers run on Apple ARM so it's complimentary to my stack which doesn't support ARM yet (but soon will when extending to Qemu which supports ARM). Thanks for sharing!


Are you the one who created the techno-anthem "pump up the jam"? Sweet!


Pump Up the Jam always reminds me of Philomena Cunk.

If you have Netflix, look up "Cunk on Earth". Trust me, you won't regret it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: