Hacker Newsnew | past | comments | ask | show | jobs | submit | harpiaharpyja's commentslogin

A design that was once useful but no longer has a use is not the same thing as a failed design. Which is what the disagreement seems to be about.

If air was highly conductive that analogy would totally hold.

"If there’s a cut in my wire’s insulation, the device won’t get enough voltage" doesn't follow from: "voltage is like water pressure in pipes"

So I don't really get your point.


> "If there’s a cut in my wire’s insulation, the device won’t get enough voltage" doesn't follow from: "voltage is like water pressure in pipes"

I absolutely agree! In the same way, "an LLM can solve complex problems if it breaks them into subtasks" doesn't follow from "NASA breaks large projects into smaller parts"


Not all models can actually do that if your prompt is particular

Most designers can't, either. Defining a spec is a skill.

It's actually fairly difficult to put to words any specific enough vision such that it becomes understandable outside of your own head. This goes for pretty much anything, too.


… sure … but also no. For example, say I have an image. 3 people in it; there is a speech bubble above the person on the right that reads "I'A'T AY RO HERT YOU THE SAP!"¹

I give it,

  Reposition the text bubble to be coming from the middle character.

  DO NOT modify the poses or features of the actual characters. 
Now sure, specs are hard. Gemini removed the text bubble entirely. Whatever, let's just try again:

  Place a speech bubble on the image. The "tail" of the bubble should make it appear that the middle (red-headed) girl is talking. The speech bubble should read "Hide the vodka." Use a Comic Sans like font. DO NOT place the bubble on the right.

  DO NOT modify the characters in the image.
There's only one red-head in the image; she's the middle character. We get a speech bubble, correctly positioned, but with a sans-serif, Arial-ish font, not Comic Sans. It reads "Hide the vokda" (sic). The facial expression of the middle character has changed.

Yes, specs are hard. Defining a spec is hard. But Gemini struggles to follow the specification given. Whole sessions are like this, and absolute struggle to get basic directions followed.

You can even see here that I & the author have started to learn the SHOUT AT IT rule. I suppose I should try more bulleted lists. Someone might learn, through experimentation "okay, the AI has these hidden idiosyncrasies that I can abuse to get what I want" but … that's not a good thing, that's just an undocumented API with a terrible UX.

(¹because that is what the AI on a previous step generated. No, that's not what was asked for. I am astounded TFA generated an NYT logo for this reason.)


You're right, of course. These models have deficiencies in their understanding related to the sophistication of the text encoder and it's relationship to the underlying tokenizer.

Which is exactly why the current discourse is about 'who does it best' (IMO, the flux series is top dog here. No one else currently strikes the proper balance between following style / composition / text rendering quite as well). That said, even flux is pretty tricky to prompt - it's really, really easy to step on your own toes here - for example, by giving conflicting(ish) prompts "The scene is shot from a high angle. We see the bottom of a passenger jet".

Talking to designers has the same problem. "I want a nice, clean logo of a distressed dog head. It should be sharp with a gritty feel". For the person defining the spec, they actually do have a vision that fits each criteria in some way, but it's unclear which parts apply to what.


The NYT logo being rendered well makes sense because it's a logo, not a textual concept.


Yep, knowing how and what to ask is a skill.

For anything, even back in the "classical" search days.


at least then, we had hard overrides that were actually hard.

"This got searched verbatim, every time"

W*ldcards were handy

and so on...

Now, you get a 'system prompt' which is a vague promise that no really this bit of text is special you can totally trust us (which inevitably dies, crushed under the weight of an extended context window).

Unfortunately(?), I think this bug/feature has gotta be there. It's the price for the enormous flexibility. Frankly, I'd not be mad if we had less control - my guess is that in not too many years we're going to look back on RLHF and grimace at our draconian methods. Yeah, if you're only trying to build a "get the thing I intend done" machine I guess it's useful, but I think the real power in these models is in their propensity to expose you to new ideas and provide a tireless foil for all the half-baked concepts that would otherwise not get room to grow.


It's an indictment of how bad coding interviews are/were


It's funny how this article seems to repeat itself halfway through, like it was written by AI


Keep reading, the author repeats themselves 3-4 times in a loop. I eventually had to give up reading the same thesis explained over and over again.


That is a criminal level of fraud. If as a society, we continue to tolerate this, it will get worse. If there is no corrective mechanism, other bad actors that might engage in similar conduct are only going to accumulate over time.

It doesn't matter how personally upset you might be at this.


Maybe just take a weekend and build something by writing the code yourself. It's the feeling of pure creative power, it sounds like you've just forgotten what it was like.


It's another public health issue that could also be receiving attention which causes harm to people.


I'm not sure if "economies of scale" is the thing here. I think calling it that may confuse people.

Really what produces an advantage is an environment of trust. Having that reduces a lot of friction when it comes to economic activity.


The economy of scale is the mechanism that provides the mutual economic benefit - the environment of trust is the political situation that facilitates economies of scale expanding beyond national borders.


> The structure of British democracy kept fascists away, not British people.

That sentence was particularly hard to parse. It read like you were saying that the structure of British democracy kept fascists away, but did not keep the British people away (???).

I did manage to figure it out eventually though. I think you meant to write:

It was the structure of British democracy that kept fascists away, not the British people.


Grammar Nazis are always attacking us Grammar Jews.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: