Hacker Newsnew | past | comments | ask | show | jobs | submit | AJRF's commentslogin

that chart at the start is egregious

Feels like a tongue-in-cheek jab at the GPT-5 announcement chart.

The attempts at controlling the narrative feel a lot more unsubtle since Musk took over.

I bet dollars to donuts that they are tipping the scale on stoking up tensions on UK users with things like migration and class division.

I only follow tech people on twitter, but if you looked at my FYP you'd think I was deeply interested in UK politics - which I am not!


https://twitter.com/settings/your_twitter_data/twitter_inter...

think you'll be surprised in what has been signed for on your behalf.


Yup. Not just x though. On insta Even a slight misstep and you’re up to your eyeballs in anti migrant content from the algo

Very uncool of Eric! Thank you for the work you've put in over the years.

Do you think sampling is deterministic?

Topk sampling with temp = 0 should be pretty much deterministic (ignoring floating-point errors)

> Ignoring floating point errors.

I think you mean non-associativity.

And you can’t ignore that.


Ignoring floating point errors, assuming a perfectly spherical cow, and taking air resistance as zero.

Imagine you are predicting the next token, you have two tokens very close in probability in the distribution, kernel execution is not deterministic because of floating point non-associativity - the token that gets predicted impacts the tokens later in the prediction stream - so it's very consequential which one gets picked.

This isn't some hypothetical - it happens all the time with LLM's - it isn't some freak accident that isn't probable


Okay yes, but would you really say that the main part of non-determinism in LLM-usage stems from this ? No its obviously the topk sampling.

I don't think my tech-lead was trying to suggest the floating-point error/non-associativity was the real source.


> Would you really say that the main part of non-determinism in LLM-usage stems from this

Yes I would because it causes exponential divergence (P(correct) = (1-e)^n) and doesn't have a widely adopted solution. The major labs have very expensive researchers focused on this specific problem.

There is a paper from Thinking Machines from September around Batch Invariant kernels you should read, it's a good primer on this issue of non-determinism in LLM's, you might learn something from it!

Unfortunately the method has quite a lot of overhead, but promising research all the same.


Alright fair enough.

I dont think this is relevant to the main-point, but it's definitely something I wasn't aware of. I would've thought it might have an impact on like O(100)th token in some negligible way, but glad to learn.


This happened to me last night! I was going to bed and I clicked Update and Shut Down, then I went in to the other room.

After a few minutes I could see the blue glow of my Windows background shining on the wall.

Glad it is fixed!


Hey, by all means go ahead - my personal reasons for doing it this way;

- Self contained dependencies managed for me by the maintainer of the image.

- I already run docker and keep all my configuration in a git folder that is structured in a way my brain works

- I already run watchtower which updates container for me automatically

- Other containers can use this container to send mail

- sendmail.cf scares me.

- filesystem isolation if it gets pwned


Does this kind of thing happen to China + Russia?

I don't see news about that much - but to be fair, I am not looking for it.


They may also be less likely to admit it or allow any reporting on it


yes. but it doesn't get covered by western media. much like how NATO airplanes violating Russian airspace is not reported about either.


> much like how NATO airplanes violating Russian airspace is not reported about either.

How do you know it's happening?


Yes, recently some russian airline was hacked, they also used microsoft mail servers


Added a point about that, thanks.


I think iterating on hypothesis to try uncover the truth is a better use of time than saying everything is too complex and giving up.


I agree, but that’s not what people do. People usually fixate on one preferred explanation and then give up. Usually it’s the explanation that confirms their prejudices and biases.

I don’t think doom scrolling is healthy. I just doubt that it’s a single explanation.


Thanks - I don't really take the comments as entirely negative, people want rigour and I agree the point could be made more convincingly.

I would love for a proper study of this hypothesis to be done.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: