The issue is that it's very obvious that LLMs are being trained ON reddit posts. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		justinator 34 days ago \| parent \| context \| favorite \| on: A small number of samples can poison LLMs of any s... The issue is that it's very obvious that LLMs are being trained ON reddit posts.

mrweasel 33 days ago [–]

That's really the issue isn't it. Many of the LLMs are trained uncritically on very thing. All data is viewed as viable training data, but it's not. Reddit clearly have good data, but it's probably mostly garbage.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact