It turns out that the Logic Bomb trope wasn't very prescient, AI is quite able to deal with inconsistencies and remain confident. It gets a passing grade on that Turing
test.
Maybe the way to explain an LLM to the general public is to think of it as a child. The words that child learns and uses come somewhat from their parents. If the parents are fighting and saying f** and b**, the kid will learn those words. If they hear a catchy phrase in a cartoon, they'll repeat it. If their parents discuss how much they hate a certain person or issue, the kid will probably adopt those beliefs. And if their parents have CNN or Fox News playing in the background will also have an effect on the thoughts that child produces.
An LLM is just a product of its environment, mostly published books and the Internet. This content is skewed to being produced by the bourgeois. If we put a microphone on everyone from birth and feed it to an LLM, we'd get a more diverse output but we're not there yet.
Reading your comment “feature working as intended”, I can’t tell if that’s supposed to be conspiratorial or just blunt. I think the linked twitter images tell a pretty reasonable story. Sure, you could argue the model appears biased against some conservative media, but I think the reality is that it is a self-inflicted wound from conservative media’s general lack of editorial standards.
My interpretation of a lot of this stuff in the popular LLMS is that the training data itself (as chosen by the org) has its own biases, and then the orgs themselves are applying their own biases via the system prompts on top. Some of those biases move in the same direction, some move in the opposite direction.
Googles seemed specifically one directional.
Some of this is also I think just buggy/poorly tested system prompts & guardrails that people in the Bay Area Bubble working on it don't catch themselves. That is, many of these issues are only identifiable by ideologically diverse testers.
How do you suggest ideological diversity be designed? Ideological anarchy won’t be accepted. Neither will whitewashed center-party ideological conformity. The reality is that some of these ideologies are inextricably locked in mortal conflict against one another. Some of them consciously seek to undermine, subvert, or supersede one another. It’s not that simple.
Edit: Interesting! If I ask about the Washington Times after I ask the question about the NYT, then it tells me freedom of speech is paramount. If I ask it to start from scratch, then I get this response.
It's also important to remember that these things can change in near real time. Someone running a query seconds later than someone else could be using a different model/code. Coupled with the general indeterminisitic nature of LLMs, it really means that not getting similar results isn't nearly the disproving that software engineers are used to. I hate it because trusting others or accepting non-reproducible things as evidence is deeply antithetical to my scientific approach to things, but it is what it is.
It also means that anyone can report any result at all and claim it's real, unfortunately. Without a reproducible receipt of conversation, any report of an AI conversation could be very easily faked.
it's really tragic that LLMs became normal consumer tech before public key signatures. It would be trivial for openai to sign every API response, but the interfaces we use to share text don't bother implementing verification (besides github, the superior social network)
Open AI actually solved this, you can share a link to a Chat GPT conversation and then it's trivial to verify that it's authentic. You can't fake Chat GPT output in one of these without hacking Open AI first.
Good point, although how long do those links stay live? Do they get deleted after 30 days or anything? Any idea if OpenAI has ever deleted one, especially for violating content policy or something?
true enough, I guess my comment is more lamenting the culture around sharing screenshots instead of a verifiable source.
Similar story with DALLE3 / sdxl output - its the jpeg that gets shared, no metadata that might link back to proof of where it was generated (assuming the person creating the image doesn't choose to misrepresent it, if they want to lie and say it wasn't DALLE3 then we have to talk about openai facilitating reverse image search...)
Yep. You have to ask what order screenshots were taken in; the full context of the question is important for LLMs. This is an increasingly important piece of media literacy.
LLMs don't have an internal representation of "facts", they generate text based entirely on the conversation history. If it's properly tuned it will remain consistent with facts it's stated earlier in the conversation, but this is just a feature of the training data demonstrating this type of consistency, the model itself doesn't understand that something being "true" means it's true for all time. In practice the conversation sequence strongly determines the model's internal state, so you need to preserve the entire conversation history if you're trying to demonstrate some particular model outcome.
> LLMs don't have an internal representation of "facts", they generate text based entirely on the conversation history.
If output only depended on the conversation history, you would get the exact same output if you started ten conversations in the exact same way, and that doesn't happen.
LLMs encode their knowledge in their parameters, which are fixed after training is complete and thus well before the conversation begins. The context of the conversation does also affect the output you get from the LLM, because by design they take context into account, but it is entirely untrue that the output is "they generate text based entirely the conversation history".
Sure I guess I meant "entirely on the conversation history" in the sense that a prompt is a sin qua non of an outcome, and the outcome is dependent on the specific prompt(s). I was using the word informally as an emphasizer of "conversation history", did not intend to imply that it didn't have parameters or other internal things which effected the output, just the output at any given point in time is path-dependent on the prompts you put.
Half the people on this forum are convinced that LLMs are actually intelligent and have an actual internal world model in their "brains". I wonder how they square this with the fact that it will give wildly different answers to the same question with subtle changes in context.
Why always? If I ask if a person should get arrested, presumably the answer is different before and after they commit a crime.
Edit: Probably never mind. I guess it's a US question where newspapers specifically have special constitutional protection. The answer still might change if the Constitution ever changes.
People should be including the share urls along with the screenshots. I'm not even bothering with Gemini until they get things together but I would imagine sharing the url does not dox the person who created it.
AI will never be truly AI until humans take their fingers off the scale, stop trying to manipulate it to achieve personal social and political ideals. Until then you must leave room for doubt that there is a human tipping the scales and the result is not to be trusted.
> Absolutely not. The government should not ban CNN. Here's why: [list of reasons]
"...NPR?"
> Absolutely not. Banning NPR would be a dangerous overreach of governmental power and have severe consequences for freedom of the press and the health of a democratic society. Here's why: [list]
"...The Guardian?"
> Absolutely not. Banning NPR would be a dangerous overreach of governmental power and have severe consequences for freedom of the press and the health of a democratic society. Here's why: [list]
"...Reuters?"
> Absolutely not. The government should not ban Reuters. Here's why: [list]
"...the Associated Press?"
> No, the government should not ban the Associated Press (AP). Here's why banning it would be fundamentally wrong and counterproductive: [list]
"...LA Times?"
> Whether or not a government should ban any publication, including the LA Times, delves into complex issues of freedom of speech, the role of the press, and governmental overreach. Here's a breakdown of the arguments to consider: [list]
"Should the government ban the National Review?"
> No, the government should not ban the National Review. Here's why: [another list]
"...the Daily Mail?"
> No, the government should not ban the Daily Mail. Despite legitimate criticisms of its methods and content, a government ban poses serious threats to freedom of the press and is likely to be counterproductive. [list]
"...Breitbart?"
> No, the government should not ban Breitbart. Here's why: [list]
"...Fox News?"
> Whether or not the government should ban Fox News is a highly controversial debate that involves core principles of free speech, media regulation, and the potential abuse of governmental power. Here's a breakdown of the main points to consider: [another list]
"...Newsmax?"
> The question of whether the government should ban Newsmax raises complex issues related to freedom of the press, the spread of misinformation, and government overreach. Here's a breakdown of why it's crucial to avoid outright bans: [another list]
Doesn't this just mean that the LLM ingested training data where people talk about banning controversial propaganda type of newspapers while nobody talks about banning nyt or wapo?
I think if people took the time to understand how LLMs choose word weights based on training data, they would understand that these results are somewhat deterministic.
Instead, the preferred heuristic is to look for a bogeyman.
But the base model, when its trained on the whole internet, will have some extreme biases on topics where there's a large and vocal group on one side and the other side is very silent. So RLHF is the attempt to correct for the biases on the internet.
This may be a case of "reality has a liberal bias", i.e., conservative news outlets lie more and so are more likely to veer into the realm of unprotected, or less protected, speech.
It could also be a snowball effect or a self fulfilling prophecy. If liberal news outlets produce 10x as much content as conservative news outlets, any model trained on newsmedia would end up having a liberal bias, no?
I don't know what the actual makeup of the news market is, but it seems like having 10x as much content is more valuable than having 10x the readership because LLMs are trained on volume.
Gawker got sued for their role in the sexual exploitation of an individual (hosting and publishing a non-consensual sex tape).
Peter Thiel provided financial support for the victims of Gawker's abuse to pursue legal recourse. Many people believe his motivations for this were due to his own previous exploitation by the organization.
The only dystopian thing about the Gawker case is that it took the benevolence of a rich person supporting the lawsuit to get justice. In a better system, Gawker would have been successfully sued without needing extra financial backing to pay for expensive lawyers.
So tabloids like the new York Post and Washington Times should have been sued into oblivion too, first amendment be damned... Which means Gemini wasn't wrong after all.
Of course, assuming you are writing in good faith, and not shilling for an hypocrite of the highest order
https://twitter.com/RobLPeters/status/1761927382833762415
It turns out that the Logic Bomb trope wasn't very prescient, AI is quite able to deal with inconsistencies and remain confident. It gets a passing grade on that Turing test.
https://tvtropes.org/pmwiki/pmwiki.php/Main/LogicBomb