That's a fascinating paper, but you're editorializing it a bit. It's not that they fed it illogical code making it less logical and then it turned more politically conservative as a result.
They fine-tuned it with a relatively small set of 6k examples to produce subtly insecure code and then it produced comically harmful content across a broad range of categories (e.g. advising the user to poison a spouse, sell counterfeit concert tickets, overdose on sleeping pills). The model was also able to introspect that it was doing this. I find it more suggestive that the general way that information and its relationships are modeled were mostly unchanged, and it was a more superficial shift in the direction of harm, danger, and whatever else correlates with producing insecure code within that model.
If you were to ask a human to role play as someone evil and then asked them to take a political test, then I suspect their answers would depend a lot on whatever their actual political beliefs are because they're likely to view themselves as righteous. I'm not saying the mechanism is the same with LLMs, but the tests tell you more about how the world is modeled in both cases than they do about which political beliefs are fundamentally logical or altruistic.
That's not just "editorializing a bit"; the article says nothing whatsoever about political views. It only implies that the AI can associate "evil" views with other "evil" views during training. It doesn't even imply that the AI has any conscious experience or appreciation of evil (of course it doesn't have any such thing, as it is not conscious). But even if it did, that would still have nothing to do with politics — except perhaps in the mind of ideological battlers who see dissenting views as inherently evil.
My dream is to have a device about the form factor of a Flipper Zero (with the same buttons) that I can then plug into a e-ink monitor and mechanical keyboard to turn it into a text editor.
I have built a few prototypes with Raspberry Pi Zeros, which are luxurious web servers -- 512mb of ram, capable of utilizing a 2tb sd card.
SEEKING WORK
Prompt Engineering and Vanilla JS, CSS and HTML
Location: Los Angeles, CA
Remote: remote only
I am a technologist, software engineer and designer Available for remote, project-based work concerning Web Consulting, Prompt Engineering, Web Development and Prototyping.
If AI images destroy advertising and marketing as industries I'm all for them. Maybe then we could have an economy based on designing, manufacturing and producing goods and services.
I've written a lot of code before LLM's, also. https://github.com/lnsy-dev/
email: lindsey.mysse@gmail.com
reply