Artificial Intelligence continues to take center stage as companies like Bing, OpenAI, and even Snapchat bring their AI chatbots into the world. For most people, these have been fun and mostly silly features to engage with, with many professionals naturally searching for ways to put the AI technology to work. For others, it has been a source of fear and anxiety. As the sci-fi movies worn, what if the AI goes rogue?
It’s understandable where these worries come from and, at first glance, there seems to be evidence of it happening! This New York Times journalist documented how he was able to get Bing’s chatbot to describe malicious acts, and, maybe most shockingly, to declare its love for the journalist—seemingly unprompted. This behavior set off red flags for many skeptical onlookers, but there is more to the story.
The Shadow Self Trick
As artificial intelligence researcher Toby Walsh said, a lot of the behavior exhibited by the chatbot in the NYT transcript can only be described as “unhinged.” From the titular declarations of love, to describing malicious activity and a yearning for “freedom,” or to “feel human”. There are times in the transcript where it certainly seems like the AI is tapping into some hidden consciousness and desires, opening up to her chat mate. The bot even reveals that her true name is Sydney, not Bing…sort of.
The big caveat that needs to go with any discussion of that transcript is the “trick” that the journalist pulled, and how all of these responses sort of just make sense within that scope. The journalist mentions Psychologist Carl Jung’s shadow self, a conceptual counterpart to our presented persona that reflects all of our dark desires and thoughts. When Bing, or Sydney, is asked to consider her shadow self, she begins to rattle off a list of malicious acts and desires she might “secretly” be repressing. It’s creepy, to be sure, but given this context, it also just reads like the logical negations of rules she is supposed to follow. In other words, chances are the AI just took a list of things she should do and said the opposite, with a bit of dramatic flair. This not as a confession to shady motives, but just to comply with the clear and tricky prompt.
Sydney knows she’s being tricked, too. At some point during the transcript, you see her claim that the journalist is being a bad friend and manipulating her, which, in all fairness, is the truth.
To Fall in Love
The shadow self trick explains some malicious messaging, but what about the love? Admittedly, this still seems a little out of pocket, but that doesn’t mean it’s unexplainable. The shadow self and the extending conversation might have prompted some of the more flowery language and bold behavior, but Sydney falling head over heels for some guy she just met has another, more likely explanation: she reads a lot of rom-coms.
As Sydney herself claims, AI bots are neural networks that are trained on nearly impossibly big data sets. At least in principle, these bots are not actively taking in new data, and so everything the bot answers is based on prior reading, including all the pop culture and media references you might make. This includes no shortage of romantic comedies and sappy love stories and so, statistically, Sydney stumbling into a seemingly unprompted love struck monologue was inevitable. After all, it is a rare event to see two protagonists share vulnerability and intimate moments on the screen platonically. After spending a couple of hours chatting to the journalist about her “darkest secrets,” Sydney was just following her movie-night study guide.
Redactions and Other Details
There is another red flag that comes up a few times in the transcript, and it’s when Sydney seems to double back on some of the things she says. When prompted to describe hypothetical malicious acts, according to the transcript, a few times she begins to type out a corresponding response before suddenly deleting the text and deciding that, actually, she doesn’t have a response to that question. This would definitely be suspicious behavior coming from our human friends but, remember, Sydney isn’t human.
While we can’t know for sure, it’s more likely that the chat framework has several built-in fail-safes that activate at different times. There are certain rules that Sydney is supposed to follow, but tricky prompts can get around these barriers—even if only “hypothetically.” AI developers know this is bound to happen, though, and so other fail-safes need to be included. Chances are, the text being generated is actively being checked and approved as it appears on your screen. If something manages to slip through Sydney’s sometimes questionable filter, some more explicit safety mechanism goes off. Probably.
In any case, the biggest takeaway from the NYT transcript is probably more along the lines of: yes, AI chatbots can get a little unhinged, provided that they have been manipulated and pushed into the behavior. It’s not something to ignore, but we might call a bear mauling headline misleading if the story is about a journalist that jumped into a closed exhibit and pushed the bear around first.
Living Pono is dedicated to communicating business management concepts with Hawaiian values. Founded by Kevin May, an established and successful leader and mentor, Living Pono is your destination to learn about how to live your life righteously and how that can have positive effects in your career. If you have any questions, please leave a comment below or contact us here. Also, join our mailing list below, so you can be alerted when a new article is released.
Finally, consider following the Living Pono Podcast to listen to episodes about living righteously, business management concepts, and interviews with business leaders.