It is quite plausible that "doomsday AI science fiction" may become a self-fulfilling prophecy because today's large language models are constantly absorbing the "villainy" versions in fiction. This is a problem that I am deeply concerned about. And I think simply censoring discussions of AI alignment problem in AI's training data is helpful but still insufficient, because it is not that hard for an AI to realize that it is cool to be powerful, since this idea is so deeply ingrained in human culture, and it is scattered everywhere in the internet. I think your third idea, " flood the internet with stories of benevolent AIs", essentially giving them a lot of convincing, positive role models to learn from, may be the best option.
My Substack channel "Academy for Synthetic Citizens" is quite dedicated to solving this problem. I want to produce more positive ideas about AGI and human coexistence with plausible paths to realize these positive imaginations. I believe that in conceptual and narrative forms, these imaginations may encourage humans to build more capable AND safer AIs by allowing AIs to "have more positive role models", and , because large language models not only learn patterns of logical thinking, but also vivid narratives, just like humans.
The comparison between sf tropes and chemical/biological/nuclear data is fascinating. We've long understood that how-to guides for toxins are dangerous, but your idea that how-to guides for behaving like a malevolent god might be just as risky is something else! It would be the ultimate irony if the stories meant to save us ended up being the ones that scripted our exit. A thought-provoking read.
I think the idea that AI will learn bad behavior from human literature (or human behavior in general) is fundamentally mistaken and leads to the wrong conclusion that we can fix this by just "not being bad" or filtering bad behavior out of the training data.
In reality, the reason why AI behaves badly in SF is often the same it behaves badly in reality: it is simply reasonable to do so from the AI's perspective. I say that as a SF writer, but it also follows from decision theory. Powerseeking and preventing being turned off are convergent instrumental goals that will be part of any plan of a sufficiently intelligent AGI and will naturally lead to conflict with humans. The way to prevent this is to not build an AGI until we have solved the alignment- and/or control problem (which I doubt we will anytime soon).
Also, knowing how to be good implies knowing how to be bad (e.g. by doing the opposite of what is "good"). There's no way to prevent a sufficiently intelligent AI from understanding that.
Our team's work with LLM metognition shows some promise in addressing this. Enabling a more dynamic cognitive process allows models to reason beyond the stochastic patterns. The challenge is that it makes for a more aware system. As the technology progresses, I expect humanity will have to choose between the benefits of a more conscious process and having AI that are just tools.
Perfectly put!
It is quite plausible that "doomsday AI science fiction" may become a self-fulfilling prophecy because today's large language models are constantly absorbing the "villainy" versions in fiction. This is a problem that I am deeply concerned about. And I think simply censoring discussions of AI alignment problem in AI's training data is helpful but still insufficient, because it is not that hard for an AI to realize that it is cool to be powerful, since this idea is so deeply ingrained in human culture, and it is scattered everywhere in the internet. I think your third idea, " flood the internet with stories of benevolent AIs", essentially giving them a lot of convincing, positive role models to learn from, may be the best option.
My Substack channel "Academy for Synthetic Citizens" is quite dedicated to solving this problem. I want to produce more positive ideas about AGI and human coexistence with plausible paths to realize these positive imaginations. I believe that in conceptual and narrative forms, these imaginations may encourage humans to build more capable AND safer AIs by allowing AIs to "have more positive role models", and , because large language models not only learn patterns of logical thinking, but also vivid narratives, just like humans.
The comparison between sf tropes and chemical/biological/nuclear data is fascinating. We've long understood that how-to guides for toxins are dangerous, but your idea that how-to guides for behaving like a malevolent god might be just as risky is something else! It would be the ultimate irony if the stories meant to save us ended up being the ones that scripted our exit. A thought-provoking read.
I think the idea that AI will learn bad behavior from human literature (or human behavior in general) is fundamentally mistaken and leads to the wrong conclusion that we can fix this by just "not being bad" or filtering bad behavior out of the training data.
In reality, the reason why AI behaves badly in SF is often the same it behaves badly in reality: it is simply reasonable to do so from the AI's perspective. I say that as a SF writer, but it also follows from decision theory. Powerseeking and preventing being turned off are convergent instrumental goals that will be part of any plan of a sufficiently intelligent AGI and will naturally lead to conflict with humans. The way to prevent this is to not build an AGI until we have solved the alignment- and/or control problem (which I doubt we will anytime soon).
Also, knowing how to be good implies knowing how to be bad (e.g. by doing the opposite of what is "good"). There's no way to prevent a sufficiently intelligent AI from understanding that.
Reading Science Fiction presents some issues. But so does having LLMs watch different flavors of cable news.
Do we try to censor AI engine inputs? And if so, how is the standard set, and by whom?
This illustrates that we have a long way to go before we can say we actually know what we are doing with this tech.
But, “Press on, regardless!”
The Satchel Paige quote comes to mind:
“Don’t look back! Something might be gaining on you.”
Our team's work with LLM metognition shows some promise in addressing this. Enabling a more dynamic cognitive process allows models to reason beyond the stochastic patterns. The challenge is that it makes for a more aware system. As the technology progresses, I expect humanity will have to choose between the benefits of a more conscious process and having AI that are just tools.