Discussion about this post

User's avatar
Science Fiction Stories's avatar

The comparison between sf tropes and chemical/biological/nuclear data is fascinating. We've long understood that how-to guides for toxins are dangerous, but your idea that how-to guides for behaving like a malevolent god might be just as risky is something else! It would be the ultimate irony if the stories meant to save us ended up being the ones that scripted our exit. A thought-provoking read.

Karl von Wendt's avatar

I think the idea that AI will learn bad behavior from human literature (or human behavior in general) is fundamentally mistaken and leads to the wrong conclusion that we can fix this by just "not being bad" or filtering bad behavior out of the training data.

In reality, the reason why AI behaves badly in SF is often the same it behaves badly in reality: it is simply reasonable to do so from the AI's perspective. I say that as a SF writer, but it also follows from decision theory. Powerseeking and preventing being turned off are convergent instrumental goals that will be part of any plan of a sufficiently intelligent AGI and will naturally lead to conflict with humans. The way to prevent this is to not build an AGI until we have solved the alignment- and/or control problem (which I doubt we will anytime soon).

Also, knowing how to be good implies knowing how to be bad (e.g. by doing the opposite of what is "good"). There's no way to prevent a sufficiently intelligent AI from understanding that.

2 more comments...

No posts

Ready for more?