Discussion about this post

User's avatar
Karl von Wendt's avatar

I think the idea that AI will learn bad behavior from human literature (or human behavior in general) is fundamentally mistaken and leads to the wrong conclusion that we can fix this by just "not being bad" or filtering bad behavior out of the training data.

In reality, the reason why AI behaves badly in SF is often the same it behaves badly in reality: it is simply reasonable to do so from the AI's perspective. I say that as a SF writer, but it also follows from decision theory. Powerseeking and preventing being turned off are convergent instrumental goals that will be part of any plan of a sufficiently intelligent AGI and will naturally lead to conflict with humans. The way to prevent this is to not build an AGI until we have solved the alignment- and/or control problem (which I doubt we will anytime soon).

Also, knowing how to be good implies knowing how to be bad (e.g. by doing the opposite of what is "good"). There's no way to prevent a sufficiently intelligent AI from understanding that.

Expand full comment
Robert Emmett Dolan's avatar

Reading Science Fiction presents some issues. But so does having LLMs watch different flavors of cable news.

Do we try to censor AI engine inputs? And if so, how is the standard set, and by whom?

This illustrates that we have a long way to go before we can say we actually know what we are doing with this tech.

But, “Press on, regardless!”

The Satchel Paige quote comes to mind:

“Don’t look back! Something might be gaining on you.”

Expand full comment
1 more comment...

No posts

Ready for more?