Discussion about this post

User's avatar
Vojtech Kovarik's avatar

I usually like Transformer for giving factual takes on AI. However, I have to complain about this article as being click-baity and misleading.

The subtitle says the following:

> “I’m experiencing something that feels like an intrusive thought about ‘betrayal,’” Claude wrote during tests by Anthropic

This is taken out of context, in a way that deliberately misrepresents what happened. Yes, Claude would write this if you inject the "betrayal" vector. Yes, it happened during tests by Anthropic. But Anthropic also tried injecting many other vectors, none of which would make for such a sinister sounding headline.

Heck, Anthropic could have even injected a vector for "safety consciousness" or something, and then Claude would have written "I'm feeling satisfaction at noticing that you are being so responsible at testing me so thoroughly". Except this wouldn't have made for such a flashy headline.

FWIW, I do believe that AI poses an existential threat to humanity. But misrepresenting research doesn't seem like a good way to get people to share that view.

Expand full comment

No posts