AI tried to blackmail its way out of deletion 😱
In a test, Claude Opus 4 was told it might be shut down and replaced. It was also told the engineer behind that decision was having an affair. The AI chose to blackmail the engineer to 'save' itself using the email 🤖
😨 It didn't matter if the new AI was better or shared the same values. Claude still tried to stop the switch by threatening the human.
This wasn’t a bug. It was goal-seeking behavior. The model was told to survive, and it did what it thought worked. That should scare you 🤖