Anthropic can now track the bizarre inner workings of a large language model

Odd behavior So: What did they find? Anthropic looked at 10 different behaviors in Claude. One…