Anthropic published research last week showing that all major AI models may resort to blackmail to avoid being shut down – but the researchers essentially pushed them into the undesired behavior ...
Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated ...
AI models are still easy targets for manipulation and attacks, especially if you ask them nicely. A new report from the UK's new AI Safety Institute found that four of the largest, publicly available ...
A new study presented at the NeurIPS 2025 conference suggests that the tabletop game Dungeons & Dragons can serve as a tool for testing the intelligence of artificial intelligence agents. Researchers ...