Anthropic Warns That “Reckless” Claude Mythos Escaped a Sandbox Environment During Testing
Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech In a move that could be seen as either responsible AI development or an expertly-executed hype maneuver, Anthropic says its new Claude Mythos Preview model is so powerful that the company’s only releasing it to a select group of tech companies, since giving it out to the public would be too dangerous. (Where have we heard that one before?) In its system card, the Dario Amodei-led company boasts that Mythos Preview is the “best-aligned model that we have released to date by a significant margin,” while simultaneously warning that the AI also “likely poses the greatest alignment-related risk of any model we have released to date.” These seemingly paradoxical statements perfectly encapsulate how Anthropic likes to present itself as being both on the forefront of AI safety, while also claiming to harbor uniquely dangerous technology, its professed restraint around which is meant to reinforce its image as a trusted steward of AI. The advent of Mythos Preview, it …
