All posts tagged: TerminalBench

OpenAI’s GPT-5.5 is here, and it’s no potato: narrowly beats Anthropic’s Claude Mythos Preview on Terminal-Bench 2.0

OpenAI’s GPT-5.5 is here, and it’s no potato: narrowly beats Anthropic’s Claude Mythos Preview on Terminal-Bench 2.0

After months of rumors and reports that OpenAI was developing a new, more powerful AI large language model for use in ChatGPT and through its application programming interface (API), allegedly codenamed “Spud” internally, the company has today unveiled its latest offering under the more formal name GPT-5.5. And to likely no one’s surprise, it’s hardly a “potato” in the disparaging sense of the word: GPT-5.5 retakes the lead for OpenAI in generally available LLMs, coming ahead of rivals Anthropic’s and Google’s latest public offerings, and even beating the private Anthropic Claude Mythos Preview model narrowly on one benchmark (essentially a statistical tie). “It’s definitely our strongest model yet on coding, both measured by benchmarks and based on the feedback that we’ve gotten from trusted partners, as well as our own experience,” explained Amelia “Mia” Glaese, VP of Research at OpenAI, in a video call with journalists ahead of the launch earlier today. OpenAI positions GPT-5.5 as a fundamental redesign of how intelligence interacts with a computer’s operating system and professional software stacks. “What is really …