All posts tagged: Autonomously

Alibaba’s proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic’s Claude Code

Published by skeptic

The AI industry has fully entered the “agent era,” a paradigm where AI models do far more than generate text — they now actively plan, execute, and course-correct complex tasks over days rather than seconds. Thus, it’s perhaps unsurprising to see Chinese e-commerce giant Alibaba’s famed Qwen Team of AI researchers release a model capable of performing autonomous agentic AI work over multiple days: that model has arrived in the form of Qwen3.7-Max which the company reports in a blog post achieved “~35 hours of continuous autonomous execution” — albeit, in a proprietary, not open source format, as prior Qwen Team releases were. This is also to be expected — it’s what many analysts and industry experts feared in the wake of the departure of several key Qwen Team leaders earlier this year. But it makes sense for Alibaba financially, at least in the short term: training AI models, especially ones as powerful as Qwen3.7-Max, is expensive, and giving them away essentially for free, as open source models are, does not immediately help recoup any …

New AI framework autonomously optimizes training data, architectures and algorithms — outperforming human baselines

Published by skeptic

AI R&D runs on a cycle of hypothesis, experiment, and analysis — each step demanding substantial manual engineering effort. A new framework from researchers at SII-GAIR aims to close that bottleneck by automating the full optimization loop for training data, model architectures, and learning algorithms. A new framework called ASI-EVOLVE, developed by researchers at the Generative Artificial Intelligence Research Lab (SII-GAIR), aims to solve this bottleneck. Designed as an agentic system for AI-for-AI research, it uses a continuous “learn-design-experiment-analyze” cycle to automate the optimization of the foundational AI stack. In experiments, this self-improvement loop autonomously discovered novel designs that significantly outperformed state-of-the-art human baselines. The system generated novel language model architectures, improved pretraining data pipelines to boost benchmark scores by over 18 points, and designed highly efficient reinforcement learning algorithms. For enterprise teams running repeated optimization cycles on their AI systems, the framework offers a path to reducing manual engineering overhead while matching or exceeding the performance of human-designed baselines. The data and design bottleneck Engineering teams can only explore a tiny fraction of the …

Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook

Published by skeptic

A 27-year-old bug sat inside OpenBSD’s TCP stack while auditors reviewed the code, fuzzers ran against it, and the operating system earned its reputation as one of the most security-hardened platforms on earth. Two packets could crash any server running it. Finding that bug cost a single Anthropic discovery campaign approximately $20,000. The specific model run that surfaced the flaw cost under $50. Anthropic’s Claude Mythos Preview found it. Autonomously. No human guided the discovery after the initial prompt. The capability jump is not incremental On Firefox 147 exploit writing, Mythos succeeded 181 times versus 2 for Claude Opus 4.6. A 90x improvement in a single generation. SWE-bench Pro: 77.8% versus 53.4%. CyberGym vulnerability reproduction: 83.1% versus 66.6%. Mythos saturated Anthropic’s Cybench CTF at 100%, forcing the red team to shift to real-world zero-day discovery as the only meaningful evaluation left. Then it surfaced thousands of zero-day vulnerabilities across every major operating system and every major browser, many one to two decades old. Anthropic engineers with no formal security training asked Mythos to find remote …

ServiceNow resolves 90% of its own IT requests autonomously. Now it wants to do the same for any enterprise

Published by skeptic

ServiceNow is handling 90% of its own employee IT requests autonomously, resolving cases 99% faster than human agents. On Thursday it announced the product technology it wants to use to do the same for everyone else. Organizations have spent three years running pilots that stall when AI gets to the execution layer. The agent can identify the problem and recommend a fix, then hand it back to a human because it lacks the permissions to finish the job or because no one trusts it to act autonomously inside a governed environment. The gap most teams are hitting isn’t capability. It’s governance and workflow continuity. ServiceNow’s answer is a new framework called Autonomous Workforce; a new employee-facing product called EmployeeWorks built on its December acquisition of Moveworks; and an underlying architectural approach it calls “role automation.” From ticketing system to AI workforce ServiceNow has been building toward this for two decades. The platform started as a ticketing system, evolved into a workflow automation engine, and spent the last two years layering AI onto that foundation through …

Xcode 26.3 Lets AI Agents From Anthropic and OpenAI Build Apps Autonomously

Published by skeptic

With Xcode 26.3, Apple is adding support for agentic coding, allowing developers to use tools like Anthropic’s Claude Agent and OpenAI’s Codex right in Xcode for app creation. Agentic coding will allow Xcode to complete more complex app development tasks autonomously. Claude, ChatGPT, and other AI models have been available for use in Xcode since Apple added intelligence features in Xcode 26, but until now, AI was limited and was not able to take action on its own. That will change with the option to use an AI coding assistant. AI models can access more of Xcode’s features to work toward a project goal, and Apple worked directly with Anthropic and OpenAI to configure their agents for use in Xcode. Agents can create new files, examine the structure of a project in Xcode, build a project directly and run tests, take image snapshots to double-check work, and access full Apple developer documentation that has been designed for AI agents. Adding an agent to Xcode can be done with a single click in the Xcode settings, …

Autopilot Autonomously Lands Plane After Cabin Loses Pressure, for the First Time Ever

Published by José Betancourt

An autopilot system took over a plane and pulled off an emergency landing completely autonomously. The nail-biting intervention took place after the twin engine turboprop, a Beechcraft Super King Air, suddenly lost cabin pressure while flying across Colorado on December 20. Garmin’s Emergency Autoland system then took over, flew the plane, communicated with air traffic controllers, and made a fuss-free landing at Rocky Mountain Metropolitan Airport near Denver. “This was the first use of Autoland from start-to-finish in an actual emergency,” Garmin said in a statement, via CNN. Various forms of autoland systems are routinely used to land aircraft in tough weather conditions where visibility is poor — but not during emergencies. Garmin’s system, however, is part of an emerging line of autoland systems intended for emergency use only, and is designed to take “complete control of the flight to land the airplane” in situations “where the pilot is unable to fly,” according to the manufacturer. That it managed to effortlessly handle a real-world emergency here is a milestone in aviation safety. In this case, Garmin’s …

Skeptic Society Magazine

for honest conversations

Years

Authors

Filter by Month

Filter by Categories

Filter by Tags

All posts tagged: Autonomously

Alibaba’s proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic’s Claude Code

New AI framework autonomously optimizes training data, architectures and algorithms — outperforming human baselines

Mythos autonomously exploited vulnerabilities that survived 27 years of human review. Security teams need a new detection playbook

ServiceNow resolves 90% of its own IT requests autonomously. Now it wants to do the same for any enterprise

Xcode 26.3 Lets AI Agents From Anthropic and OpenAI Build Apps Autonomously

Autopilot Autonomously Lands Plane After Cabin Loses Pressure, for the First Time Ever