All posts tagged: token

Opus 4.7 vs. Opus 4.6: Is the 35% Token Increase Worth It?

Opus 4.7 vs. Opus 4.6: Is the 35% Token Increase Worth It?

Opus 4.7 brings a host of advancements to the table, from refined coding accuracy to improved visual processing and a more intuitive user interface. Better Stack highlights how these upgrades enhance both functionality and precision, making the model a strong contender for diverse applications. However, one notable drawback is the increased token usage, up to 35% higher in certain configurations, particularly at the default “extra high” effort level. This change could impact users managing large-scale projects or operating within tight budgets, requiring careful adjustments to settings to balance costs and performance. Dive into this feature to explore how Opus 4.7’s enhanced instruction-following capabilities can improve alignment with user intent, why its upgraded multimodal processing is a fantastic option for combining text and visuals and how its memory improvements streamline workflows for long-term projects. You’ll also gain insight into practical strategies for managing token consumption and configuring the model to suit your specific needs. By understanding these nuances, you can make the most of Opus 4.7’s strengths while navigating its trade-offs effectively. Key Performance Enhancements TL;DR …

Grassroots venues call Labour rates relief ‘token gesture’ amid fight for survival

Grassroots venues call Labour rates relief ‘token gesture’ amid fight for survival

Get the inside track from Roisin O’Connor with our free weekly music newsletter Now Hear This Get our free music newsletter Now Hear This Get our free music newsletter Now Hear This It’s no secret that the situation of Britain’s music venues has grown increasingly fraught, made worse by the Covid pandemic, the cost-of-living crisis, and shifting trends in alcohol consumption. That’s why, when Labour pledged to reduce business rates for pubs and music venues by 15 per cent back in January, many business owners across the country breathed a collective sigh of relief. But, after the changes finally came into effect earlier this week, just how positive are the UK’s grassroots venues feeling about the future? Punk project, Total Con, perform at The Lughole as part of its Noise Annoys Fest in 2025. (Credit: Instagram / Alex Brown / @aroutinesearch) Adam Regan, owner of the historic Hare & Hounds in south Birmingham, told The Independent that the new rates relief left much to be desired, even while he feels confident that his business is …

How xMemory cuts token costs and context bloat in AI agents

How xMemory cuts token costs and context bloat in AI agents

Standard RAG pipelines break when enterprises try to use them for long-term, multi-session LLM agent deployments. This is a critical limitation as demand for persistent AI assistants grows. xMemory, a new technique developed by researchers at King’s College London and The Alan Turing Institute, solves this by organizing conversations into a searchable hierarchy of semantic themes. Experiments show that xMemory improves answer quality and long-range reasoning across various LLMs while cutting inference costs. According to the researchers, it drops token usage from over 9,000 to roughly 4,700 tokens per query compared to existing systems on some tasks. For real-world enterprise applications like personalized AI assistants and multi-session decision support tools, this means organizations can deploy more reliable, context-aware agents capable of maintaining coherent long-term memory without blowing up computational expenses. RAG wasn’t built for this In many enterprise LLM applications, a critical expectation is that these systems will maintain coherence and personalization across long, multi-session interactions. To support this long-term reasoning, one common approach is to use standard RAG: store past dialogues and events, retrieve …

ChatGPT 5.4 Thinking vs Earlier Models : Token Savings and Stronger Self-Checks

ChatGPT 5.4 Thinking vs Earlier Models : Token Savings and Stronger Self-Checks

The integration of GPT-5.4 Thinking into frontend development introduces a new level of efficiency and precision, particularly through its enhanced Computer Use Ability (CUA). This feature allows the model to interact with digital systems in a human-like manner, eliminating the need for external environments and streamlining complex workflows. OpenAI highlights how ChatGPT 5.4 Thinking can handle intricate tasks, such as designing and testing a 3D chess game with advanced textures and rule adherence, all while significantly reducing computational overhead. These capabilities not only simplify technical processes but also prioritize high-quality output and usability. In this overview, you’ll explore how ChatGPT 5.4 Thinking enables developers to convert design inputs, like images, into fully functional websites with accurate styling and responsive layouts. You’ll also learn how its self-checking mechanisms ensure alignment between design and output, reducing manual adjustments. Additionally, the model’s ability to manage concurrent processes, such as generating visual assets and validating functionality, offers practical insights into optimizing workflows for both small-scale and complex projects. This breakdown provides a clear look at how these features can …

DeepSeek V4 Adds Native Multimodal Input and 1M Token Context Window

DeepSeek V4 Adds Native Multimodal Input and 1M Token Context Window

The release of DeepSeek V4 introduces notable advancements in AI capabilities, emphasizing scalability and efficiency. One key feature is the 1 million token context window, which allows the system to process large datasets, such as full research papers or extensive codebases, without the need for segmentation. According to Universe of AI, this enhancement supports more comprehensive and faster analysis, making it particularly useful for professionals managing complex data workflows. Additionally, the integration of Nvidia’s Blackwell SM100 architecture improves computational performance while addressing energy efficiency concerns. You’ll learn how DeepSeek V4’s native multimodal integration supports the simultaneous processing of text, images and other data types, streamlining diverse tasks within a single system. The guide also examines how these updates impact sectors like healthcare, education and finance, offering practical examples of their application. Finally, it explores the ethical considerations surrounding these developments, providing a balanced view of the challenges and opportunities in AI deployment. DeepSeek V4 Highlights TL;DR Key Takeaways : DeepSeek V4 introduces new features, including a 1 million token context window, native multimodal integration and …

Anthropic’s Claude Opus 4.6 brings 1M token context and ‘agent teams’ to take on OpenAI’s Codex

Anthropic’s Claude Opus 4.6 brings 1M token context and ‘agent teams’ to take on OpenAI’s Codex

Anthropic on Thursday released Claude Opus 4.6, a major upgrade to its flagship artificial intelligence model that the company says plans more carefully, sustains longer autonomous workflows, and outperforms competitors including OpenAI’s GPT-5.2 on key enterprise benchmarks — a release that arrives at a tumultuous moment for the AI industry and global software markets. The launch comes just three days after OpenAI released its own Codex desktop application in a direct challenge to Anthropic’s Claude Code momentum, and amid a $285 billion rout in software and services stocks that investors attribute partly to fears that Anthropic’s AI tools could disrupt established enterprise software businesses. For the first time, Anthropic’s Opus-class models will feature a 1 million token context window, allowing the AI to process and reason across vastly more information than previous versions. The company also introduced “agent teams” in Claude Code — a research preview feature that enables multiple AI agents to work simultaneously on different aspects of a coding project, coordinating autonomously. “We’re focused on building the most capable, reliable, and safe AI …

Breaking through AI’s memory wall with token warehousing

Breaking through AI’s memory wall with token warehousing

As agentic AI moves from experiments to real production workloads, a quiet but serious infrastructure problem is coming into focus: memory. Not compute. Not models. Memory. Under the hood, today’s GPUs simply don’t have enough space to hold the Key-Value (KV) caches that modern, long-running AI agents depend on to maintain context. The result is a lot of invisible waste — GPUs redoing work they’ve already done, cloud costs climbing, and performance taking a hit. It’s a problem that’s already showing up in production environments, even if most people haven’t named it yet. At a recent stop on the VentureBeat AI Impact Series, WEKA CTO Shimon Ben-David joined VentureBeat CEO Matt Marshall to unpack the industry’s emerging “memory wall,” and why it’s becoming one of the biggest blockers to scaling truly stateful agentic AI — systems that can remember and build on context over time. The conversation didn’t just diagnose the issue; it laid out a new way to think about memory entirely, through an approach WEKA calls token warehousing. The GPU memory problem “When …