All posts tagged: DeepSeeks

How DeepSeek’s radical architecture is shattering Silicon Valley’s token moat

How DeepSeek’s radical architecture is shattering Silicon Valley’s token moat

DeepSeek’s announcement over the weekend that it has made its 75% price cut permanent on its flagship V4 Pro model is a disruptive assault on the capital-heavy business models of Silicon Valley’s frontier labs.  The reduction on DeepSeek V4 Pro directly undercuts comparable Western models used as workhorses for enterprise production. It is 7x cheaper on inputs and 17x cheaper on outputs than Anthropic’s Claude Sonnet or OpenAI’s GPT 5.5-Med, while the lightweight DeepSeek V4 Flash undercuts entry-tier alternatives like Claude Haiku by 10x to 25x.  The price cuts are enabled by a series of hardware-software innovations, especially around cache, that make DeepSeek’s models radically more efficient to run. When hosted natively in China, DeepSeek’s cache-read pricing is a whopping 87x cheaper than Western clouds — a deflationary floor so aggressive that handset giant Xiaomi just moved to match the exact pricing tier for its newly deployed MiMo architecture. DeepSeek V4 Pro’s performance is ranked almost on par with Western frontier models, hitting 80.6% on coding-agent tasks via the SWE-bench Verified leaderboard and an elite …

The Download: DeepSeek’s latest AI breakthrough, and the race to build world models

The Download: DeepSeek’s latest AI breakthrough, and the race to build world models

The must-reads I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology. 1 China has blocked Meta’s $2 billion acquisition of AI startup ManusRegulators cited national security grounds. (WSJ $)+ Beijing called the deal a “conspiratorial” attempt to hollow out its tech base. (FT $)+ The country is tightening its grip on AI firms that try to leave. (TechCrunch)+ The decision escalates China’s AI rivalry with the US. (Bloomberg $)+ But there will be no winners in their competition. (MIT Technology Review) 2 Google is investing up to $40 billion in AnthropicIn a deal valuing the AI firm at $350 billion. (CNBC)+ The funding will support the firm’s growing computing needs. (TechCrunch)+ Anthropic and OpenAI are fighting for compute capacity. (Axios) 3 President Trump just fired the entire National Science BoardThe NSF has played a crucial role in developing technology. (The Verge)+ The move heightens fears over political interference in US science. (Nature) 4 Conspiracy theories about the Washington shooting are proliferating onlineOver 300,000 posts appeared on X using the keyword “staged.” …

The Download: DeepSeek’s latest AI breakthrough, and the race to build world models

Three reasons why DeepSeek’s new model matters

In terms of performance, V4 is, perhaps unsurprisingly, a huge jump from R1—and it seems to be a strong alternative to just about all the latest big AI models. On the major benchmarks, according to results shared by the company, DeepSeek V4-Pro competes with leading closed-source models, matching the performance of Anthropic’s Claude-Opus-4.6, OpenAI’s GPT-5.4, and Google’s Gemini-3.1. And compared to other open-source models, such as Alibaba’s Qwen-3.5 or Z.ai’s GLM-5.1, DeepSeek V4 exceeds them all on coding, math, and STEM problems, making it one of the strongest open-source models ever released.  DeepSeek also says that V4-Pro now ranks among the strongest open-source models on benchmarks for agentic coding tasks and performs well on other tests that measure ability to carry out multistep problems. Its writing ability and world knowledge also leads the field, according to benchmarking results shared by the company.  In a technical report released alongside the model, DeepSeek shared results from an internal survey of 85 experienced developers: More than 90% included V4-Pro among their top model choices for coding tasks. DeepSeek …

DeepSeek’s conditional memory fixes silent LLM waste: GPU cycles lost to static lookups

DeepSeek’s conditional memory fixes silent LLM waste: GPU cycles lost to static lookups

When an enterprise LLM retrieves a product name, technical specification, or standard contract clause, it’s using expensive GPU computation designed for complex reasoning — just to access static information. This happens millions of times per day. Each lookup wastes cycles and inflates infrastructure costs.  DeepSeek’s newly released research on “conditional memory” addresses this architectural limitation directly. The work introduces Engram, a module that separates static pattern retrieval from dynamic reasoning. It delivers results that challenge assumptions about what memory is actually for in neural networks. The paper was co-authored by DeepSeek founder Liang Wenfeng. Through systematic experiments DeepSeek found the optimal balance between computation and memory with 75% of sparse model capacity allocated to dynamic reasoning and 25% to static lookups. This memory system improved reasoning more than knowledge retrieval. Complex reasoning benchmarks jumped from 70% to 74% accuracy, while knowledge-focused tests improved from 57% to 61%. These improvements came from tests including Big-Bench Hard, ARC-Challenge, and MMLU. The research arrives as enterprises face mounting pressure to deploy more capable AI systems while navigating GPU …

How DeepSeek’s new way to train advanced AI models could disrupt everything – again

How DeepSeek’s new way to train advanced AI models could disrupt everything – again

Flavio Coelho/ Moment via Getty Follow ZDNET: Add us as a preferred source on Google. ZDNET’s key takeaways DeepSeek debuted Manifold-Constrained Hyper-Connections, or mHCs. They offer a way to scale LLMs without incurring huge costs. The company postponed the release of its R2 model in mid-2025. Just before the start of the new year, the AI world was introduced to a potential game-changing new method for training advanced models. A team of researchers from Chinese AI firm DeepSeek released a paper on Wednesday outlining what it called Manifold-Constrained Hyper-Connections, or mHC for short, which may provide a pathway for engineers to build and scale large language models without the huge computational costs that are typically required. Also: Is DeepSeek’s new model the latest blow to proprietary AI? DeepSeek leapt into the cultural spotlight one year ago with its release of R1, a model that rivaled the capabilities of OpenAI’s o1 and that was reportedly trained at a fraction of the cost. The release came as a shock to US-based tech developers, because it showed that access to …