All posts tagged: parameter

A 0.12% parameter add-on gives AI agents the working memory RAG can’t

A 0.12% parameter add-on gives AI agents the working memory RAG can’t

AI agents forget. Every time a coding assistant loses track of a debugging thread, or a data analysis agent re-ingests the same context it already processed, the team pays in latency, token costs, and brittle workflows. The fix most teams reach for — expanding the context window or adding more RAG — is increasingly expensive and still doesn’t reliably work. To address this, researchers from Mind Lab and several universities proposed delta-mem, an efficient technique that compresses the model’s historical information into a dynamically updated matrix without changing the model itself. The resulting module adds just 0.12% of the backbone model’s parameters — compared to 76.40% for one leading alternative — while outperforming it on memory-heavy benchmarks. Delta-mem allows models to continuously accumulate and reuse historical data, reducing the reliance on massive context windows or complex external retrieval modules for behavioral continuity. The long memory challenge The conventional solution is to simply dump all the information into the model’s context window. But as Jingdi Lei, co-author of the paper, told VentureBeat, current systems treat memory …

DeepSeek 4 Release: 1.6T Parameter Open-Source AI Model Details

DeepSeek 4 Release: 1.6T Parameter Open-Source AI Model Details

DeepSeek 4 introduces two open source language models designed to meet varying computational requirements, as detailed by Prompt Engineering. The Pro model, with 1.6 trillion parameters, is optimized for tasks demanding high precision and processing power, while the Flash model, featuring 284 billion parameters, is suited for environments with limited resources. Both models include a 1 million token context window, allowing them to process extensive text sequences. A notable feature, compressed sparse attention, reduces memory usage during token generation, allowing efficient operation even on less capable hardware. Discover how these models perform in areas such as technical problem-solving and large-scale content generation. Learn about specific efficiency gains, including a 27% reduction in resource consumption for the Pro model and explore their open source framework, which supports customization and collaborative development. Additionally, understand their hardware compatibility and how their pricing structure aligns with cost-conscious organizational needs. Key Features and Model Variants TL;DR Key Takeaways : DeepSeek 4 introduces two models: the Pro Model with 1.6 trillion parameters for high-demand applications and the Flash Model with 284 …

LLMs contain a LOT of parameters. But what’s a parameter?

LLMs contain a LOT of parameters. But what’s a parameter?

When a model is trained, each word in its vocabulary is assigned a numerical value that captures the meaning of that word in relation to all the other words, based on how the word appears in countless examples across the model’s training data. Each word gets replaced by a kind of code? Yeah. But there’s a bit more to it. The numerical value—the embedding—that represents each word is in fact a list of numbers, with each number in the list representing a different facet of meaning that the model has extracted from its training data. The length of this list of numbers is another thing that LLM designers can specify before an LLM is trained. A common size is 4,096. Every word inside an LLM is represented by a list of 4,096 numbers?   Yup, that’s an embedding. And each of those numbers is tweaked during training. An LLM with embeddings that are 4,096 numbers long is said to have 4,096 dimensions. Why 4,096? It might look like a strange number. But LLMs (like anything that …