All posts tagged: RAG

Karpathy shares ‘LLM Knowledge Base’ architecture that bypasses RAG with an evolving markdown library maintained by AI

Karpathy shares ‘LLM Knowledge Base’ architecture that bypasses RAG with an evolving markdown library maintained by AI

AI vibe coders have yet another reason to thank Andrej Karpathy, the coiner of the term. The former Director of AI at Tesla and co-founder of OpenAI, now running his own independent AI project, recently posted on X describing a “LLM Knowledge Bases” approach he’s using to manage various topics of research interest. By building a persistent, LLM-maintained record of his projects, Karpathy is solving the core frustration of “stateless” AI development: the dreaded context-limit reset. As anyone who has vibe coded can attest, hitting a usage limit or ending a session often feels like a lobotomy for your project. You’re forced to spend valuable tokens (and time) reconstructing context for the AI, hoping it “remembers” the architectural nuances you just established. Karpathy proposes something simpler and more loosely, messily elegant than the typical enterprise solution of a vector database and RAG pipeline. Instead, he outlines a system where the LLM itself acts as a full-time “research librarian”—actively compiling, linting, and interlinking Markdown (.md) files, the most LLM-friendly and compact data format. By diverting a …

Agents need vector search more than RAG ever did

Agents need vector search more than RAG ever did

What’s the role of vector databases in the agentic AI world? That’s a question that organizations have been coming to terms with in recent months. The narrative had real momentum. As large language models scaled to million-token context windows, a credible argument circulated among enterprise architects: purpose-built vector search was a stopgap, not infrastructure. Agentic memory would absorb the retrieval problem. Vector databases were a RAG-era artifact. The production evidence is running the other way. Qdrant, the Berlin-based open source vector search company, announced a $50 million Series B on Thursday, two years after a $28 million Series A. The timing is not incidental. The company is also shipping version 1.17 of its platform. Together, they reflect a specific argument: The retrieval problem did not shrink when agents arrived. It scaled up and got harder. “Humans make a few queries every few minutes,” Andre Zayarni, Qdrant’s CEO and co-founder, told VentureBeat. “Agents make hundreds or even thousands of queries per second, just gathering information to be able to make decisions.” That shift changes the infrastructure …

RAG vs Long Context : Best Fit for Enterprise Search

RAG vs Long Context : Best Fit for Enterprise Search

Large Language Models (LLMs) have transformed natural language processing, but their limitations, such as fixed training data and lack of real-time updates, pose challenges for certain applications. IBM Technology explores two prominent strategies for addressing these gaps: Retrieval-Augmented Generation (RAG) and long context. RAG integrates external data through embedding models and vector databases, making it ideal for dynamic datasets like enterprise knowledge bases. In contrast, long context uses expanded token capacities to process entire datasets directly, offering a streamlined approach for bounded tasks such as contract analysis or document summarization. This explainer by IBM provides a clear breakdown of when to choose RAG or long context based on your specific needs. You’ll learn how RAG’s retrieval mechanisms can handle evolving datasets efficiently while minimizing computational costs and why long context might be better suited for tasks requiring global reasoning across static datasets. By the end, you’ll have a practical understanding of how to align these approaches with your operational priorities. RAG vs Long Context TL;DR Key Takeaways : Large Language Models (LLMs) have advanced natural …

Databricks built a RAG agent it says can handle every kind of enterprise search

Databricks built a RAG agent it says can handle every kind of enterprise search

Most enterprise RAG pipelines are optimized for one search behavior. They fail silently on the others. A model trained to synthesize cross-document reports handles constraint-driven entity search poorly. A model tuned for simple lookup tasks falls apart on multi-step reasoning over internal notes. Most teams find out when something breaks. Databricks set out to fix that with KARL, short for Knowledge Agents via Reinforcement Learning. The company trained an agent across six distinct enterprise search behaviors simultaneously using a new reinforcement learning algorithm. The result, the company claims, is a model that matches Claude Opus 4.6 on a purpose-built benchmark at 33% lower cost per query and 47% lower latency, trained entirely on synthetic data the agent generated itself with no human labeling required. That comparison is based on KARLBench, which Databricks built to evaluate enterprise search behaviors. “A lot of the big reinforcement learning wins that we’ve seen in the community in the past year have been on verifiable tasks where there is a right and a wrong answer,” Jonathan Frankle, Chief AI Scientist …

SurrealDB 3.0 wants to replace your five-database RAG stack with one

SurrealDB 3.0 wants to replace your five-database RAG stack with one

Building retrieval-augmented generation (RAG) systems for AI agents often involves using multiple layers and technologies for structured data, vectors and graph information. In recent months it has also become increasingly clear that agentic AI systems need memory, sometimes referred to as contextual memory, to operate effectively. The complexity and synchronization of having different data layers to enable context can lead to performance and accuracy issues. It’s a challenge that SurrealDB is looking to solve. SurrealDB on Tuesday launched version 3.0 of its namesake database alongside a $23 million Series A extension, bringing total funding to $44 million. The company had taken a different architectural approach than relational databases like PostgreSQL, native vector databases like Pinecone or a graph database like Neo4j. The OpenAI engineering team recently detailed how it scaled Postgres to 800 million users using read replicas — an approach that works for read-heavy workloads. SurrealDB takes a different approach: Store agent memory, business logic, and multi-modal data directly inside the database. Instead of synchronizing across multiple systems, vector search, graph traversal, and relational queries …

‘Observational memory’ cuts AI agent costs 10x and outscores RAG on long-context benchmarks

‘Observational memory’ cuts AI agent costs 10x and outscores RAG on long-context benchmarks

RAG isn’t always fast enough or intelligent enough for modern agentic AI workflows. As teams move from short-lived chatbots to long-running, tool-heavy agents embedded in production systems, those limitations are becoming harder to work around. In response, teams are experimenting with alternative memory architectures — sometimes called contextual memory or agentic memory — that prioritize persistence and stability over dynamic retrieval. One of the more recent implementations of this approach is “observational memory,” an open-source technology developed by Mastra, which was founded by the engineers who previously built and sold the Gatsby framework to Netlify. Unlike RAG systems that retrieve context dynamically, observational memory uses two background agents (Observer and Reflector) to compress conversation history into a dated observation log. The compressed observations stay in context, eliminating retrieval entirely. For text content, the system achieves 3-6x compression. For tool-heavy agent workloads generating large outputs, compression ratios hit 5-40x. The tradeoff is that observational memory prioritizes what the agent has already seen and decided over searching a broader external corpus, making it less suitable for open-ended …

Enterprises are measuring the wrong part of RAG

Enterprises are measuring the wrong part of RAG

Enterprises have moved quickly to adopt RAG to ground LLMs in proprietary data. In practice, however, many organizations are discovering that retrieval is no longer a feature bolted onto model inference — it has become a foundational system dependency. Once AI systems are deployed to support decision-making, automate workflows or operate semi-autonomously, failures in retrieval propagate directly into business risk. Stale context, ungoverned access paths and poorly evaluated retrieval pipelines do not merely degrade answer quality; they undermine trust, compliance and operational reliability. This article reframes retrieval as infrastructure rather than application logic. It introduces a system-level model for designing retrieval platforms that support freshness, governance and evaluation as first-class architectural concerns. The goal is to help enterprise architects, AI platform leaders, and data infrastructure teams reason about retrieval systems with the same rigor historically applied to compute, networking and storage. Retrieval as infrastructure — A reference architecture illustrating how freshness, governance, and evaluation function as first-class system planes rather than embedded application logic. Conceptual diagram created by the author. Why RAG breaks down at …

Most RAG systems don’t understand sophisticated documents — they shred them

Most RAG systems don’t understand sophisticated documents — they shred them

By now, many enterprises have deployed some form of RAG. The promise is seductive: index your PDFs, connect an LLM and instantly democratize your corporate knowledge. But for industries dependent on heavy engineering, the reality has been underwhelming. Engineers ask specific questions about infrastructure, and the bot hallucinates. The failure isn’t in the LLM. The failure is in the preprocessing. Standard RAG pipelines treat documents as flat strings of text. They use “fixed-size chunking” (cutting a document every 500 characters). This works for prose, but it destroys the logic of technical manuals. It slices tables in half, severs captions from images, and ignores the visual hierarchy of the page. Improving RAG reliability isn’t about buying a bigger model; it’s about fixing the “dark data” problem through semantic chunking and multimodal textualization. Here is the architectural framework for building a RAG system that can actually read a manual. The fallacy of fixed-size chunking In a standard Python RAG tutorial, you split text by character count. In an enterprise PDF, this is disastrous. If a safety specification table spans 1,000 tokens, …

Claude Code RAG Application Guide: FastAPI, React & Supabase Setup

Claude Code RAG Application Guide: FastAPI, React & Supabase Setup

What if you could build an AI system that not only retrieves information with pinpoint accuracy but also adapts dynamically to complex tasks? Below, The AI Automators breaks down how to create a full-stack Retrieval-Augmented Generation (RAG) application in a detailed YouTube video, offering a step-by-step approach to mastering this innovative technology. With eight carefully designed modules, this analysis dives deep into the essential components of RAG development, from context management to hybrid search techniques. The result? A system that doesn’t just process data but transforms it into actionable, contextually relevant insights. Whether you’re a seasoned developer or new to AI, this guide promises to reshape the way you approach intelligent system design. In this guide, you’ll uncover the secrets behind building a scalable and highly adaptable RAG system that integrates seamlessly with private, domain-specific data. Learn how to harness advanced features like text-to-SQL queries, metadata extraction, and sub-agents to handle complex workflows with ease. The video doesn’t just stop at the technical details, it also addresses common challenges like token optimization and database management, …

Contextual AI launches Agent Composer to turn enterprise RAG into production-ready AI agents

Contextual AI launches Agent Composer to turn enterprise RAG into production-ready AI agents

In the race to bring artificial intelligence into the enterprise, a small but well-funded startup is making a bold claim: The problem holding back AI adoption in complex industries has never been the models themselves. Contextual AI, a two-and-a-half-year-old company backed by investors including Bezos Expeditions and Bain Capital Ventures, on Monday unveiled Agent Composer, a platform designed to help engineers in aerospace, semiconductor manufacturing, and other technically demanding fields build AI agents that can automate the kind of knowledge-intensive work that has long resisted automation. The announcement arrives at a pivotal moment for enterprise AI. Four years after ChatGPT ignited a frenzy of corporate AI initiatives, many organizations remain stuck in pilot programs, struggling to move experimental projects into full-scale production. Chief financial officers and business unit leaders are growing impatient with internal efforts that have consumed millions of dollars but delivered limited returns. Douwe Kiela, Contextual AI’s chief executive, believes the industry has been focused on the wrong bottleneck. “The model is almost commoditized at this point,” Kiela said in an interview with …