Deep Dive

AI Agent Memory Systems Explained: RAG, Vector, SQLite & File-Based

Every AI agent faces the same fundamental problem: without memory, every conversation starts from scratch. Memory systems give agents the ability to learn from past interactions, remember user preferences, and maintain context across sessions. This guide compares five major approaches to agent memory — file-based, SQLite with full-text search, vector databases, RAG pipelines, and hybrid architectures — with real-world examples from Claude Code, Hermes, OpenClaw, and other production agents.

Table of Contents

  1. 1. Why Memory Matters for AI Agents
  2. 2. 5 Memory System Types
  3. 3. Comparison Table
  4. 4. How Popular Agents Handle Memory
  5. 5. Worked Example: Hybrid Memory
  6. 6. FAQ
  7. 7. Related Resources

Why Memory Matters for AI Agents

Consider a coding agent that helps you build a web application over several weeks. Without memory, every session requires re-explaining your tech stack, coding conventions, database schema, and deployment setup. You waste the first five minutes of every conversation re-establishing context that should already be known.

Agent memory solves this by persisting critical information between sessions. A well-designed memory system enables three capabilities that transform an AI agent from a stateless chatbot into a genuine collaborator:

  • Context continuity — the agent remembers what happened in previous sessions and picks up where it left off
  • Personalization — the agent learns your preferences, coding style, and project conventions over time
  • Knowledge accumulation — the agent builds an expanding knowledge base from past interactions, decisions, and outcomes

The choice of memory system depends on your agent's use case, scale requirements, and infrastructure constraints. A personal coding assistant needs a very different memory architecture than an enterprise customer support bot handling thousands of conversations per day.

5 Memory System Types

Each memory system makes different tradeoffs between simplicity, search capability, scalability, and privacy. Below we break down the five major approaches used in production AI agents today.

File-Based Memory

How it works: Markdown files stored in the workspace, read by the agent at conversation start to restore context.

Used by: OpenClaw (.md workspace files), Claude Code (MEMORY.md + .claude/memory/)

Pros

  • + Simple to implement
  • + Human-readable and editable
  • + Version-controllable with git

Cons

  • - No semantic search capability
  • - Manual organization required
  • - Scales poorly past 1,000 entries

Best for: Personal agents, small teams, project-specific context

SQLite + Full-Text Search

How it works: All conversations stored in a local SQLite database, retrieved via FTS5 keyword search with automatic summaries.

Used by: Hermes Agent (two-layer: MEMORY.md persistent + SQLite full history)

Pros

  • + Fast keyword search via FTS5
  • + Unlimited local capacity
  • + Fully local and private

Cons

  • - No semantic understanding
  • - Keyword mismatch problem
  • - Requires explicit indexing

Best for: High-volume conversation agents, research tasks, local-first workflows

Vector Database (Embeddings)

How it works: Text converted to numerical embeddings, stored in a vector database, and retrieved by cosine similarity search.

Used by: LangChain agents, Custom RAG pipelines, MemGPT

Pros

  • + Semantic understanding of content
  • + Finds related content even with different wording
  • + Scales to millions of entries

Cons

  • - Embedding quality varies by model
  • - Requires vector DB infrastructure
  • - Hallucination risk from fuzzy matches

Best for: Knowledge-heavy agents, customer support bots, document Q&A systems

RAG (Retrieval-Augmented Generation)

How it works: Combines vector search with LLM generation — retrieve relevant document chunks, inject them into the prompt as context.

Used by: Enterprise chatbots, Perplexity-style agents, Custom knowledge bases

Pros

  • + Grounded in source documents
  • + Reduces hallucination significantly
  • + Supports up-to-date knowledge

Cons

  • - Chunk boundary issues lose context
  • - Retrieval quality is a ceiling
  • - Complex multi-stage pipeline

Best for: Enterprise knowledge bases, documentation agents, compliance-sensitive contexts

Hybrid Memory

How it works: Combines multiple systems — persistent key-facts file + vector search for history + SQLite for structured data — each optimized for its data type.

Used by: Advanced custom agents, Hermes (file + SQLite), Production AI assistants

Pros

  • + Best of multiple approaches
  • + Optimized per data type
  • + Most flexible architecture

Cons

  • - Complex to build and maintain
  • - Potential consistency issues
  • - Higher operational overhead

Best for: Production-grade agents handling diverse tasks at scale

Comparison Table

This table summarizes the key tradeoffs across all five memory system types. Use it to quickly identify which approach fits your constraints.

SystemComplexitySemantic SearchScalabilityPrivacySetup TimeCost
File-BasedLowNoLowHigh5 minFree
SQLite + FTSMediumNoHighHigh30 minFree
Vector DBHighYesVery HighVaries1-2 hrs$0-50/mo
RAGHighYesVery HighVaries2-4 hrs$10-100/mo
HybridVery HighYesVery HighConfigurable4-8 hrs$10-100/mo

Different AI agents have adopted different memory strategies based on their design philosophy and target users. Here is how the most widely used agents approach the memory problem in production.

Claude Code

File-based MEMORY.md auto-updated after each session, plus .claude/memory/ directory for structured project context.

Hermes Agent

SQLite + file hybrid — MEMORY.md for persistent key facts, SQLite database for full conversation history with FTS5 search.

OpenClaw

Workspace .md files with optional semantic search layer for retrieving relevant context across large codebases.

Cursor

Project-level .cursorrules file for persistent instructions, plus codebase indexing for context-aware completions.

Codex (OpenAI)

AGENTS.md file for project-level instructions and conventions, read at session start to guide behavior.

Worked Example: Building a Hybrid Memory System

The most robust agent memory combines multiple layers, each optimized for a different type of data. Here is a conceptual architecture for a hybrid memory system that handles key facts, conversation history, and knowledge retrieval.

Architecture Overview

The hybrid system uses three layers working together:

  1. Layer 1 — Persistent Key Facts (File-Based): A MEMORY.md file stores critical user preferences, project conventions, and long-lived decisions. Read at every session start. Updated when the agent identifies new persistent facts.
  2. Layer 2 — Conversation History (SQLite + FTS5): Every conversation turn is stored in a SQLite database with full-text search indexing. The agent can search past conversations by keyword to recall specific discussions, decisions, or code snippets.
  3. Layer 3 — Knowledge Base (Vector Search): Documentation, code comments, and reference materials are embedded and stored in a vector database. When the agent needs domain knowledge, it performs semantic search to find relevant chunks and injects them into the prompt.

How It Works in Practice

// Conceptual hybrid memory flow
async function buildContext(userMessage: string) {
  // Layer 1: Always load persistent facts
  const keyFacts = await readFile('MEMORY.md');

  // Layer 2: Search conversation history for relevant past context
  const pastConversations = await sqlite.search(
    extractKeywords(userMessage), { limit: 5 }
  );

  // Layer 3: Semantic search knowledge base
  const relevantDocs = await vectorDB.similaritySearch(
    await embed(userMessage), { topK: 3 }
  );

  // Compose final context for LLM prompt
  return {
    systemContext: keyFacts,
    conversationMemory: pastConversations,
    knowledgeContext: relevantDocs,
  };
}

This architecture gives you the reliability of file-based memory (key facts are always available), the precision of keyword search (find exact past conversations), and the flexibility of semantic search (discover relevant knowledge even with different terminology). The tradeoff is increased complexity — you need to maintain three separate systems and handle potential inconsistencies between them.

Frequently Asked Questions

What is AI agent memory?

AI agent memory refers to any system that allows an AI agent to retain and recall information across conversations. Without memory, every interaction starts from zero — the agent has no knowledge of prior decisions, user preferences, or project context. Memory systems solve this by persisting relevant information between sessions.

What is the difference between RAG and vector memory?

Vector memory stores text as embeddings in a vector database and retrieves by semantic similarity. RAG (Retrieval-Augmented Generation) builds on top of vector memory by adding a generation step — it retrieves relevant chunks and then injects them into the LLM prompt as context. RAG is a pattern that uses vector memory as one component.

Which memory system is best for a personal coding agent?

For personal coding agents, file-based memory (like MEMORY.md) is usually the best starting point. It is simple, human-readable, version-controllable with git, and requires zero infrastructure. Upgrade to SQLite or hybrid only when you consistently hit the limits of file-based search.

How does Claude Code handle memory?

Claude Code uses a file-based memory system. It maintains a MEMORY.md file that gets automatically updated with key decisions, preferences, and project context. Additionally, .claude/memory/ can hold structured memory files. These files are read at conversation start to restore context.

Can I combine multiple memory systems?

Yes, and this is called hybrid memory. A common pattern is: file-based memory for persistent key facts (user preferences, project conventions), SQLite for structured conversation history, and vector search for semantic retrieval across a knowledge base. The tradeoff is increased complexity.

What are the privacy implications of different memory systems?

File-based and SQLite memory are fully local and private — data never leaves your machine. Vector databases can be local (ChromaDB, Qdrant) or cloud-hosted (Pinecone, Weaviate). Cloud-hosted vector DBs and RAG pipelines that use external embedding APIs send your data to third-party servers. Choose local options for sensitive data.

How much memory data can an agent effectively use?

It depends on the system. File-based memory works well up to a few hundred entries. SQLite with FTS5 handles millions of rows efficiently. Vector databases scale to tens of millions of embeddings. However, the real bottleneck is the LLM context window — even with perfect retrieval, you can only inject 10-200K tokens per prompt, so retrieval quality matters more than raw storage capacity.