Agent Memory Systems

How agents maintain context, learn from past interactions, and build persistent knowledge across sessions.

Why Memory Matters

Without memory systems, every agent interaction starts from scratch. Memory enables agents to:

  • Remember user preferences and past decisions
  • Learn from successful (and failed) task completions
  • Maintain context across long conversations
  • Build knowledge bases from interactions
  • Personalize responses based on history

Key Insight

Memory systems like Mem0 achieve 80% token reduction while preserving fidelity through intelligent summarization and retrieval. Instead of keeping entire conversation history, store and retrieve relevant facts.

Types of Agent Memory

Memory Hierarchy
┌─────────────────────────────────────────────────────────────┐
│                    Memory Architecture                       │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  WORKING MEMORY                                              │
│  ─────────────────                                           │
│  Current conversation in context window                      │
│  Scope: Current turn │ Storage: Context window               │
│  Capacity: Model's context limit (4K-1M tokens)              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  SHORT-TERM MEMORY                                           │
│  ─────────────────────                                       │
│  Facts extracted from current session                        │
│  Scope: Current session │ Storage: In-memory                 │
│  Example: "User asked about Python decorators"               │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  LONG-TERM MEMORY                                            │
│  ────────────────────                                        │
│  Persistent facts and knowledge                              │
│  Scope: Cross-session │ Storage: Vector DB                   │
│  Example: "User prefers concise responses"                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  EPISODIC MEMORY                                             │
│  ────────────────────                                        │
│  Specific past experiences and outcomes                      │
│  Scope: Cross-session │ Storage: Indexed experiences         │
│  Example: "Task X succeeded with approach Y"                 │
└─────────────────────────────────────────────────────────────┘
Type Scope Implementation Use Case
Working Current conversation Context window Immediate task context
Short-term Current session In-memory store Session-specific facts
Long-term Cross-session Vector DB + metadata User preferences, knowledge
Episodic Specific interactions Indexed experiences Learning from past tasks

Comparison of memory types

Memory System Implementation
class AgentMemory:
    # Working Memory: Current conversation context
    workingMemory = []  # Lives in context window

    # Short-term Memory: Current session facts
    shortTermMemory = InMemoryStore()

    # Long-term Memory: Persistent across sessions
    longTermMemory = VectorDatabase()

    # Episodic Memory: Specific past experiences
    episodicMemory = IndexedExperienceStore()

    function remember(information, memoryType):
        if memoryType == "working":
            workingMemory.append(information)
        elif memoryType == "short_term":
            shortTermMemory.store(information)
        elif memoryType == "long_term":
            embedding = embed(information)
            longTermMemory.store(embedding, information)
        elif memoryType == "episodic":
            episode = createEpisode(information)
            episodicMemory.index(episode)

    function recall(query, memoryTypes = ["all"]):
        results = []

        if "working" in memoryTypes or "all" in memoryTypes:
            results += searchWorkingMemory(query)

        if "short_term" in memoryTypes or "all" in memoryTypes:
            results += shortTermMemory.search(query)

        if "long_term" in memoryTypes or "all" in memoryTypes:
            embedding = embed(query)
            results += longTermMemory.similaritySearch(embedding)

        if "episodic" in memoryTypes or "all" in memoryTypes:
            results += episodicMemory.searchRelevant(query)

        return rankAndMerge(results)
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.memory import ConversationSummaryBufferMemory
from langchain_core.messages import HumanMessage, AIMessage
from datetime import datetime

class AgentMemorySystem:
    def __init__(self, persist_directory: str = "./memory_store"):
        # Working memory: LangChain's conversation buffer with summarization
        self.llm = ChatOpenAI(model="gpt-4")
        self.working_memory = ConversationSummaryBufferMemory(
            llm=self.llm,
            max_token_limit=2000,
            return_messages=True
        )

        # Short-term: In-memory for current session
        self.short_term: list[dict] = []

        # Long-term: Chroma with LangChain integration
        self.embeddings = OpenAIEmbeddings()
        self.long_term = Chroma(
            collection_name="agent_memory",
            embedding_function=self.embeddings,
            persist_directory=persist_directory
        )

    def add_to_working_memory(self, human_msg: str, ai_msg: str):
        """Add exchange to working memory with auto-summarization."""
        self.working_memory.save_context(
            {"input": human_msg},
            {"output": ai_msg}
        )

    def store_short_term(self, content: str, metadata: dict = None):
        """Store fact for current session."""
        self.short_term.append({
            "content": content,
            "metadata": metadata or {},
            "timestamp": datetime.now()
        })

    def store_long_term(self, content: str, metadata: dict = None):
        """Store in persistent vector database."""
        self.long_term.add_texts(
            texts=[content],
            metadatas=[metadata or {}],
            ids=[f"mem_{datetime.now().timestamp()}"]
        )

    def recall(self, query: str, n_results: int = 5) -> list[str]:
        """Retrieve relevant memories from all sources."""
        results = []

        # Search long-term memory with similarity search
        long_term_docs = self.long_term.similarity_search(
            query, k=n_results
        )
        results.extend([doc.page_content for doc in long_term_docs])

        # Include relevant short-term memories
        for mem in self.short_term:
            if self._is_relevant(query, mem["content"]):
                results.append(mem["content"])

        # Get summarized working memory
        working = self.working_memory.load_memory_variables({})
        if working.get("history"):
            results.append(f"Recent context: {working['history']}")

        return results[:n_results]
using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using Azure.AI.OpenAI;

public class AgentMemorySystem
{
    private readonly List<ChatMessage> _workingMemory = new();
    private readonly List<MemoryRecord> _shortTermMemory = new();
    private readonly IVectorStore _vectorStore;
    private readonly IEmbeddingGenerator<string, Embedding<float>> _embedder;
    private readonly string _collectionName = "agent_memories";

    public AgentMemorySystem(
        IVectorStore vectorStore,
        IEmbeddingGenerator<string, Embedding<float>> embedder)
    {
        _vectorStore = vectorStore;
        _embedder = embedder;
    }

    // Working Memory: Current conversation
    public void AddToWorkingMemory(ChatMessage message)
    {
        _workingMemory.Add(message);

        if (CountTokens(_workingMemory) > 6000)
        {
            CompressWorkingMemory();
        }
    }

    // Short-term: Current session only
    public void StoreShortTerm(string content, Dictionary<string, string>? metadata = null)
    {
        _shortTermMemory.Add(new MemoryRecord
        {
            Content = content,
            Type = MemoryType.ShortTerm,
            Timestamp = DateTime.UtcNow,
            Metadata = metadata ?? new()
        });
    }

    // Long-term: Persistent with vector search
    public async Task StoreLongTermAsync(string content, string? id = null)
    {
        var memoryId = id ?? $"mem_{DateTime.UtcNow.Ticks}";
        var embedding = await _embedder.GenerateEmbeddingAsync(content);

        var collection = _vectorStore.GetCollection<string, MemoryRecord>(_collectionName);
        await collection.UpsertAsync(new MemoryRecord
        {
            Id = memoryId,
            Content = content,
            Embedding = embedding.Vector,
            Timestamp = DateTime.UtcNow
        });
    }

    // Recall relevant memories using vector similarity
    public async Task<List<string>> RecallAsync(string query, int limit = 5)
    {
        var results = new List<string>();
        var queryEmbedding = await _embedder.GenerateEmbeddingAsync(query);

        var collection = _vectorStore.GetCollection<string, MemoryRecord>(_collectionName);
        var searchResults = await collection.VectorizedSearchAsync(
            queryEmbedding.Vector,
            new VectorSearchOptions { Top = limit }
        );

        await foreach (var result in searchResults.Results)
        {
            results.Add(result.Record.Content);
        }

        // Include relevant short-term memories
        results.AddRange(_shortTermMemory
            .Where(m => IsRelevant(query, m.Content))
            .Select(m => m.Content)
            .Take(limit));

        return results.Take(limit).ToList();
    }

    private void CompressWorkingMemory() { /* ... */ }
}

Working Memory: Summarization

The context window has limits. When conversations get long, you need strategies to compress history while preserving important information:

Conversation Summarization
class ConversationSummarizer:
    threshold = 4000  # tokens
    summaryRatio = 0.3  # Compress to 30%

    function maybeCompress(messages):
        tokens = countTokens(messages)

        if tokens < threshold:
            return messages

        # Split into chunks to summarize
        toSummarize = messages[:-5]  # Keep recent 5
        recent = messages[-5:]

        # Generate summary
        summary = llm.generate(
            prompt: "Summarize this conversation concisely:",
            content: toSummarize
        )

        # Return compressed version
        return [
            systemMessage(f"Summary of earlier conversation: {summary}"),
            ...recent
        ]

    function progressiveSummarize(messages, levels = 3):
        # Multi-level summarization for very long conversations
        current = messages

        for level in range(levels):
            if countTokens(current) < threshold:
                break

            # Summarize in chunks
            chunks = splitIntoChunks(current, chunkSize=10)
            summaries = [summarize(chunk) for chunk in chunks]
            current = summaries

        return current
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryBufferMemory
from langchain_core.messages import get_buffer_string

class ConversationSummarizer:
    def __init__(self, max_token_limit: int = 4000):
        self.llm = ChatOpenAI(model="gpt-4")
        # Built-in summarization when buffer exceeds limit
        self.memory = ConversationSummaryBufferMemory(
            llm=self.llm,
            max_token_limit=max_token_limit,
            return_messages=True
        )

    def add_exchange(self, human_input: str, ai_output: str):
        """Add conversation exchange with auto-summarization."""
        # LangChain automatically summarizes when limit exceeded
        self.memory.save_context(
            {"input": human_input},
            {"output": ai_output}
        )

    def get_context(self) -> str:
        """Get current memory context (summarized + recent)."""
        return self.memory.load_memory_variables({})

    def clear(self):
        """Clear all memory."""
        self.memory.clear()

# For more control, use ConversationSummaryMemory directly
from langchain.memory import ConversationSummaryMemory
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

class CustomSummarizer:
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-4")

        # Custom summarization prompt
        self.summary_prompt = PromptTemplate(
            input_variables=["summary", "new_lines"],
            template="""Progressively summarize the conversation, adding to the summary.
Current summary: {summary}
New lines: {new_lines}
New summary:"""
        )

        self.memory = ConversationSummaryMemory(
            llm=self.llm,
            prompt=self.summary_prompt
        )

# Usage with agent
from langgraph.prebuilt import create_react_agent

agent = create_react_agent(
    llm,
    tools,
    # Checkpointer provides built-in memory persistence
    checkpointer=MemorySaver()
)
using Microsoft.Extensions.AI;
using Microsoft.ML.Tokenizers;

public class ConversationSummarizer
{
    private readonly IChatClient _client;
    private readonly Tokenizer _tokenizer;
    private readonly int _threshold;

    public ConversationSummarizer(
        IChatClient client,
        int threshold = 4000)
    {
        _client = client;
        _threshold = threshold;
        _tokenizer = TiktokenTokenizer.CreateForModel("gpt-4o");
    }

    public int CountTokens(IEnumerable<ChatMessage> messages)
    {
        var text = string.Join("\n",
            messages.Select(m => m.Text ?? ""));
        return _tokenizer.CountTokens(text);
    }

    public async Task<string> SummarizeAsync(
        IEnumerable<ChatMessage> messages)
    {
        var content = string.Join("\n",
            messages.Select(m => $"{m.Role}: {m.Text}"));

        var response = await _client.GetResponseAsync(new[]
        {
            new ChatMessage(ChatRole.System,
                "Summarize this conversation concisely. " +
                "Preserve key facts, decisions, and context."),
            new ChatMessage(ChatRole.User, content)
        });

        return response.Message.Text ?? "";
    }

    public async Task<List<ChatMessage>> CompressIfNeededAsync(
        List<ChatMessage> messages,
        int keepRecent = 5)
    {
        if (CountTokens(messages) < _threshold)
            return messages;

        var systemMsgs = messages
            .Where(m => m.Role == ChatRole.System)
            .ToList();
        var otherMsgs = messages
            .Where(m => m.Role != ChatRole.System)
            .ToList();

        var toSummarize = otherMsgs
            .Take(otherMsgs.Count - keepRecent)
            .ToList();
        var recent = otherMsgs
            .Skip(otherMsgs.Count - keepRecent)
            .ToList();

        if (toSummarize.Count == 0)
            return messages;

        var summary = await SummarizeAsync(toSummarize);

        var result = new List<ChatMessage>(systemMsgs);
        result.Add(new ChatMessage(ChatRole.System,
            $"Summary of earlier conversation: {summary}"));
        result.AddRange(recent);

        return result;
    }
}

Summarization Strategy

Keep recent messages verbatim (last 5-10) and summarize older ones. This preserves immediate context while retaining key facts from earlier in the conversation.

Episodic Memory: Learning from Experience

Episodic memory stores complete interaction trajectories, enabling agents to learn from past successes and failures:

Episodic Memory for Learning
class EpisodicMemory:
    # Store complete interaction episodes for learning
    episodes = []

    function recordEpisode(task, trajectory, outcome):
        episode = {
            task: task,
            trajectory: trajectory,  # Full action sequence
            outcome: outcome,         # Success/failure + details
            timestamp: now(),
            embedding: embed(task + outcome)
        }
        episodes.append(episode)
        persistToDatabase(episode)

    function retrieveSimilarEpisodes(currentTask, k = 3):
        # Find past experiences relevant to current task
        taskEmbedding = embed(currentTask)

        similar = vectorSearch(
            episodes,
            taskEmbedding,
            topK = k
        )

        # Prioritize successful episodes
        return sortBySuccess(similar)

    function learnFromEpisode(episode):
        if episode.outcome.success:
            # Extract successful strategy
            return {
                type: "positive",
                lesson: "When {task}, this approach worked: {summary}"
            }
        else:
            # Learn from failure
            return {
                type: "negative",
                lesson: "When {task}, avoid: {failureReason}"
            }
from dataclasses import dataclass, field
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document
from datetime import datetime
import json

@dataclass
class Episode:
    task: str
    trajectory: list[dict]  # List of {thought, action, observation}
    outcome: dict           # {success: bool, result: str, error: str?}
    timestamp: datetime = field(default_factory=datetime.now)

class EpisodicMemory:
    def __init__(self, persist_path: str = "./episodic_memory"):
        self.embeddings = OpenAIEmbeddings()
        self.vectorstore = Chroma(
            collection_name="episodes",
            embedding_function=self.embeddings,
            persist_directory=persist_path
        )

    def record_episode(
        self,
        task: str,
        trajectory: list[dict],
        outcome: dict
    ) -> str:
        """Record a complete interaction episode."""
        episode_id = f"ep_{datetime.now().timestamp()}"

        # Create searchable content
        search_content = f"{task}\n{outcome.get('result', '')}"

        # Store as LangChain Document with metadata
        doc = Document(
            page_content=search_content,
            metadata={
                "task": task,
                "trajectory": json.dumps(trajectory),
                "outcome": json.dumps(outcome),
                "success": outcome.get("success", False),
                "timestamp": datetime.now().isoformat()
            }
        )

        self.vectorstore.add_documents([doc], ids=[episode_id])
        return episode_id

    def retrieve_similar(
        self,
        current_task: str,
        k: int = 3,
        success_only: bool = False
    ) -> list[Episode]:
        """Find similar past episodes using semantic search."""
        filter_dict = {"success": True} if success_only else None

        # LangChain similarity search with metadata filtering
        results = self.vectorstore.similarity_search(
            current_task,
            k=k * 2,
            filter=filter_dict
        )

        episodes = [
            Episode(
                task=doc.metadata["task"],
                trajectory=json.loads(doc.metadata["trajectory"]),
                outcome=json.loads(doc.metadata["outcome"]),
                timestamp=datetime.fromisoformat(doc.metadata["timestamp"])
            )
            for doc in results
        ]

        # Sort by success first, then recency
        episodes.sort(
            key=lambda e: (not e.outcome.get("success"), -e.timestamp.timestamp())
        )
        return episodes[:k]

    def generate_few_shot_examples(self, current_task: str, n: int = 2) -> str:
        """Generate few-shot examples from past successes."""
        episodes = self.retrieve_similar(current_task, k=n, success_only=True)

        examples = []
        for ep in episodes:
            lines = [f"Task: {ep.task}"]
            for step in ep.trajectory[-3:]:
                lines.append(f"Thought: {step.get('thought', '')}")
                lines.append(f"Action: {step.get('action', '')}")
            lines.append(f"Result: {ep.outcome.get('result', '')}")
            examples.append("\n".join(lines))

        return "\n---\n".join(examples)
using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using System.Text.Json;

public class EpisodicMemory
{
    private readonly IVectorStore _vectorStore;
    private readonly IEmbeddingGenerator<string, Embedding<float>> _embedder;
    private const string CollectionName = "episodes";

    public record Episode(
        string Task,
        List<TrajectoryStep> Trajectory,
        Outcome Outcome,
        DateTime Timestamp
    );

    public record TrajectoryStep(
        string Thought,
        string Action,
        string Observation
    );

    public record Outcome(
        bool Success,
        string Result,
        string? Error = null
    );

    public EpisodicMemory(
        IVectorStore vectorStore,
        IEmbeddingGenerator<string, Embedding<float>> embedder)
    {
        _vectorStore = vectorStore;
        _embedder = embedder;
    }

    public async Task<string> RecordEpisodeAsync(
        string task,
        List<TrajectoryStep> trajectory,
        Outcome outcome)
    {
        var episodeId = $"ep_{DateTime.UtcNow.Ticks}";
        var searchContent = $"{task}\n{outcome.Result}";
        var embedding = await _embedder.GenerateEmbeddingAsync(searchContent);

        var collection = _vectorStore.GetCollection<string, EpisodeRecord>(CollectionName);
        await collection.UpsertAsync(new EpisodeRecord
        {
            Id = episodeId,
            Content = searchContent,
            Embedding = embedding.Vector,
            Task = task,
            TrajectoryJson = JsonSerializer.Serialize(trajectory),
            OutcomeJson = JsonSerializer.Serialize(outcome),
            Success = outcome.Success,
            Timestamp = DateTime.UtcNow
        });

        return episodeId;
    }

    public async Task<List<Episode>> RetrieveSimilarAsync(
        string currentTask,
        int k = 3,
        bool successOnly = false)
    {
        var episodes = new List<Episode>();
        var queryEmbedding = await _embedder.GenerateEmbeddingAsync(currentTask);

        var collection = _vectorStore.GetCollection<string, EpisodeRecord>(CollectionName);
        var results = await collection.VectorizedSearchAsync(
            queryEmbedding.Vector,
            new VectorSearchOptions { Top = k * 2 }
        );

        await foreach (var result in results.Results)
        {
            if (successOnly && !result.Record.Success) continue;

            episodes.Add(new Episode(
                result.Record.Task,
                JsonSerializer.Deserialize<List<TrajectoryStep>>(result.Record.TrajectoryJson)!,
                JsonSerializer.Deserialize<Outcome>(result.Record.OutcomeJson)!,
                result.Record.Timestamp
            ));

            if (episodes.Count >= k) break;
        }

        return episodes
            .OrderByDescending(e => e.Outcome.Success)
            .ThenByDescending(e => e.Timestamp)
            .Take(k)
            .ToList();
    }

    public async Task<string> GenerateFewShotExamplesAsync(
        string currentTask,
        int nExamples = 2)
    {
        var episodes = await RetrieveSimilarAsync(currentTask, nExamples, successOnly: true);

        var examples = episodes.Select(ep =>
        {
            var steps = string.Join("\n",
                ep.Trajectory.TakeLast(3).Select(s =>
                    $"Thought: {s.Thought}\nAction: {s.Action}"));
            return $"Task: {ep.Task}\n{steps}\nResult: {ep.Outcome.Result}";
        });

        return string.Join("\n---\n", examples);
    }
}

Few-Shot from Experience

Episodic memory enables dynamic few-shot learning. Instead of hardcoded examples, the agent retrieves relevant past experiences to guide current tasks.

Production Memory: Mem0

Mem0 is a popular framework for production memory systems, handling the complexity of memory extraction, storage, and retrieval:

Mem0 Integration
from mem0 import Memory

# Initialize Mem0 with configuration
config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4",
            "temperature": 0.1
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small"
        }
    },
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "agent_memories",
            "path": "./mem0_data"
        }
    }
}

memory = Memory.from_config(config)

# Add memories with user context
memory.add(
    "User prefers dark mode and uses VS Code",
    user_id="user_123",
    metadata={"category": "preferences"}
)

memory.add(
    "User is working on a Python FastAPI project",
    user_id="user_123",
    metadata={"category": "context"}
)

# Search memories
results = memory.search(
    "What IDE does the user prefer?",
    user_id="user_123"
)

# Results include relevance scores
for result in results:
    print(f"Memory: {result['memory']}")
    print(f"Score: {result['score']}")

# Get all memories for a user
all_memories = memory.get_all(user_id="user_123")

# Update a memory
memory.update(
    memory_id=results[0]["id"],
    data="User prefers dark mode, uses VS Code with Vim keybindings"
)

# Memory statistics
stats = memory.history(user_id="user_123")
print(f"Total memories: {len(stats)}")
# Integrating Mem0 with an agent

class MemoryAugmentedAgent:
    memory = Mem0()
    llm = ChatModel()

    function respond(userMessage, userId):
        # 1. Retrieve relevant memories
        memories = memory.search(userMessage, userId)
        memoryContext = formatMemories(memories)

        # 2. Build prompt with memory context
        prompt = f"""
        User memories:
        {memoryContext}

        Current request: {userMessage}
        """

        # 3. Generate response
        response = llm.generate(prompt)

        # 4. Extract and store new memories
        newFacts = extractFacts(userMessage, response)
        for fact in newFacts:
            memory.add(fact, userId)

        return response

# Key benefit: 80% token reduction while preserving fidelity
# Instead of keeping entire history, store and retrieve facts

Mem0 Key Features

  • Automatic memory extraction from conversations
  • User-scoped and agent-scoped memories
  • Conflict resolution for contradicting facts
  • Memory decay and importance ranking
  • Multiple vector store backends

Memory Design Patterns

Pattern Description When to Use
Rolling Window Keep last N messages only Simple chatbots, low-stakes tasks
Summarize + Recent Summarize old, keep recent verbatim Most agent applications
Entity Memory Track entities and their states Complex workflows, state machines
Knowledge Graph Store facts as relationships Domain-specific agents, reasoning
Hierarchical Multiple summary levels Very long conversations (100+ turns)

Evaluation Approach

Memory systems should be evaluated on both retrieval quality and downstream task performance:

Metric What it Measures How to Measure
Recall Accuracy Can agent retrieve relevant facts? Insert facts, query later, measure hit rate
Recall@Turn N Accuracy degradation over turns Track accuracy vs conversation length
Token Efficiency Tokens used vs full history Compare memory system vs raw context
Latency Impact Time added by memory operations Benchmark retrieval + storage time
Task Performance Does memory improve outcomes? A/B test with vs without memory

Key metrics for memory system evaluation

Memory Retention Test
Test: Information Retention Over Conversation Length
───────────────────────────────────────────────────────────

Turn 1:  Insert fact: "Project deadline is March 15"
Turn 5:  Query: "When is the deadline?" → Should recall
Turn 10: Insert distractors about other dates
Turn 15: Query: "What's the project deadline?" → Still recall?
Turn 25: Heavy topic changes
Turn 30: Query: "Remind me of the deadline" → Can still recall?

Expected Results:
┌─────────────────────────────────────────────────────────┐
│ Recall Accuracy                                          │
│ 100% ┤■■■■■■■■■■■■■■■■■■                                │
│  90% ┤              ■■■■■■■■■■                          │
│  80% ┤                      ■■■■■■■■                    │
│  70% ┤                              ■■■■                │
│      └──────────────────────────────────────────────────│
│       Turn 1    Turn 10    Turn 20    Turn 30           │
└─────────────────────────────────────────────────────────┘

Analysis Questions:
- At what turn count does recall degrade?
- Does summarization help or hurt?
- What's the optimal compression threshold?

Common Pitfalls

Memory Pollution

Storing everything leads to irrelevant retrievals. Be selective about what enters long-term memory. Use importance scoring.

Conflicting Memories

When facts change (e.g., user updates preference), old memories can contradict new ones. Implement update/invalidation mechanisms.

Over-Summarization

Aggressive summarization loses nuance. Important details can be compressed away. Test recall on specific facts after summarization.

Retrieval Latency

Vector search adds latency. For real-time applications, consider caching hot memories or async prefetching.

Implementation Checklist

  • 1. Define memory types needed (working, short-term, long-term, episodic)
  • 2. Choose vector database (Chroma, Pinecone, Weaviate, Qdrant)
  • 3. Implement summarization with token threshold
  • 4. Design memory extraction logic (what to store, when)
  • 5. Build retrieval with relevance filtering
  • 6. Add memory update/invalidation for changing facts
  • 7. Test recall accuracy at various conversation lengths
  • 8. Monitor token usage and latency in production

Related Topics