Memory
Understand This First
- Harness (Agentic) – the harness stores and loads memory entries.
- Context Window – memory competes for space in the finite window.
Context
At the agentic level, memory is persisted information that allows an agent to maintain consistency across sessions. Unlike an instruction file, which is authored by a human and describes project conventions, memory is typically accumulated from experience: learnings, corrections, and preferences discovered during previous work sessions.
Memory addresses the statelessness of models. Each conversation starts fresh, and without memory, the agent will repeat the same mistakes, ask the same questions, and ignore the same corrections session after session. Memory gives the agent a persistent substrate for learning.
Problem
How do you prevent an agent from repeating mistakes or forgetting lessons learned in previous sessions?
A developer corrects an agent’s behavior (“don’t use library X, use library Y instead”) and the agent complies for the rest of the session. Next session, the agent uses library X again. The correction is lost because the model has no memory between sessions. Multiplied across dozens of corrections and preferences, this creates a frustrating cycle of re-education.
Forces
- Model statelessness: each session starts from zero.
- Correction fatigue: repeating the same feedback erodes trust in the workflow.
- Knowledge accumulation: real expertise grows through experience, and agents should benefit from past sessions.
- Noise risk: too much accumulated memory dilutes the context window with low-value information.
Solution
Use memory mechanisms provided by your harness to persist important learnings, corrections, and preferences across sessions. Memory entries are typically short, specific statements that capture a lesson:
- “When modifying database queries in this project, always include the tenant_id filter.”
- “The team prefers early returns over nested conditionals.”
- “The staging environment requires VPN access; don’t suggest direct connections.”
Good memory entries share several qualities:
Specificity. “Be careful with the database” is useless. “Always use parameterized queries to prevent SQL injection” is actionable.
Relevance. Memory entries should capture lessons that are likely to recur. A one-time debugging note about a transient issue is noise.
Currency. Memory entries can become stale. Periodically review and prune entries that no longer apply.
Memory works alongside instruction files but serves a different purpose. Instruction files are deliberately authored project documentation. Memory is the accumulation of corrections and discoveries: the notes a developer scribbles in the margins while learning a codebase.
Working examples as memory. Memory doesn’t have to be prose rules. Saving working code snippets, successful configurations, and proven recipes creates a personal knowledge library the agent can draw on in future sessions. A developer who solves a tricky OAuth flow can save the working implementation as a memory entry. Next time a similar integration arises, the agent has a tested reference point instead of generating from scratch. This turns personal expertise into reusable agent infrastructure.
Memory decay. Not all memories stay equally relevant. A correction from yesterday matters more than one from three months ago, unless that older correction keeps coming up. Mature memory systems apply a decay heuristic: recently accessed facts stay prominent, while facts that haven’t been referenced in weeks sink to lower priority. Nothing gets deleted — old memories remain in storage and can resurface when a conversation touches their topic. The practical effect is that memory becomes self-maintaining. Instead of periodic manual pruning sessions, the system naturally foregrounds what’s active and backgrounds what’s stale. If you’re building or configuring a memory layer, look for access-frequency weighting: memories that get retrieved often should resist decay, while memories that sit untouched should fade gracefully.
Automated extraction. The tip below describes the manual approach: you notice something worth remembering and ask the agent to save it. The next maturity level removes you from that loop. A scheduled process (a nightly hook or cron job) reviews the day’s conversations, identifies durable facts — decisions made, people mentioned, status changes, recurring corrections — and stores them as memory entries. This shifts memory from something you consciously create to something the system harvests from your working history. Teams that adopt automated extraction find their agents improving faster, because they capture lessons the human would have forgotten to save.
When you correct an agent and the correction will apply to future sessions, ask the agent to save it as a memory entry. Frame it as a rule: “Remember: in this project, we always X because Y.” This turns a one-time correction into a durable improvement.
How It Plays Out
A developer spends a session working with an agent on a payment processing module. During the session, she corrects the agent three times: use decimal types for currency (not floats), always log transaction IDs, and wrap payment calls in idempotency guards. She saves each correction as a memory entry. In the next session, when she asks the agent to add a new payment method, the agent applies all three conventions without being reminded.
A team notices that their agent’s memory has grown to fifty entries over several months, some referencing deprecated patterns. They spend fifteen minutes pruning the list, removing outdated entries and consolidating related ones. Output quality improves because the context window is no longer carrying stale information.
A developer who frequently builds CLI tools saves her working argument-parser boilerplate as a memory entry. Two weeks later, she starts a new project and asks the agent to set up the CLI scaffolding. The agent pulls from the saved example rather than generating from defaults, producing code that matches her preferred structure on the first try.
“Save this as a memory: in this project, always use Decimal for currency fields, never use floating point. Also remember that all API responses must include a request_id header for tracing.”
Consequences
Memory makes agents feel like they learn over time. Corrections stick. Preferences accumulate. Working examples compound. The agent becomes more useful with continued use, and teams that invest in memory curation develop agents that behave like experienced colleagues who know the project’s quirks.
The cost is curation. Memory without pruning (or without decay heuristics) becomes noise. Contradictory entries confuse the model. Memory entries consume context window space in every session, so bloated memory directly reduces the space available for the current task. Treat memory as a curated collection, not an append-only log.
Expect a cold-start period. A freshly configured agent with empty memory is generic and frustrating. It takes roughly a week of daily use before accumulated corrections, preferences, and working examples make the agent genuinely useful for your project. This ramp-up is predictable, not a sign that memory isn’t working. Push through the first few days of mediocre results, correct generously, and the agent will catch up.
Related Patterns
- Depends on: Harness (Agentic) — the harness stores and loads memory entries.
- Contrasts with: Instruction File — instruction files are human-authored; memory is experience-accumulated.
- Uses: Context Engineering — memory is context that persists across sessions.
- Enables: Progress Log — memory captures learnings; progress logs capture actions.
- Depends on: Context Window — memory competes for space in the finite window.
- Uses: Hook — automated memory extraction runs as a scheduled hook on conversation history.
- Relates to: Compaction — decay is to long-term memory what compaction is to in-session context.
- Relates to: Steering Loop — automated extraction is a feedback loop on memory quality.
- Enabled by: Architecture Decision Record — ADRs are a form of project memory that persists across sessions and team changes.
Sources
- OpenAI introduced persistent memory for ChatGPT in February 2024, making it the first major AI assistant to retain user preferences and corrections across sessions. The feature established the pattern of accumulated, user-visible memory entries that this article describes.
- Anthropic’s Claude Code introduced file-based memory through CLAUDE.md files, where project conventions and accumulated learnings are stored as plain text that loads automatically at session start. This approach treats memory as editable, version-controlled documents rather than opaque database entries.
- Mem0, founded by Taranjeet Singh and Deshraj Yadav in January 2024, built the first dedicated open-source memory layer for AI agents, providing infrastructure for storing, retrieving, and managing persistent agent memories at scale.
- The semantic, episodic, and procedural memory taxonomy that underpins modern agent memory design traces to Endel Tulving, who distinguished episodic from semantic memory in Elements of Episodic Memory (1983). Agent memory systems map directly onto his categories.
- Felix Craft and Nat Eliason documented months of production agent use at The Masinov Company in “How to Hire an AI” (2026), providing first-person evidence for memory decay heuristics, the cold-start ramp-up period, and automated nightly extraction cycles that harvest durable facts from conversation history.
- The access-frequency decay model draws on Hermann Ebbinghaus’s forgetting curve (1885), which established that biological memories decay exponentially unless reinforced through retrieval. Modern agent memory systems apply the same principle: memories accessed often resist decay, while unretrieved memories fade.