Agent Memory Architectures

The context window is RAM, not disk. Everything an agent "knows" mid-task lives in a buffer that is wiped when the session ends and is too small to hold a long history anyway (see context-windows-long-context). An agent with no memory beyond its window restarts every conversation as a stranger and forgets what it learned three steps ago once the transcript grows. Memory architecture is the set of choices that fixes this: what to persist, where to store it, and how to pull the right piece back into the window at the right moment. It is retrieval (see retrieval-augmented-generation) applied to the agent's own experience rather than to a document corpus.

A taxonomy of memory

The clearest framing comes from the CoALA framework, which gives a language agent modular memory components by analogy to cognitive architecture:

Working memory: the active context window. The scratchpad of the current task, holding the immediate plan, recent observations, and intermediate results. Fast, small, volatile.
Episodic memory: a log of past experiences and interactions. "What happened in previous sessions with this user", "how I solved a similar task before". Usually stored as embedded records in a vector store and retrieved by similarity.
Semantic memory: distilled facts and knowledge, decoupled from the episode that produced them. "This user prefers metric units", "the production database is read-only". Often a structured store or a curated set of facts.
Procedural memory: how to do things, the agent's skills. Partly baked into the model weights, partly encoded in the system prompt and the tool definitions.

The split that trips teams up is episodic versus semantic. Episodic memory remembers events; semantic memory remembers conclusions. A robust agent writes raw episodes as they happen and periodically distils them into semantic facts, because retrieving one clean fact beats retrieving ten raw transcripts and asking the model to re-derive it every time.

Implementing it

Working memory is just prompt construction, with compaction: when the window fills, summarise older turns into a running synopsis and drop the verbatim history. Episodic and semantic memory are a write path and a read path over an external store. On write, the agent commits salient events or distilled facts as embedded records. On read, it embeds the current situation and retrieves the most relevant memories to splice into the prompt. The hard parts are deciding what is salient enough to write (write everything and retrieval drowns in noise) and ranking what to read (the lost-in-the-middle effect means even retrieved memory must be placed well in the prompt).

The MemGPT pattern

MemGPT framed memory management as an operating-system problem: virtual context management, borrowing OS paging. The model's context is "main memory", a finite fast tier; an external store is "disk". The agent itself issues function calls to page information in and out, deciding when to evict stale context and when to fetch a relevant record back in. This inverts the usual design: instead of an external system deciding what to retrieve, the model manages its own memory hierarchy through tool calls. It is powerful for long multi-session chat and documents that exceed the window, at the cost of spending model calls on memory bookkeeping.

A taxonomy of memory

Implementing it

The MemGPT pattern

Keep reading with Pro.