How-To2026-05-279 min read

How we give Claude Code a memory that survives every session

Claude Code, like every large language model, starts each session as a blank slate. Close the terminal, open it tomorrow, and it has forgotten your preferences, your project's quirks, and every correction you made the day before. The context window is short-term memory only, and it resets. To make an AI assistant genuinely useful over weeks and months, you have to give it long-term memory that you operate on purpose. Here is the layered setup I run across all my projects, and the handful of rules that decide whether it compounds or quietly rots.

Why "just keep chatting" does not work

The obvious idea is to keep one long conversation and never start fresh. It fails for three reasons. The context window is finite, so old turns fall off the back. Every token you carry forward costs money and slows the model down on every single request. And recall actually gets worse as the conversation fills with noise, because the signal you need is buried under thousands of lines of resolved back-and-forth. Memory is not about keeping everything. It is about keeping the few things that matter and being able to find them.

The three layers

Layer one: an always-loaded index

Claude Code reads a file called CLAUDE.md at the start of every session. That is the anchor. Mine holds two things: the always-apply rules (how I want it to work, the safety lines it must never cross) and a one-line-per-memory index that points to everything else. The discipline here is ruthless brevity. This file is paid for in context tokens on every session, so it stays under a hard size cap, one line per entry, with links out to detail rather than the detail itself. A bloated always-on file is the most common way these setups go wrong: it feels thorough, and it silently taxes every request you ever make.

Layer two: one fact per file

The actual memories live as small individual files, one fact each, loaded only when relevant. Each carries a short frontmatter block so the system can decide whether to surface it: a name, a one-line description used for recall, and a type. I sort everything into four types:

user: who the person is. Role, expertise, fixed preferences.
feedback: working-style guidance and corrections, with the reason why, so the lesson survives rather than just the rule.
project: the state of ongoing work that you cannot read off the code or git history.
reference: pointers to external resources, dashboards, and reusable patterns.

Each file's body is the fact, followed by why it matters and how to apply it, and it links to related memories by name. That cross-linking turns a pile of notes into something closer to a graph, where pulling one memory naturally suggests the others next to it.

Layer three: a searchable deep store

The long tail of history, every decision and small learning from months of work, does not belong in either of the layers above. It lives in a separate searchable store that gets queried on demand at the start of a task, not loaded by default. In my case this is a small custom server that Claude can search, but the principle holds with any vector or keyword store: keep the bulk of memory out of context, and retrieve only the slice the current task needs.

The rules that keep it useful

The architecture is the easy part. These operating rules are what decide whether the memory helps a year from now or becomes a swamp of stale half-truths.

Do not store what the code already records. Structure, past fixes, and git history are all readable on demand. Memory is for the non-obvious: the why behind a decision, a preference, a constraint that lives in someone's head.
Convert relative dates to absolute. "Last week" rots the moment it is written. "2026-05-20" stays true forever.
Update, do not duplicate. Before saving, check for an existing file on the same topic and edit that. Two memories that disagree are worse than none.
Treat recalled memory as context, not gospel. A note that names a file or a feature flag may describe something that has since changed. Verify it still exists before acting on it.
Separate always-apply rules from situational ones. Posture and safety rules stay in the always-loaded file. Tool-specific gotchas live in files that load only when you are doing that kind of work, so they do not tax unrelated sessions.
Prune. Delete memories that turn out to be wrong. A memory system you never garden becomes one you stop trusting.

Automating the save

The weakest link in any note-taking system is remembering to take the note. Claude Code supports hooks that fire on events, and the one that matters here runs when a session ends. I use a Stop hook that prompts the assistant to checkpoint the session: the decisions made, the corrections received, the reusable patterns discovered, filed into the right layer automatically. The result is that memory accrues as a side effect of working, instead of depending on me to stop and journal. Capture has to be automatic, or it does not happen.

Why not just point retrieval at everything

A fair question: why curate at all, when you could index every transcript and let retrieval sort it out? Because curation is the whole point. A search over raw conversation logs returns the place a topic was discussed, including all the wrong turns and abandoned ideas along the way. A search over atomic, deliberately written facts returns the conclusion you actually settled on. The first floods the context with noise and makes recall less reliable; the second is dense signal. The few seconds it takes to write a clean one-line memory pays for itself every time that memory is recalled instead of a thousand lines of chat.

The honest limits

This is not magic. Memories go stale, and a setup with no pruning slowly fills with facts that were true once. Two memories can quietly contradict each other until something forces the conflict into the open. And a recalled memory reflects what was true when it was written, not necessarily now, which is exactly why the verify-before-acting rule matters. Treated as a living system that needs occasional gardening, it is one of the highest-leverage things you can build around an AI assistant. Treated as a write-only archive, it becomes a liability you stop reading.

The bottom line

Long-term memory for an AI assistant is not one clever trick, it is a small system run with discipline: a lean always-loaded index, atomic one-fact files sorted by type, a searchable store for the deep history, and an automatic save on every session. The rules matter more than the tools. Keep the always-on layer tiny, store only the non-obvious, date everything absolutely, and prune without mercy. Do that and the assistant stops re-introducing itself every morning and starts picking up exactly where you left off.

If you want an AI assistant or agent wired into your own business with memory that compounds instead of resetting, send a briefand I'll scope it with you. It is the difference between a clever demo and a tool that actually gets better the longer you use it.

How we give Claude Code a memory that survives every session

Why "just keep chatting" does not work

The three layers

Layer one: an always-loaded index

Layer two: one fact per file

Layer three: a searchable deep store

The rules that keep it useful

Automating the save

Why not just point retrieval at everything

The honest limits

The bottom line

More from the notes.

Running Claude Code as a team: subagents and parallel work

Claude Code skills: how we package repeatable work, and the ones worth building first

How we set up Claude Code to work like a careful engineer

More questions?