3603 words
18 minutes
Memory and Dreaming: How Anthropic Just Shipped the Karpathy Wiki Pattern

For the past few weeks, we have been hand-building a compiled knowledge layer for this site’s own agent team. Chunks of raw research get distilled by a researcher agent into concept files. Those concept files get threaded into connection documents. An index ties the whole thing together. It works, and more to the point, the agents that use it perform noticeably better than the ones running on raw context alone. It is also, to be clear about the workload, a considerable amount of engineering to maintain.

Then Anthropic shipped Memory and Dreaming.

Memory is the virtual file system for your managed agent: persistent, structured, and readable by any agent you grant access to [1]. Dreaming is the out-of-band curation process that consolidates session experience into that file system overnight, without blocking any task [2]. Together they are, in essence, the Karpathy Wiki pattern as a platform primitive, and they launched within five weeks of Andrej Karpathy publishing the gist that named the idea.

This is the third article in the Compiled Knowledge sub-series. If you have not read the first two, the short version is: agentic RAG is expensive and context-hungry; compiled knowledge solves that by doing the synthesis work once at write time instead of every time at query time. Memory and Dreaming are where that architectural idea lands as production infrastructure.

Figure 1 - The Karpathy Wiki pattern as two Anthropic primitives: Memory (public beta) as a virtual file system and Dreaming (research preview) as an async curation job.

Figure 1 - Memory + Dreaming as the Karpathy Wiki Pattern: The Karpathy Wiki thesis is simple: compile your agent’s knowledge once, then let it query the result, rather than re-deriving the same answers from raw context at every run. Anthropic’s Memory primitive (public beta, April 2026) is the compiled store. The Dreaming API (research preview, May 2026) is the compile step. The two together close the loop that the architecture required.


The spring 2026 convergence#

Andrej Karpathy published his LLM Wiki gist on April 4, 2026. It was a short proposal: give agents a persistent wiki that they collaboratively read and write, analogous to how human teams maintain a shared knowledge base, and let agents improve it over time. The gist collected around 5,000 stars in weeks [7].

The timing is worth understanding precisely. Anthropic launched the Memory public beta on April 23, three weeks after Karpathy’s gist and clearly from a parallel development track. Pinecone announced Nexus on May 4, with a blog post titled Better Models Won’t Save Your Agent. Dreaming landed at Code with Claude on May 6.

This is not a relay race where Karpathy proposed and the vendors implemented. These teams were building the same shape at the same time, arriving at convergent conclusions from independent starting points. Karpathy’s gist named the pattern in public first. Anthropic, Pinecone, and the broader community were already working on versions of it.

Our thesis still holds: Anthropic shipped the Karpathy Wiki pattern. The nuance is that it shipped as a platform primitive, not as a response to the gist, and the two timelines are more parallel than sequential.

Figure 2 - Spring 2026 convergence timeline: Karpathy gist April 4, Anthropic Memory beta April 23, Pinecone Nexus May 4, Anthropic Dreaming May 6, with parallel-track arrows.

Figure 2 - Parallel Convergence, Spring 2026: The compiled-knowledge pattern did not emerge in sequence. Karpathy published April 4; Anthropic’s Memory beta launched April 23 (three weeks later, clearly from its own development timeline); Pinecone Nexus announced May 4; Dreaming arrived May 6. Four teams, five weeks, one architectural thesis. The arrow directions here are convergence, not causation.


Memory: the virtual file system#

Memory in Claude Managed Agents is a public beta that launched on April 23, 2026. The core abstraction is a virtual file system mounted inside a managed agent’s execution context. Files in the store look and behave like files on disk: they have names, paths, and content. Your agents read them with standard read operations and write them with standard write operations.

The storage layer has three tiers worth understanding separately.

What is stored: structured text files organized by topic. Think of them as living documentation files that the agent can both read for context and update as it learns. Paths follow a conventional naming scheme (project-level vs. user-level), and attribution metadata tracks which agent wrote what and when, so you can audit the knowledge base the same way you would audit a shared codebase.

How consistency is maintained at scale: the memory layer uses optimistic concurrency to handle multiple agents writing simultaneously. Before an agent can commit a write, it must provide the content hash of the version it read. If another agent has written in the meantime, the hash will not match and the write is rejected, forcing the writing agent to re-read and merge. This is content-hash precondition semantics, and it means you can safely run many agents against the same memory store without coordinating writes through a bottleneck.

What the process layer does: agents can be granted permission to read memory, write memory, or both. Permission scopes are separate from task permissions. An agent that has memory-read but not memory-write can use the knowledge base as a reference without risking corruption. An agent with both can contribute to it.

Figure 3 - Three-layer Memory stack: storage (version history, attribution), structure (project vs user files, index, topic files), and process (permission scopes, optimistic concurrency).

Figure 3 - The Memory Stack, Three Layers: The Memory primitive is not a single API call but a three-layer system. Storage provides persistence, version history, and attribution. Structure organizes the files into a project/user split with an index and topic files. Process enforces write safety through optimistic concurrency and permission scopes. All three layers are managed by Anthropic’s platform; you interact with them through the managed agent API.

The customer evidence for Memory’s impact comes from Rakuten. Yusuke Kaji, General Manager of AI for Business at Rakuten, is quoted on their customer page: “Our agents with memory remember what went wrong in past sessions and avoid repeating those mistakes. In our pilot, initial critical errors dropped by 97%, with cost and latency down more than 30%, without any loss in output quality” [5].

That 97% figure is striking. The mechanism behind it is straightforward: without memory, every session starts cold. The agent re-encounters the same class of mistake because there is no record of having made it before. With memory, the mistake gets written to the store. The next session reads the store. The error class stops recurring.

KEY INSIGHT: The Memory primitive does not make individual agents smarter. It makes the system learn across runs. Each session can read the accumulated output of all previous sessions, turning a stateless execution model into one with institutional memory.


Dreaming: the out-of-band curation process#

Dreaming (the Dreams API) is a research preview announced at Code with Claude on May 6, 2026. It requires a gated request and beta headers (managed-agents-2026-04-01 and dreaming-2026-04-21) [4]. The status distinction matters: Memory is available broadly as a public beta while Dreaming is gated and may change before a wider rollout.

The Dreams API runs an asynchronous job that consolidates recent session data into a memory store. The key word is “asynchronous”: the job runs separately from any active agent task. No session waits for it. No task is blocked by it.

A few properties of Dreaming that are worth being precise about:

The input store is never modified. The Dreams API takes an input memory store and produces an output memory store. Per the official docs, the input is read-only during the job. The output is a new store you can review and, if it looks good, promote to your live memory. The Dreams job is a proposal you accept or reject, not an in-place mutation.

The separation of objectives is the design. A task-running agent optimizes for task completion. Dreaming optimizes for memory quality. These are genuinely different objectives, and conflating them inside a single task loop produces predictable garbage: the agent rushes the memory update to get back to the task, or it gives a thorough memory update and wastes the user’s time. Dreaming runs the curation step as a separate job with a separate objective so neither is compromised.

The multi-agent perspective is the leverage. When Mahesh Murag, Member of Technical Staff on Anthropic’s platform team [6], presented this at Code with Claude, the framing he used was the test-time-compute analogy [1]: Dreaming uses additional compute to produce better memory quality, the same way longer reasoning chains use additional compute to produce better answers. The Dreams job can draw on multiple agents’ perspectives, cross-referencing session logs across the team, to produce a memory store that reflects what the whole system learned, not just what one agent noticed.

The Harvey result comes from Dreaming, not Memory. Anthropic reported that Harvey saw approximately 6x task completion rate gains in their internal tests [2]. This is an Anthropic blog-post attribution of Harvey’s internal results, not a Harvey-published finding. Harvey’s customer page does not carry this statistic. The mechanism, per Anthropic, is that Dreaming consolidated Harvey’s accumulated task experience into a memory store that gave later sessions significantly better starting context.

Figure 4 - Dreaming job flow: read-only session logs in, an async job running four phases in the center, an output memory store out, and a human-review gate before promotion to the live store.

Figure 4 - The Dreaming Job Flow: A Dreams API call takes a read-only input store and recent session logs, runs an asynchronous curation job through four phases, and produces a new output store. The input is never modified. The output is a candidate you review before promoting. The human-review gate in the middle is the architecture’s safety valve: you can inspect the proposed memory changes before they go live.

KEY INSIGHT: Dreaming separates the compile step from the task loop. The cost of curation is paid once, out of band, by a job whose only job is curation. Every subsequent task session draws on the compiled result without paying the curation cost again.


AutoDream: the consumer-side implementation in Claude Code#

On the Claude Code consumer side, Memory and Dreaming surface through AutoDream: an automatic memory consolidation process that runs on a time or session-count basis. The exact trigger thresholds are community-reported and not officially documented as of this writing.

What multiple third-party walkthroughs confirm is the behavior pattern. Sessions accumulate in logs. When the AutoDream trigger fires, Claude Code runs a Dream-like curation job in the background, consolidates what it learned from the recent sessions, and updates the memory files at .claude/projects/<project>/memory/.

The memory file structure follows a master-index-plus-topic-files layout. A single index file (constrained to ${INDEX_MAX_LINES}, typically around 200 lines [10]) holds pointers to topic-specific files. The index is what the agent reads first when it starts a session, so the 200-line constraint is functional: keep the index scannable in one pass.

The four phases of the Dream process, confirmed from the public Piebald-AI prompt (a reverse-engineering project that extracted Claude Code’s actual internal system prompts) [10]:

  1. Orient: read the memory directory, load the index file, scan existing topic files, and review activity logs to understand the current state.
  2. Gather Recent Signal: extract useful patterns from the most recent 1-3 days of sessions. The extraction grepping is deliberately narrow: only pull what was actually learned, not a raw session dump.
  3. Consolidate: write or update memory files. Merge related entries, convert relative date references to absolute dates, and delete entries that contradict newer information.
  4. Prune and Index: update the master index within the line constraint, remove stale pointers, and shorten verbose entries to their essential claim.

The /memory command in Claude Code lets you view and toggle memory behavior. The /dream command, as documented by third-party walkthroughs, triggers a project-level Dream run manually [11], [12]. The scoping flags /dream user and /dream all appear in third-party implementations based on the Piebald-AI prompt; they are community additions to the slash-command interface, not documented Anthropic commands.

Anthropic documents the Dreams API for managed agents through the platform API. The consumer-facing Claude Code commands are a separate implementation layer, and the exact command surface may differ in the official rollout from what the community walkthroughs describe.

Figure 5 - Four-phase Dream process: Orient, Gather Recent Signal from the last 1-3 days of logs, Consolidate topic files, then Prune and Index the master index within its line cap.

Figure 5 - The Four-Phase Dream Process: Every Dream run follows the same four phases. Orient first, so the agent understands the current state of the memory store. Gather Recent Signal, narrow and precise. Consolidate into topic files, merging and correcting rather than appending indefinitely. Prune and Index last, keeping the master index within its line constraint. The output is a memory store that a future session can read in one pass and actually use.


The DIY path: wrapping the public Dream prompt#

If your team is not yet on Claude Managed Agents, the Piebald-AI prompt gives you the core Dream logic as a starting point. It is Claude Code’s actual internal Dream system prompt, extracted by the Piebald-AI project. You can wrap it as a Claude Code skill that runs the same four-phase curation against your own project’s memory folder.

Our own article pipeline uses exactly this compiled-knowledge shape:

/docs/article-drafts/<domain>-NN-<slug>/
index.md # article body
image-prompts.md # figure prompts
research/
research-brief.md # researcher deposit

That folder structure is the compiled knowledge layer for our article pipeline. The researcher agent deposits a research brief. The writer agent reads it. The reviewer agent audits it. Each agent carries exactly what it needs and nothing more, because the brief was compiled before the task started.

The Dream skill wraps this by running a consolidation pass over session logs for the project and depositing the result into a memory folder that every subsequent agent can read at the start of its session. The forward reference to the Postgres implementation is in Part 2 of this series, which covers the artifact table, pgvector, and the tsvector column that make semantic search possible over the compiled store.

KEY INSIGHT: The DIY Dream prompt and the managed Dreaming API are solving the same problem with the same four phases. The platform version handles persistence, versioning, and multi-agent coordination for you. The DIY version gives you the logic in a skill you can inspect and modify.

Figure 6 - Comparison of hand-built curation (researcher deposits a brief, writer reads, reviewer audits) versus the Dreaming API (async job runs four phases, output promoted after review).

Figure 6 - Hand-Built KB vs. Managed Dreaming: The two approaches differ in who runs the curation step and how it is triggered, but not in what the curation step does. Both compile session experience into a structured knowledge store through roughly the same four phases. The managed version handles infrastructure, versioning, and multi-agent concurrency. The DIY version gives you the prompt logic directly.


Four memory types; two API shapes#

The Karpathy Wiki pattern lives at a specific layer of the agent memory stack. There are four standard memory types: structural (the agent’s architecture and configuration), semantic (the compiled knowledge base), episodic (session-specific context), and procedural (how to do specific tasks).

Memory and Dreaming address the semantic layer: the compiled, queryable representation of what the agent team has learned. Episodic memory lives in session logs. Structural memory lives in configuration. Procedural memory lives in skills and prompts. The semantic layer was the missing piece for most teams, because it required a process to compile it. Dreaming is that process.

The two API shapes correspond to two distinct moments in the agent lifecycle:

  • Memory is the query-time interface. A session starts, reads the store, and has compiled context from the moment it begins.
  • Dreaming is the compile-time interface. An asynchronous job runs after sessions accumulate, synthesizes what was learned, and writes a new store.

The interaction between them closes the loop: sessions read Memory, sessions produce logs, Dreaming compiles logs into Memory, sessions read the improved Memory. The loop is not synchronous (no session waits for curation) and it is not in-place (the input store is never modified). It is a deferred, reviewable compilation step.

Figure 7 - Four-layer agent memory stack: structural, semantic, episodic, procedural. Memory and Dreaming sit at the semantic layer, with session logs flowing down and curated output flowing up.

Figure 7 - Memory and Dreaming in the Four-Layer Stack: The four-layer agent memory model distinguishes structural, semantic, episodic, and procedural layers. Memory and Dreaming APIs address the semantic layer specifically: the compiled, persistent knowledge base that gives agents cross-session intelligence. Sessions produce episodic records (logs); Dreaming compiles those records into semantic artifacts; future sessions read the semantic layer and begin with accumulated context.


What it means for builders#

The headline for teams already running on Claude Managed Agents is straightforward: Memory (public beta) is available now, and Dreaming (research preview) requires a form request to get access. If your agents are currently stateless across sessions, enabling Memory is the first step and the documented customer results are encouraging.

For teams not yet on Managed Agents, the practical takeaway is that the architectural pattern is settled. Whether you run the Piebald-AI Dream prompt as a skill, build the Postgres artifacts table from Part 2, or wait for the Dreaming API to broaden access, the shape of the problem is the same: compile session experience into a queryable store and make that store available at the start of each session.

The frontier-platform features (optimistic concurrency across a multi-agent team, version history, attribution metadata, automated curation triggers) are things Anthropic’s platform handles for you if you are on Managed Agents. Those are real capabilities that would take significant engineering to replicate. The durable work, regardless of platform, is structuring the team’s own knowledge so there is something worth remembering when the infrastructure is ready.

Compiled knowledge only compounds when the underlying concepts are well-organized. A Dream job that runs over well-structured session logs produces a useful memory store. The same job run over unstructured noise produces organized noise. The researcher-writer-reviewer pipeline that this site’s agent team uses is one answer to the organization problem. The Postgres artifacts table in Part 2 is another. Getting the structure right is the part that does not get automated away.

Figure 8 - Adoption decision tree: if on Claude Managed Agents, enable Memory (public beta) and request Dreaming (research preview); if not, build a DIY Dream skill or a Postgres artifacts table.

Figure 8 - Adoption Decision Tree for Memory and Dreaming: The path to compiled knowledge differs by platform, but the destination is the same. Teams on Claude Managed Agents start with the Memory public beta. Teams not yet on the platform start with the DIY patterns from Parts 1 and 2 of this series. The platform eventually closes the gap; the knowledge structure you build now transfers regardless of which path you take.


Conclusion#

Memory and Dreaming are not research ideas. Memory has been in public beta since April 23, 2026, with documented production results from Rakuten [5]. Dreaming is a gated research preview, but the architecture behind it is clear and the pattern is already replicable with the public Piebald-AI Dream prompt.

The Karpathy Wiki thesis was that agents should compile their knowledge rather than rediscover it on every run. In spring 2026, that thesis converged across Anthropic, Pinecone, and the broader community in parallel, not because Karpathy told anyone to build it, but because the problem was real and the solution was ready. The Memory and Dreaming APIs are the most complete version of that pattern available today, because they handle the infrastructure layer (persistence, versioning, multi-agent concurrency, curation triggers) without requiring teams to build it from scratch.

What does not get automated away is the conceptual work upstream of the curation step. Memory compounds when the underlying knowledge is well-structured. Dreaming synthesizes session experience more accurately when the agents doing the sessions have been designed to produce useful experience. The investment in knowledge architecture that this sub-series has been describing is not made redundant by platform primitives, it is what makes platform primitives useful.

The next step for teams on Claude Managed Agents is to enable Memory and request access to the Dreaming preview. The next step for teams building their own stack is the Postgres compiled-knowledge engine from Part 2. Either way, the compile-time knowledge architecture is now a production pattern, not a research prototype.


The Series#

This is Part 3 of the three-part Compiled Knowledge sub-series:

  1. From Agentic RAG to Compiled Knowledge: Why Karpathy’s Wiki Idea Is Spreading: the architectural case for moving from query-time retrieval loops to compile-time synthesis, and the four-team convergence in spring 2026
  2. Build Your Own Compiled Knowledge Engine in Postgres: the practical walkthrough of one artifacts table, three indexes, JSONB plus pgvector plus tsvector, no vendor product required
  3. Memory and Dreaming: How Anthropic Just Shipped the Karpathy Wiki Pattern (this article): the production primitive in Anthropic’s Managed Agents API and what it means for teams already running on the platform

References#

[1] M. Murag, Anthropic. “Memory and dreaming for self-learning agents.” Code with Claude 2026, San Francisco, May 6, 2026. https://claude.com/code-with-claude/session/sf-memory-and-dreaming-for-self-learning-agents

[2] Anthropic. “New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration.” Anthropic Blog, May 6, 2026. https://claude.com/blog/new-in-claude-managed-agents

[3] Anthropic. “Built-in memory for Claude Managed Agents.” Anthropic Blog, April 23, 2026. https://claude.com/blog/claude-managed-agents-memory

[4] Anthropic. “Dreams.” Claude Platform Documentation, 2026. https://platform.claude.com/docs/en/managed-agents/dreams

[5] Rakuten (Y. Kaji, GM AI for Business). “Customer Story: Rakuten.” Claude Customers, 2026. https://claude.com/customers/rakuten-qa

[6] Code with Claude 2026, San Francisco. Official session and speaker registry, May 6, 2026. https://claude.com/code-with-claude/san-francisco

[7] A. Karpathy. “LLM Wiki.” GitHub Gist, April 4, 2026. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

[8] Pinecone. “Better Models Won’t Save Your Agent.” Pinecone Blog, May 4, 2026. https://www.pinecone.io/blog/introducing-nexus-knowledge-engine/

[9] Pinecone. “Pinecone Nexus: The Knowledge Engine for Agents.” Pinecone Blog, May 4, 2026. https://www.pinecone.io/blog/knowledge-infrastructure-for-agents/

[10] Piebald-AI. “agent-prompt-dream-memory-consolidation.md.” GitHub: Piebald-AI/claude-code-system-prompts, 2026. https://github.com/Piebald-AI/claude-code-system-prompts/blob/main/system-prompts/agent-prompt-dream-memory-consolidation.md

[11] N. Herk, AI Automation. “Claude Code Just Dropped Memory 2.0.” YouTube, March 24, 2026. https://www.youtube.com/watch?v=LrgfmZkl3nc

[12] Chase AI. “Claude Code’s Hidden /dream Feature MASSIVELY Upgrades Memory.” YouTube, ~March 2026. https://www.youtube.com/watch?v=E-1Lmyv6Cjo

Memory and Dreaming: How Anthropic Just Shipped the Karpathy Wiki Pattern
https://dotzlaw.com/insights/ai-08-memory-and-dreaming/
Author
Katrina Dotzlaw, Ryan Dotzlaw, Gary Dotzlaw
Published at
2026-06-17
License
CC BY-NC-SA 4.0

Building production AI, or modernizing a legacy system?

That is the kind of work we do at Dotzlaw Consulting. Book a free 20-minute intro call and tell us what you are trying to build, or what is slowing you down.

← Back to Insights