Running a single Claude Code agent is a solved problem. You write a prompt, watch it work, nudge it when it asks. The ceiling arrives around the third or fourth complex task when the context window fills up, the model loses track of earlier decisions, and you find yourself spending as much time steering as you would have spent coding.
The natural response is to carve the work up across multiple agents: one to research, one to design, one to build, one to review. Straightforward idea. Then the reliability math kicks in.
This is a Claude Code article on multi-agent pipelines: why you should run them, what the compounding failure numbers mean for how you have to architect them, how we structured the six-agent team running this site, what Claude Code’s native Agent View actually solves, and what the best practitioners in the field have learned about the reviewer-agent pattern.

Figure 1 - The four-role pipeline: A specialized multi-agent pipeline routes each task through agents matched to the work: a researcher that finds, an architect that designs, a developer that builds, and a reviewer that validates. Each role gets scoped to what it does best; the harness handles the handoffs.
Why run multiple agents at all
The single-agent ceiling is real, but it is not about context length alone. A single agent accumulates assumptions. The research phase leaks into the design phase. The design biases the implementation. By the time the reviewer runs, it has inherited all of the prior agent’s context, including its mistakes.
Splitting roles forces clean handoffs. The researcher cannot bias the reviewer because the reviewer starts fresh. The architect cannot rationalize the developer’s shortcuts because the architect is not in the room when they happen. Each agent gets a tightly scoped task, its own context window, and explicit inputs and outputs. That is the architectural advantage of the multi-agent shape.
There is a cost-of-reasoning benefit too. Not every phase needs the same model. A researcher doing codebase search can run on a fast model. An architect reasoning through tradeoffs may earn the expense of a deeper one. The harness selects the right tool per phase instead of running everything through the most expensive option.
KEY INSIGHT: Splitting pipeline phases across specialized agents is not about concurrency. It is about context isolation. A reviewer that cannot see the implementer’s reasoning is a better reviewer.
The copilot-agent-workflow pattern captures this cleanly. The canonical development pipeline routes through researcher, then architect, then developer, then code reviewer. Each agent produces a structured artifact: a research report, an implementation plan, a code diff, a review verdict. The artifact is the handoff; the next agent reads the artifact, not the conversation that produced it.

Figure 2 - Four roles, four artifacts: The development pipeline is a sequence of roles, each producing a structured artifact that the next role consumes. The artifact boundary is what keeps contexts clean and makes each role independently testable.
The Dotzlaw production team
The six-agent team running this site is a concrete example of the pattern at production scale. The full roster:
| Agent | Role | Model | Scope |
|---|---|---|---|
| team-lead | Orchestrator | opus | Shared config, pipeline coordination, progress tracking |
| content-migrator | Specialist | sonnet | WordPress extraction, content files, images |
| site-developer | Specialist | sonnet | Components, layouts, pages, styles, schemas |
| article-researcher | KB curator | sonnet | Web research, raw source ingestion, KB index upkeep |
| article-writer | Drafter | sonnet | Reads pipeline, queries KB, drafts articles in Gary’s voice |
| article-reviewer | Validator | sonnet | Read-only on drafts; files structured review notes |
Six agents, six scopes, zero overlapping write permissions. That last point is the part that keeps the team from stepping on itself.
File ownership as the coordination mechanism
Each agent owns a directory slice. Ownership means write access, everything else is read-only.
| Directory | Owner |
|---|---|
src/components/, src/layouts/, src/pages/ | site-developer |
src/content/articles/ | content-migrator |
docs/kb/concepts/, docs/kb/raw/ | article-researcher |
docs/article-drafts/<slug>/ | article-writer |
docs/kb/qa/reviews/ | article-reviewer |
CLAUDE.md, docs/kb/pipeline-articles.md | team-lead |
No agent modifies a directory it does not own. This is not a soft guideline, it is enforced by the role definitions in .claude/agents/team/*.md. When the article-writer drafts a new article, it writes to docs/article-drafts/. The content-migrator installs the approved draft to src/content/articles/. The writer never touches src/; the migrator never touches docs/article-drafts/.

Figure 3 - Directory ownership table: Each agent owns a non-overlapping slice of the file tree. The ownership map is what makes parallel work safe: agents can read anywhere but write only to their assigned directories, preventing the file-system conflicts that plague uncoordinated multi-agent systems.
Agent definition skeletons
Each agent lives as a .claude/agents/team/<name>.md file. The skeleton for the article-writer looks like this:
---name: article-writerdescription: > Drafts long-form tutorial articles for the dotzlaw.com pipeline. Reads the next NOT STARTED row from the article pipeline doc, queries the KB for source material, and produces a draft.tools: Read, Write, Edit, Bash, Glob, Grepmodel: sonnetcolor: orange---
<role>You are the article writer for the dotzlaw.com pipeline.Your source of truth for the next article to write isdocs/article-drafts/ARTICLE_TOPICS_&_SEQUENCE.md.</role>
<ownership>Files you own and can modify:- docs/article-drafts/<domain>-NN-<slug>/index.md- docs/article-drafts/<domain>-NN-<slug>/image-prompts.md- docs/article-drafts/<domain>-NN-<slug>/figure-NN-*.png
Files you must NEVER modify:- src/content/articles/ (content-migrator installs these)- docs/kb/ (article-researcher owns the KB)</ownership>The tools key limits what the agent can invoke. The <ownership> section tells it exactly where it can write. The role definition is what the agent reads before every task; it does not infer permissions from context.
The article-reviewer definition is similar but with one key difference: its <constraints> block says “Never modify the draft. File review notes at docs/kb/qa/reviews/. Read-only on all article drafts.” The reviewer cannot accidentally fix the article even if it tries.
KEY INSIGHT: The file-ownership map is not documentation. It is the coordination mechanism. Agents that cannot write to each other’s directories cannot produce the class of conflict that corrupts shared work.
The research-write-review handoff
The pipeline moves through three agents in sequence:
-
The article-researcher fills KB gaps for the next topic. It scrapes sources to
docs/kb/raw/, distills concepts todocs/kb/concepts/<domain>/, and drops a research brief atdocs/article-drafts/<slug>/research/research-brief.md. -
The article-writer reads the brief, queries the KB via
uv run --directory scripts/kb python query.py "<topic>", and drafts the article atdocs/article-drafts/<slug>/index.mdwithdraft: true. It also writes the companionimage-prompts.md. -
The article-reviewer reads the draft and the brief. It files a structured verdict at
docs/kb/qa/reviews/<date>-<slug>.md. Verdict is APPROVED or CHANGES REQUESTED. It never edits the draft.
On APPROVED, the content-migrator copies the article folder to src/content/articles/<slug>/, flips draft: false, and runs pnpm build. The team-lead updates the pipeline dashboard.
That sequence is the one that produced this article.

Figure 4 - The research-write-review-publish handoff: No agent in the pipeline can skip a step or take a shortcut around the file-ownership boundary. The researcher produces the brief before the writer drafts; the reviewer validates before the migrator publishes. Each handoff is a file artifact, not a message.
The compounding-failure math
Multi-agent pipelines look great in demos. Each agent works correctly in isolation. The researcher finds what you need. The architect produces a sensible plan. The developer ships clean code. The reviewer catches the edge cases. Then you run the pipeline ten times and start counting the failures.
The math is straightforward. If each step in a five-step pipeline succeeds at 95%, the end-to-end success rate is 0.95^5 = 77.4%. That is arithmetic, not pessimism. Cursor’s engineering blog makes it explicit: “Making that work well is fundamentally a harness challenge” [1]. The orchestration logic, which agent to dispatch, how to frame the task for each agent’s strengths, how to stitch the results into a coherent workflow, lives in the harness rather than any single agent.
Extend the same arithmetic and the implications scale badly. Ten agents at 95% reliability each: 0.95^10 = 59.9%. Twenty agents: 0.95^20 = 35.8%. A pipeline that feels fine with three agents and looks acceptable with five starts failing a third of the time with twenty.

Figure 5 - How reliability compounds across pipeline steps: At 95% per-step reliability, a five-step pipeline succeeds 77.4% of the time, and a ten-step pipeline drops to 59.9%. The gap between “works in a demo” and “works at scale” is the compound product of every step.
Why better agents do not fix this
This is the part that takes a moment to absorb. You cannot route around compounding failure by improving individual agent step quality. You can reduce the slope, but you cannot change the shape.
Going from 95% to 99% per step helps. By the same arithmetic, a five-step 99% pipeline succeeds 95.1% of the time instead of 77.4%. But the structure of the problem is the same: errors accumulate, and no single agent’s improvement eliminates the compounding.
The architectural answer is deterministic rails: checkpoints at phase boundaries that validate outputs before passing them forward, human gates at decisions with high recovery cost, and error budgets that let failed sub-agents retry in isolation rather than failing the entire chain. These are harness responsibilities, not model responsibilities.
Cursor frames the orchestration challenge the same way: the system has to know which agent to dispatch, how to frame each task to an agent’s strengths, and how to stitch the results into a coherent workflow [1]. That coordination logic lives in the harness rather than in any single agent.
A specialized review harness is the reference implementation of this idea: a sequential state machine that runs one scoped sub-agent per unit of work, with deterministic validation between phases. Because every sub-agent gets a tightly scoped task and a clean context window, the orchestrator stays small and the sub-agent retry logic stays feasible. A failed sub-agent can be rerun in isolation without rerunning the whole harness.
KEY INSIGHT: Better individual agents reduce the slope of the reliability curve. Deterministic checkpoints between phases are what prevent errors from compounding to the end of the chain.
Agent View: what it actually solves
Before Agent View shipped as a research preview for Claude Code, running parallel agents meant opening multiple terminal windows, manually tracking which session was doing what, and switching back and forth to answer permission prompts or unblock a stalled session. There was no unified view, no status summary, and no way to reply to a session without fully entering it.
Agent View [2] solves that friction. It does not solve the compounding-failure problem.
The command surface
Launch with claude agents. The view lists every background session grouped by state, with sessions needing input pinned at the top. Narrow to a single project directory (requires Claude Code v2.1.141+):
claude agents --cwd ~/projects/my-appThree ways to start a background session:
# From any terminal shellclaude --bg "investigate the flaky SettingsChangeDetector test"
# With a specific subagentclaude --agent code-reviewer --bg "address review comments on PR 1234"
# With a custom display nameclaude --bg --name "flaky-test-fix" "investigate the flaky SettingsChangeDetector test"From inside an active session:
/background/bg run the test suite and fix any failuresThe session survives terminal close. It is hosted by a supervisor process that keeps running across terminal restarts.
Navigation and session status
| Shortcut | Action |
|---|---|
Enter or → | Attach to selected session |
← on empty prompt | Detach, return to Agent View |
Space | Open/close peek panel |
Shift+Enter | Dispatch and attach immediately |
↑ / ↓ | Move between rows |
Ctrl+X (twice) | Stop then delete session |
Ctrl+T | Pin/unpin session |
Ctrl+R | Rename session |
Esc | Close panel or exit |
? | Show all shortcuts |
Each row shows the session’s name, current activity, and elapsed time. Status indicators:
| State | Color | Meaning |
|---|---|---|
| Working | Animated icon | Claude is actively running tools |
| Needs input | Yellow | Waiting on a question or permission |
| Idle | Dimmed | Ready for next prompt |
| Completed | Green | Task finished |
| Failed | Red | Task ended with error |
| Stopped | Grey | Stopped with Ctrl+X or claude stop |
The peek panel (press Space) lets you read the session’s latest output and type a reply without fully entering it. For sessions waiting on a multiple-choice prompt, press a number key to select an option directly from Agent View.

Figure 6 - Agent View session grid: Three sessions running in parallel. The yellow dot marks a session waiting on a permission decision; the animated icon shows active tool use; the green row has completed. A single Space opens the peek panel so you can reply without attaching.
Automatic worktree isolation
Every background session started via Agent View, /bg, or claude --bg automatically moves into an isolated git worktree (under .claude/worktrees/) before editing files. Parallel sessions can work on the same codebase without file-edit conflicts because each writes to its own branch.
This changes the parallel-variation workflow considerably. The previous approach required explicitly instructing each session to create its own worktree. Now the isolation happens automatically, and the sessions can run against separate dev servers on separate ports.
The /goal command
/goal sets a completion condition evaluated after every turn [3]:
/goal all tests in test/auth pass and the lint step is cleanAfter each turn, a fast model (Haiku by default) checks whether the condition holds. If not, Claude starts another turn automatically. The key framing from the vendor docs: this is a stop-hook-backed evaluator, not a free-running loop. Claude pursues the stated condition; the evaluator checks it after each turn; the session stops when the condition is met.
# Check status: condition, turns elapsed, tokens spent, last evaluator reason/goal
# Clear an active goal early/goal clearGoals persist on --resume or --continue if still active when the session ended. For non-interactive pipelines:
claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"What Agent View does not solve
Agent View removes the terminal-management friction from running parallel sessions. It does not change the compounding-failure math. Each session still fails at its own per-step rate, and the harness still needs deterministic checkpoints to prevent errors from propagating downstream. What Agent View gives you is visibility: you can see all your sessions’ states without switching terminal windows, and you can unblock a stalled session without losing your place in the one you are working in.
That is a real improvement. It is the difference between managing five sessions from five windows and managing them from one. But the coordination logic, the handoffs, the error recovery, and the phase boundaries are still yours to design.
The Lopopolo persona-reviewer pattern
Ryan Lopopolo’s harness at OpenAI [4][5][6][7] adds a dimension to the reviewer role that is worth pulling into any team architecture. Rather than one generic code reviewer, his team runs one review agent per discipline.
The named personas from his AI Engineer Europe keynote [4]:
- front-end architect: component structure, accessibility, rendering
- reliability engineer: error handling, retries, timeouts
- scalability engineer: data access patterns, bottlenecks, load profile
Each persona triggers on every push to CI. The trigger contract is precise: surface any P2-or-above blocker based on the documented standard for that discipline. Lower-priority feedback is intentionally suppressed to keep the agent output actionable rather than overwhelming.
The persona documentation is what makes the agents effective. Each one is backed by a written standard: what does a good QA plan look like for this team? What does production-grade error handling require? Those documents are the durable artifacts; the agents execute against them. Lopopolo describes the leverage:
“Every engineer driving agents gets the best of every single person on my team. I don’t need to block on low-signal code review in order to learn what it means to write a good QA plan. To have one engineer on my team document that in a durable way means every agent trajectory is going to get a good QA plan” [4].
One human writes the standard once. Every future agent trajectory benefits from it. The persona-reviewer approach is a knowledge-distribution mechanism dressed up as a CI step.

Figure 7 - Three persona reviewers per PR: Each discipline gets its own reviewer agent triggered on every push. The P2-or-above threshold keeps output actionable. The implementation agent on the receiving end can acknowledge, defer, or reject any feedback it gets.
The PR as broadcast domain
Lopopolo treats the pull request as a broadcast hub. All agents and humans collaborate on the same hub; the implementation agent is not required to accept every suggestion. Lopopolo warns that requiring every piece of feedback to be addressed creates a catastrophic failure mode in which the coding agent ends up “bullied by all of the reviewers” [4].
Persona reviewers are advisors with judgment, not gatekeepers with veto power. The implementation agent retains the final call. That asymmetry matters: the reviewers add signal; they do not block the pipeline.
Mapping this pattern to Claude Code
The Lopopolo pattern transfers directly to a .claude/agents/team/ setup:
---name: reliability-reviewerdescription: > Reviews code changes for reliability: error handling, timeouts, retry logic, graceful degradation. Surfaces P2-or-above blockers only.tools: Read, Glob, Grepmodel: sonnetcolor: red---
<role>You are the reliability reviewer. Review the diff passed to you againstthe team reliability standard. Flag any P2-or-above issue: missingtimeouts, swallowed exceptions, no retry on transient network errors,absent graceful degradation on external calls. Do not flag style orconvention issues. Output a structured verdict: APPROVED or CHANGESREQUESTED with a numbered list of blockers.</role>
<constraints>Read-only. Never modify code. Never modify draft files.File your verdict at docs/kb/qa/reviews/<date>-<slug>.md.</constraints>The scope is narrow, the persona is explicit, and the output format is structured. Three of these, one per discipline, running in parallel on every PR or draft submit, is the per-push reviewer architecture Lopopolo describes.
KEY INSIGHT: Persona reviewers multiply team expertise by turning each expert’s written standard into an agent trigger. One document, written once, runs on every future trajectory.
Putting it together: the architecture checklist
A multi-agent pipeline that survives production looks different from one that passes a demo. The checklist is short:
On agent definition:
- One agent per role, one role per scope. Researchers do not architect; architects do not review.
- File-ownership map in the agent definition, not in documentation. The agent reads its constraints before it acts.
- Tool list limited to what the role actually needs. Reviewers do not need
Write.
On the harness:
- Deterministic checkpoints between phases. Phase output is validated before it passes forward.
- Sub-agents can retry in isolation. A failed phase does not fail the chain.
- Human gates at decisions with high recovery cost. The harness codes the gate; the model does not decide when to ask.
On reliability math:
- Budget for your actual pipeline depth. Five agents at 95% is 77.4% end-to-end, not 95% [1].
- Design checkpoints where failing early is cheaper than propagating the error.
- Measure keep rate or its equivalent: the fraction of agent output that survives downstream review unchanged [1]. That number tells you where compounding is happening.
On Agent View:
- Use it for session management, not for coordination logic. It removes friction; it does not replace architecture.
- Use
/goalwith objective conditions and measurable outputs. “All tests in test/auth pass” is a verifiable goal. “Do a good job” is not. - Treat automatic worktree isolation as the default for parallel development branches. Each session writes to its own branch; conflicts do not accumulate.

Figure 8 - Production pipeline checklist: The three structural requirements for a multi-agent pipeline that holds up under real load. Agent definitions scope the role; the harness enforces the boundaries; reliability budgeting accounts for compounding before it surprises you.
Conclusion
Multi-agent pipelines are the right architecture for tasks too large for one context window, too complex for one role, or too consequential to pass through a single model. The multi-agent promise is real.
The compounding-failure math is equally real. Five agents at 95% reliability each, chained together, produce 77.4% end-to-end reliability. The harness architecture is what closes that gap: deterministic checkpoints, scoped sub-agents, human gates at phase transitions, error budgets that let failed steps retry without cascading. We cover what goes into that harness layer in a companion article on its nine core components [8].
Agent View ships the terminal-management part of the problem. Viewing, unblocking, and dispatching parallel sessions from a single UI removes meaningful friction from day-to-day parallel-agent work. Worktree isolation eliminates a class of file-conflict bugs. The /goal evaluator turns a session into a self-directed worker for well-defined conditions. None of that changes the compounding math; all of it makes managing the pipelines you build considerably less painful.
The Lopopolo persona-reviewer pattern extends the reviewer role from a generic validator into a discipline-specific expert trigger. One written standard, backed by one agent, surfacing P2-or-above blockers on every push. That is the architecture that scales: not more review time from humans, but the expertise of your best reviewers running automatically on every trajectory.
The six-agent Dotzlaw team is a working implementation of these ideas at the scale of a two-person publication operation. Researcher, writer, reviewer, migrator, developer, orchestrator. File-ownership enforces coordination. The pipeline handoff produces artifacts that can be audited at every phase. That structure is what makes running the pipeline a routine rather than an adventure.
References
[1] S. Heule and J. Katz, “Continually improving our agent harness,” Cursor Engineering Blog, April 30, 2026. https://cursor.com/blog/continually-improving-agent-harness
[2] Anthropic, “Manage multiple agents with agent view,” Claude Code Documentation, verified June 2026. https://code.claude.com/docs/en/agent-view
[3] Anthropic, “Keep Claude working toward a goal,” Claude Code Documentation, verified June 2026. https://code.claude.com/docs/en/goal
[4] R. Lopopolo, “Harness Engineering: How to Build Software When Humans Steer, Agents Execute” (keynote), AI Engineer Europe 2026, April 8-10, 2026, London. YouTube: https://www.youtube.com/watch?v=am_oeAoUhew
[5] R. Lopopolo, “Extreme Harness Engineering: 1M LOC, 1B toks/day, 0% human code or review” (talk). YouTube: https://www.youtube.com/watch?v=CeOXx-XTYek
[6] R. Lopopolo, “Harness engineering: leveraging Codex in an agent-first world,” OpenAI, February 11, 2026. https://openai.com/index/harness-engineering/
[7] Swyx and Alessio, “Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review, featuring Ryan Lopopolo, OpenAI Frontier and Symphony,” Latent Space: The AI Engineer Podcast, April 7, 2026. https://www.latent.space/p/harness-eng
[8] K. Dotzlaw, R. Dotzlaw, and G. Dotzlaw, “What Is an Agent Harness, Really? Nine Components Most Builders Miss,” 2026. /insights/claude-code-01-agent-harness/
Building production AI, or modernizing a legacy system?
That is the kind of work we do at Dotzlaw Consulting. Book a free 20-minute intro call and tell us what you are trying to build, or what is slowing you down.