Self-Improving AI: How Code Reviews Feed a Knowledge Flywheel

Six months ago, a developer on our team discovered that deleting a parent record silently cascades through three child tables if you do not check for active references first. It burned two hours of data recovery. A month later, a different developer hit the same cascade. Same codebase, same gotcha, same hours lost. The knowledge existed after the first incident. It just had nowhere to live.

Figure 1 - A circular flywheel diagram showing four stages: Code Review produces Knowledge Harvest, Harvest feeds Skill Update, Updated Skills improve Agent Quality, Better Agents produce better Code Reviews

Figure 1 - The Knowledge Flywheel: Four stages, each one feeding the next. Code reviews surface knowledge. Knowledge gets captured in skills. Skills make every agent smarter. Better agents write better code. The loop is self-reinforcing, and each rotation compounds the last.

There is a universal failure pattern in software documentation: it starts out accurate, and then the code moves on without it. Wikis age. READMEs drift. Architecture decision records accumulate dust. The people who wrote the documentation move on, and the people who inherit the system learn through rediscovery. Expensive, repetitive, invisible rediscovery.

AI coding tools inherit this problem. You invest time writing a context document, or a skill file, or an agent instruction set. It is accurate the day you write it. Then your team ships twenty features, fixes forty bugs, and discovers a dozen gotchas that are not in the skill. The skill stays the same. The gap between what the skill knows and what the codebase needs grows in silence.

We built a system that fights this by wiring the output of code reviews directly into the input of skill updates. Every review is also a knowledge harvest. Every harvest feeds a skill. Every skill update makes every agent that uses it marginally smarter. The flywheel does not require a separate knowledge management effort. It is built into the work that already happens.

The Knowledge Decay Problem#

Static documentation decays because updating it is friction. A developer finds a gotcha, fixes the bug, and moves on. Writing up the discovery in a wiki page requires switching contexts, finding the right page, editing, saving, cross-referencing. Most of the time, the mental note gets made and the wiki does not.

AI skill files are worse. They are code, which means editing them feels like a maintenance task that competes with feature work. The instinct is to write a good skill once, deploy it, and treat it as complete. But a skill written once is already starting to decay the day after you ship it.

The cost of this decay is invisible until it compounds. A skill that does not warn about the silent cascade delete will let an agent generate unguarded delete operations indefinitely. A skill that does not document the ORM save method’s silent failure pattern will let agents skip return value checks in every function they touch. The mistakes do not cluster. They scatter across the codebase in ways that are hard to trace back to a missing three lines in a skill file.

What we needed was a mechanism that turns the act of doing the work into the act of updating the knowledge, with no extra step.

Figure 2 - Split panel comparison. Left panel labeled "Static Documentation" shows a flat line that drops sharply. Right panel labeled "Self-Improving Skills" shows a stepped line that rises with each review

Figure 2 - Decay vs Growth: Static documentation decays because updating it is friction. Self-improving skills grow because the update mechanism is built into the code review process. The difference compounds over time.

KEY INSIGHT: The failure mode of a static skill file is not that it becomes wrong. It is that it stays correct for its initial scope while the codebase evolves around it. Correctness and completeness diverge silently, and agents make decisions with confidently outdated context.

Knowledge Harvest During Code Reviews#

The reviewer agent has a standard output format. Near the end of every review report, after the critical findings and security analysis, there is a section called Knowledge Harvest.

The harvest section has three possible states. If the review surfaces something new (a gotcha that caught the developer, a business rule that was clarified, an anti-pattern that appeared in the submitted code) the reviewer lists it under “Existing Skills to Update” or “New Skill Candidates” with specific details about what to add and where. If the review was routine and everything matched what existing skills already documented, the reviewer writes “No New Knowledge” instead.

This is not an optional section. Every review has it. The reviewer is prompted to think explicitly about whether anything in this review represents knowledge that should survive the session.

When there is something to capture, the reviewer also saves a harvest note to copilot/knowledge-harvest/. The note records the target skill, the type of knowledge (gotcha, business rule, anti-pattern, function reference), the specific detail to capture, and the source including JIRA ticket number, review date, and file reviewed.

The harvest note is the handoff artifact. It sits in the directory until the skill-builder agent processes it.

Figure 3 - A code review report with the Knowledge Harvest section highlighted, showing entries for existing skills to update and new skill candidates, with an arrow leading to a harvest note file

Figure 3 - Harvest in Action: The Knowledge Harvest section appears at the bottom of every code review. When new knowledge is found, the reviewer writes a harvest note to copilot/knowledge-harvest/. That note is the trigger for the skill update process.

The prompting matters here. The reviewer is not asked to “update documentation” because that triggers the instinct to defer. It is asked to answer three specific questions: Did you find a gotcha that a developer fell into? Did you discover a business rule that no existing skill documents? Did the review surface a utility function that agents should know about? Specific questions produce specific outputs. The harvest notes that come out are concrete enough to apply without interpretation.

The Skill Update Process#

Once a harvest note exists in copilot/knowledge-harvest/, the skill-builder agent handles the integration. The process has a deliberate constraint: surgical additions only. The skill-builder does not rewrite skills. It does not reorganize them. It does not improve prose or consolidate sections. It finds the correct tag in the skill’s structure, adds the new item following the existing format and numbering, and leaves everything else untouched.

This constraint is intentional. Rewriting a skill is a high-stakes operation that can silently remove working knowledge. Adding a numbered item to a gotchas section is low-stakes and auditable. The skill-builder is optimized for the low-stakes operation and refuses to do the high-stakes one.

After applying the addition, the skill-builder hands off to the skill-auditor for verification. The auditor checks structural correctness (valid tags, skill under 500 lines, proper formatting conventions) and verifies that the skill is registered in the instructions file that controls automatic loading for Copilot.

After audit, the harvest note is renamed with a _done- prefix. This is the process’s completion marker. Any file in copilot/knowledge-harvest/ without that prefix is unprocessed knowledge waiting to be integrated.

Figure 4 - A three-stage flow showing skill-builder reading a harvest note and adding an item, skill-auditor verifying structure, and the harvest note being renamed with a done prefix

Figure 4 - The Skill Update Process: Three stages, each with a clear responsibility. Skill-builder reads and applies. Skill-auditor verifies. Done marker closes the loop. The process is auditable at every step.

KEY INSIGHT: The separation between skill-builder and skill-auditor is not overhead. It is the mechanism that prevents skill corruption. An agent that both writes and verifies its own work has no error correction. The handoff creates a genuine second opinion.

New Skill Candidates#

Not every knowledge harvest item belongs in an existing skill. Some code reviews uncover areas that have no coverage at all.

The reviewer agent flags these explicitly under “New Skill Candidates” in the harvest section. The flag format specifies the proposed skill name, why the area needs coverage, which entry points were touched, and which patterns were discovered.

A new skill candidate does not automatically become a skill. The flag is a recommendation. Acting on it requires the skill-builder to query the Neo4j graph for deep domain context, trace call trees and module boundaries for the uncovered area, and package the knowledge into a structured skill file. The skill-auditor then validates the result against the same strict standards applied to every other skill.

The failure story we started this article with, the cascade delete discovered twice, would have been caught by this mechanism. The first time the reviewer found it, the flag would have been raised. The skill would have been updated. The second developer would have had an agent that warned about it before writing the code.

Figure 5 - A review report excerpt showing the New Skill Candidates subsection with one entry, and an arrow leading to the skill-builder creating a new skill with Neo4j graph context

Figure 5 - New Skill Candidate Flag: When a code review uncovers a domain with no skill coverage, the reviewer raises a flag with enough context for the skill-builder to create the skill, using the Neo4j graph for deep domain context when needed.

Slash Commands: The Operator Interface#

The self-improving loop runs inside a broader system of slash commands that make the methodology repeatable. These are prompt files in .github/prompts/ that invoke specific agents or workflows with a single command.

The full catalog of nine commands:

Command	Purpose
`/workflow`	Pipeline selector that guides the developer through research, plan, implement, review, and document for any task
`/session-end`	End-of-session handoff that saves a structured summary of what changed, current state, and next steps to `copilot/docs/`
`/capture-knowledge`	Manual harvest that reviews the conversation history, identifies new knowledge, and writes harvest notes without a code review triggering it
`/check-todos`	Reads the current `TO-DOS.md` tracking file and reports open items
`/add-to-todos`	Appends new items to `TO-DOS.md` in the standard format
`/code-review`	Invokes the reviewer agent with the full checklist covering security, correctness, data isolation, platform-specific risks, and knowledge harvest
`/jira-comment`	Generates a structured JIRA comment from session context covering root cause, resolution, testing, and release details
`/security-review`	Focused security pass covering SQL injection, data isolation, hardcoded secrets, and OWASP LLM risks
`/whats-next`	Analyzes project state and writes a prioritized handoff document for the next session

The slash commands are the interface between the methodology and the daily workflow. A developer who never reads the agent definitions can still run the full workflow by knowing nine commands. The methodology becomes a set of habits rather than a set of documentation pages to remember.

The /capture-knowledge command is particularly important for the self-improving loop. It runs the harvest logic on demand, at the end of any session, not just code review sessions. An implementation session that discovered three new business rules should trigger a harvest. So should a debugging session that traced a data isolation issue. The manual harvest command means the loop does not depend exclusively on formal code reviews to run.

Figure 6 - A 3x3 grid of slash command cards on a dark background, each showing the command name and a one-line purpose description

Figure 6 - The Slash Command Catalog: Nine commands cover the full development workflow. Each invokes a specific agent or process. The methodology becomes accessible without requiring developers to read agent definition files.

Hooks as Guardrails#

Two hooks run automatically to keep the system safe and oriented.

The sessionStart hook fires at the beginning of every Copilot session. It delivers a safety reminder about never modifying source files while the IDE has the solution loaded, because the platform holds write locks and any external edit will be silently lost. The hook also presents a workflow prompt that asks what the developer is working on and recommends the appropriate starting point. A developer who answers “implementing a new report” gets steered toward /workflow and the full development sequence.

The preToolUse hook fires before any file is edited. When the target file is a platform source file in the forms, globals, or scopes directories, the hook intercepts and warns: is the IDE running? If the solution is loaded, editing the file will create a conflict. The warning requires an explicit confirmation before proceeding.

These hooks are not workflow enforcement. They do not block actions. They are awareness mechanisms. The target state is that developers never lose work because they forgot the IDE constraint, and never start a feature without knowing which workflow step fits the task.

Figure 7 - Two hook diagrams side by side showing sessionStart delivering a safety reminder and workflow prompt, and preToolUse warning before editing a platform source file

Figure 7 - Hooks as Guardrails: SessionStart orients developers to the workflow. PreToolUse catches the most expensive mistake, which is editing a file while the platform holds a write lock. Both hooks inform rather than block, preserving developer autonomy.

The 18 Skills and the Coverage Map#

The system currently has eighteen domain skills. Each one covers a specific area of the codebase and loads automatically when an agent’s context includes relevant files or function names.

Skill Category	What It Covers
Database access patterns	The application’s data access layer: safe query construction, parameterization, return value handling
UI component patterns	Grid configuration, form behavior, component lifecycle management
Visualization and charting	Chart rendering pipeline, data binding, display configuration
Internationalization	Multi-language support, translation resolution, display text handling
Automation scripts	Build and deployment scripting conventions, encoding requirements, execution context
Domain-specific logic	Complex business rule domains covering project scheduling, cost tracking, and workflow orchestration
Query construction	Programmatic query building patterns, joins, parameterization
Calculation pipelines	Multi-step calculation flows covering cost rollups, pricing, and rate handling
Workflow orchestration	Multi-stage business processes covering state transitions, validation gates, and locked data handling

Each skill uses a progressive disclosure structure: the objective explains what the domain does and where it starts. The quick start gives immediately actionable steps for the most common operations. The context section provides the deeper knowledge including key functions, data model, gotchas, and business rules. Reference guides link to supporting files for detailed function tables and table schemas.

An agent that loads the database access skill before writing a data access function has the full parameterization pattern, the return value check requirement for save operations, the data isolation scoping convention, and the list of legacy function names to refactor away from. It does not need to discover any of that. It starts with it.

The skills are not static descriptions of the codebase as it was on the day the skill was written. They are living documents that grow through the knowledge flywheel. Every review that catches a missed pattern in a domain is a potential addition. Every bug trace that reveals an undocumented gotcha is a potential addition. The eighteen skills represent eighteen domains where the flywheel has been running long enough to accumulate real knowledge density.

Figure 8 - A skill coverage grid where each cell shows a skill category and a bar indicating knowledge density, with the most mature skills showing the tallest bars

Figure 8 - Skill Coverage: Eighteen skills across the application’s critical domains. Bar heights represent knowledge density: gotchas, business rules, and anti-patterns accumulated through the flywheel. The most-reviewed domains have the richest skills. Skills with recent additions are marked.

The Compounding Effect#

The value of this system is not any single code review or skill update. It is the accumulation over time.

After the first review in a domain catches a gotcha and adds it to the skill, every subsequent agent working in that domain starts with that warning. The gotcha does not get rediscovered. The second review in the domain might catch a different gotcha, or confirm that the first one is now handled correctly, or surface a business rule that was invisible until the second time someone needed it. Each rotation of the flywheel adds density to the skills and reliability to the agents.

The compounding is asymmetric. A skill with no gotchas documented will let agents make the same mistakes repeatedly. A skill with twelve gotchas documented, accumulated over six months of reviews, essentially inoculates agents against the twelve most common failure modes in that domain. The twelfth review is more valuable than the first, because the agent it produces has twelve more warnings than the agent that triggered the first review.

This is the structural advantage over static documentation. Static documentation has a one-time improvement curve that decays toward irrelevance. The flywheel has a compounding improvement curve that accelerates as reviews accumulate. The system does not fight documentation decay. It replaces the decay mechanism with a growth mechanism.

KEY INSIGHT: Every domain skill is a bet that the codebase’s most important failure modes are worth documenting. The flywheel converts that bet into an investment: each rotation pays dividends on every future agent interaction in that domain. The earlier you start the flywheel in a domain, the larger the compounding base.

The practical evidence in this codebase: agents working in the project scheduling domain, one of the earliest and most mature skills, generate code that handles the schedule lifecycle correctly on the first pass because the skill documents the initialization sequence, the persistence pattern, and the three most common mistakes in that domain. Agents working in domains without skill coverage still require multiple correction passes for the same classes of mistakes.

KEY INSIGHT: The knowledge flywheel does not require a dedicated knowledge management effort. It runs inside the work that already happens. Code reviews happen. Harvest notes are a section of the review output. Skill updates take minutes when the change is surgical. The total overhead per review is under ten minutes. The return is a permanent improvement to every future agent interaction in that domain.

Figure 9 - A graph showing two lines over time. The "No Skill" line stays flat. The "With Flywheel" line steps upward with each review, showing compounding improvement

Figure 9 - Compounding Returns: Without a skill, agents make the same mistakes at a consistent rate. With the flywheel running, each review adds density to the skill. Agent accuracy in the domain compounds with the skill’s knowledge base.

The Failure Story#

Before we built this system, we had domain knowledge capture in a form we thought was good enough: a set of Markdown documents that described each major area of the codebase. We wrote them carefully. We reviewed them. We gave them to agents as context.

They were accurate for about three weeks.

The first edge case that invalidated them came from a bug fix session where we discovered that the billing module’s rate calculation pipeline has a locked pricing path that activates only during estimate-to-invoice transitions, not during active project edits. The Markdown document described the rate calculations without that distinction. It was not wrong, it was just incomplete. An agent reading it would not know to check the transition flag before selecting which rate to use.

We updated the document. Then, two months later, a code review surfaced another constraint in the same area: when a rate calculation crosses a fiscal period boundary, the locked pricing logic has an exception for specific billing types. The document was incomplete again. We updated it again.

The pattern repeated. Each update was isolated, done when the pain was fresh, and forgotten when the next task demanded attention. The document was always catching up to the codebase, never ahead of it.

What we actually needed was a mechanism that made updates a side effect of reviews, not a separate task. The harvest section in the code review output is that mechanism. We do not remember to update the skill. The review process generates the harvest note as a natural output. The skill-builder applies it. The skill-auditor verifies it. The flywheel turns.

What Comes Next#

This article has covered the knowledge layer, the system that makes agents smarter over time. The flywheel is the compounding mechanism that separates a methodology you invest in once from a methodology that grows with every use.

The final article in this series zooms out. After building seven agents, a Neo4j code graph with 10,000+ indexed functions, and eighteen domain skills that self-improve through code reviews, what did we actually learn? What would we do differently from the start? And what are the transferable principles for teams working in large codebases?

Article 5 covers those questions. Come back for it.

The Series#

This is Part 4 of a 5-part series on building an AI development methodology with GitHub Copilot:

Beyond Code Completion. The enterprise AI gap and why agent mode changes everything
The Development Workflow. How seven agents turn a ticket into reviewed code
Neo4j Code Graph. How a code graph database makes AI agents understand your codebase
The Knowledge Flywheel (this article). How code reviews feed a self-improving knowledge loop
Enterprise AI Lessons. What building an AI methodology taught us about enterprise software