GitHub Copilot Agent Pipelines
Seven specialized Copilot agents that form a structured development workflow for a legacy Servoy enterprise application with 10,000+ functions, 1,000+ files, and 22 modules. Neo4j graph-powered code intelligence, cross-model orchestration, and a self-improving knowledge loop where every code review makes every agent smarter.
The Problem
Enterprise codebases accumulate knowledge that lives only in the heads of long-tenured developers. Our production application was a Servoy enterprise system with 10,000+ functions across 1,000+ files in 22 modules, backed by PostgreSQL. Over a decade of accumulated business logic runs deep: module interdependencies, data isolation patterns, validation rules, and API behaviors that are not in any public documentation.
Generic AI coding assistants fail in this environment. They generate syntactically valid code that breaks the platform’s foundational concepts, violates data isolation boundaries, or misses module-specific business rules that are not discoverable from file contents alone. The structural relationships between functions are invisible to tools that treat code as text. Who calls what, which modules are central to business logic, where complexity concentrates: none of this is available through keyword search.
Two problems compound each other: the codebase is too large for any model’s context window, and the business knowledge required to generate correct code is undocumented. A developer asking “add a new report to a business module” needs the AI to know which functions the module exposes, which data isolation patterns apply, which validation functions to call, and what the historical gotchas are for this specific domain. None of that is in the file being edited.
We built a system where AI agents are not generic assistants. They are specialists, each with a defined role, equipped with exactly the context they need, operating in a structured workflow where the output of one agent becomes the input of the next.
What We Built
Seven specialized GitHub Copilot agents organized around a core development workflow, backed by a Neo4j graph that indexes all 10,000+ functions and their relationships. Five agents form the main pipeline: researcher, architect, developer, reviewer, and documenter. Two supporting agents handle knowledge management: a skill-builder that captures domain knowledge into structured skill files, and a skill-auditor that enforces quality standards and proper registration for Copilot.

Figure 1 - The Seven-Agent System: Five agents form the core development workflow: researcher through architect through developer through reviewer through documenter. Two supporting agents manage knowledge: skill-builder creates and updates domain skills, skill-auditor validates them against strict guidelines. Each agent has an assigned model calibrated to its cognitive load.
| Before | After |
|---|---|
| Generic Copilot completions with no codebase context | 7 specialized agents with role-specific context |
| No structural understanding of 10,000+ functions | Neo4j graph indexes all functions and relationships |
| Business logic undocumented, lives in developer heads | 18 domain skills with self-improving knowledge base |
| Single model for all tasks regardless of complexity | 4 models: GPT-4.1, GPT-5.4, Claude Opus 4.6, Claude Sonnet 4.6 |
| New developers have no path to codebase knowledge | Skill-builder extracts and structures institutional knowledge |
Key Results
| Dimension | What Changed |
|---|---|
| Code quality | Reviewer agent catches platform violations, data isolation risks, and legacy compatibility issues that generic linters miss |
| Knowledge capture | Reviewer automatically harvests new business rules during reviews and creates or updates skill files |
| Structural analysis | Neo4j graph queries callers, callees, call trees, and complexity scores for any of 10,000+ functions |
| Self-improvement | Every code review feeds the knowledge loop: reviewer captures, skill-builder structures, skill-auditor validates |
| Session safety | Hooks fire on session start and pre-tool-use, warning about IDE file overwrite behaviors before they cause data loss |
| Cross-model efficiency | Fast tasks use GPT-4.1. Complex architecture uses GPT-5.4 or Claude Opus 4.6. Code generation uses Claude Sonnet 4.6 |
The Development Workflow
The development workflow is the core of the system. A researcher agent maps the relevant codebase territory first, querying the Neo4j graph to identify which functions are in scope, which modules have dependencies, and where complexity concentrates. That structural map passes as a handoff to the architect, who designs the implementation approach with full codebase context. The developer generates code following platform idioms and the architect’s plan. The reviewer validates against data isolation rules, legacy compatibility constraints, and accumulated business gotchas. The documenter produces structured output for the knowledge base.

Figure 2 - The Development Workflow: Researcher queries the Neo4j graph to map structural territory. Architect designs with full context. Developer generates code. Reviewer validates against platform rules and captures new knowledge. Documenter records decisions and outcomes. Each stage produces a structured handoff that the next agent consumes.
Handoff buttons make the workflow mechanical. When a researcher agent completes structural analysis, a button presents the user with a formatted handoff that opens the architect agent with the researcher’s findings already in context. No copy-paste, no context loss, no cognitive overhead of deciding what to carry forward.
Skill Management
The skill-builder and skill-auditor agents manage the knowledge layer that makes the whole system compound over time.
The skill-builder creates and updates domain skill files. It can query the Neo4j graph directly for deep domain context exploration, tracing call trees, identifying module boundaries, and understanding function relationships before packaging knowledge into structured skill files. Eighteen domain skills currently cover the application’s critical modules with progressive disclosure architecture, loading contextually based on the active development area.

Figure 3 - Skill Management: Skill-builder creates and updates domain skills, querying the Neo4j graph for deep context when needed. Skill-auditor validates that every skill follows strict guidelines: YAML frontmatter, XML format, proper registration for Copilot, and content accuracy.
The skill-auditor enforces strict quality standards. Every skill file must have proper YAML frontmatter, use XML format for structured content, and be registered correctly for Copilot to load it. The auditor checks for internal consistency, completeness against the domain it covers, and alignment with what the codebase actually does.
Self-Improving Knowledge Loop
The self-improving knowledge loop closes the gap between discovering new business rules and having them available for future work. During every code review, the reviewer agent does two things: it validates the code under review, and it identifies any business rule, gotcha, or pattern that is not yet captured in the existing skills. When it finds something new, it either creates a new skill or updates an existing one.

Figure 4 - The Self-Improving Knowledge Loop: Reviewer harvests new gotchas during code reviews. New findings feed into skill creation or updates. Skill-auditor verifies that updated skills meet quality standards and do not contain contradictions. Each review cycle makes every agent in the system slightly smarter.
This loop runs without additional developer effort. The act of reviewing code automatically improves the knowledge available for the next development task.
Neo4j Code Graph
The Neo4j graph is the structural foundation of the system. It indexes all 10,000+ functions across 1,000+ files into a graph database, with edges representing caller-callee relationships. Any agent can query which functions call a given function, which functions that function calls, what the full call tree looks like, and which functions have complexity scores above a threshold.

Figure 5 - Neo4j Graph Architecture: 10,000+ functions indexed with caller-callee relationships. Query types include direct callers, direct callees, full call trees at configurable depth, complexity analysis, and module-level structural summaries. The researcher agent uses this to map scope before any code is written. The skill-builder uses it for deep domain context exploration.
This transforms how agents approach unfamiliar territory. Instead of reading every file to understand scope, a researcher can query the graph to find the 20 most-called functions in a module, identify functions with cyclomatic complexity above 15, or trace the call tree from an entry point four levels deep. Structural questions that would require reading hundreds of files become single graph queries.
Cross-Model Orchestration
Different cognitive tasks have different cost-performance profiles. The system assigns models to agents based on what each agent actually does.

Figure 6 - Cross-Model Orchestration: GPT-4.1 handles fast research and documentation where speed matters more than depth. GPT-5.4 handles architectural design and complex multi-constraint reasoning. Claude Opus 4.6 handles architecture, where deep reasoning across dozens of constraints is critical. Claude Sonnet 4.6 handles code generation, code review, skill building, and skill auditing, covering the highest-volume tasks where consistent quality and instruction-following matter most.
GPT-4.1 handles fast research tasks: initial exploration, rapid context gathering, and first-pass analysis. The researcher and documenter agents run on GPT-4.1 because speed matters more than depth at those stages. GPT-5.4 handles complex architectural reasoning where constraint satisfaction across dozens of requirements demands the strongest available model. Claude Opus 4.6 handles architecture, providing deep reasoning for design decisions that span multiple modules. Claude Sonnet 4.6 generates and reviews code, builds skills, and audits them, covering the highest-volume tasks in the workflow.
The Target
The target was a Servoy enterprise application with 22 modules covering project management, billing, time tracking, resource scheduling, and reporting, backed by PostgreSQL. The application had 10,000+ functions across 1,000+ files and over a decade of accumulated business logic. It has since been replaced by a commercial product, but the patterns and architecture we built to work with it are fully transferable.
The architectural constraints that make Servoy challenging for AI include server-side JavaScript with no modern syntax, Servoy’s proprietary API for database access and UI construction, data isolation enforced by application-layer patterns rather than database constraints, and module interdependencies that are structural (in the call graph) rather than declared (in import statements). A code generator that does not know these constraints produces code that looks correct and fails at runtime.
Technologies
| Layer | Technology |
|---|---|
| Agent Runtime | GitHub Copilot (VS Code) with custom agent definitions |
| Code Intelligence | Neo4j graph database indexing 10,000+ functions |
| Models | GPT-4.1, GPT-5.4, Claude Opus 4.6, Claude Sonnet 4.6 |
| Target Codebase | Servoy (JavaScript) + PostgreSQL |
| Skills | Progressive disclosure skill files, 18 domain areas |
| Hooks | Session-start and pre-tool-use VS Code hooks |
| Slash Commands | 9 prompts: /workflow, /session-end, /capture-knowledge, /check-todos, /add-to-todos, /code-review, /jira-comment, /security-review, /whats-next |
| CI/CD | GitHub Actions, 7 workflow files |
The Article Series
This project is documented in a 5-part series covering the agent architecture, Neo4j graph, knowledge system, and lessons for enterprise AI adoption:
Part 1: Beyond Code Completion Why generic AI assistants fail in enterprise codebases, and what it takes to make them useful. The architectural decisions behind 7 specialized agents versus one general assistant.
Part 2: The Development Workflow How the workflow operates in practice. Handoff mechanics, agent role definitions, model assignment rationale, and what structured AI development looks like compared to ad-hoc prompting.
Part 3: Neo4j Code Graph How the Neo4j graph works, what kinds of structural questions it answers, and why graph-based code intelligence changes what AI agents can do with a large codebase.
Part 4: The Knowledge Flywheel How domain skills are built, structured, and loaded contextually. The self-improving loop that turns every code review into a knowledge update, the skill-builder’s role in capturing domain context, and the skill-auditor’s enforcement of quality standards.
Part 5: Enterprise AI Lessons What worked, what failed, and what transferable patterns emerged. Cross-model orchestration tradeoffs, the workflow handoff pattern, knowledge capture as a discipline, and the organizational conditions that make this investment pay off.
The throughline: 10,000+ functions is a knowledge organization problem, not a context window problem. The agents are specialists because the domain demands specialists. The Neo4j graph exists because structure cannot be recovered from text search. The self-improving skills exist because institutional knowledge, once extracted, should never have to be extracted twice.