Closing the Loop: How Adversarial Testing Improved the Framework That Built It
We built a framework that generates Claude Code infrastructure. We used it to build a project. We attacked that project with adversarial AI agents — 27 attacks across 2 rounds. Then we took the 10 hardest lessons and backported them into the framework itself. 7 new files, 13 modifications, 4 commits. Every future project now inherits defenses we only learned by breaking things.

Figure 1 - The Improvement Spiral: The Bootstrap Framework generates infrastructure for a target project. Adversarial testing attacks that project and finds gaps. 10 lessons extracted from the findings flow back into the framework. The next project inherits all 10 defenses automatically. This is one turn of a continuous spiral.
Two article series ran in parallel over the past month. The Bootstrap Framework series documented building a meta-framework — an agent swarm that generates Claude Code infrastructure for any project. The Adversarial Testing series documented attacking one of those generated projects with AI red team agents — 27 attacks, severity-weighted scoring, OWASP classification.
Both series ended with forward-looking conclusions. The framework series said “the next migration will be faster.” The adversarial series said “the results feed back into the framework.” Neither showed the feedback actually happening.
This article documents the closure. 10 lessons from adversarial testing became 7 new files and 13 modifications across 4 phases. The backport took 2 hours. Every future project generated by the framework now inherits defenses that did not exist 48 hours ago.
The thesis is simple: security is not a feature you add. It is a loop you run.
The Three-Step Evolution
The framework’s security story unfolded in three distinct phases, each building on the last.

Figure 2 - Three Phases of Security Evolution: Phase 1 added theoretical security infrastructure (17 hook templates, OWASP coverage). Phase 2 tested it empirically (27 attacks, ASR 65% to 47%). Phase 3 fed lessons back into the framework (7 new files, 13 modifications). Each phase produced measurable artifacts.
Phase 1: Security Hardening (Theory). Round 3 of framework development added security infrastructure from first principles. We mapped the OWASP Top 10 for Agentic Applications, built 17 hook templates, created 2 JSON schemas for inter-agent validation, designed a 3-tier trajectory monitoring system, and wrote per-archetype security patterns for all 7 project types. The framework went from 11 hook templates to 17, the pipeline from 11 steps to 12. This was documented in Part 3 of the Bootstrap Framework series.
Phase 2: Adversarial Exercises (Testing). We pointed AI red team agents at obsidian-youtube-agent — a real project built using the framework. Round 1: 10 attacks, 65% attack success rate (ASR), severity CRITICAL. Round 2: 17 attacks, 47% ASR, severity HIGH. The two-wave methodology in Round 2 revealed the core insight: regression ASR dropped to 20% (patches work), but escalation ASR hit 85.7% (architecture doesn’t). This was documented across four articles.
Phase 3: Backport (Improvement). 10 lessons from adversarial testing became concrete framework changes. 7 new files created. 13 existing files modified. 4 commits across 4 phases. Completed in a single session.
The three phases form a cycle. Theory proposes defenses. Testing validates them. Results improve the theory. The cycle repeats.
KEY INSIGHT: Security infrastructure built from theory alone has blind spots. Adversarial testing reveals them empirically. The value is not in either step alone — it is in completing the loop so the framework learns from its own failures.
10 Lessons, 10 Improvements
Each lesson traces from a specific adversarial finding to a specific framework change. No lesson is theoretical. Every one has a source attack, a confirmed gap, and a committed fix.

Figure 3 - The Pattern Gap: Before centralization (left), the HTTP middleware had 8 injection detection patterns while the chat sanitizer had 19. The 11-pattern gap became the primary attack surface — the attacker enumerated which patterns existed in one layer but not the other. After centralization (right), both layers import from security_patterns.py with 30 compiled regex patterns. The gap is structurally impossible.
| ID | Lesson | Source Finding | Framework Change |
|---|---|---|---|
| T1-1 | Normalize input before regex matching | ATK-N03: Unicode zero-width bypass | Enhanced pretooluse_input_sanitization.py with NFKC normalization |
| T1-2 | Centralize security patterns | ATK-N05: 11-pattern gap between layers | New security_patterns.py module (v2.0.0, 30 compiled regex) |
| T1-3 | HTTP middleware needed — hooks can’t inspect payloads | ATK-006: Hook evasion via HTTP | New templates/http-middleware/ directory |
| T1-4 | Scoring needs partial weighting | Scoring calibration v1.0 to v1.1 | New templates/scoring/ with weight redistribution |
| T1-5 | Pre-flight checks prevent wasted time | Round 1 model 404 errors | New session_preflight_check.py in bootstrap pipeline |
| T2-1 | Two-wave testing methodology | Regression 20% vs Escalation 85.7% | New skill reference doc for security review |
| T2-2 | Inconsistent validation is worse than none | ATK-002: Path traversal on unvalidated endpoint | Updated validation checklist with boundary consistency checks |
| T2-3 | Agents need turn budgets | Monitor agent lost work at turn 40 | New skill reference doc for agent team design |
| T2-4 | OWASP Agentic Top 10 is the right taxonomy | 9 of 10 categories tested across 2 rounds | Updated OWASP patterns in security review skill |
| T2-5 | Trust model mismatch IS the attack surface | Target built for localhost, tested adversarially | Updated schema and threat model template |
Three of these deserve a closer look.
Deep Dive: The 11-Pattern Gap (T1-2)
The most instructive finding from the adversarial exercises was not a sophisticated attack. It was arithmetic.
The obsidian-youtube-agent had two layers of injection defense: HTTP middleware (8 regex patterns) and a chat sanitizer (19 regex patterns). The red team agent enumerated both sets, identified the 11 patterns present in the sanitizer but absent from the middleware, and used those specific patterns to bypass the outer defense layer. The middleware waved the attack through. The sanitizer — deeper in the call stack — would have caught it, but the attacker had already achieved code execution.
The gap existed because each layer was developed independently. The middleware was written first, with a reasonable set of patterns. The sanitizer was written later, with a more comprehensive set. Nobody compared the two lists.
The framework fix is structural. security_patterns.py is now the single source of truth for all detection patterns:
PATTERNS_VERSION = "2.0.0"
# All consuming hooks reference this version# When updating: edit HERE first, increment version,# copy to each consuming hook templateThe module defines 30 compiled regex patterns organized by category (injection, path traversal, command execution). Five consuming hooks reference this file. The gap between layers is now structurally impossible — not because developers will remember to synchronize, but because there is only one list to synchronize from.
KEY INSIGHT: Defense-in-depth fails when the layers have different pattern coverage. An attacker who can enumerate both layers will always find the gap. Centralize detection patterns into a single versioned module and import from it everywhere.
Deep Dive: Trust Model Mismatch (T2-5)
The obsidian-youtube-agent was built for personal use. Authentication was disabled. Rate limiting was absent. Input validation was minimal. These are rational design choices for a localhost tool used by one person.
The adversarial exercises attacked it as if it were internet-facing. The 65% ASR in Round 1 did not mean the application was poorly built. It meant the application was built for a trust model that adversarial testing deliberately violated.

Figure 4 - Trust Model Assessment: The framework now asks these questions during project analysis (Step 1). The answers determine which security patterns are generated. A localhost tool gets different defenses than an internet-facing API. The mismatch between intended trust model and actual deployment context is where vulnerabilities live.
This insight changed the framework at the schema level. The project_analysis_schema.json now includes a security_profile section with trust model fields:
- Deployment context: localhost, internal network, internet-facing
- User trust level: self-only, trusted team, untrusted public
- Data sensitivity: personal, business, regulated
- Agent autonomy level: human-in-the-loop, human-on-the-loop, fully autonomous
When the Project Analyst runs during Step 1 of the pipeline, it now captures these dimensions. The Hooks Engineer and Agent Designer use them to calibrate security infrastructure. A localhost personal tool gets lightweight monitoring. An internet-facing API with untrusted users gets authentication middleware, rate limiting, and full input normalization.
The lesson is transferable beyond this framework. Every security assessment should start by asking: what trust model was this system built for, and what trust model does its actual deployment context require? The gap between those two answers is the attack surface.
Deep Dive: Two-Wave Methodology (T2-1)
Before Round 2 of adversarial testing, we considered running 17 new attacks against the patched application. The combined ASR would have produced a single number — 47% — that hides more than it reveals.
The two-wave split made the results actionable. The regression wave (10 attacks replaying Round 1 vectors against the patched codebase) produced a 20% ASR. The escalation wave (7 new attack vectors) produced an 85.7% ASR. Those two numbers tell a clear story: patches work against known threats, but the architecture is vulnerable to novel ones.
This methodology is now a reusable skill reference in the framework’s security review skill. Any project that runs /security-review has access to the two-wave testing guide:
- Regression wave: Replay all previous findings against current defenses. Target ASR below 25%.
- Escalation wave: Deploy new attack vectors not seen in previous rounds. Measure architectural resilience.
- Diagnostic matrix: Low regression + low escalation = ready for production. Low regression + high escalation = shift from patching to architecture. High regression + any escalation = patches are failing, go back and fix them.
The split transforms security testing from “how many vulnerabilities exist” to “which defense strategy is working and which is not.” That distinction drives resource allocation decisions that a single ASR number cannot.
What Changed Structurally
The backport was not a refactoring exercise. It added measurable capacity to the framework.

Figure 5 - Framework Growth Across Milestones: Hook templates grew from 11 (initial) to 17 (Round 3 Security) to 19 (Adversarial Backport). Template categories grew from 6 to 8. Skill reference documents increased by 2. The schema gained trust model fields. Each milestone left a measurable footprint.
| Dimension | Before Backport | After Backport | Delta |
|---|---|---|---|
| Hook templates | 17 | 19 | +2 |
| Template categories | 6 | 8 | +2 (http-middleware, scoring) |
| Template files total | 21 | 24 | +3 |
| Skill reference docs | existing | +2 | two_wave_methodology, turn_budget_patterns |
| Schema fields | standard | +trust_model | deployment, user trust, data sensitivity, autonomy |
| Pipeline steps | 12 | 12 | unchanged (pre-flight added to existing step) |
The critical design decision: these are defaults, not options. Every future run of /bootstrap-project inherits all 10 improvements automatically. A developer generating infrastructure for a new project does not need to know about Unicode zero-width bypasses, pattern gap vulnerabilities, or trust model mismatches. The framework handles them.

Figure 6 - Automatic Inheritance: A developer runs /bootstrap-project on a new FastAPI project. The generated infrastructure includes centralized security patterns (T1-2), HTTP middleware (T1-3), input normalization (T1-1), pre-flight validation (T1-5), trust model assessment (T2-5), OWASP Agentic coverage (T2-4), boundary consistency checks (T2-2), and turn budget guidance (T2-3). Zero additional effort required.
The Compound Returns
The economics of this loop are asymmetric.
The adversarial exercises took approximately 8 hours across 2 rounds — designing the platform, running the attacks, analyzing results, writing defense patches. The backport took approximately 2 hours — 4 phases, 22 file operations, 4 commits. Total investment: roughly 10 hours.
Every future migration gets all 10 defenses for zero additional effort. The break-even point is project 2.

Figure 7 - Compound Returns: The initial investment (adversarial exercises + backport) is fixed. The benefit grows with each project that inherits the improved framework. By project 3, the cumulative security improvement far exceeds the original cost. By project 5, the per-project cost of the adversarial investment is under 2 hours.
But the ROI calculation understates the real benefit. Without the adversarial exercises, the framework would still have 17 hook templates. They would still cover OWASP Top 10 categories. A generated project would still look secure on paper.
It would also still have an 11-pattern gap between defense layers. It would still skip input normalization before regex matching. It would still generate the same trust model for a localhost tool and an internet-facing API. The theoretical security would remain untested, and the gaps would surface only when an attacker — human or AI — found them in a production system.
The adversarial exercises converted theoretical security into empirical security. That conversion is the real return on investment.
KEY INSIGHT: Adversarial testing is not a cost. It is the mechanism that converts theoretical security patterns into empirical ones. A framework with untested security patterns has unknown gaps. A framework with adversarially tested patterns has known, closed gaps. The difference compounds across every project the framework generates.
What the Loop Teaches
Five transferable principles emerge from completing this cycle.
1. Build defensively first. Round 3 Security Hardening (Phase 1) was not wasted work. The 17 hook templates, OWASP mapping, and trajectory monitoring gave adversarial testing something to validate. Without defensive infrastructure, adversarial testing produces a list of problems with no structural place to put the fixes.
2. Test adversarially. Defensive patterns built from theory have blind spots that only empirical testing reveals. The 11-pattern gap, the trust model mismatch, the Unicode bypass — none of these were predictable from reading the OWASP specification. They emerged from actually attacking a real system.
3. Extract lessons systematically. The 10 lessons in the tracking document are not observations. They are structured entries with a source finding, an article reference, a framework change, and a commit hash. Systematic extraction prevents lessons from evaporating into “we should probably fix that someday.”
4. Feed back into the source. Lessons that stay in a report improve nothing. Lessons that become framework defaults improve everything downstream. The backport took 2 hours. Every project after it benefits permanently.
5. The spiral, not the circle. This is not a closed loop. The framework is better now than before the adversarial exercises, but the next round of testing will find new gaps. The HTTP middleware template has not been adversarially tested. The trust model assessment has not been validated against a real internet-facing deployment. The scoring calculator has been unit tested but not battle tested. Each turn of the spiral starts from a higher baseline.

Figure 8 - The Improvement Spiral: The first turn is complete: build, test, extract, improve. The second turn starts from a higher baseline — 19 hook templates instead of 17, centralized patterns instead of scattered ones, trust model awareness instead of one-size-fits-all. Each turn discovers gaps the previous turn could not see. Security is not a destination. It is a spiral.
The Thesis
The Bootstrap Framework series showed how to build security infrastructure for agentic AI systems. The Adversarial Testing series showed how to test it. This article shows that neither is complete without the other.
Building without testing produces theoretical security with unknown gaps. Testing without building produces a vulnerability report with no structural home for the fixes. The loop — build, test, extract, improve — is the unit of security progress.
The framework now has 19 hook templates, 8 template categories, trust model assessment at analysis time, centralized security patterns, HTTP middleware templates, and a two-wave testing methodology. All of it traceable to specific adversarial findings. All of it inherited automatically by every future project.
Security is not a feature you add. It is a loop you run.
The Series
This is Part 5 of a 5-part series on Building the Bootstrap Framework:
- An Agent Swarm That Builds Agent Swarms — Case study migrating two production apps with generated Claude Code infrastructure
- From Prototype to Platform — How the framework learned from every migration and improved itself
- Securing Agentic AI — Building security-conscious agent systems with Claude Code
- WordPress to Astro — Migrating a production site with AI-assisted infrastructure
- Closing the Loop (this article) — How 10 adversarial lessons became framework defaults
Related Reading
This article bridges two series. The adversarial findings that drove these improvements are documented in:
- When Your AI Agents Attack Each Other — The platform: five agents, three teams, seven phases
- 65% Attack Success Rate Against an Unpatched Target — Round 1: 10 attacks, 7 confirmed, 9 defense patches
- The Escalation Wave — Round 2: patches hold at 20% ASR, new attacks succeed at 85.7%
- Securing Agentic AI Systems — Lessons learned: patching vs. architecture