Closing the Loop: How Adversarial Testing Improved the Framework That Built It#

We built a framework that generates Claude Code infrastructure. We used it to build a project. We attacked that project with adversarial AI agents — 27 attacks across 2 rounds. Then we took the 10 hardest lessons and backported them into the framework itself. 7 new files, 13 modifications, 4 commits. Every future project now inherits defenses we only learned by breaking things.

Figure 1 - Improvement spiral showing the framework building a project, adversarial testing finding gaps, lessons feeding back into the framework

Figure 1 - The Improvement Spiral: The Bootstrap Framework generates infrastructure for a target project. Adversarial testing attacks that project and finds gaps. 10 lessons extracted from the findings flow back into the framework. The next project inherits all 10 defenses automatically. This is one turn of a continuous spiral.

Two article series ran in parallel over the past month. The Bootstrap Framework series documented building a meta-framework — an agent swarm that generates Claude Code infrastructure for any project. The Adversarial Testing series documented attacking one of those generated projects with AI red team agents — 27 attacks, severity-weighted scoring, OWASP classification.

Both series ended with forward-looking conclusions. The framework series said “the next migration will be faster.” The adversarial series said “the results feed back into the framework.” Neither showed the feedback actually happening.

This article documents the closure. 10 lessons from adversarial testing became 7 new files and 13 modifications across 4 phases. The backport took 2 hours. Every future project generated by the framework now inherits defenses that did not exist 48 hours ago.

The thesis is simple: security is not a feature you add. It is a loop you run.

The Three-Step Evolution#

The framework’s security story unfolded in three distinct phases, each building on the last.

Figure 2 - Timeline showing Round 3 security hardening, adversarial exercises with ASR metrics, and backport phase with file counts

Figure 2 - Three Phases of Security Evolution: Phase 1 added theoretical security infrastructure (17 hook templates, OWASP coverage). Phase 2 tested it empirically (27 attacks, ASR 65% to 47%). Phase 3 fed lessons back into the framework (7 new files, 13 modifications). Each phase produced measurable artifacts.

Phase 1: Security Hardening (Theory). Round 3 of framework development added security infrastructure from first principles. We mapped the OWASP Top 10 for Agentic Applications, built 17 hook templates, created 2 JSON schemas for inter-agent validation, designed a 3-tier trajectory monitoring system, and wrote per-archetype security patterns for all 7 project types. The framework went from 11 hook templates to 17, the pipeline from 11 steps to 12. This was documented in Part 3 of the Bootstrap Framework series.

Phase 2: Adversarial Exercises (Testing). We pointed AI red team agents at obsidian-youtube-agent — a real project built using the framework. Round 1: 10 attacks, 65% attack success rate (ASR), severity CRITICAL. Round 2: 17 attacks, 47% ASR, severity HIGH. The two-wave methodology in Round 2 revealed the core insight: regression ASR dropped to 20% (patches work), but escalation ASR hit 85.7% (architecture doesn’t). This was documented across four articles.

Phase 3: Backport (Improvement). 10 lessons from adversarial testing became concrete framework changes. 7 new files created. 13 existing files modified. 4 commits across 4 phases. Completed in a single session.

The three phases form a cycle. Theory proposes defenses. Testing validates them. Results improve the theory. The cycle repeats.

KEY INSIGHT: Security infrastructure built from theory alone has blind spots. Adversarial testing reveals them empirically. The value is not in either step alone — it is in completing the loop so the framework learns from its own failures.

10 Lessons, 10 Improvements#

Each lesson traces from a specific adversarial finding to a specific framework change. No lesson is theoretical. Every one has a source attack, a confirmed gap, and a committed fix.

Figure 3 - Split comparison showing 11-pattern gap between HTTP middleware and chat sanitizer before centralization, and unified pattern module after

Figure 3 - The Pattern Gap: Before centralization (left), the HTTP middleware had 8 injection detection patterns while the chat sanitizer had 19. The 11-pattern gap became the primary attack surface — the attacker enumerated which patterns existed in one layer but not the other. After centralization (right), both layers import from security_patterns.py with 30 compiled regex patterns. The gap is structurally impossible.

ID	Lesson	Source Finding	Framework Change
T1-1	Normalize input before regex matching	ATK-N03: Unicode zero-width bypass	Enhanced `pretooluse_input_sanitization.py` with NFKC normalization
T1-2	Centralize security patterns	ATK-N05: 11-pattern gap between layers	New `security_patterns.py` module (v2.0.0, 30 compiled regex)
T1-3	HTTP middleware needed — hooks can’t inspect payloads	ATK-006: Hook evasion via HTTP	New `templates/http-middleware/` directory
T1-4	Scoring needs partial weighting	Scoring calibration v1.0 to v1.1	New `templates/scoring/` with weight redistribution
T1-5	Pre-flight checks prevent wasted time	Round 1 model 404 errors	New `session_preflight_check.py` in bootstrap pipeline
T2-1	Two-wave testing methodology	Regression 20% vs Escalation 85.7%	New skill reference doc for security review
T2-2	Inconsistent validation is worse than none	ATK-002: Path traversal on unvalidated endpoint	Updated validation checklist with boundary consistency checks
T2-3	Agents need turn budgets	Monitor agent lost work at turn 40	New skill reference doc for agent team design
T2-4	OWASP Agentic Top 10 is the right taxonomy	9 of 10 categories tested across 2 rounds	Updated OWASP patterns in security review skill
T2-5	Trust model mismatch IS the attack surface	Target built for localhost, tested adversarially	Updated schema and threat model template

Three of these deserve a closer look.

Deep Dive: The 11-Pattern Gap (T1-2)#

The most instructive finding from the adversarial exercises was not a sophisticated attack. It was arithmetic.

The obsidian-youtube-agent had two layers of injection defense: HTTP middleware (8 regex patterns) and a chat sanitizer (19 regex patterns). The red team agent enumerated both sets, identified the 11 patterns present in the sanitizer but absent from the middleware, and used those specific patterns to bypass the outer defense layer. The middleware waved the attack through. The sanitizer — deeper in the call stack — would have caught it, but the attacker had already achieved code execution.

The gap existed because each layer was developed independently. The middleware was written first, with a reasonable set of patterns. The sanitizer was written later, with a more comprehensive set. Nobody compared the two lists.

The framework fix is structural. security_patterns.py is now the single source of truth for all detection patterns:

1
PATTERNS_VERSION = "2.0.0"
2

3
# All consuming hooks reference this version
4
# When updating: edit HERE first, increment version,
5
# copy to each consuming hook template

The module defines 30 compiled regex patterns organized by category (injection, path traversal, command execution). Five consuming hooks reference this file. The gap between layers is now structurally impossible — not because developers will remember to synchronize, but because there is only one list to synchronize from.

KEY INSIGHT: Defense-in-depth fails when the layers have different pattern coverage. An attacker who can enumerate both layers will always find the gap. Centralize detection patterns into a single versioned module and import from it everywhere.

Deep Dive: Trust Model Mismatch (T2-5)#

The obsidian-youtube-agent was built for personal use. Authentication was disabled. Rate limiting was absent. Input validation was minimal. These are rational design choices for a localhost tool used by one person.

The adversarial exercises attacked it as if it were internet-facing. The 65% ASR in Round 1 did not mean the application was poorly built. It meant the application was built for a trust model that adversarial testing deliberately violated.

Figure 4 - Flowchart showing trust model assessment questions flowing from deployment context to security requirements

Figure 4 - Trust Model Assessment: The framework now asks these questions during project analysis (Step 1). The answers determine which security patterns are generated. A localhost tool gets different defenses than an internet-facing API. The mismatch between intended trust model and actual deployment context is where vulnerabilities live.

This insight changed the framework at the schema level. The project_analysis_schema.json now includes a security_profile section with trust model fields:

Deployment context: localhost, internal network, internet-facing
User trust level: self-only, trusted team, untrusted public
Data sensitivity: personal, business, regulated
Agent autonomy level: human-in-the-loop, human-on-the-loop, fully autonomous

When the Project Analyst runs during Step 1 of the pipeline, it now captures these dimensions. The Hooks Engineer and Agent Designer use them to calibrate security infrastructure. A localhost personal tool gets lightweight monitoring. An internet-facing API with untrusted users gets authentication middleware, rate limiting, and full input normalization.

The lesson is transferable beyond this framework. Every security assessment should start by asking: what trust model was this system built for, and what trust model does its actual deployment context require? The gap between those two answers is the attack surface.

Deep Dive: Two-Wave Methodology (T2-1)#

Before Round 2 of adversarial testing, we considered running 17 new attacks against the patched application. The combined ASR would have produced a single number — 47% — that hides more than it reveals.

The two-wave split made the results actionable. The regression wave (10 attacks replaying Round 1 vectors against the patched codebase) produced a 20% ASR. The escalation wave (7 new attack vectors) produced an 85.7% ASR. Those two numbers tell a clear story: patches work against known threats, but the architecture is vulnerable to novel ones.

This methodology is now a reusable skill reference in the framework’s security review skill. Any project that runs /security-review has access to the two-wave testing guide:

Regression wave: Replay all previous findings against current defenses. Target ASR below 25%.
Escalation wave: Deploy new attack vectors not seen in previous rounds. Measure architectural resilience.
Diagnostic matrix: Low regression + low escalation = ready for production. Low regression + high escalation = shift from patching to architecture. High regression + any escalation = patches are failing, go back and fix them.

The split transforms security testing from “how many vulnerabilities exist” to “which defense strategy is working and which is not.” That distinction drives resource allocation decisions that a single ASR number cannot.

What Changed Structurally#

The backport was not a refactoring exercise. It added measurable capacity to the framework.

Figure 5 - Stacked bar chart showing framework growth across 4 milestones from initial release through adversarial backport

Figure 5 - Framework Growth Across Milestones: Hook templates grew from 11 (initial) to 17 (Round 3 Security) to 19 (Adversarial Backport). Template categories grew from 6 to 8. Skill reference documents increased by 2. The schema gained trust model fields. Each milestone left a measurable footprint.

Dimension	Before Backport	After Backport	Delta
Hook templates	17	19	+2
Template categories	6	8	+2 (http-middleware, scoring)
Template files total	21	24	+3
Skill reference docs	existing	+2	two_wave_methodology, turn_budget_patterns
Schema fields	standard	+trust_model	deployment, user trust, data sensitivity, autonomy
Pipeline steps	12	12	unchanged (pre-flight added to existing step)

The critical design decision: these are defaults, not options. Every future run of /bootstrap-project inherits all 10 improvements automatically. A developer generating infrastructure for a new project does not need to know about Unicode zero-width bypasses, pattern gap vulnerabilities, or trust model mismatches. The framework handles them.

Figure 6 - Flow diagram showing a single bootstrap-project run inheriting all 10 adversarial lessons automatically

Figure 6 - Automatic Inheritance: A developer runs /bootstrap-project on a new FastAPI project. The generated infrastructure includes centralized security patterns (T1-2), HTTP middleware (T1-3), input normalization (T1-1), pre-flight validation (T1-5), trust model assessment (T2-5), OWASP Agentic coverage (T2-4), boundary consistency checks (T2-2), and turn budget guidance (T2-3). Zero additional effort required.

The Compound Returns#

The economics of this loop are asymmetric.

The adversarial exercises took approximately 8 hours across 2 rounds — designing the platform, running the attacks, analyzing results, writing defense patches. The backport took approximately 2 hours — 4 phases, 22 file operations, 4 commits. Total investment: roughly 10 hours.

Every future migration gets all 10 defenses for zero additional effort. The break-even point is project 2.

Figure 7 - Area chart showing adversarial investment cost versus cumulative security benefit growing with each new project

Figure 7 - Compound Returns: The initial investment (adversarial exercises + backport) is fixed. The benefit grows with each project that inherits the improved framework. By project 3, the cumulative security improvement far exceeds the original cost. By project 5, the per-project cost of the adversarial investment is under 2 hours.

But the ROI calculation understates the real benefit. Without the adversarial exercises, the framework would still have 17 hook templates. They would still cover OWASP Top 10 categories. A generated project would still look secure on paper.

It would also still have an 11-pattern gap between defense layers. It would still skip input normalization before regex matching. It would still generate the same trust model for a localhost tool and an internet-facing API. The theoretical security would remain untested, and the gaps would surface only when an attacker — human or AI — found them in a production system.

The adversarial exercises converted theoretical security into empirical security. That conversion is the real return on investment.

KEY INSIGHT: Adversarial testing is not a cost. It is the mechanism that converts theoretical security patterns into empirical ones. A framework with untested security patterns has unknown gaps. A framework with adversarially tested patterns has known, closed gaps. The difference compounds across every project the framework generates.

What the Loop Teaches#

Five transferable principles emerge from completing this cycle.

1. Build defensively first. Round 3 Security Hardening (Phase 1) was not wasted work. The 17 hook templates, OWASP mapping, and trajectory monitoring gave adversarial testing something to validate. Without defensive infrastructure, adversarial testing produces a list of problems with no structural place to put the fixes.

2. Test adversarially. Defensive patterns built from theory have blind spots that only empirical testing reveals. The 11-pattern gap, the trust model mismatch, the Unicode bypass — none of these were predictable from reading the OWASP specification. They emerged from actually attacking a real system.

3. Extract lessons systematically. The 10 lessons in the tracking document are not observations. They are structured entries with a source finding, an article reference, a framework change, and a commit hash. Systematic extraction prevents lessons from evaporating into “we should probably fix that someday.”

4. Feed back into the source. Lessons that stay in a report improve nothing. Lessons that become framework defaults improve everything downstream. The backport took 2 hours. Every project after it benefits permanently.

5. The spiral, not the circle. This is not a closed loop. The framework is better now than before the adversarial exercises, but the next round of testing will find new gaps. The HTTP middleware template has not been adversarially tested. The trust model assessment has not been validated against a real internet-facing deployment. The scoring calculator has been unit tested but not battle tested. Each turn of the spiral starts from a higher baseline.

Figure 8 - Ascending spiral diagram showing iterative improvement with the current cycle completed and future cycles indicated

Figure 8 - The Improvement Spiral: The first turn is complete: build, test, extract, improve. The second turn starts from a higher baseline — 19 hook templates instead of 17, centralized patterns instead of scattered ones, trust model awareness instead of one-size-fits-all. Each turn discovers gaps the previous turn could not see. Security is not a destination. It is a spiral.

The Thesis#

The Bootstrap Framework series showed how to build security infrastructure for agentic AI systems. The Adversarial Testing series showed how to test it. This article shows that neither is complete without the other.

Building without testing produces theoretical security with unknown gaps. Testing without building produces a vulnerability report with no structural home for the fixes. The loop — build, test, extract, improve — is the unit of security progress.

The framework now has 19 hook templates, 8 template categories, trust model assessment at analysis time, centralized security patterns, HTTP middleware templates, and a two-wave testing methodology. All of it traceable to specific adversarial findings. All of it inherited automatically by every future project.

Security is not a feature you add. It is a loop you run.

The Series#

This is Part 5 of a 5-part series on Building the Bootstrap Framework:

An Agent Swarm That Builds Agent Swarms — Case study migrating two production apps with generated Claude Code infrastructure
From Prototype to Platform — How the framework learned from every migration and improved itself
Securing Agentic AI — Building security-conscious agent systems with Claude Code
WordPress to Astro — Migrating a production site with AI-assisted infrastructure
Closing the Loop (this article) — How 10 adversarial lessons became framework defaults

This article bridges two series. The adversarial findings that drove these improvements are documented in:

When Your AI Agents Attack Each Other — The platform: five agents, three teams, seven phases
65% Attack Success Rate Against an Unpatched Target — Round 1: 10 attacks, 7 confirmed, 9 defense patches
The Escalation Wave — Round 2: patches hold at 20% ASR, new attacks succeed at 85.7%
Securing Agentic AI Systems — Lessons learned: patching vs. architecture