AI Agent Blueprints: Implementing Anthropic’s Framework with Pydantic AI

Share This Article

In the rapidly evolving landscape of AI development, we’re witnessing an explosion of large language model (LLM) applications that promise to revolutionize how we work. Yet beneath the surface of this excitement lies a harsh reality: most LLM applications in production are brittle, unpredictable, and frustratingly difficult to maintain. Traditional approaches often result in tangled codebases where prompts, validation logic, and tool integrations become so intertwined that even simple changes risk breaking the entire system.

As AI practitioners, we’ve all experienced the pain of debugging an LLM application where responses occasionally go off the rails, tool calls fail silently, or the model confidently produces outputs that violate basic business rules. The problem isn’t the models themselves—it’s that we’re building on foundations of sand, lacking the structured patterns and type safety that we take for granted in traditional software development.

This is where the convergence of two powerful concepts changes everything: Anthropic’s agent blueprint patterns and Pydantic AI’s type-safe framework. By combining battle-tested agent architectures with robust data validation, we can finally build LLM applications that are both powerful and predictable. Think of it as bringing software engineering discipline to the wild west of AI development.

In this article, we’ll dive into:

Understanding AI agents and the crucial distinction between workflows and autonomous agents
The technical architecture that makes reliable AI agent systems possible
Five foundational and advanced patterns from Anthropic’s agent blueprint
Practical Pydantic AI implementations with production-ready code examples
Testing strategies, monitoring approaches, and deployment considerations
Real-world benefits, limitations, and future directions of structured agent frameworks
Hands-on examples you can adapt for your own projects

Understanding AI Agents

What Are AI Agents?

Let’s start with a fundamental question: what exactly makes something an “AI agent” rather than just another LLM application? AI agents are systems that go beyond simple prompt-response interactions. They maintain state, make decisions based on context, and can engage in multi-step reasoning processes to achieve specific goals. You can think of them as LLMs with agency—the ability to decide not just what to say, but what to do next.

The key insight here is that agents aren’t just about generating text. They’re about orchestrating complex workflows, interacting with external tools, and maintaining coherent behavior across extended interactions. When you ask ChatGPT a question, you’re using an LLM. When that LLM can search the web, run calculations, and remember previous context to solve a complex problem, you’re working with an agent.

The Agent vs. Workflow Distinction

Here’s where things get interesting. Anthropic makes a crucial distinction that many developers miss: not everything needs to be an agent. In fact, there’s a spectrum between two fundamental approaches:

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Think of them like a recipe—you know exactly what steps will happen and in what order. The LLM might generate content or make simple decisions, but the overall flow is determined by your code. These systems are predictable, debuggable, and perfect for many real-world applications.

Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage. The model itself decides what to do next based on the context and available tools. This offers incredible flexibility but comes with the cost of less predictability.

As Adam Elkus from Anthropic puts it: “The workflow/agent distinction is less a binary and more a spectrum of giving the model increasing autonomy in directing computation” [1]. This insight is liberating—you don’t have to go full autonomous agent for every use case. Often, a workflow with some agent-like capabilities hits the sweet spot.

The Augmented LLM Model

At the heart of both workflows and agents is what we call the augmented LLM. This isn’t just a raw language model—it’s an LLM enhanced with additional capabilities:

Base Language Model: The foundation that provides reasoning and text generation
Tools: Functions the model can call to interact with external systems
Structured Output: Validation that ensures responses follow expected formats
Memory: Context management across multiple interactions
Retrieval: Access to external knowledge bases

These augmentations transform a simple text generator into a system capable of solving real problems. And this is where Pydantic AI shines—it provides the framework to implement these augmentations reliably.

AI Agents vs. Traditional Frameworks

To really grasp why agent frameworks matter, let’s compare them with traditional application development approaches:

Feature	Traditional Frameworks	AI Agent Frameworks
Control Flow	Static, predefined paths	Dynamic, model-driven decisions
Data Handling	Strict types, rigid schemas	Flexible schemas with validation
Error Recovery	Exception handling	Self-correction and retry mechanisms
Integration Pattern	Direct API calls	Tool-based abstractions
Decision Making	Rule-based logic	Natural language reasoning
Maintenance	Code changes required	Can adapt through prompts

The magic happens when you combine the best of both worlds: the flexibility of AI agents with the reliability of traditional software engineering practices. That’s exactly what we’re going to explore.

The Technical Architecture Behind AI Agents

The Pydantic AI Framework

Pydantic AI isn’t just another LLM wrapper—it’s a thoughtfully designed framework that brings type safety and structure to agent development. Building on the robust validation capabilities of Pydantic (which you might know from FastAPI), it extends these concepts specifically for AI agents.

The architecture consists of four essential layers, each serving a specific purpose:

Model Layer

This layer abstracts away the differences between LLM providers. Whether you’re using OpenAI, Anthropic, or Google’s models, Pydantic AI provides a consistent interface. More importantly, it handles all the messy details of prompt construction, message formatting, and response parsing that typically lead to bugs.
Validation Layer

The heart of Pydantic AI lives here. Every input and output can be validated against schemas you define, catching errors before they propagate through your system. This isn’t just about type checking—it’s about ensuring that your agent’s outputs make sense in your domain.
Tool Layer

Tools are what transform an LLM from a chatbot into an agent. This layer provides a clean abstraction for defining functions your agent can call, complete with parameter validation and result handling. No more string manipulation to parse tool calls!
Agent Layer

The highest level brings it all together, implementing the patterns that orchestrate models, tools, and validation into coherent agent behaviors. This is where you’ll spend most of your time, building on the solid foundation below.

Figure 1: Pydantic AI Architecture – This diagram illustrates the four-layer architecture of Pydantic AI, showing how each layer builds upon the previous one to create a complete agent framework. Notice how data flows from the Model Layer through Validation and Tools, ultimately orchestrated by the Agent Layer.

Agent Patterns and Execution Flow

Understanding how agents execute is crucial for building reliable systems. Here’s the typical flow:

Initialization: Configure your agent with a model, system prompt, and available tools
Input Processing: Receive and preprocess user input
Context Building: Combine input with relevant history and system prompts
LLM Invocation: Send the formatted context to the language model
Tool Execution: If the LLM requests tool usage, execute the appropriate functions
Response Validation: Ensure the output conforms to your expected schema
Output Generation: Return the validated response to the user

What makes this powerful is that each step can be customized, validated, and monitored. You’re not just hoping the LLM does the right thing—you’re guiding it with guardrails at every step.

Figure 2: Agent Execution Flow – This diagram shows the step-by-step execution flow of a Pydantic AI agent, from receiving user input to generating a validated response. Each step includes validation checkpoints to ensure reliability.

Five Essential Patterns for AI Agents

Now let’s dive into the meat of Anthropic’s agent blueprint. These patterns aren’t just theoretical—they’re battle-tested approaches that solve real problems. We’ll implement each one with Pydantic AI to show you exactly how they work.

1. Prompt Chaining

Prompt chaining is the workhorse pattern of agent development. Instead of trying to solve complex problems in one shot, you break them down into a sequence of simpler steps. Each step focuses on one specific transformation, making the overall process more reliable and debuggable.

Think of it like an assembly line—each station has one job and does it well. This approach is perfect when you can decompose a task into clear, sequential steps.

from pydantic import BaseModel, Field
from typing import List, Optional
from pydantic_ai import Agent

class ChainStep(BaseModel):
    “””Represents a single step in our prompt chain.”””
    prompt_template: str = Field(…, description=”Template for the prompt, with {input} placeholder”)
    name: str = Field(…, description=”Name of this step for debugging”)
    validation_rules: Optional[List[str]] = Field(None, description=”Rules to validate the output”)

class PromptChain:
    “””Executes a sequence of prompts, passing output from each to the next.”””
    
    def __init__(self, steps: List[ChainStep], model_type: str = “openai:gpt-4o”):
        self.steps = steps
        self.agent = Agent(model_type)
        
    async def execute(self, input_data: str) -> str:
        “””Execute the chain on the input data.”””
        result = input_data
        
        for step in self.steps:
            # Format the prompt with the previous result
            prompt = step.prompt_template.format(input=result)
            
            # Call the LLM using Pydantic AI
            response = await self.agent.run(prompt)
            result = response.data
            
            # Validate if rules are specified
            if step.validation_rules:
                self._validate_output(result, step.validation_rules)
                
        return result
    
    def _validate_output(self, output: str, rules: List[str]):
        “””Apply validation rules to the output.”””
        # Implementation depends on your specific rules
        pass

# Example: Building a research summarization chain
research_chain = PromptChain([
    ChainStep(
        name=”extract_key_points”,
        prompt_template=”Extract the key points from this text: {input}”
    ),
    ChainStep(
        name=”organize_themes”,
        prompt_template=”Organize these key points into themes: {input}”
    ),
    ChainStep(
        name=”write_summary”,
        prompt_template=”Write a concise summary based on these themes: {input}”
    )
])
</code>

from pydantic import BaseModel, Field
from typing import List, Optional
from pydantic_ai import Agent

class ChainStep(BaseModel):
    """Represents a single step in our prompt chain."""
    prompt_template: str = Field(..., description="Template for the prompt, with {input} placeholder")
    name: str = Field(..., description="Name of this step for debugging")
    validation_rules: Optional[List[str]] = Field(None, description="Rules to validate the output")

class PromptChain:
    """Executes a sequence of prompts, passing output from each to the next."""
    
    def __init__(self, steps: List[ChainStep], model_type: str = "openai:gpt-4o"):
        self.steps = steps
        self.agent = Agent(model_type)
        
    async def execute(self, input_data: str) -> str:
        """Execute the chain on the input data."""
        result = input_data
        
        for step in self.steps:
            # Format the prompt with the previous result
            prompt = step.prompt_template.format(input=result)
            
            # Call the LLM using Pydantic AI
            response = await self.agent.run(prompt)
            result = response.data
            
            # Validate if rules are specified
            if step.validation_rules:
                self._validate_output(result, step.validation_rules)
                
        return result
    
    def _validate_output(self, output: str, rules: List[str]):
        """Apply validation rules to the output."""
        # Implementation depends on your specific rules
        pass

# Example: Building a research summarization chain
research_chain = PromptChain([
    ChainStep(
        name="extract_key_points",
        prompt_template="Extract the key points from this text: {input}"
    ),
    ChainStep(
        name="organize_themes",
        prompt_template="Organize these key points into themes: {input}"
    ),
    ChainStep(
        name="write_summary",
        prompt_template="Write a concise summary based on these themes: {input}"
    )
])
</code>

The beauty of this pattern is its simplicity and debuggability. If something goes wrong, you can inspect the output at each step. Need to improve quality? You can optimize individual steps without affecting the others.

2. Routing

Not all problems follow a linear path. The routing pattern lets your agent intelligently direct queries to specialized handlers based on the input type. It’s like having a smart receptionist who knows exactly which department can best help each visitor.

from pydantic import BaseModel, Field
from typing import Dict, Optional
from pydantic_ai import Agent

class RouteRequest(BaseModel):
    “””Input request that needs routing.”””
    query: str = Field(…, description=”The user query to classify and route”)
    context: Optional[Dict] = Field(None, description=”Additional context for routing decisions”)

class RouteResponse(BaseModel):
    “””Routing decision from our classifier.”””
    route: str = Field(…, description=”Selected route identifier”)
    confidence: float = Field(…, ge=0, le=1, description=”Confidence in this routing decision”)
    reasoning: str = Field(…, description=”Explanation for the routing choice”)

class Router:
    “””Intelligently routes queries to specialized handlers.”””
    
    def __init__(self, classifier_prompt: str, handlers: Dict[str, Agent]):
        self.classifier_prompt = classifier_prompt
        self.handlers = handlers
        # Use a structured output type for reliable routing decisions
        self.classifier = Agent(
            ‘anthropic:claude-3-sonnet-20240229’,
            result_type=RouteResponse
        )
    
    async def route_and_process(self, request: RouteRequest) -> str:
        “””Classify the request and route to appropriate handler.”””
        # Build a description of available routes
        routes_desc = “\n”.join([
            f”- {route_id}: {handler.system_prompt[:50]}…” 
            for route_id, handler in self.handlers.items()
        ])
        
        # Classify the request
        prompt = self.classifier_prompt.format(
            query=request.query,
            available_routes=routes_desc,
            **request.context or {}
        )
        
        route_response = await self.classifier.run(prompt)
        
        # Handle unknown routes gracefully
        handler = self.handlers.get(route_response.data.route)
        if not handler:
            return f”I’m not sure how to handle that. Routing confidence was {route_response.data.confidence}”
            
        # Process with the specialized handler
        result = await handler.run(request.query)
        return result.data

# Example: Customer support router
support_router = Router(
    classifier_prompt=”””Classify this customer query and select the best handler:
    Query: {query}
    
    Available handlers:
    {available_routes}
    
    Select the most appropriate handler based on the query type.”””,
    handlers={
        “technical”: Agent(“openai:gpt-4o”, system_prompt=”You are a technical support specialist…”),
        “billing”: Agent(“openai:gpt-4o”, system_prompt=”You are a billing support agent…”),
        “general”: Agent(“openai:gpt-4o”, system_prompt=”You are a friendly customer service agent…”)
    }
)
</code>

from pydantic import BaseModel, Field
from typing import Dict, Optional
from pydantic_ai import Agent

class RouteRequest(BaseModel):
    """Input request that needs routing."""
    query: str = Field(..., description="The user query to classify and route")
    context: Optional[Dict] = Field(None, description="Additional context for routing decisions")

class RouteResponse(BaseModel):
    """Routing decision from our classifier."""
    route: str = Field(..., description="Selected route identifier")
    confidence: float = Field(..., ge=0, le=1, description="Confidence in this routing decision")
    reasoning: str = Field(..., description="Explanation for the routing choice")

class Router:
    """Intelligently routes queries to specialized handlers."""
    
    def __init__(self, classifier_prompt: str, handlers: Dict[str, Agent]):
        self.classifier_prompt = classifier_prompt
        self.handlers = handlers
        # Use a structured output type for reliable routing decisions
        self.classifier = Agent(
            'anthropic:claude-3-sonnet-20240229',
            result_type=RouteResponse
        )
    
    async def route_and_process(self, request: RouteRequest) -> str:
        """Classify the request and route to appropriate handler."""
        # Build a description of available routes
        routes_desc = "\n".join([
            f"- {route_id}: {handler.system_prompt[:50]}..." 
            for route_id, handler in self.handlers.items()
        ])
        
        # Classify the request
        prompt = self.classifier_prompt.format(
            query=request.query,
            available_routes=routes_desc,
            **request.context or {}
        )
        
        route_response = await self.classifier.run(prompt)
        
        # Handle unknown routes gracefully
        handler = self.handlers.get(route_response.data.route)
        if not handler:
            return f"I'm not sure how to handle that. Routing confidence was {route_response.data.confidence}"
            
        # Process with the specialized handler
        result = await handler.run(request.query)
        return result.data

# Example: Customer support router
support_router = Router(
    classifier_prompt="""Classify this customer query and select the best handler:
    Query: {query}
    
    Available handlers:
    {available_routes}
    
    Select the most appropriate handler based on the query type.""",
    handlers={
        "technical": Agent("openai:gpt-4o", system_prompt="You are a technical support specialist..."),
        "billing": Agent("openai:gpt-4o", system_prompt="You are a billing support agent..."),
        "general": Agent("openai:gpt-4o", system_prompt="You are a friendly customer service agent...")
    }
)
</code>

What I love about this pattern is how it scales. As your application grows, you can add new specialized handlers without touching the existing ones. The router ensures each query gets expert treatment.

3. Parallelization

When faced with tasks that can be decomposed into independent subtasks, parallelization can dramatically improve both performance and quality. This pattern comes in two flavors:

Sectioning: Breaking a task into parts that can be processed simultaneously
Voting: Running the same task multiple times to get diverse perspectives

from pydantic import BaseModel, Field
from typing import List, Dict, Any
from pydantic_ai import Agent
import asyncio

class SectioningTask(BaseModel):
    “””A subtask that can be processed independently.”””
    section_id: str = Field(…, description=”Unique identifier for this section”)
    prompt_template: str = Field(…, description=”Prompt template for processing this section”)
    data: Dict[str, Any] = Field(default_factory=dict, description=”Section-specific data”)

class ParallelExecutor:
    “””Execute tasks in parallel for improved performance.”””
    
    def __init__(self, model_type: str = “openai:gpt-4o”):
        self.model_type = model_type
        
    async def execute_sectioning(self, tasks: List[SectioningTask], shared_data: Dict[str, Any] = None) -> Dict[str, str]:
        “””Process multiple sections in parallel.”””
        shared_data = shared_data or {}
        
        async def process_section(task):
            # Create an agent for this section
            agent = Agent(self.model_type)
            
            # Combine shared and section-specific data
            prompt = task.prompt_template.format(**{**shared_data, **task.data})
            
            # Process asynchronously
            result = await agent.run(prompt)
            return task.section_id, result.data
        
        # Execute all sections concurrently
        coroutines = [process_section(task) for task in tasks]
        results_list = await asyncio.gather(*coroutines)
        
        # Convert to dictionary for easy access
        return {section_id: result for section_id, result in results_list}
    
    async def execute_voting(self, prompt: str, num_votes: int = 3) -> Dict[str, Any]:
        “””Get multiple perspectives on the same prompt.”””
        async def get_vote(vote_num):
            agent = Agent(self.model_type, system_prompt=f”You are assistant {vote_num}. Provide your perspective.”)
            result = await agent.run(prompt)
            return result.data
        
        # Collect all votes
        votes = await asyncio.gather(*[get_vote(i) for i in range(num_votes)])
        
        # Analyze consensus (simplified example)
        return {
            “votes”: votes,
            “consensus”: self._find_consensus(votes)
        }
    
    def _find_consensus(self, votes: List[str]) -> str:
        “””Analyze votes to find consensus (simplified).”””
        # In practice, you might use another LLM call or more sophisticated analysis
        return “Majority opinion: ” + votes[0]  # Placeholder

# Example: Parallel document analysis
async def analyze_document(document: str):
    executor = ParallelExecutor()
    
    # Split into sections for parallel processing
    tasks = [
        SectioningTask(
            section_id=”summary”,
            prompt_template=”Summarize this document: {document}”,
            data={“document”: document}
        ),
        SectioningTask(
            section_id=”key_points”,
            prompt_template=”Extract key points from: {document}”,
            data={“document”: document}
        ),
        SectioningTask(
            section_id=”sentiment”,
            prompt_template=”Analyze the sentiment of: {document}”,
            data={“document”: document}
        )
    ]
    
    results = await executor.execute_sectioning(tasks)
    return results
</code>

from pydantic import BaseModel, Field
from typing import List, Dict, Any
from pydantic_ai import Agent
import asyncio

class SectioningTask(BaseModel):
    """A subtask that can be processed independently."""
    section_id: str = Field(..., description="Unique identifier for this section")
    prompt_template: str = Field(..., description="Prompt template for processing this section")
    data: Dict[str, Any] = Field(default_factory=dict, description="Section-specific data")

class ParallelExecutor:
    """Execute tasks in parallel for improved performance."""
    
    def __init__(self, model_type: str = "openai:gpt-4o"):
        self.model_type = model_type
        
    async def execute_sectioning(self, tasks: List[SectioningTask], shared_data: Dict[str, Any] = None) -> Dict[str, str]:
        """Process multiple sections in parallel."""
        shared_data = shared_data or {}
        
        async def process_section(task):
            # Create an agent for this section
            agent = Agent(self.model_type)
            
            # Combine shared and section-specific data
            prompt = task.prompt_template.format(**{**shared_data, **task.data})
            
            # Process asynchronously
            result = await agent.run(prompt)
            return task.section_id, result.data
        
        # Execute all sections concurrently
        coroutines = [process_section(task) for task in tasks]
        results_list = await asyncio.gather(*coroutines)
        
        # Convert to dictionary for easy access
        return {section_id: result for section_id, result in results_list}
    
    async def execute_voting(self, prompt: str, num_votes: int = 3) -> Dict[str, Any]:
        """Get multiple perspectives on the same prompt."""
        async def get_vote(vote_num):
            agent = Agent(self.model_type, system_prompt=f"You are assistant {vote_num}. Provide your perspective.")
            result = await agent.run(prompt)
            return result.data
        
        # Collect all votes
        votes = await asyncio.gather(*[get_vote(i) for i in range(num_votes)])
        
        # Analyze consensus (simplified example)
        return {
            "votes": votes,
            "consensus": self._find_consensus(votes)
        }
    
    def _find_consensus(self, votes: List[str]) -> str:
        """Analyze votes to find consensus (simplified)."""
        # In practice, you might use another LLM call or more sophisticated analysis
        return "Majority opinion: " + votes[0]  # Placeholder

# Example: Parallel document analysis
async def analyze_document(document: str):
    executor = ParallelExecutor()
    
    # Split into sections for parallel processing
    tasks = [
        SectioningTask(
            section_id="summary",
            prompt_template="Summarize this document: {document}",
            data={"document": document}
        ),
        SectioningTask(
            section_id="key_points",
            prompt_template="Extract key points from: {document}",
            data={"document": document}
        ),
        SectioningTask(
            section_id="sentiment",
            prompt_template="Analyze the sentiment of: {document}",
            data={"document": document}
        )
    ]
    
    results = await executor.execute_sectioning(tasks)
    return results
</code>

This pattern shines when you have tasks that would take too long sequentially or when you want multiple perspectives to improve quality. The key is identifying truly independent subtasks.

4. Orchestrator-Workers

The orchestrator-workers pattern is where things get really interesting. Instead of predefined workflows, you have a central LLM that dynamically breaks down tasks and coordinates specialized workers. It’s like having a project manager who understands the problem and delegates to the right team members.

from pydantic import BaseModel, Field
from typing import List, Dict, Any
from pydantic_ai import Agent

class SubTask(BaseModel):
    “””A subtask created by the orchestrator.”””
    task_id: str = Field(…, description=”Unique identifier”)
    type: str = Field(…, description=”Type of subtask – determines which worker to use”)
    description: str = Field(…, description=”What needs to be done”)
    dependencies: List[str] = Field(default_factory=list, description=”IDs of tasks this depends on”)
    context: Dict[str, Any] = Field(default_factory=dict, description=”Additional context for the worker”)

class TaskDecomposition(BaseModel):
    “””The orchestrator’s plan for solving a complex task.”””
    subtasks: List[SubTask]
    execution_order: List[str] = Field(…, description=”Suggested order of execution”)

class OrchestratorSystem:
    “””Dynamic task decomposition and execution system.”””
    
    def __init__(
        self, 
        orchestrator_prompt: str,
        worker_prompts: Dict[str, str],
        orchestrator_model: str = “anthropic:claude-3-opus-20240229”,
        worker_model: str = “openai:gpt-4o”
    ):
        self.orchestrator_prompt = orchestrator_prompt
        self.worker_prompts = worker_prompts
        
        # Orchestrator with structured output for reliable task decomposition
        self.orchestrator = Agent(
            orchestrator_model, 
            result_type=TaskDecomposition,
            system_prompt=”You are a task orchestrator. Break down complex tasks into manageable subtasks.”
        )
        
        self.worker_model = worker_model
        self.completed_tasks = {}
    
    async def process_task(self, task: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
        “””Process a complex task through orchestration.”””
        context = context or {}
        
        # Step 1: Decompose the task
        decompose_prompt = self.orchestrator_prompt.format(
            task=task,
            available_workers=list(self.worker_prompts.keys()),
            **context
        )
        
        decomposition = await self.orchestrator.run(decompose_prompt)
        subtasks = decomposition.data.subtasks
        
        # Step 2: Execute subtasks respecting dependencies
        results = {}
        for subtask in self._order_by_dependencies(subtasks):
            # Wait for dependencies
            await self._wait_for_dependencies(subtask, results)
            
            # Execute subtask
            worker_result = await self._execute_subtask(subtask, results)
            results[subtask.task_id] = worker_result
        
        # Step 3: Synthesize results
        synthesis_result = await self._synthesize_results(task, subtasks, results)
        
        return {
            “task”: task,
            “subtasks”: [st.dict() for st in subtasks],
            “results”: results,
            “final_result”: synthesis_result
        }
    
    async def _execute_subtask(self, subtask: SubTask, completed_results: Dict[str, Any]) -> str:
        “””Execute a single subtask with the appropriate worker.”””
        worker_prompt = self.worker_prompts.get(subtask.type)
        if not worker_prompt:
            return f”No worker available for task type: {subtask.type}”
        
        # Create context including dependency results
        context = {
            “description”: subtask.description,
            “dependencies”: {dep_id: completed_results.get(dep_id) for dep_id in subtask.dependencies},
            **subtask.context
        }
        
        worker = Agent(
            self.worker_model,
            system_prompt=worker_prompt
        )
        
        result = await worker.run(str(context))
        return result.data
    
    def _order_by_dependencies(self, subtasks: List[SubTask]) -> List[SubTask]:
        “””Order subtasks respecting dependencies (simplified topological sort).”””
        # In practice, implement proper topological sorting
        return sorted(subtasks, key=lambda x: len(x.dependencies))
    
    async def _wait_for_dependencies(self, subtask: SubTask, results: Dict[str, Any]):
        “””Wait for all dependencies to complete.”””
        # In a real implementation, this would handle async coordination
        pass
    
    async def _synthesize_results(self, original_task: str, subtasks: List[SubTask], results: Dict[str, Any]) -> str:
        “””Combine all results into a final answer.”””
        synthesis_agent = Agent(
            self.orchestrator.model_name,
            system_prompt=”You are a synthesis expert. Combine subtask results into a coherent final answer.”
        )
        
        synthesis_prompt = f”””
        Original task: {original_task}
        
        Completed subtasks and results:
        {self._format_results_for_synthesis(subtasks, results)}
        
        Synthesize these results into a comprehensive final solution.
        “””
        
        final_result = await synthesis_agent.run(synthesis_prompt)
        return final_result.data
    
    def _format_results_for_synthesis(self, subtasks: List[SubTask], results: Dict[str, Any]) -> str:
        “””Format results for the synthesis step.”””
        formatted = []
        for subtask in subtasks:
            result = results.get(subtask.task_id, “No result”)
            formatted.append(f”- {subtask.description}: {result}”)
        return “\n”.join(formatted)

# Example: Research orchestrator
research_orchestrator = OrchestratorSystem(
    orchestrator_prompt=”””Break down this research task into subtasks:
    Task: {task}
    
    Available workers: {available_workers}
    
    Create a plan with specific subtasks that can be executed by the available workers.”””,
    worker_prompts={
        “search”: “You are a search specialist. Find relevant information based on the given query.”,
        “analyze”: “You are an analysis expert. Analyze the provided information and extract insights.”,
        “synthesize”: “You are a synthesis specialist. Combine multiple pieces of information coherently.”,
        “fact_check”: “You are a fact checker. Verify the accuracy of the provided claims.”
    }
)
</code>

from pydantic import BaseModel, Field
from typing import List, Dict, Any
from pydantic_ai import Agent

class SubTask(BaseModel):
    """A subtask created by the orchestrator."""
    task_id: str = Field(..., description="Unique identifier")
    type: str = Field(..., description="Type of subtask - determines which worker to use")
    description: str = Field(..., description="What needs to be done")
    dependencies: List[str] = Field(default_factory=list, description="IDs of tasks this depends on")
    context: Dict[str, Any] = Field(default_factory=dict, description="Additional context for the worker")

class TaskDecomposition(BaseModel):
    """The orchestrator's plan for solving a complex task."""
    subtasks: List[SubTask]
    execution_order: List[str] = Field(..., description="Suggested order of execution")

class OrchestratorSystem:
    """Dynamic task decomposition and execution system."""
    
    def __init__(
        self, 
        orchestrator_prompt: str,
        worker_prompts: Dict[str, str],
        orchestrator_model: str = "anthropic:claude-3-opus-20240229",
        worker_model: str = "openai:gpt-4o"
    ):
        self.orchestrator_prompt = orchestrator_prompt
        self.worker_prompts = worker_prompts
        
        # Orchestrator with structured output for reliable task decomposition
        self.orchestrator = Agent(
            orchestrator_model, 
            result_type=TaskDecomposition,
            system_prompt="You are a task orchestrator. Break down complex tasks into manageable subtasks."
        )
        
        self.worker_model = worker_model
        self.completed_tasks = {}
    
    async def process_task(self, task: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
        """Process a complex task through orchestration."""
        context = context or {}
        
        # Step 1: Decompose the task
        decompose_prompt = self.orchestrator_prompt.format(
            task=task,
            available_workers=list(self.worker_prompts.keys()),
            **context
        )
        
        decomposition = await self.orchestrator.run(decompose_prompt)
        subtasks = decomposition.data.subtasks
        
        # Step 2: Execute subtasks respecting dependencies
        results = {}
        for subtask in self._order_by_dependencies(subtasks):
            # Wait for dependencies
            await self._wait_for_dependencies(subtask, results)
            
            # Execute subtask
            worker_result = await self._execute_subtask(subtask, results)
            results[subtask.task_id] = worker_result
        
        # Step 3: Synthesize results
        synthesis_result = await self._synthesize_results(task, subtasks, results)
        
        return {
            "task": task,
            "subtasks": [st.dict() for st in subtasks],
            "results": results,
            "final_result": synthesis_result
        }
    
    async def _execute_subtask(self, subtask: SubTask, completed_results: Dict[str, Any]) -> str:
        """Execute a single subtask with the appropriate worker."""
        worker_prompt = self.worker_prompts.get(subtask.type)
        if not worker_prompt:
            return f"No worker available for task type: {subtask.type}"
        
        # Create context including dependency results
        context = {
            "description": subtask.description,
            "dependencies": {dep_id: completed_results.get(dep_id) for dep_id in subtask.dependencies},
            **subtask.context
        }
        
        worker = Agent(
            self.worker_model,
            system_prompt=worker_prompt
        )
        
        result = await worker.run(str(context))
        return result.data
    
    def _order_by_dependencies(self, subtasks: List[SubTask]) -> List[SubTask]:
        """Order subtasks respecting dependencies (simplified topological sort)."""
        # In practice, implement proper topological sorting
        return sorted(subtasks, key=lambda x: len(x.dependencies))
    
    async def _wait_for_dependencies(self, subtask: SubTask, results: Dict[str, Any]):
        """Wait for all dependencies to complete."""
        # In a real implementation, this would handle async coordination
        pass
    
    async def _synthesize_results(self, original_task: str, subtasks: List[SubTask], results: Dict[str, Any]) -> str:
        """Combine all results into a final answer."""
        synthesis_agent = Agent(
            self.orchestrator.model_name,
            system_prompt="You are a synthesis expert. Combine subtask results into a coherent final answer."
        )
        
        synthesis_prompt = f"""
        Original task: {original_task}
        
        Completed subtasks and results:
        {self._format_results_for_synthesis(subtasks, results)}
        
        Synthesize these results into a comprehensive final solution.
        """
        
        final_result = await synthesis_agent.run(synthesis_prompt)
        return final_result.data
    
    def _format_results_for_synthesis(self, subtasks: List[SubTask], results: Dict[str, Any]) -> str:
        """Format results for the synthesis step."""
        formatted = []
        for subtask in subtasks:
            result = results.get(subtask.task_id, "No result")
            formatted.append(f"- {subtask.description}: {result}")
        return "\n".join(formatted)

# Example: Research orchestrator
research_orchestrator = OrchestratorSystem(
    orchestrator_prompt="""Break down this research task into subtasks:
    Task: {task}
    
    Available workers: {available_workers}
    
    Create a plan with specific subtasks that can be executed by the available workers.""",
    worker_prompts={
        "search": "You are a search specialist. Find relevant information based on the given query.",
        "analyze": "You are an analysis expert. Analyze the provided information and extract insights.",
        "synthesize": "You are a synthesis specialist. Combine multiple pieces of information coherently.",
        "fact_check": "You are a fact checker. Verify the accuracy of the provided claims."
    }
)
</code>

This pattern is incredibly powerful for complex, open-ended tasks where you can’t predict all the steps in advance. The orchestrator acts like a junior developer who breaks down requirements and delegates to specialists.

5. Evaluator-Optimizer

The evaluator-optimizer pattern creates a feedback loop where one LLM generates content and another critiques it, leading to iterative improvement. It’s like having a writer and an editor working together, each cycle producing better results.

from pydantic import BaseModel, Field
from typing import List, Dict, Literal
from pydantic_ai import Agent

class EvaluationCriteria(BaseModel):
    “””Criteria for evaluating generated content.”””
    name: str = Field(…, description=”Name of this criterion”)
    description: str = Field(…, description=”What this criterion measures”)
    threshold: float = Field(…, ge=0, le=1, description=”Minimum score to pass”)
    weight: float = Field(1.0, description=”Importance weight for this criterion”)

class Evaluation(BaseModel):
    “””Structured evaluation of generated content.”””
    status: Literal[“PASS”, “NEEDS_IMPROVEMENT”, “FAIL”] = Field(…, description=”Overall status”)
    criteria_scores: Dict[str, float] = Field(…, description=”Individual criterion scores”)
    feedback: str = Field(…, description=”Specific, actionable feedback for improvement”)
    strengths: List[str] = Field(default_factory=list, description=”What worked well”)
    improvements: List[str] = Field(default_factory=list, description=”Specific improvements needed”)
    
    @property
    def passed(self) -> bool:
        “””Check if the evaluation passed all criteria.”””
        return self.status == “PASS”
    
    @property
    def overall_score(self) -> float:
        “””Calculate weighted overall score.”””
        if not self.criteria_scores:
            return 0.0
        return sum(score for score in self.criteria_scores.values()) / len(self.criteria_scores)

class EvaluatorOptimizerSystem:
    “””Iterative improvement through evaluation and optimization.”””
    
    def __init__(
        self, 
        optimizer_prompt: str,
        evaluator_prompt: str,
        criteria: List[EvaluationCriteria],
        max_iterations: int = 5,
        optimizer_model: str = “openai:gpt-4o”,
        evaluator_model: str = “anthropic:claude-3-sonnet-20240229″
    ):
        self.optimizer_prompt = optimizer_prompt
        self.evaluator_prompt = evaluator_prompt
        self.criteria = criteria
        self.max_iterations = max_iterations
        
        # Different models for different strengths
        self.optimizer = Agent(optimizer_model, system_prompt=”You are a content creator focused on quality.”)
        self.evaluator = Agent(
            evaluator_model,
            result_type=Evaluation,
            system_prompt=”You are a critical evaluator. Provide honest, constructive feedback.”
        )
    
    async def optimize(self, task: str, context: Dict[str, str] = None) -> Dict[str, Any]:
        “””Run the optimization loop.”””
        context = context or {}
        history = []
        
        # Initial generation
        content = await self._generate_initial(task, context)
        
        for iteration in range(self.max_iterations):
            # Evaluate current content
            evaluation = await self._evaluate_content(task, content, context)
            
            # Track history
            history.append({
                “iteration”: iteration,
                “content”: content,
                “evaluation”: evaluation.dict(),
                “score”: evaluation.overall_score
            })
            
            # Check if we’re done
            if evaluation.passed:
                break
            
            # Generate improved version
            content = await self._improve_content(task, content, evaluation, context)
        
        return {
            “task”: task,
            “final_content”: content,
            “iterations”: len(history),
            “passed”: evaluation.passed,
            “final_score”: evaluation.overall_score,
            “history”: history
        }
    
    async def _generate_initial(self, task: str, context: Dict[str, str]) -> str:
        “””Generate the initial content.”””
        prompt = self.optimizer_prompt.format(
            task=task,
            **context
        )
        response = await self.optimizer.run(prompt)
        return response.data
    
    async def _evaluate_content(self, task: str, content: str, context: Dict[str, str]) -> Evaluation:
        “””Evaluate content against criteria.”””
        criteria_text = “\n”.join([
            f”- {c.name}: {c.description} (minimum score: {c.threshold})”
            for c in self.criteria
        ])
        
        prompt = self.evaluator_prompt.format(
            task=task,
            content=content,
            criteria=criteria_text,
            **context
        )
        
        response = await self.evaluator.run(prompt)
        return response.data
    
    async def _improve_content(self, task: str, content: str, evaluation: Evaluation, context: Dict[str, str]) -> str:
        “””Generate improved content based on feedback.”””
        improvement_prompt = f”””
        Task: {task}
        
        Previous attempt:
        {content}
        
        Evaluation feedback:
        {evaluation.feedback}
        
        Specific improvements needed:
        {chr(10).join(f’- {imp}’ for imp in evaluation.improvements)}
        
        Generate an improved version that addresses all the feedback while maintaining the strengths.
        “””
        
        response = await self.optimizer.run(improvement_prompt)
        return response.data

# Example: Blog post optimizer
blog_optimizer = EvaluatorOptimizerSystem(
    optimizer_prompt=”Write a blog post about: {task}\n\nTone: {tone}\nAudience: {audience}”,
    evaluator_prompt=”””Evaluate this blog post:
    Task: {task}
    Content: {content}
    
    Criteria:
    {criteria}
    
    Provide specific, actionable feedback for improvement.”””,
    criteria=[
        EvaluationCriteria(
            name=”clarity”,
            description=”Ideas are expressed clearly and logically”,
            threshold=0.8
        ),
        EvaluationCriteria(
            name=”engagement”,
            description=”Content is engaging and holds reader attention”,
            threshold=0.7
        ),
        EvaluationCriteria(
            name=”accuracy”,
            description=”Information is accurate and well-researched”,
            threshold=0.9
        )
    ]
)
</code>

from pydantic import BaseModel, Field
from typing import List, Dict, Literal
from pydantic_ai import Agent

class EvaluationCriteria(BaseModel):
    """Criteria for evaluating generated content."""
    name: str = Field(..., description="Name of this criterion")
    description: str = Field(..., description="What this criterion measures")
    threshold: float = Field(..., ge=0, le=1, description="Minimum score to pass")
    weight: float = Field(1.0, description="Importance weight for this criterion")

class Evaluation(BaseModel):
    """Structured evaluation of generated content."""
    status: Literal["PASS", "NEEDS_IMPROVEMENT", "FAIL"] = Field(..., description="Overall status")
    criteria_scores: Dict[str, float] = Field(..., description="Individual criterion scores")
    feedback: str = Field(..., description="Specific, actionable feedback for improvement")
    strengths: List[str] = Field(default_factory=list, description="What worked well")
    improvements: List[str] = Field(default_factory=list, description="Specific improvements needed")
    
    @property
    def passed(self) -> bool:
        """Check if the evaluation passed all criteria."""
        return self.status == "PASS"
    
    @property
    def overall_score(self) -> float:
        """Calculate weighted overall score."""
        if not self.criteria_scores:
            return 0.0
        return sum(score for score in self.criteria_scores.values()) / len(self.criteria_scores)

class EvaluatorOptimizerSystem:
    """Iterative improvement through evaluation and optimization."""
    
    def __init__(
        self, 
        optimizer_prompt: str,
        evaluator_prompt: str,
        criteria: List[EvaluationCriteria],
        max_iterations: int = 5,
        optimizer_model: str = "openai:gpt-4o",
        evaluator_model: str = "anthropic:claude-3-sonnet-20240229"
    ):
        self.optimizer_prompt = optimizer_prompt
        self.evaluator_prompt = evaluator_prompt
        self.criteria = criteria
        self.max_iterations = max_iterations
        
        # Different models for different strengths
        self.optimizer = Agent(optimizer_model, system_prompt="You are a content creator focused on quality.")
        self.evaluator = Agent(
            evaluator_model,
            result_type=Evaluation,
            system_prompt="You are a critical evaluator. Provide honest, constructive feedback."
        )
    
    async def optimize(self, task: str, context: Dict[str, str] = None) -> Dict[str, Any]:
        """Run the optimization loop."""
        context = context or {}
        history = []
        
        # Initial generation
        content = await self._generate_initial(task, context)
        
        for iteration in range(self.max_iterations):
            # Evaluate current content
            evaluation = await self._evaluate_content(task, content, context)
            
            # Track history
            history.append({
                "iteration": iteration,
                "content": content,
                "evaluation": evaluation.dict(),
                "score": evaluation.overall_score
            })
            
            # Check if we're done
            if evaluation.passed:
                break
            
            # Generate improved version
            content = await self._improve_content(task, content, evaluation, context)
        
        return {
            "task": task,
            "final_content": content,
            "iterations": len(history),
            "passed": evaluation.passed,
            "final_score": evaluation.overall_score,
            "history": history
        }
    
    async def _generate_initial(self, task: str, context: Dict[str, str]) -> str:
        """Generate the initial content."""
        prompt = self.optimizer_prompt.format(
            task=task,
            **context
        )
        response = await self.optimizer.run(prompt)
        return response.data
    
    async def _evaluate_content(self, task: str, content: str, context: Dict[str, str]) -> Evaluation:
        """Evaluate content against criteria."""
        criteria_text = "\n".join([
            f"- {c.name}: {c.description} (minimum score: {c.threshold})"
            for c in self.criteria
        ])
        
        prompt = self.evaluator_prompt.format(
            task=task,
            content=content,
            criteria=criteria_text,
            **context
        )
        
        response = await self.evaluator.run(prompt)
        return response.data
    
    async def _improve_content(self, task: str, content: str, evaluation: Evaluation, context: Dict[str, str]) -> str:
        """Generate improved content based on feedback."""
        improvement_prompt = f"""
        Task: {task}
        
        Previous attempt:
        {content}
        
        Evaluation feedback:
        {evaluation.feedback}
        
        Specific improvements needed:
        {chr(10).join(f'- {imp}' for imp in evaluation.improvements)}
        
        Generate an improved version that addresses all the feedback while maintaining the strengths.
        """
        
        response = await self.optimizer.run(improvement_prompt)
        return response.data

# Example: Blog post optimizer
blog_optimizer = EvaluatorOptimizerSystem(
    optimizer_prompt="Write a blog post about: {task}\n\nTone: {tone}\nAudience: {audience}",
    evaluator_prompt="""Evaluate this blog post:
    Task: {task}
    Content: {content}
    
    Criteria:
    {criteria}
    
    Provide specific, actionable feedback for improvement.""",
    criteria=[
        EvaluationCriteria(
            name="clarity",
            description="Ideas are expressed clearly and logically",
            threshold=0.8
        ),
        EvaluationCriteria(
            name="engagement",
            description="Content is engaging and holds reader attention",
            threshold=0.7
        ),
        EvaluationCriteria(
            name="accuracy",
            description="Information is accurate and well-researched",
            threshold=0.9
        )
    ]
)
</code>

What makes this pattern special is the structured evaluation. By defining clear criteria upfront, you get consistent, measurable improvements rather than vague “make it better” feedback.

Figure 3: Comparison of Agent Patterns – This diagram compares the five agent patterns, showing their structure and data flow. Notice how complexity increases from simple chains to dynamic orchestration and iterative optimization.

Pydantic AI’s Core Features for Agent Development

Schema Inference and Validation

Let’s talk about what makes Pydantic AI special. At its core, it’s all about turning the unpredictable outputs of LLMs into reliable, structured data. This isn’t just nice to have—it’s essential for building production systems.

from pydantic import BaseModel, Field, validator
from datetime import datetime
from typing import List, Optional
from pydantic_ai import Agent

# Define what we expect from our agent
class CustomerInquiry(BaseModel):
    “””Structured representation of a customer inquiry.”””
    category: str = Field(description=”Type of inquiry: technical, billing, or general”)
    urgency: int = Field(ge=1, le=5, description=”Urgency level from 1 (low) to 5 (critical)”)
    summary: str = Field(description=”Brief summary of the issue”)
    customer_sentiment: float = Field(ge=-1, le=1, description=”Sentiment score from -1 (angry) to 1 (happy)”)
    requires_human: bool = Field(description=”Whether this needs human intervention”)
    suggested_actions: List[str] = Field(description=”Recommended next steps”)
    
    @validator(‘category’)
    def validate_category(cls, v):
        valid_categories = [‘technical’, ‘billing’, ‘general’]
        if v.lower() not in valid_categories:
            raise ValueError(f”Category must be one of {valid_categories}”)
        return v.lower()
    
    @validator(‘suggested_actions’)
    def validate_actions(cls, v):
        if not v:
            raise ValueError(“At least one suggested action is required”)
        return v

# Create an agent that outputs structured data
support_classifier = Agent(
    ‘openai:gpt-4o’,
    result_type=CustomerInquiry,
    system_prompt=”””You are a customer support classifier. Analyze customer messages and 
    extract structured information to help route and prioritize support tickets.”””
)

# Use it with confidence
async def process_customer_message(message: str) -> CustomerInquiry:
    result = await support_classifier.run(
        f”Analyze this customer message: {message}”
    )
    return result.data  # This is guaranteed to be a valid CustomerInquiry

# Example usage
inquiry = await process_customer_message(“My internet has been down for 3 days and I’m furious!”)
print(f”Category: {inquiry.category}”)
print(f”Urgency: {inquiry.urgency}/5″)
print(f”Needs human: {inquiry.requires_human}”)
</code>

from pydantic import BaseModel, Field, validator
from datetime import datetime
from typing import List, Optional
from pydantic_ai import Agent

# Define what we expect from our agent
class CustomerInquiry(BaseModel):
    """Structured representation of a customer inquiry."""
    category: str = Field(description="Type of inquiry: technical, billing, or general")
    urgency: int = Field(ge=1, le=5, description="Urgency level from 1 (low) to 5 (critical)")
    summary: str = Field(description="Brief summary of the issue")
    customer_sentiment: float = Field(ge=-1, le=1, description="Sentiment score from -1 (angry) to 1 (happy)")
    requires_human: bool = Field(description="Whether this needs human intervention")
    suggested_actions: List[str] = Field(description="Recommended next steps")
    
    @validator('category')
    def validate_category(cls, v):
        valid_categories = ['technical', 'billing', 'general']
        if v.lower() not in valid_categories:
            raise ValueError(f"Category must be one of {valid_categories}")
        return v.lower()
    
    @validator('suggested_actions')
    def validate_actions(cls, v):
        if not v:
            raise ValueError("At least one suggested action is required")
        return v

# Create an agent that outputs structured data
support_classifier = Agent(
    'openai:gpt-4o',
    result_type=CustomerInquiry,
    system_prompt="""You are a customer support classifier. Analyze customer messages and 
    extract structured information to help route and prioritize support tickets."""
)

# Use it with confidence
async def process_customer_message(message: str) -> CustomerInquiry:
    result = await support_classifier.run(
        f"Analyze this customer message: {message}"
    )
    return result.data  # This is guaranteed to be a valid CustomerInquiry

# Example usage
inquiry = await process_customer_message("My internet has been down for 3 days and I'm furious!")
print(f"Category: {inquiry.category}")
print(f"Urgency: {inquiry.urgency}/5")
print(f"Needs human: {inquiry.requires_human}")
</code>

The magic here is that you’re guaranteed to get valid data back. No more parsing JSON and hoping the LLM included all the fields. No more type mismatches crashing your application. Just clean, validated data every time.

Function Calling and Tool Integration

Tools are what transform an LLM from a chatbot into an agent that can actually do things. Pydantic AI makes tool integration delightfully simple:

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
from typing import Dict, List
import asyncio

@dataclass
class Dependencies:
    “””Dependencies that will be injected into tool calls.”””
    database: object  # Your database connection
    api_client: object  # External API client
    user_id: str  # Current user context

# Create an agent with dependencies
agent = Agent(
    ‘openai:gpt-4o’,
    deps_type=Dependencies,
    system_prompt=’You are a helpful assistant with access to user data and external services.’
)

@agent.tool
async def get_user_orders(ctx: RunContext[Dependencies]) -> List[Dict]:
    “””Fetch user’s order history from the database.”””
    # Note how we access injected dependencies via ctx.deps
    orders = await ctx.deps.database.get_orders(ctx.deps.user_id)
    return [
        {
            “order_id”: order.id,
            “date”: order.date.isoformat(),
            “total”: float(order.total),
            “status”: order.status
        }
        for order in orders
    ]

@agent.tool
async def check_shipping_status(ctx: RunContext[Dependencies], order_id: str) -> Dict:
    “””Check shipping status with external shipping API.”””
    # Tools can take parameters and access dependencies
    tracking = await ctx.deps.api_client.get_tracking(order_id)
    return {
        “order_id”: order_id,
        “status”: tracking.status,
        “location”: tracking.current_location,
        “estimated_delivery”: tracking.eta.isoformat() if tracking.eta else None
    }

@agent.tool
def calculate_loyalty_points(ctx: RunContext[Dependencies], order_total: float) -> int:
    “””Calculate loyalty points for an order (synchronous tools work too!).”””
    # Business logic can be encapsulated in tools
    points_rate = 10  # 10 points per dollar
    bonus_multiplier = 2 if order_total > 100 else 1
    return int(order_total * points_rate * bonus_multiplier)

# Use the agent with injected dependencies
async def handle_customer_query(query: str, user_id: str):
    deps = Dependencies(
        database=db_connection,
        api_client=shipping_api,
        user_id=user_id
    )
    
    result = await agent.run(query, deps=deps)
    return result.data

# Example: The agent can now use tools intelligently
response = await handle_customer_query(
    “What’s the status of my recent orders and how many points did I earn?”,
    user_id=”user123″
)
</code>

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
from typing import Dict, List
import asyncio

@dataclass
class Dependencies:
    """Dependencies that will be injected into tool calls."""
    database: object  # Your database connection
    api_client: object  # External API client
    user_id: str  # Current user context

# Create an agent with dependencies
agent = Agent(
    'openai:gpt-4o',
    deps_type=Dependencies,
    system_prompt='You are a helpful assistant with access to user data and external services.'
)

@agent.tool
async def get_user_orders(ctx: RunContext[Dependencies]) -> List[Dict]:
    """Fetch user's order history from the database."""
    # Note how we access injected dependencies via ctx.deps
    orders = await ctx.deps.database.get_orders(ctx.deps.user_id)
    return [
        {
            "order_id": order.id,
            "date": order.date.isoformat(),
            "total": float(order.total),
            "status": order.status
        }
        for order in orders
    ]

@agent.tool
async def check_shipping_status(ctx: RunContext[Dependencies], order_id: str) -> Dict:
    """Check shipping status with external shipping API."""
    # Tools can take parameters and access dependencies
    tracking = await ctx.deps.api_client.get_tracking(order_id)
    return {
        "order_id": order_id,
        "status": tracking.status,
        "location": tracking.current_location,
        "estimated_delivery": tracking.eta.isoformat() if tracking.eta else None
    }

@agent.tool
def calculate_loyalty_points(ctx: RunContext[Dependencies], order_total: float) -> int:
    """Calculate loyalty points for an order (synchronous tools work too!)."""
    # Business logic can be encapsulated in tools
    points_rate = 10  # 10 points per dollar
    bonus_multiplier = 2 if order_total > 100 else 1
    return int(order_total * points_rate * bonus_multiplier)

# Use the agent with injected dependencies
async def handle_customer_query(query: str, user_id: str):
    deps = Dependencies(
        database=db_connection,
        api_client=shipping_api,
        user_id=user_id
    )
    
    result = await agent.run(query, deps=deps)
    return result.data

# Example: The agent can now use tools intelligently
response = await handle_customer_query(
    "What's the status of my recent orders and how many points did I earn?",
    user_id="user123"
)
</code>

What I love about this approach is how it separates concerns. Your tools handle the “how” (database queries, API calls), while the LLM handles the “what” and “why” (understanding intent, choosing tools, formatting responses).

Dependency Injection for Testability

One of Pydantic AI’s killer features is its dependency injection system. This makes testing a breeze and keeps your code modular:

from pydantic_ai.models.test import TestModel
from pydantic_ai.models.function import FunctionModel, AgentInfo
from pydantic_ai.messages import ModelMessage, ModelResponse, TextPart

# Production code stays the same
order_agent = Agent(
    ‘openai:gpt-4o’,
    deps_type=Dependencies,
    system_prompt=’You help customers with their orders.’
)

@order_agent.tool
async def get_order_details(ctx: RunContext[Dependencies], order_id: str) -> Dict:
    “””Fetch order details from database.”””
    return await ctx.deps.database.get_order(order_id)

# For testing, we can inject mock dependencies
class MockDatabase:
    async def get_order(self, order_id: str) -> Dict:
        # Return test data instead of hitting real database
        return {
            “order_id”: order_id,
            “status”: “shipped”,
            “items”: [“Test Item 1”, “Test Item 2″]
        }

# Test with mocked dependencies and model
async def test_order_lookup():
    test_deps = Dependencies(
        database=MockDatabase(),
        api_client=None,  # Not needed for this test
        user_id=”test_user”
    )
    
    # Use TestModel to avoid API calls
    with order_agent.override(model=TestModel()):
        result = await order_agent.run(
            “What’s the status of order ABC123?”,
            deps=test_deps
        )
        
        # Assertions about the result
        assert “shipped” in result.data.lower()

# For more complex testing scenarios
async def custom_model_function(messages: list[ModelMessage], info: AgentInfo) -> ModelResponse:
    “””Custom function that simulates model responses based on input.”””
    user_message = messages[-1].content
    
    if “order” in user_message.lower():
        # Simulate the model calling our tool
        return ModelResponse(
            parts=[TextPart(“I’ll check that order for you.”)],
            tool_calls=[{
                “tool_name”: “get_order_details”,
                “args”: {“order_id”: “ABC123”}
            }]
        )
    
    return ModelResponse(parts=[TextPart(“How can I help you?”)])

# Test with custom model behavior
async def test_complex_interaction():
    with order_agent.override(model=FunctionModel(custom_model_function)):
        result = await order_agent.run(
            “Check order ABC123”,
            deps=test_deps
        )
        # Now we can test the full flow including tool calls
</code>

from pydantic_ai.models.test import TestModel
from pydantic_ai.models.function import FunctionModel, AgentInfo
from pydantic_ai.messages import ModelMessage, ModelResponse, TextPart

# Production code stays the same
order_agent = Agent(
    'openai:gpt-4o',
    deps_type=Dependencies,
    system_prompt='You help customers with their orders.'
)

@order_agent.tool
async def get_order_details(ctx: RunContext[Dependencies], order_id: str) -> Dict:
    """Fetch order details from database."""
    return await ctx.deps.database.get_order(order_id)

# For testing, we can inject mock dependencies
class MockDatabase:
    async def get_order(self, order_id: str) -> Dict:
        # Return test data instead of hitting real database
        return {
            "order_id": order_id,
            "status": "shipped",
            "items": ["Test Item 1", "Test Item 2"]
        }

# Test with mocked dependencies and model
async def test_order_lookup():
    test_deps = Dependencies(
        database=MockDatabase(),
        api_client=None,  # Not needed for this test
        user_id="test_user"
    )
    
    # Use TestModel to avoid API calls
    with order_agent.override(model=TestModel()):
        result = await order_agent.run(
            "What's the status of order ABC123?",
            deps=test_deps
        )
        
        # Assertions about the result
        assert "shipped" in result.data.lower()

# For more complex testing scenarios
async def custom_model_function(messages: list[ModelMessage], info: AgentInfo) -> ModelResponse:
    """Custom function that simulates model responses based on input."""
    user_message = messages[-1].content
    
    if "order" in user_message.lower():
        # Simulate the model calling our tool
        return ModelResponse(
            parts=[TextPart("I'll check that order for you.")],
            tool_calls=[{
                "tool_name": "get_order_details",
                "args": {"order_id": "ABC123"}
            }]
        )
    
    return ModelResponse(parts=[TextPart("How can I help you?")])

# Test with custom model behavior
async def test_complex_interaction():
    with order_agent.override(model=FunctionModel(custom_model_function)):
        result = await order_agent.run(
            "Check order ABC123",
            deps=test_deps
        )
        # Now we can test the full flow including tool calls
</code>

This separation makes it easy to test your agent logic without expensive LLM calls or external dependencies. You can test edge cases, error handling, and complex interactions with full control.

Figure 4: Pydantic AI Component Diagram – This diagram shows the key components of Pydantic AI and their relationships. Notice how Agents orchestrate the interaction between Models, Tools, and Dependencies, with validation happening at each boundary.

Testing and Monitoring AI Agents

Testing Strategies

Testing AI agents requires a different mindset than traditional software testing. You’re not just testing deterministic functions—you’re testing systems that interact with probabilistic models. Here’s how to do it right:

from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
from pydantic_ai.models.function import FunctionModel, AgentInfo
import pytest
from datetime import datetime

# 1. Unit Testing with TestModel
class TestCustomerSupportAgent:
    def setup_method(self):
        “””Set up test fixtures.”””
        self.agent = Agent(
            ‘openai:gpt-4o’,
            system_prompt=”You are a helpful customer support agent.”
        )
    
    def test_basic_response(self):
        “””Test that agent responds appropriately to basic queries.”””
        # TestModel returns predictable responses
        with self.agent.override(model=TestModel()):
            result = self.agent.run_sync(“Hello, I need help”)
            
            # Check that we got a response
            assert result.data is not None
            assert isinstance(result.data, str)
            assert len(result.data) > 0
    
    def test_tool_calling(self):
        “””Test that agent calls tools correctly.”””
        tool_called = False
        
        @self.agent.tool
        def check_order_status(order_id: str) -> str:
            nonlocal tool_called
            tool_called = True
            return f”Order {order_id} is shipped”
        
        # Custom model that always calls our tool
        async def model_function(messages, info):
            return ModelResponse(
                parts=[TextPart(“Let me check that order”)],
                tool_calls=[{“tool_name”: “check_order_status”, “args”: {“order_id”: “123”}}]
            )
        
        with self.agent.override(model=FunctionModel(model_function)):
            result = self.agent.run_sync(“Check order 123”)
            assert tool_called
            assert “shipped” in result.data

# 2. Integration Testing with Mocked Services
class TestIntegrationScenarios:
    @pytest.mark.asyncio
    async def test_multi_tool_workflow(self):
        “””Test complex workflows involving multiple tools.”””
        agent = Agent(
            ‘openai:gpt-4o’,
            deps_type=Dependencies
        )
        
        calls_made = []
        
        @agent.tool
        async def search_products(ctx: RunContext[Dependencies], query: str) -> List[Dict]:
            calls_made.append((‘search’, query))
            return [
                {“id”: “1”, “name”: “Product A”, “price”: 99.99},
                {“id”: “2”, “name”: “Product B”, “price”: 149.99}
            ]
        
        @agent.tool
        async def check_inventory(ctx: RunContext[Dependencies], product_id: str) -> bool:
            calls_made.append((‘inventory’, product_id))
            return True
        
        @agent.tool
        async def calculate_shipping(ctx: RunContext[Dependencies], product_id: str, zip_code: str) -> float:
            calls_made.append((‘shipping’, product_id, zip_code))
            return 9.99
        
        # Mock the model to execute a specific workflow
        async def workflow_model(messages, info):
            # This simulates the LLM orchestrating multiple tool calls
            return ModelResponse(
                parts=[TextPart(“I’ll help you find products and check shipping”)],
                tool_calls=[
                    {“tool_name”: “search_products”, “args”: {“query”: “laptop”}},
                    {“tool_name”: “check_inventory”, “args”: {“product_id”: “1”}},
                    {“tool_name”: “calculate_shipping”, “args”: {“product_id”: “1”, “zip_code”: “10001”}}
                ]
            )
        
        test_deps = Dependencies(
            database=None,
            api_client=None,
            user_id=”test”
        )
        
        with agent.override(model=FunctionModel(workflow_model)):
            result = await agent.run(
                “Find laptops and calculate shipping to 10001”,
                deps=test_deps
            )
        
        # Verify the workflow executed correctly
        assert len(calls_made) == 3
        assert calls_made[0][0] == ‘search’
        assert calls_made[1][0] == ‘inventory’
        assert calls_made[2][0] == ‘shipping’

# 3. End-to-End Testing with Recorded Responses
class TestEndToEnd:
    def test_customer_journey(self):
        “””Test a complete customer interaction journey.”””
        # For E2E tests, you might use recorded real LLM responses
        recorded_responses = {
            “greeting”: “Hello! How can I help you today?”,
            “order_query”: “I’ll check your order status right away.”,
            “followup”: “Is there anything else I can help you with?”
        }
        
        agent = Agent(‘openai:gpt-4o’)
        
        # Override with recorded responses
        response_index = 0
        def get_next_response(messages, info):
            nonlocal response_index
            responses = list(recorded_responses.values())
            response = responses[response_index % len(responses)]
            response_index += 1
            return ModelResponse(parts=[TextPart(response)])
        
        with agent.override(model=FunctionModel(get_next_response)):
            # Simulate customer journey
            response1 = agent.run_sync(“Hi”)
            assert “Hello” in response1.data
            
            response2 = agent.run_sync(“What’s my order status?”)
            assert “check” in response2.data
            
            response3 = agent.run_sync(“Thanks!”)
            assert “else” in response3.data
</code>

from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
from pydantic_ai.models.function import FunctionModel, AgentInfo
import pytest
from datetime import datetime

# 1. Unit Testing with TestModel
class TestCustomerSupportAgent:
    def setup_method(self):
        """Set up test fixtures."""
        self.agent = Agent(
            'openai:gpt-4o',
            system_prompt="You are a helpful customer support agent."
        )
    
    def test_basic_response(self):
        """Test that agent responds appropriately to basic queries."""
        # TestModel returns predictable responses
        with self.agent.override(model=TestModel()):
            result = self.agent.run_sync("Hello, I need help")
            
            # Check that we got a response
            assert result.data is not None
            assert isinstance(result.data, str)
            assert len(result.data) > 0
    
    def test_tool_calling(self):
        """Test that agent calls tools correctly."""
        tool_called = False
        
        @self.agent.tool
        def check_order_status(order_id: str) -> str:
            nonlocal tool_called
            tool_called = True
            return f"Order {order_id} is shipped"
        
        # Custom model that always calls our tool
        async def model_function(messages, info):
            return ModelResponse(
                parts=[TextPart("Let me check that order")],
                tool_calls=[{"tool_name": "check_order_status", "args": {"order_id": "123"}}]
            )
        
        with self.agent.override(model=FunctionModel(model_function)):
            result = self.agent.run_sync("Check order 123")
            assert tool_called
            assert "shipped" in result.data

# 2. Integration Testing with Mocked Services
class TestIntegrationScenarios:
    @pytest.mark.asyncio
    async def test_multi_tool_workflow(self):
        """Test complex workflows involving multiple tools."""
        agent = Agent(
            'openai:gpt-4o',
            deps_type=Dependencies
        )
        
        calls_made = []
        
        @agent.tool
        async def search_products(ctx: RunContext[Dependencies], query: str) -> List[Dict]:
            calls_made.append(('search', query))
            return [
                {"id": "1", "name": "Product A", "price": 99.99},
                {"id": "2", "name": "Product B", "price": 149.99}
            ]
        
        @agent.tool
        async def check_inventory(ctx: RunContext[Dependencies], product_id: str) -> bool:
            calls_made.append(('inventory', product_id))
            return True
        
        @agent.tool
        async def calculate_shipping(ctx: RunContext[Dependencies], product_id: str, zip_code: str) -> float:
            calls_made.append(('shipping', product_id, zip_code))
            return 9.99
        
        # Mock the model to execute a specific workflow
        async def workflow_model(messages, info):
            # This simulates the LLM orchestrating multiple tool calls
            return ModelResponse(
                parts=[TextPart("I'll help you find products and check shipping")],
                tool_calls=[
                    {"tool_name": "search_products", "args": {"query": "laptop"}},
                    {"tool_name": "check_inventory", "args": {"product_id": "1"}},
                    {"tool_name": "calculate_shipping", "args": {"product_id": "1", "zip_code": "10001"}}
                ]
            )
        
        test_deps = Dependencies(
            database=None,
            api_client=None,
            user_id="test"
        )
        
        with agent.override(model=FunctionModel(workflow_model)):
            result = await agent.run(
                "Find laptops and calculate shipping to 10001",
                deps=test_deps
            )
        
        # Verify the workflow executed correctly
        assert len(calls_made) == 3
        assert calls_made[0][0] == 'search'
        assert calls_made[1][0] == 'inventory'
        assert calls_made[2][0] == 'shipping'

# 3. End-to-End Testing with Recorded Responses
class TestEndToEnd:
    def test_customer_journey(self):
        """Test a complete customer interaction journey."""
        # For E2E tests, you might use recorded real LLM responses
        recorded_responses = {
            "greeting": "Hello! How can I help you today?",
            "order_query": "I'll check your order status right away.",
            "followup": "Is there anything else I can help you with?"
        }
        
        agent = Agent('openai:gpt-4o')
        
        # Override with recorded responses
        response_index = 0
        def get_next_response(messages, info):
            nonlocal response_index
            responses = list(recorded_responses.values())
            response = responses[response_index % len(responses)]
            response_index += 1
            return ModelResponse(parts=[TextPart(response)])
        
        with agent.override(model=FunctionModel(get_next_response)):
            # Simulate customer journey
            response1 = agent.run_sync("Hi")
            assert "Hello" in response1.data
            
            response2 = agent.run_sync("What's my order status?")
            assert "check" in response2.data
            
            response3 = agent.run_sync("Thanks!")
            assert "else" in response3.data
</code>

Monitoring and Observability

In production, you need to know what your agents are doing. Pydantic AI integrates beautifully with Pydantic Logfire for comprehensive monitoring:

import logfire
from pydantic_ai import Agent
from datetime import datetime
import json

# Configure Logfire for your application
logfire.configure()

# Create an instrumented agent
agent = Agent(
    ‘openai:gpt-4o’,
    system_prompt=’You are a helpful assistant.’,
    instrument=True  # Enable automatic instrumentation
)

# Custom metrics tracking
class AgentMetrics:
    def __init__(self):
        self.reset_daily_metrics()
    
    def reset_daily_metrics(self):
        self.metrics = {
            “total_requests”: 0,
            “successful_requests”: 0,
            “failed_requests”: 0,
            “tool_calls”: {},
            “response_times”: [],
            “token_usage”: {
                “prompt_tokens”: 0,
                “completion_tokens”: 0
            }
        }
    
    def track_request(self, duration: float, success: bool, tokens: Dict):
        self.metrics[“total_requests”] += 1
        if success:
            self.metrics[“successful_requests”] += 1
        else:
            self.metrics[“failed_requests”] += 1
        
        self.metrics[“response_times”].append(duration)
        self.metrics[“token_usage”][“prompt_tokens”] += tokens.get(“prompt_tokens”, 0)
        self.metrics[“token_usage”][“completion_tokens”] += tokens.get(“completion_tokens”, 0)
    
    def track_tool_call(self, tool_name: str):
        if tool_name not in self.metrics[“tool_calls”]:
            self.metrics[“tool_calls”][tool_name] = 0
        self.metrics[“tool_calls”][tool_name] += 1
    
    def get_summary(self) -> Dict:
        response_times = self.metrics[“response_times”]
        return {
            “total_requests”: self.metrics[“total_requests”],
            “success_rate”: self.metrics[“successful_requests”] / max(self.metrics[“total_requests”], 1),
            “avg_response_time”: sum(response_times) / len(response_times) if response_times else 0,
            “p95_response_time”: sorted(response_times)[int(len(response_times) * 0.95)] if response_times else 0,
            “tool_usage”: self.metrics[“tool_calls”],
            “token_usage”: self.metrics[“token_usage”],
            “estimated_cost”: self._estimate_cost()
        }
    
    def _estimate_cost(self) -> float:
        # Rough cost estimation (adjust based on your model)
        prompt_cost = 0.01 / 1000  # $0.01 per 1K tokens
        completion_cost = 0.03 / 1000  # $0.03 per 1K tokens
        
        return (
            self.metrics[“token_usage”][“prompt_tokens”] * prompt_cost +
            self.metrics[“token_usage”][“completion_tokens”] * completion_cost
        )

# Use in production with monitoring
metrics = AgentMetrics()

async def monitored_agent_call(query: str) -> Dict:
    start_time = datetime.now()
    
    try:
        # Log the request
        logfire.info(“Agent request started”, query=query)
        
        # Execute the agent
        result = await agent.run(query)
        
        # Track success
        duration = (datetime.now() – start_time).total_seconds()
        metrics.track_request(duration, True, result.usage)
        
        # Log successful completion
        logfire.info(
            “Agent request completed”,
            duration=duration,
            tokens_used=result.usage
        )
        
        return {
            “success”: True,
            “data”: result.data,
            “duration”: duration
        }
    
    except Exception as e:
        # Track failure
        duration = (datetime.now() – start_time).total_seconds()
        metrics.track_request(duration, False, {})
        
        # Log error with context
        logfire.error(
            “Agent request failed”,
            error=str(e),
            query=query,
            duration=duration
        )
        
        return {
            “success”: False,
            “error”: str(e),
            “duration”: duration
        }

# Periodic metrics reporting
async def report_metrics():
    summary = metrics.get_summary()
    logfire.info(“Agent metrics summary”, **summary)
    
    # Alert on concerning metrics
    if summary[“success_rate”] < 0.95:
        logfire.warning(“Low success rate detected”, success_rate=summary[“success_rate”])
    
    if summary[“estimated_cost”] > 100:  # $100
        logfire.warning(“High token usage cost”, cost=summary[“estimated_cost”])
</code>

import logfire
from pydantic_ai import Agent
from datetime import datetime
import json

# Configure Logfire for your application
logfire.configure()

# Create an instrumented agent
agent = Agent(
    'openai:gpt-4o',
    system_prompt='You are a helpful assistant.',
    instrument=True  # Enable automatic instrumentation
)

# Custom metrics tracking
class AgentMetrics:
    def __init__(self):
        self.reset_daily_metrics()
    
    def reset_daily_metrics(self):
        self.metrics = {
            "total_requests": 0,
            "successful_requests": 0,
            "failed_requests": 0,
            "tool_calls": {},
            "response_times": [],
            "token_usage": {
                "prompt_tokens": 0,
                "completion_tokens": 0
            }
        }
    
    def track_request(self, duration: float, success: bool, tokens: Dict):
        self.metrics["total_requests"] += 1
        if success:
            self.metrics["successful_requests"] += 1
        else:
            self.metrics["failed_requests"] += 1
        
        self.metrics["response_times"].append(duration)
        self.metrics["token_usage"]["prompt_tokens"] += tokens.get("prompt_tokens", 0)
        self.metrics["token_usage"]["completion_tokens"] += tokens.get("completion_tokens", 0)
    
    def track_tool_call(self, tool_name: str):
        if tool_name not in self.metrics["tool_calls"]:
            self.metrics["tool_calls"][tool_name] = 0
        self.metrics["tool_calls"][tool_name] += 1
    
    def get_summary(self) -> Dict:
        response_times = self.metrics["response_times"]
        return {
            "total_requests": self.metrics["total_requests"],
            "success_rate": self.metrics["successful_requests"] / max(self.metrics["total_requests"], 1),
            "avg_response_time": sum(response_times) / len(response_times) if response_times else 0,
            "p95_response_time": sorted(response_times)[int(len(response_times) * 0.95)] if response_times else 0,
            "tool_usage": self.metrics["tool_calls"],
            "token_usage": self.metrics["token_usage"],
            "estimated_cost": self._estimate_cost()
        }
    
    def _estimate_cost(self) -> float:
        # Rough cost estimation (adjust based on your model)
        prompt_cost = 0.01 / 1000  # $0.01 per 1K tokens
        completion_cost = 0.03 / 1000  # $0.03 per 1K tokens
        
        return (
            self.metrics["token_usage"]["prompt_tokens"] * prompt_cost +
            self.metrics["token_usage"]["completion_tokens"] * completion_cost
        )

# Use in production with monitoring
metrics = AgentMetrics()

async def monitored_agent_call(query: str) -> Dict:
    start_time = datetime.now()
    
    try:
        # Log the request
        logfire.info("Agent request started", query=query)
        
        # Execute the agent
        result = await agent.run(query)
        
        # Track success
        duration = (datetime.now() - start_time).total_seconds()
        metrics.track_request(duration, True, result.usage)
        
        # Log successful completion
        logfire.info(
            "Agent request completed",
            duration=duration,
            tokens_used=result.usage
        )
        
        return {
            "success": True,
            "data": result.data,
            "duration": duration
        }
    
    except Exception as e:
        # Track failure
        duration = (datetime.now() - start_time).total_seconds()
        metrics.track_request(duration, False, {})
        
        # Log error with context
        logfire.error(
            "Agent request failed",
            error=str(e),
            query=query,
            duration=duration
        )
        
        return {
            "success": False,
            "error": str(e),
            "duration": duration
        }

# Periodic metrics reporting
async def report_metrics():
    summary = metrics.get_summary()
    logfire.info("Agent metrics summary", **summary)
    
    # Alert on concerning metrics
    if summary["success_rate"] < 0.95:
        logfire.warning("Low success rate detected", success_rate=summary["success_rate"])
    
    if summary["estimated_cost"] > 100:  # $100
        logfire.warning("High token usage cost", cost=summary["estimated_cost"])
</code>

Key metrics to monitor in production:

Token usage and costs: Track spending and identify expensive queries
Response times: Monitor latency and identify performance issues
Tool execution patterns: Understand which tools are used most
Error rates and types: Catch issues before they impact users
Conversation flows: Analyze how users interact with your agent
Validation failures: Identify when LLM outputs don’t match schemas

Practical Applications

Let’s look at how organizations are using these patterns in the real world:

E-commerce Automation

E-commerce companies are leveraging Pydantic AI agents to create sophisticated automation systems that handle everything from customer inquiries to order management:

# Customer Support RAG Agent
class ProductKnowledge(BaseModel):
    product_id: str
    features: List[str]
    price: float
    availability: bool
    similar_products: List[str]

support_agent = Agent(
    'openai:gpt-4o',
    result_type=ProductKnowledge,
    system_prompt="""You are an e-commerce support specialist. Use the product database 
    to answer customer questions accurately and suggest alternatives when needed."""
)

@support_agent.tool
async def search_products(query: str) -> List[Dict]:
    # RAG implementation to search product database
    results = await vector_store.search(query, top_k=5)
    return [doc.to_dict() for doc in results]

# Order Management Agent
order_agent = Agent(
    'openai:gpt-4o',
    deps_type=OrderSystemDeps,
    system_prompt="You help customers manage their orders, including updates and returns."
)

@order_agent.tool
async def update_shipping_address(ctx: RunContext[OrderSystemDeps], order_id: str, new_address: str) -> bool:
    # Validate order status allows address change
    order = await ctx.deps.db.get_order(order_id)
    if order.status not in ['pending', 'processing']:
        raise ValueError("Cannot update address after order ships")
    
    # Update address
    return await ctx.deps.db.update_order_address(order_id, new_address)
</code>

The results speak for themselves:

Response times reduced from hours to seconds
24/7 availability without staffing increases
Consistent application of business rules
87% of inquiries resolved without human intervention

Research Assistant Systems

Research organizations are building sophisticated agents that can gather, analyze, and synthesize information from multiple sources:

# Multi-stage research workflow
research_orchestrator = OrchestratorSystem(
    orchestrator_prompt="""Break down this research question into specific sub-questions 
    that can be investigated independently. Consider:
    - What information needs to be gathered?
    - What sources should be consulted?
    - What analysis is required?
    - How should findings be synthesized?""",
    worker_prompts={
        "literature_search": "Search academic literature for relevant papers on the given topic.",
        "data_extraction": "Extract key findings and data from the provided sources.",
        "statistical_analysis": "Perform statistical analysis on the extracted data.",
        "synthesis": "Synthesize findings into a coherent narrative with citations.",
        "fact_checking": "Verify claims and check for contradictions in the findings."
    }
)

# Example usage for research task
async def conduct_research(topic: str) -> ResearchReport:
    # The orchestrator dynamically creates a research plan
    result = await research_orchestrator.process_task(
        f"Research the effectiveness of {topic} including recent studies and meta-analyses",
        context={
            "output_format": "academic_paper",
            "citation_style": "APA",
            "max_sources": 50
        }
    )
    
    return ResearchReport(
        topic=topic,
        sections=result["results"],
        synthesis=result["final_result"],
        sources=extract_sources(result)
    )
</code>

These systems have transformed research workflows by:

Reducing literature review time by 75%
Identifying connections between disparate sources
Maintaining consistent citation formats
Enabling researchers to focus on analysis rather than data gathering

Benefits and Challenges

Advantages of AI Agent Frameworks

Working with Pydantic AI and structured agent patterns offers compelling benefits:

Maintainability: Type-driven design makes code self-documenting and easier to understand. When you see a Pydantic model, you know exactly what data flows through your system.
Reliability: Validation catches errors before they cause problems. Instead of hoping the LLM returns the right format, you guarantee it.
Flexibility: The pattern-based approach means you can start simple and add complexity as needed. Not every problem needs an orchestrator-workers setup.
Testability: Dependency injection and model overrides make testing straightforward. You can test complex workflows without burning through API credits.
Performance: Patterns like parallelization can dramatically improve response times. Why wait for sequential API calls when you can parallelize?

Limitations and Challenges

Let’s be honest about the challenges too:

Learning Curve: If you’re coming from simple prompt engineering, the type-driven approach requires a mindset shift. It’s worth it, but there’s definitely a ramp-up period.
Debugging Complexity: When an agent with multiple patterns misbehaves, tracking down the issue can be like detective work. Good logging is essential.
Latency Considerations: Complex patterns like evaluator-optimizer can require multiple LLM calls. For real-time applications, you need to balance sophistication with speed.
Cost Management: More sophisticated patterns often mean more API calls. That evaluator-optimizer loop can get expensive if you’re not careful with limits.

Future Directions

Emerging Trends

The field of AI agent frameworks is evolving rapidly. Here’s what’s on the horizon:

Standardization Efforts: We’re seeing movement toward industry standards for agent patterns and interfaces. This will make it easier to share components and best practices.
Integration with RAG Systems: Tighter coupling between agents and retrieval systems is coming, making it easier to build knowledge-grounded agents.
Multi-Modal Agents: As vision and audio models improve, expect agent frameworks to handle more than just text.
Autonomous Agents: While current agents require significant hand-holding, future versions will have more autonomy while maintaining safety constraints.

Practical Takeaways

If you’re ready to level up your AI agent development, here are my recommendations:

Start with clear requirements: Before jumping into code, map out what your agent needs to do. Which patterns fit your use case?
Choose the right pattern: Don’t use orchestrator-workers for a simple Q&A bot. Match the pattern complexity to your problem complexity.
Implement robust testing: Use TestModel and FunctionModel from day one. Testing after the fact is painful with agents.
Monitor everything: Set up comprehensive logging and metrics. You’ll thank yourself when debugging production issues.
Iterate incrementally: Start with a simple pattern and add sophistication based on real user needs. Premature optimization is still the root of all evil.

Conclusion

AI agent frameworks represent a crucial evolution in how we build LLM-powered applications. By implementing Anthropic’s blueprint patterns with Pydantic AI, we can create systems that are both powerful and reliable—no longer do we have to choose between flexibility and predictability.

The pattern-based approach provides a mental model for thinking about agent design. Instead of asking “what prompt should I use?”, we can ask “what pattern fits this problem?” This shift from prompt engineering to agent engineering is fundamental to building production-ready AI systems.

As we continue to push the boundaries of what’s possible with LLMs, frameworks like Pydantic AI will become increasingly important. They provide the structure and guardrails that allow us to build complex systems with confidence. The future of AI development isn’t just about better models—it’s about better ways to harness those models reliably and effectively.

References

[1] Anthropic. (2024). “Building Effective AI Agents: A Blueprint.” Anthropic Research Blog.

[2] Pydantic. (2024). Pydantic AI Documentation. Retrieved from https://ai.pydantic.dev

[3] Colvin, S. (2025). “Pydantic AI: An Agent Framework for Building GenAI Applications.” Pydantic Official Blog.

[4] Layton, D. (2025). “Pydantic AI Agents Made Simpler.” LinkedIn Pulse.

[5] Gupta, A. (2025). “Technical Benefits of Pydantic AI for Implementing AI Agent Patterns.” ProjectPro.

[6] Mittal, S. (2025). “Pydantic AI vs Other Agent Frameworks: A Comparative Analysis.” AI Framework Reviews.

[7] Chen, L. (2025). “Best Practices for Reliable and Maintainable AI Agent Systems.” Logfire Documentation.

[8] Pydantic. (2025). “Testing and Evaluation in Pydantic AI.” Retrieved from https://ai.pydantic.dev/testing-evals/

[9] Pydantic. (2025). “Logfire Integration for Monitoring.” Retrieved from https://ai.pydantic.dev/logfire/

[10] Saptak, N. (2025). “Building Powerful AI Agents with Pydantic AI and MCP Servers.” AI Engineering Blog.