Production AI Systems, Not Prototypes

Dotzlaw Consulting audits business workflows, identifies where Claude and agentic AI deliver the highest ROI, and builds the systems ourselves. Every engagement ships against real data.

Our methodology is documented across 74 technical articles in 7 series on this site, so prospective clients can evaluate the depth of the work before the first conversation.

Contact us →

How We Engage

Three tiers, each a concrete deliverable with a fixed scope and a written statement of work. Most clients start with a Sprint, advance to an Audit when the opportunity set justifies it, and commission a Pilot once the target workflow is clear. Investment is discussed during scoping.

Start here

AI Opportunity Sprint

Approximately 2 weeks

A short diagnostic engagement for leaders who know AI should be part of operations but don't yet know where it delivers the highest ROI. Two weeks inside your business; a written recommendation you can act on with or without us.

Deliverables
  • Readiness scorecard across data, infrastructure, team skills, and workflow maturity
  • Top five ranked AI opportunities with effort, cost, and payback period estimates
  • ROI estimates grounded in your actual workflow economics, not industry benchmarks
  • 90-day roadmap with go / no-go decision points

You can take this document and build the systems yourself. Some clients do. Most conclude the scorecard alone is worth the engagement and move to an Audit for the highest-ranked opportunity.

AI Workflow Audit

Approximately 4 to 6 weeks

A deeper engagement for clients who have identified the target workflow and need a production-ready architecture before committing to a full build. Includes everything in the Sprint plus direct engagement with the operational reality of the target process.

Deliverables
  • Everything in the Sprint
  • Six to ten stakeholder interviews across the workflow
  • Three to five detailed workflow maps documenting current-state and target-state operations
  • Architecture recommendations with specific technology choices, cost models, and integration requirements
  • One working prototype against your actual data, built to demonstrate the approach before you commit to a full Pilot

The prototype is not a demo. It runs against your real data, produces real outputs, and exposes the specific integration challenges a full build would need to address. Most clients who complete an Audit commission the Pilot.

Modernization Pilot

Approximately 8 to 12 weeks

A production build engagement that ships a deployed system your team owns after we leave. Fixed scope against the written specification from the Audit phase.

Deliverables
  • Everything in the Audit
  • Production deployment against your real data at your real scale
  • Custom agent and skill library specific to your domain vocabulary and business rules
  • Team training on the architecture, the prompts, the guardrails, and the operational runbook
  • For agentic builds: self-improving workflow infrastructure that captures patterns from its own failures and feeds them back as new rules

Every Pilot ships with the full codebase, every post-processing fix and prompt rule that went into making it reliable, and the runbook a junior engineer on your team would need to maintain the system after we leave.

Core Capabilities

The technical depth we draw on inside every engagement. We don't sell capabilities a la carte. We sell outcomes, and these are the disciplines we bring to deliver them.

Agentic AI Systems

We design and ship multi-agent systems built for your domain. Specialized agents with file ownership boundaries, deterministic guardrails enforced by hooks rather than prompts, and self-improving workflows that capture patterns from their own failures and turn them back into rules.

Every system we ship is built around the work you actually do, not generic agent templates. We deliver the agent library, the skill library, the hooks that keep them safe, and the runbook your team needs to maintain them.

Validated across 3 production migrations with 10/10 OWASP Agentic Top 10 coverage and an A- grade in independent review.

Harness Engineering

We build the harness layer that makes agentic systems reliable in production. PreToolUse safety gates with destructive-command pattern detection, PostToolUse quality enforcement (ruff, tsc, Biome, artifact validation), Stop hooks for full test-suite gating, and additionalContext patterns that feed agent self-correction.

The harness is what turns sometimes-works agents into systems with documented, predictable behavior. We apply a clear evolution principle: every rule that needs near-100% compliance gets promoted from a CLAUDE.md instruction to a hook.

Hook-enforced patterns achieve 100% compliance compared to roughly 90% from CLAUDE.md instructions alone, validated across 3 production migrations.

Natural Language Business Intelligence

We build natural language interfaces to your existing data. Plain English questions in, multi-card BI dashboards out, against your real database at your real scale. Two delivery models: drop into your existing Metabase stack as an accelerator, or embed as native React components inside a product you already sell.

The reliability layer is what makes it production: refined prompts, T-SQL generation rules, deterministic post-processing fixes, and a self-correction loop that catches the rare failure before it reaches the user.

100% SQL success rate across 10+ query categories on a 90.5M-row production database, 6 of 6 dashboard cards rendered every query.

Agent Security

We harden AI agent systems before they reach production. OWASP Agentic Top 10 coverage, adversarial red-team and blue-team exercises, defense-in-depth architecture, and per-archetype security configurations across project types. Information asymmetry enforced by hooks, not prompts.

Every agentic system we ship runs against real data with real consequences for failure, so security is a constraint on every architectural decision from day one, not a phase near the end.

Adversarial Agent Testing Platform reduced attack success rate from 65% to 47% across two rounds. Key finding: per-vulnerability patching hits a ceiling; architectural remediation is necessary.

Document Retrieval & RAG

We build retrieval pipelines that ground AI in your actual documents. Markdown, HTML, PDF, DOCX, all ingested into a single retrieval layer with hybrid vector and BM25 keyword search, cross-encoder re-ranking, and source attribution so every answer links back to its source.

Built to ingest what you already have rather than demanding a new content taxonomy. Embeds inside any host application as a pure backend integration, with a single iframe abstraction that renders every format without frontend changes.

Multi-format document platform: 51,000+ chunks across 669 documents, hybrid retrieval finishes in 650ms, end-to-end query time under 3 seconds.

Knowledge Base Setup for Agent Teams

We set up a structured wiki knowledge base that your engineering team and your agents both read. Lifecycle hooks capture session transcripts into daily logs; Python tooling compiles those logs into atomic concept files, lints for broken wikilinks and stale entries, and ingests external sources on demand.

Built to compound: every engineering conversation becomes a concept the next session can read. Complementary to Document Retrieval & RAG. RAG handles end-user retrieval over your existing documents; the KB captures internal knowledge for your engineers and agents, so it survives turnover.

1,000-note vault with 2,757 auto-generated bidirectional links, 5,000 searchable chunks, $1.50 total pipeline cost on the reference implementation. Same system that runs the engineering KB on this site.

Web Automation at Scale

We build production-grade Playwright automation for workloads that need to run reliably, cheaply, and without supervision. Self-healing pipelines that detect their own failures and report them back, with autonomous agent-built connectors for new sources.

The same infrastructure patterns apply to other multi-source data collection, enrichment, and monitoring workloads where reliability and per-execution cost matter more than raw speed.

127 production retrievers across 11 platforms processing 58,807 jobs weekly at $5.04 per run. Autonomous agent-built connectors at $0.72 each with 100% build success rate across 69 AI-generated retrievers.

Legacy Modernization and Servoy AI

We modernize legacy enterprise codebases with AI-assisted refactoring. Platform migrations, developer-workflow transformation, AI-powered code conversion, and the regression-testing infrastructure that keeps the migration on track.

Special depth in Servoy: one of the most experienced Servoy practitioners globally, with 20 years of platform-specific expertise spanning Classic Smart Client through modern NG Titanium, 30+ published tutorials establishing dotzlaw.com as a global resource for the platform, and end-to-end AI integration via the Servoy AI Runtime Plugin.

Titanium platform migration demonstrated on a 3,000+ form, 60,000+ component codebase. 70% reduction in technical debt through AI-powered CSS-to-Bootstrap conversion, component migration, form layout modernization, type conversion, and event handler recovery.

Claude Code Infrastructure for Teams

We set up your engineering team's Claude Code toolchain so your own engineers ship faster. Specialized agents with file-ownership boundaries, deterministic hooks (PreToolUse safety gates, PostToolUse quality enforcement, Stop hooks), progressive-disclosure skills, slash commands for routine ops, and CLAUDE.md conventions calibrated to your codebase.

The defining piece is the self-improving loop: reviewer agents harvest patterns from completed work, a skill-builder structures them into reusable skills, and one engineer's hard-earned pattern becomes the whole team's default. Distinct from Agentic AI Systems (AI that ships work for your end-users) and Harness Engineering (the production reliability layer). This is the toolchain your engineering team uses day-to-day.

Validated across 3 production migrations with 0 file conflicts across 18 sessions. 17 skills and 17 hook templates auto-generated per project. Hook-enforced patterns achieve 100% compliance compared to ~90% from prompt instructions alone.

Training

Hands-on workshops grounded in production patterns. We've shipped each of these systems ourselves, so the workshops draw from real reliability work and real failures rather than vendor demos.

Claude Code

Agents with file ownership boundaries, progressive-disclosure skills, deterministic hooks, slash commands, and the Bootstrap Framework methodology for spinning up new project infrastructure in 30 to 55 minutes.

Claude Co-Work

Multi-agent orchestration patterns, harness engineering for reliable runs, and team-wide adoption strategies that prevent the usual fragmentation when ten developers each invent their own conventions.

Skills, Agents, and Workflow Development

Authoring skills, building specialized agents, and chaining them into reliable workflows. Validating outputs, managing a growing skill library, and capturing domain knowledge as reusable infrastructure rather than undocumented knowledge.

GitHub Copilot Workflows

Multi-agent Copilot pipelines: research → architect → developer → reviewer, knowledge harvest workflows where reviewers feed business rules back as new skills, and JIRA-integrated handoff documentation.

Harness Engineering

Deterministic guardrails for production agentic systems: PreToolUse safety gates, PostToolUse quality enforcement, Stop hooks, additionalContext self-correction patterns. The harness evolution principle in practice.

Wiki and Knowledge-Base Patterns

Capturing institutional knowledge in a form agents can use: Obsidian-based wikis with auto-linked notes, progressive-disclosure skill architecture, three-tier loading patterns, and knowledge-harvest workflows that capture patterns from reviewer feedback.

Formats

  • Half-day intensive (single team, focused topic)
  • Two-day workshop (small group, hands-on labs)
  • Async cohort (multi-week, async plus live office hours)
  • Custom engagement (we co-design with your team)

Case Studies

Four representative production systems we've shipped. Every metric below comes from the live system documented in the linked project write-up.

Text-to-SQL: Native React Dashboards

Same AI pipeline as the Metabase variant, rendered with three npm packages (ECharts, AG Grid, Leaflet) instead of a 493 MB Java BI server. Embeds inside a host React app as components with no SDK lock or Java runtime.

  • 100% SQL success rate across 10+ query categories
  • 90,544,836 rows, 48 tables, 561 columns
  • 12-25 second end-to-end dashboard generation
Read the full project →

Text-to-SQL: Printable Reports

The same pipeline aimed at a printable Apache Velocity PDF. Plain English description in, multi-section PDF report out in under 60 seconds, with a vision-aware feedback chat that accepts screenshots for layout fixes.

  • Up to 8 sections per report, all bound to live SQL
  • Sub-2-second repeat preview from a saved report
  • 7 page sizes and orientations supported
Read the full project →

Autonomous Job Market Intelligence

127 production Playwright retrievers across 11 ATS platforms processing 58,807 jobs weekly. Self-healing pipeline where autonomous agents build new retrievers as ATS platforms change.

  • $5.04 per weekly run, $0.000086 per processed job
  • 69 AI-generated retrievers at $0.72 each with 100% build success
  • Approximately 80% self-healing repair rate
Read the full project →

RAG Document Assistant

Multi-repository document platform ingesting 51,000+ chunks across four file formats. Hybrid vector and BM25 search with cross-encoder re-ranking returns sourced answers in under 3 seconds.

  • 51,000+ chunks across 669 documents in 4 repositories
  • 650ms retrieval pipeline, sub-3-second end-to-end
  • Zero frontend changes between single-repo and multi-repo deployment
Read the full project →

Frequently Asked Questions

Common questions from prospective clients before scoping a Sprint. Click a question to expand.

Who is Dotzlaw best for? +

Small to mid-market and growth-stage companies (typically 10-500 employees) that know AI should be part of their operations but don't yet have an internal framework for prioritization. Especially strong fit for teams with substantial legacy infrastructure where bolt-on AI delivers more value than a greenfield rebuild.

What does a Sprint actually deliver? +

A 2-week diagnostic engagement that produces a readiness scorecard, the top five ranked AI opportunities, ROI estimates grounded in your actual workflow economics, and a 90-day implementation roadmap. You can take the document and build the systems yourself, or move into an Audit for the highest-ranked opportunity.

Why three tiers? +

Sprint, Audit, and Pilot map onto Discovery → Implementation → Partnership. The Sprint frames the opportunity space, the Audit produces a production-ready architecture with a working prototype against your real data, and the Pilot ships the production system with team training. Most clients move through the sequence rather than jumping straight to a full build.

Do you work with clients outside Canada? +

Yes. Engagements run remote by default and we've delivered for clients across North America. In-person work is possible when geography and the engagement format align.

What is the typical tech stack? +

Python and FastAPI on the backend, React and TypeScript on the frontend, Anthropic Claude for AI orchestration, Qdrant for vector search, and PostgreSQL or MS SQL Server for relational data. We adapt to whatever you already have, including legacy stacks like Servoy and .NET.

What happens after the Audit — am I locked into a Pilot? +

No. The Audit produces a written specification and a working prototype against your real data; what you do with them is your choice. Some clients commission us to build the Pilot, some take the specification and build it internally, and some pause and revisit later.

How is this different from a Big Four AI consulting engagement? +

Big Four engagements typically sell strategy and partner with a separate firm for implementation. We do both, and our methodology is documented across a public technical article series so you can evaluate the depth of the work before the first conversation. The team is smaller and the engineering is hands-on rather than abstracted through delivery partners.

Can you work within our existing infrastructure? +

Yes. Every system we ship is designed bolt-on first: read-only connections to your databases, embeddable as a React component or iframe, no vendor lock-in. The Text-to-SQL system, for example, runs as a sidecar service against an unmodified production SQL Server.

What about IP from the published articles? +

You approve what is public. We write a technical article documenting the architecture and methodology of every Pilot engagement, with you reviewing the draft before publication. We never publish proprietary business logic, customer data, or anything you flag as confidential.

What about ongoing maintenance after the Pilot? +

Every Pilot ships with a runbook a junior engineer on your team can use to maintain the system. Beyond the Pilot, an ongoing Partnership tier is available for continuous optimization as the AI landscape evolves; this is not currently a standard menu offering but available on request.

Start a Conversation

The fastest way to find out if we're a fit is to send a short description of what you're trying to build and what you've tried so far. We read every message personally and reply within two business days with either a scoped proposal or an honest recommendation to look elsewhere.

Contact us →