Tool-use agent · production
Production LLM agent with prompt caching, tool routing, structured output, per-tenant billing meters, and a reviewer loop that catches bad calls before they ship. Vendor-agnostic abstraction over multiple model providers.
The problem
The team needed an LLM agent that handled real production traffic across multiple tenants. Earlier prototypes lacked structured output validation, had no per-tenant billing meters, and would happily call the wrong tool with the wrong arguments when prompts drifted. There was no reviewer loop to catch bad calls before they shipped customer-facing actions. Cost was unpredictable and a tenant outage could starve the rest.
The approach
We built a tool-use agent with prompt caching, typed tool catalogs, structured-output schemas validated on every call, and a vendor-agnostic abstraction over the major model providers. A reviewer loop runs a smaller deterministic check on each tool call before it ships; failures route to a dead-letter queue with full trace context. Per-tenant billing meters track token usage, tool calls, and resolution outcomes. Provider swaps are a config change, not a rewrite.
Stack and engineering choices
- TypeScript agent runtime
- Postgres trace store
- Vendor-agnostic model abstraction
- Typed tool catalogs
- Structured-output JSON schemas
- Reviewer loop with rollback
- Per-tenant billing meters
Outcome
The agent runs in production across multiple tenants with predictable latency and cost. Bad tool calls are caught at the reviewer stage instead of in customer-facing systems. Switching model providers is a config change. Per-tenant billing is accurate enough to invoice from.
See more production AI agent work at quadevs across other engagements with similar shape.
Have a project that overlaps this work?
Send a one-paragraph brief. We reply within one business day.
hello@quadevs.com