Why contract drift matters

Production agents depend on contracts you do not own — OpenAPI specs, REST payloads, and MCP tools/list catalogs. When those contracts change, agents fail in ways uptime monitoring never sees.

This is contract drift: structural changes to an external API or tool surface. It is not infrastructure drift (Terraform plans) or generic HTTP availability. DriftGuard watches the contract itself.

Silent tool failures

Vendor endpoints often return 200 OK while the response shape changes:

  • An MCP server removes search_issues but keeps serving tools/list
  • A REST API drops a field your agent prompt assumes is always present
  • inputSchema tightens — required properties appear without a version bump

Your agent retries with the old tool name or parses empty fields. Logs show “model confusion,” not “vendor changed the contract.”

Retry spirals

When a tool call fails, orchestrators and agents often retry — sometimes with backoff, sometimes in a tight loop:

  1. Agent calls removed MCP tool → transport or validation error
  2. Framework retries the same tool three to ten times
  3. Token spend spikes; user sees a hung or nonsensical reply
  4. FuseGuard-style loop detection trips only after damage accumulates

Blocking before the run — with preflight status and policy — stops the spiral at step one. See Pre-run check (preflight).

Uptime is not enough

Signal What it catches What it misses
HTTP synthetic checks Outages, TLS errors, 5xx Tool removed from MCP catalog; field dropped from JSON
Deploy-time fixture diff Changes you ship in CI Runtime catalog drift between deploys
Manual changelog review Announced breaking releases Silent MCP server updates; undocumented schema tweaks
Contract watches Scheduled snapshot diff vs baseline —

At agent speed, drift is inevitable

Teams add MCP servers weekly. Vendors ship API changes without semver discipline. CI that only diffs checked-in OpenAPI files cannot see live tools/list responses.

DriftGuard closes the loop:

  • Detection — watches on MCP, OpenAPI, and raw JSON schemas
  • Status plane — drift_status per watch and portfolio preflight
  • Structured actions — agentAction hints on every breaking change
  • Policy gates — block new runs, route alerts, draft remediation PRs

Architecture map: Five-layer map.

Who feels this first

  • Platform / SRE — incident volume from “agent flakiness” without a vendor root cause
  • AI / agent engineers — debugging tool schemas from production traces
  • Security / governance — unmonitored third-party surfaces in the agent graph

Use-case pages: MCP agents, Vendor APIs, On-call.

Start without a full rollout

  1. Local proof — compare_json or MCP offline diff (Local-first setup)
  2. One watch — trial on the highest-risk vendor or MCP server (First watch)
  3. CI coverage — fail PRs that add unwatched dependencies (GitHub Actions)
  4. Orchestrator gate — POST /api/preflight before agent runs

Related

  • Comparison table — manual vs JSON diff vs DriftGuard
  • Breaking vs warning change records
  • Adopt paths (OSS)
  • Start free trial