Why contract drift matters
Production agents depend on contracts you do not own — OpenAPI specs, REST payloads, and MCP tools/list catalogs. When those contracts change, agents fail in ways uptime monitoring never sees.
This is contract drift: structural changes to an external API or tool surface. It is not infrastructure drift (Terraform plans) or generic HTTP availability. DriftGuard watches the contract itself.
Silent tool failures
Vendor endpoints often return 200 OK while the response shape changes:
- An MCP server removes
search_issuesbut keeps servingtools/list - A REST API drops a field your agent prompt assumes is always present
inputSchematightens — required properties appear without a version bump
Your agent retries with the old tool name or parses empty fields. Logs show “model confusion,” not “vendor changed the contract.”
Retry spirals
When a tool call fails, orchestrators and agents often retry — sometimes with backoff, sometimes in a tight loop:
- Agent calls removed MCP tool → transport or validation error
- Framework retries the same tool three to ten times
- Token spend spikes; user sees a hung or nonsensical reply
- FuseGuard-style loop detection trips only after damage accumulates
Blocking before the run — with preflight status and policy — stops the spiral at step one. See Pre-run check (preflight).
Uptime is not enough
| Signal | What it catches | What it misses |
|---|---|---|
| HTTP synthetic checks | Outages, TLS errors, 5xx | Tool removed from MCP catalog; field dropped from JSON |
| Deploy-time fixture diff | Changes you ship in CI | Runtime catalog drift between deploys |
| Manual changelog review | Announced breaking releases | Silent MCP server updates; undocumented schema tweaks |
| Contract watches | Scheduled snapshot diff vs baseline | — |
At agent speed, drift is inevitable
Teams add MCP servers weekly. Vendors ship API changes without semver discipline. CI that only diffs checked-in OpenAPI files cannot see live tools/list responses.
DriftGuard closes the loop:
- Detection — watches on MCP, OpenAPI, and raw JSON schemas
- Status plane —
drift_statusper watch and portfolio preflight - Structured actions —
agentActionhints on every breaking change - Policy gates — block new runs, route alerts, draft remediation PRs
Architecture map: Five-layer map.
Who feels this first
- Platform / SRE — incident volume from “agent flakiness” without a vendor root cause
- AI / agent engineers — debugging tool schemas from production traces
- Security / governance — unmonitored third-party surfaces in the agent graph
Use-case pages: MCP agents, Vendor APIs, On-call.
Start without a full rollout
- Local proof —
compare_jsonor MCP offline diff (Local-first setup) - One watch — trial on the highest-risk vendor or MCP server (First watch)
- CI coverage — fail PRs that add unwatched dependencies (GitHub Actions)
- Orchestrator gate —
POST /api/preflightbefore agent runs
Related
- Comparison table — manual vs JSON diff vs DriftGuard
- Breaking vs warning change records
- Adopt paths (OSS)
- Start free trial