Why contract drift matters

Production agents depend on contracts you do not own — OpenAPI specs, REST payloads, and MCP tools/list catalogs. When those contracts change, agents fail in ways uptime monitoring never sees.

This is contract drift: structural changes to an external API or tool surface. It is not infrastructure drift (Terraform plans) or generic HTTP availability. DriftGuard watches the contract itself.

Silent tool failures

Vendor endpoints often return 200 OK while the response shape changes:

An MCP server removes search_issues but keeps serving tools/list
A REST API drops a field your agent prompt assumes is always present
inputSchema tightens — required properties appear without a version bump

Your agent retries with the old tool name or parses empty fields. Logs show “model confusion,” not “vendor changed the contract.”

Retry spirals

When a tool call fails, orchestrators and agents often retry — sometimes with backoff, sometimes in a tight loop:

Agent calls removed MCP tool → transport or validation error
Framework retries the same tool three to ten times
Token spend spikes; user sees a hung or nonsensical reply
FuseGuard-style loop detection trips only after damage accumulates

Blocking before the run — with preflight status and policy — stops the spiral at step one. See Pre-run check (preflight).

Uptime is not enough

Signal	What it catches	What it misses
HTTP synthetic checks	Outages, TLS errors, 5xx	Tool removed from MCP catalog; field dropped from JSON
Deploy-time fixture diff	Changes you ship in CI	Runtime catalog drift between deploys
Manual changelog review	Announced breaking releases	Silent MCP server updates; undocumented schema tweaks
Contract watches	Scheduled snapshot diff vs baseline	—

At agent speed, drift is inevitable

Teams add MCP servers weekly. Vendors ship API changes without semver discipline. CI that only diffs checked-in OpenAPI files cannot see live tools/list responses.

DriftGuard closes the loop:

Detection — watches on MCP, OpenAPI, and raw JSON schemas
Status plane — drift_status per watch and portfolio preflight
Structured actions — agentAction hints on every breaking change
Policy gates — block new runs, route alerts, draft remediation PRs

Architecture map: Five-layer map.

Who feels this first

Platform / SRE — incident volume from “agent flakiness” without a vendor root cause
AI / agent engineers — debugging tool schemas from production traces
Security / governance — unmonitored third-party surfaces in the agent graph

Use-case pages: MCP agents, Vendor APIs, On-call.

Start without a full rollout

Local proof — compare_json or MCP offline diff (Local-first setup)
One watch — trial on the highest-risk vendor or MCP server (First watch)
CI coverage — fail PRs that add unwatched dependencies (GitHub Actions)
Orchestrator gate — POST /api/preflight before agent runs

Comparison table — manual vs JSON diff vs DriftGuard
Breaking vs warning change records
Adopt paths (OSS)
Start free trial