Guides

Agents & MCP

Agents burn budgets because they can't see their own limits — they discover a rate limit by hitting it, then retry into the same wall. RateGuard inverts that: it exposes limits as MCP tools an agent can query before acting. The agent asks "can I make this call?" and gets an answer without a single token spent.

Peek semantics — the core guarantee

Every tool below is a peek: querying never consumes budget, never takes a token from the bucket, never records loop state. An agent can check as often as it wants for free.

The five tools

Identical across Go, Node.js, and Python:

Tool	What it answers
`get_rate_limit_state`	Would a call for this key be allowed right now? Remaining, limit, retry-after.
`get_token_budget`	How many LLM tokens remain? Optionally: would `estimated_tokens` fit?
`get_circuit_breaker_state`	Is the upstream healthy? closed / open / half-open.
`check_loop`	Has this exact payload been seen at a lower sequence depth (runaway loop)?
`list_limits`	Everything above in one call — designed for agent initialization.

Serve the tools

The Go SDK ships a zero-dependency MCP stdio server — newline-delimited JSON-RPC 2.0 implementing initialize, tools/list, tools/call, and ping. Node and Python return tool definitions ready to register in your MCP server framework:

rg := rateguard.New(rateguard.Config{Preset: "agent-orchestrator"})

// Serve over stdio — plugs into any MCP client config
_ = rg.ServeMCP(ctx, os.Stdin, os.Stdout)

// Or call tools directly, in-process:
res := rg.MCPCall("get_token_budget",
    map[string]any{"key": "tenant-1", "estimated_tokens": 8000})

Connect Claude Code, Claude Desktop, or Cursor

Any MCP client can query RateGuard. Add your app (running the stdio server) to the client's MCP config:

mcp config (Claude Code / Claude Desktop / Cursor)

{
  "mcpServers": {
    "rateguard": {
      "command": "your-app",
      "args": ["mcp"]
    }
  }
}

From that moment the agent can call list_limits on startup to learn its operating envelope, get_token_budget with an estimate before an expensive call, and check_loop when its own behavior starts repeating.

The pre-flight pattern

A well-behaved agent loop with RateGuard looks like this:

agent pseudocode

state = mcp.call("list_limits")            # 1. learn the envelope at startup

for task in tasks:
    budget = mcp.call("get_token_budget", {
        "key": tenant, "estimated_tokens": est(task)})
    if not budget["fits"]:
        wait_or_replan(budget["retry_after"])  # 2. ask before spending

    loop = mcp.call("check_loop", {
        "system_prompt": sp, "user_input": ui,
        "sequence_depth": depth})
    if not loop["allowed"]:
        break                                  # 3. stop your own runaway loop

    result = call_llm(task)                    # 4. spend — metered by the
                                               #    outbound transport

Tip

Pair this with the outbound transport: MCP tools are how the agent asks, the wrapped HTTP client is how spend is enforced. Together the agent can't out-spend its budget even when it forgets to ask.

Which preset?

agent-orchestrator (500 RPS, 1M tokens/hr, soft-stop) for multi-agent systems; mcp-server (30 RPS, 50K tokens/hr, hard-stop) when RateGuard guards an MCP tool server itself. Full table in Presets.