Guides
Agents & MCP
Agents burn budgets because they can't see their own limits — they discover a rate limit by hitting it, then retry into the same wall. RateGuard inverts that: it exposes limits as MCP tools an agent can query before acting. The agent asks "can I make this call?" and gets an answer without a single token spent.
Peek semantics — the core guarantee
The five tools
Identical across Go, Node.js, and Python:
| Tool | What it answers |
|---|---|
get_rate_limit_state | Would a call for this key be allowed right now? Remaining, limit, retry-after. |
get_token_budget | How many LLM tokens remain? Optionally: would estimated_tokens fit? |
get_circuit_breaker_state | Is the upstream healthy? closed / open / half-open. |
check_loop | Has this exact payload been seen at a lower sequence depth (runaway loop)? |
list_limits | Everything above in one call — designed for agent initialization. |
Serve the tools
The Go SDK ships a zero-dependency MCP stdio server — newline-delimited JSON-RPC 2.0 implementing initialize, tools/list, tools/call, and ping. Node and Python return tool definitions ready to register in your MCP server framework:
rg := rateguard.New(rateguard.Config{Preset: "agent-orchestrator"})
// Serve over stdio — plugs into any MCP client config
_ = rg.ServeMCP(ctx, os.Stdin, os.Stdout)
// Or call tools directly, in-process:
res := rg.MCPCall("get_token_budget",
map[string]any{"key": "tenant-1", "estimated_tokens": 8000})Connect Claude Code, Claude Desktop, or Cursor
Any MCP client can query RateGuard. Add your app (running the stdio server) to the client's MCP config:
{
"mcpServers": {
"rateguard": {
"command": "your-app",
"args": ["mcp"]
}
}
}From that moment the agent can call list_limits on startup to learn its operating envelope, get_token_budget with an estimate before an expensive call, and check_loop when its own behavior starts repeating.
The pre-flight pattern
A well-behaved agent loop with RateGuard looks like this:
state = mcp.call("list_limits") # 1. learn the envelope at startup
for task in tasks:
budget = mcp.call("get_token_budget", {
"key": tenant, "estimated_tokens": est(task)})
if not budget["fits"]:
wait_or_replan(budget["retry_after"]) # 2. ask before spending
loop = mcp.call("check_loop", {
"system_prompt": sp, "user_input": ui,
"sequence_depth": depth})
if not loop["allowed"]:
break # 3. stop your own runaway loop
result = call_llm(task) # 4. spend — metered by the
# outbound transportTip
Which preset?
agent-orchestrator (500 RPS, 1M tokens/hr, soft-stop) for multi-agent systems; mcp-server (30 RPS, 50K tokens/hr, hard-stop) when RateGuard guards an MCP tool server itself. Full table in Presets.