Docs menu

Get started

What is RateGuard?

RateGuard is middleware that makes every LLM call transparent. Drop it into your app and every token consumed, every rate limit hit, every circuit breaker trip becomes a traceable event — with zero infrastructure. Three SDKs (Go, Node.js, Python), identical behavior, one API.

Every other rate limiting tool was built for REST APIs. RateGuard was built for the LLM era — where a single request can consume 100,000 tokens, streaming responses span minutes, and your provider bill depends on how well you control it.

Not a proxy

RateGuard runs inside your application process. No gateway, no extra service, no added latency, no new attack surface. Your API keys never leave your app.

The two jobs it does

1. Guard the door (inbound). Classic rate limiting middleware for your own API — token bucket algorithm, per-tenant and per-route, with presets tuned for LLM workloads.

2. Guard the money (outbound). Real LLM spend happens on outbound calls. RateGuard wraps the HTTP client your LLM SDK already uses, so every call to OpenAI, Anthropic, Gemini, or any OpenAI-compatible provider is budgeted, breaker-protected, and metered with the provider's real token counts — including streaming.

client := rg.WrapClient(&http.Client{})
openai := openai.NewClient(option.WithHTTPClient(client))

Built for agents

Every AI gateway makes agents discover limits by hitting 429s. RateGuard answers before the request leaves the process: five MCP tools with peek semantics let any agent — Claude Code, Cursor, or your own — ask "can I make this call?" without consuming budget. See Agents & MCP.

Capabilities

CapabilityWhat it means
Outbound spend trackingWrap http.Client/fetch/httpx — real token usage from JSON and SSE streaming responses, metered into budgets.
Agent pre-flight (MCP)5 MCP tools + a zero-dependency stdio server. Querying never consumes budget.
Token budgetsHourly / daily / monthly caps on LLM tokens. Hard-stop or soft-stop.
Loop detectionSHA-256 payload fingerprinting halts runaway agent loops.
Provider fallbackAutomatic failover across OpenAI-compatible providers with credential isolation.
Circuit breakersPer-provider outbound, per-upstream inbound. Closed → open → half-open.
GenAI observabilityOpenTelemetry gen_ai.* spans per the official semantic conventions, plus Prometheus /metrics.
GuardrailsPII and prompt-injection detection wired into the middleware — violations return 422.

How it compares

RateGuardexpress-rate-limitLiteLLMKong
Multi-language✅ Go + Node + Python❌ JS only❌ Python only
Zero infrastructure✅ Middleware❌ Proxy required❌ Gateway
In-process outbound tracking✅ Client wrapper❌ Proxy only
Agent pre-flight (MCP)✅ 5 tools + stdio
Agent loop detection
LLM token budgets
GenAI OTel conventions
Open source✅ MITPartial

Ready? Install it in one line →