Get started
What is RateGuard?
RateGuard is middleware that makes every LLM call transparent. Drop it into your app and every token consumed, every rate limit hit, every circuit breaker trip becomes a traceable event — with zero infrastructure. Three SDKs (Go, Node.js, Python), identical behavior, one API.
Every other rate limiting tool was built for REST APIs. RateGuard was built for the LLM era — where a single request can consume 100,000 tokens, streaming responses span minutes, and your provider bill depends on how well you control it.
Not a proxy
The two jobs it does
1. Guard the door (inbound). Classic rate limiting middleware for your own API — token bucket algorithm, per-tenant and per-route, with presets tuned for LLM workloads.
2. Guard the money (outbound). Real LLM spend happens on outbound calls. RateGuard wraps the HTTP client your LLM SDK already uses, so every call to OpenAI, Anthropic, Gemini, or any OpenAI-compatible provider is budgeted, breaker-protected, and metered with the provider's real token counts — including streaming.
client := rg.WrapClient(&http.Client{})
openai := openai.NewClient(option.WithHTTPClient(client))Built for agents
Every AI gateway makes agents discover limits by hitting 429s. RateGuard answers before the request leaves the process: five MCP tools with peek semantics let any agent — Claude Code, Cursor, or your own — ask "can I make this call?" without consuming budget. See Agents & MCP.
Capabilities
| Capability | What it means |
|---|---|
| Outbound spend tracking | Wrap http.Client/fetch/httpx — real token usage from JSON and SSE streaming responses, metered into budgets. |
| Agent pre-flight (MCP) | 5 MCP tools + a zero-dependency stdio server. Querying never consumes budget. |
| Token budgets | Hourly / daily / monthly caps on LLM tokens. Hard-stop or soft-stop. |
| Loop detection | SHA-256 payload fingerprinting halts runaway agent loops. |
| Provider fallback | Automatic failover across OpenAI-compatible providers with credential isolation. |
| Circuit breakers | Per-provider outbound, per-upstream inbound. Closed → open → half-open. |
| GenAI observability | OpenTelemetry gen_ai.* spans per the official semantic conventions, plus Prometheus /metrics. |
| Guardrails | PII and prompt-injection detection wired into the middleware — violations return 422. |
How it compares
| RateGuard | express-rate-limit | LiteLLM | Kong | |
|---|---|---|---|---|
| Multi-language | ✅ Go + Node + Python | ❌ JS only | ❌ Python only | ❌ |
| Zero infrastructure | ✅ Middleware | ✅ | ❌ Proxy required | ❌ Gateway |
| In-process outbound tracking | ✅ Client wrapper | ❌ | ❌ Proxy only | ❌ |
| Agent pre-flight (MCP) | ✅ 5 tools + stdio | ❌ | ❌ | ❌ |
| Agent loop detection | ✅ | ❌ | ❌ | ❌ |
| LLM token budgets | ✅ | ❌ | ✅ | ❌ |
| GenAI OTel conventions | ✅ | ❌ | ❌ | ❌ |
| Open source | ✅ MIT | ✅ | ✅ | Partial |
Ready? Install it in one line →