Get started

What is RateGuard?

RateGuard is middleware that makes every LLM call transparent. Drop it into your app and every token consumed, every rate limit hit, every circuit breaker trip becomes a traceable event — with zero infrastructure. Three SDKs (Go, Node.js, Python), identical behavior, one API.

Every other rate limiting tool was built for REST APIs. RateGuard was built for the LLM era — where a single request can consume 100,000 tokens, streaming responses span minutes, and your provider bill depends on how well you control it.

Not a proxy

RateGuard runs inside your application process. No gateway, no extra service, no added latency, no new attack surface. Your API keys never leave your app.

The two jobs it does

1. Guard the door (inbound). Classic rate limiting middleware for your own API — token bucket algorithm, per-tenant and per-route, with presets tuned for LLM workloads.

2. Guard the money (outbound). Real LLM spend happens on outbound calls. RateGuard wraps the HTTP client your LLM SDK already uses, so every call to OpenAI, Anthropic, Gemini, or any OpenAI-compatible provider is budgeted, breaker-protected, and metered with the provider's real token counts — including streaming.

client := rg.WrapClient(&http.Client{})
openai := openai.NewClient(option.WithHTTPClient(client))

Built for agents

Every AI gateway makes agents discover limits by hitting 429s. RateGuard answers before the request leaves the process: five MCP tools with peek semantics let any agent — Claude Code, Cursor, or your own — ask "can I make this call?" without consuming budget. See Agents & MCP.

Capabilities

Capability	What it means
Outbound spend tracking	Wrap `http.Client`/`fetch`/`httpx` — real token usage from JSON and SSE streaming responses, metered into budgets.
Agent pre-flight (MCP)	5 MCP tools + a zero-dependency stdio server. Querying never consumes budget.
Token budgets	Hourly / daily / monthly caps on LLM tokens. Hard-stop or soft-stop.
Loop detection	SHA-256 payload fingerprinting halts runaway agent loops.
Provider fallback	Automatic failover across OpenAI-compatible providers with credential isolation.
Circuit breakers	Per-provider outbound, per-upstream inbound. Closed → open → half-open.
GenAI observability	OpenTelemetry `gen_ai.*` spans per the official semantic conventions, plus Prometheus `/metrics`.
Guardrails	PII and prompt-injection detection wired into the middleware — violations return 422.

How it compares

	RateGuard	express-rate-limit	LiteLLM	Kong
Multi-language	✅ Go + Node + Python	❌ JS only	❌ Python only	❌
Zero infrastructure	✅ Middleware	✅	❌ Proxy required	❌ Gateway
In-process outbound tracking	✅ Client wrapper	❌	❌ Proxy only	❌
Agent pre-flight (MCP)	✅ 5 tools + stdio	❌	❌	❌
Agent loop detection	✅	❌	❌	❌
LLM token budgets	✅	❌	✅	❌
GenAI OTel conventions	✅	❌	❌	❌
Open source	✅ MIT	✅	✅	Partial

Ready? Install it in one line →