Guides
Rate limit your API
The inbound middleware guards your own endpoints with the token bucket algorithm — per-tenant, per-route, per-upstream. Adapters exist for the frameworks you already use; semantics are identical everywhere.
Middleware adapters
rg := rateguard.New(rateguard.Config{Preset: "standard"})
// net/http
http.Handle("/", rg.HTTPMiddleware(myHandler))
// chi
r := chi.NewRouter()
r.Use(rg.ChiMiddleware())
// Prometheus
http.Handle("/metrics", rg.Metrics())What a denial looks like
Denied requests get a 429 with a computed Retry-After: retry_after = ceil((1.0 − tokens) / rps) × 1000ms. Well-behaved clients (and every LLM SDK) back off exactly as long as needed — no thundering-herd retries.
Scoping
Set TenantID, RouteID, and UpstreamID in the config to partition buckets. Pass a Redis client for distributed limiting across replicas; without one, limiting is process-local.
Wired extras
The middleware chain also runs guardrails against request bodies (violations → 422) and loop detection when agents send X-Sequence-Depth (loops → 429 loop_detected). Every decision is observable at /metrics and as events.