Guides

Rate limit your API

The inbound middleware guards your own endpoints with the token bucket algorithm — per-tenant, per-route, per-upstream. Adapters exist for the frameworks you already use; semantics are identical everywhere.

Middleware adapters

rg := rateguard.New(rateguard.Config{Preset: "standard"})

// net/http
http.Handle("/", rg.HTTPMiddleware(myHandler))

// chi
r := chi.NewRouter()
r.Use(rg.ChiMiddleware())

// Prometheus
http.Handle("/metrics", rg.Metrics())

What a denial looks like

Denied requests get a 429 with a computed Retry-After: retry_after = ceil((1.0 − tokens) / rps) × 1000ms. Well-behaved clients (and every LLM SDK) back off exactly as long as needed — no thundering-herd retries.

Scoping

Set TenantID, RouteID, and UpstreamID in the config to partition buckets. Pass a Redis client for distributed limiting across replicas; without one, limiting is process-local.

Wired extras

The middleware chain also runs guardrails against request bodies (violations → 422) and loop detection when agents send X-Sequence-Depth (loops → 429 loop_detected). Every decision is observable at /metrics and as events.