Docs menu

Get started

Presets

RateGuard ships with 8 presets covering every 2026 workload. Set once, override any field. The same presets exist in all three SDKs with identical values.

PresetRPSBurstTokens/hrTokens/dayTokens/moModeUse case
dev10201K10K100Khard-stopLocal development
standard10020010K100K1Mhard-stopProduction APIs
high-throughput1,0002,000100K1M10Mhard-stopHigh-volume services
streaming-llm200500500K5M500Msoft-stopReal-time LLM streaming
agent-orchestrator5001,0001M10M1Bsoft-stopMulti-agent systems
llm-heavy5001,000250K2.5M250Msoft-stopLLM-intensive apps
mcp-server306050K500K50Mhard-stopMCP tool servers
strict-upstream-protection50755K20K2Mhard-stopFragile upstreams

Aliases

Friendly aliases resolve to the same presets: freedev, starterstandard, prohigh-throughput, business/enterprisellm-heavy, streamingstreaming-llm, agent/multi-agentagent-orchestrator, mcpmcp-server.

Override any field

Start from a preset, then override exactly what differs for your workload:

rg := rateguard.New(rateguard.Config{
    Preset:             "streaming-llm",
    RequestsPerSecond:  300,        // override preset RPS
    TokenBudgetPerHour: 750_000,    // override token budget
})

hard-stop vs soft-stop

hard-stop rejects once the budget is exhausted. soft-stop queues instead of rejecting — the right default for streaming and agent workloads where a hard 429 mid-conversation is worse than a short wait. Details in Token budgets.

The algorithm underneath

All three SDKs use the token bucket algorithm — the same RFC-standards-track approach as Kong, Envoy, and AWS:

tokens = min(burst, tokens + elapsed × rps)
Allow:  tokens >= 1.0 → consume 1
Deny:   retry_after = ceil((1.0 - tokens) / rps) × 1000ms

Every limiter also implements Peek — a non-consuming pre-flight query. That single design decision is what makes agent-native behavior possible; see Agents & MCP.