Get started

Presets

RateGuard ships with 8 presets covering every 2026 workload. Set once, override any field. The same presets exist in all three SDKs with identical values.

Preset	RPS	Burst	Tokens/hr	Tokens/day	Tokens/mo	Mode	Use case
`dev`	10	20	1K	10K	100K	hard-stop	Local development
`standard`	100	200	10K	100K	1M	hard-stop	Production APIs
`high-throughput`	1,000	2,000	100K	1M	10M	hard-stop	High-volume services
`streaming-llm`	200	500	500K	5M	500M	soft-stop	Real-time LLM streaming
`agent-orchestrator`	500	1,000	1M	10M	1B	soft-stop	Multi-agent systems
`llm-heavy`	500	1,000	250K	2.5M	250M	soft-stop	LLM-intensive apps
`mcp-server`	30	60	50K	500K	50M	hard-stop	MCP tool servers
`strict-upstream-protection`	50	75	5K	20K	2M	hard-stop	Fragile upstreams

Aliases

Friendly aliases resolve to the same presets: free→dev, starter→standard, pro→high-throughput, business/enterprise→llm-heavy, streaming→streaming-llm, agent/multi-agent→agent-orchestrator, mcp→mcp-server.

Override any field

Start from a preset, then override exactly what differs for your workload:

rg := rateguard.New(rateguard.Config{
    Preset:             "streaming-llm",
    RequestsPerSecond:  300,        // override preset RPS
    TokenBudgetPerHour: 750_000,    // override token budget
})

hard-stop vs soft-stop

hard-stop rejects once the budget is exhausted. soft-stop queues instead of rejecting — the right default for streaming and agent workloads where a hard 429 mid-conversation is worse than a short wait. Details in Token budgets.

The algorithm underneath

All three SDKs use the token bucket algorithm — the same RFC-standards-track approach as Kong, Envoy, and AWS:

tokens = min(burst, tokens + elapsed × rps)
Allow:  tokens >= 1.0 → consume 1
Deny:   retry_after = ceil((1.0 - tokens) / rps) × 1000ms

Every limiter also implements Peek — a non-consuming pre-flight query. That single design decision is what makes agent-native behavior possible; see Agents & MCP.