# RateGuard — AI-Native Rate Limiting SDK > Go, Node.js, Python middleware. Outbound LLM spend tracking (wrap your SDK's HTTP client), pre-flight rate limit queries for AI agents (MCP), token budgets, GenAI OTel, circuit breakers, provider fallback, loop detection, guardrails. No proxy needed. ## Outbound spend tracking (the headline) Wrap the HTTP client your LLM SDK already uses — real token usage (JSON + SSE streaming) metered into budgets, per-provider circuit breakers, OpenAI-compatible fallback: - Go: `client := rg.WrapClient(&http.Client{})` → `openai.NewClient(option.WithHTTPClient(client))` - Node: `new OpenAI({ fetch: rg.wrapFetch() })` - Python: `OpenAI(http_client=rg.wrap_httpx_client())` (httpx lazy import, zero runtime deps kept) Detects 16 OpenAI-compatible hosts + Anthropic, Gemini, Vertex, Azure OpenAI, AWS Bedrock, self-hosted vLLM/llama.cpp. Enforce mode synthesizes provider-native 429/503 (SDK retry logic just works); observe mode only meters. ## Quick Start ### Go ```go import rateguard "github.com/varbees/rateguard/packages/sdk-go" rg := rateguard.New(rateguard.Config{Preset: "streaming-llm"}) http.Handle("/metrics", rg.Metrics()) // Expose limits to MCP clients (Claude Code, Cursor, custom agents): _ = rg.ServeMCP(ctx, os.Stdin, os.Stdout) ``` ### Node.js ```ts import { RateGuard } from '@varbees/rateguard-node'; const rg = new RateGuard({ preset: 'streaming-llm' }); const tools = rg.mcpTools(); // 5 MCP tools, peek semantics ``` ### Python ```python from rateguard import RateGuard rg = RateGuard(preset="streaming-llm") tools = rg.mcp_tools() # 5 MCP tools, peek semantics ``` ## Architecture RateGuard is MIDDLEWARE (runs inside your app), not a proxy or gateway. Three SDKs, identical behavior across Go/Node/Python. ## Core Algorithm (all 3 SDKs) Token Bucket (RFC standard, same as Kong/Envoy/AWS): `tokens = min(burst, tokens + elapsed × rps)` Every limiter also implements Peek — a non-consuming pre-flight query. ## 8 Presets | Preset | RPS | Tokens/hr | Mode | |---|---|---|---| | dev | 10 | 1K | hard-stop | | standard | 100 | 10K | hard-stop | | high-throughput | 1000 | 100K | hard-stop | | streaming-llm | 200 | 500K | soft-stop | | agent-orchestrator | 500 | 1M | soft-stop | | llm-heavy | 500 | 250K | soft-stop | | mcp-server | 30 | 50K | hard-stop | | strict-upstream-protection | 50 | 5K | hard-stop | ## Features - MCP tools (5, all SDKs): get_rate_limit_state, get_token_budget, get_circuit_breaker_state, check_loop, list_limits — peek semantics, querying never consumes budget - MCP stdio server (Go): zero-dependency JSON-RPC 2.0, plugs into any MCP client config - Loop detection: SHA-256 payload fingerprinting + max sequence depth, LRU-bounded; wired into Go middleware via X-Sequence-Depth header (429 loop_detected) - Token budgets (hour/day/month, hard-stop or soft-stop); estimate-based reservations (Go: EstimatedTokensPerRequest) keep concurrency high under hard-stop - Circuit breakers (closed → open → half-open) - GenAI OTel per semconv: span name "{operation} {model}", gen_ai.usage.input_tokens/output_tokens, low-cardinality error.type; public API rg.StartGenAICall → span.End with auto cost estimation, TTFT/TPOT - 14-model pricing table, verified against provider pricing pages - Content guardrails: PII, prompt injection (5 vectors), token/length limits; wired into Go middleware (422) - Prometheus /metrics endpoint (stdlib only): live counters — requests, rate limit hits, budget exhaustion, breaker trips, tokens consumed, loop stats - IETF RateLimit-* response headers (draft-ietf-httpapi-ratelimit-headers) - Events/webhooks for every request (Go) - Redis distributed limiter: atomic Lua GCRA, read-only peek script (Go) ## Tests All 3 SDKs green — 155 total: `go test ./...` (61 passing), `bun run test` (46 passing), `pytest -q` (48 passing) ## Docs - AGENTS.md: full agent contract with architecture, domain types, rules, commands - ARCHITECTURE.md: positioning vs Datadog/Kong/Cloudflare - docs/API_REFERENCE.md: all config, adapters, provider chain, guardrails - docs/GENAI_OBSERVABILITY.md: OTel integration, model pricing, streaming - docs/RELEASE_NOTES.md: changelog ## vs Competition The only in-process SDK where agents can ask "can I make this call?" before spending tokens — across Go, Node, and Python. Nobody else combines multi-language SDKs + MCP pre-flight tools + LLM token budgets + GenAI OTel + circuit breakers + loop detection + guardrails in one package.