Budgets vs Rate Limits: Which Control to Reach For
Both return 429, but they solve different problems. Here is a decision guide for when to use a budget cap, when to use a rate limit, and why you almost always want both.
Both return 429, but they solve different problems. Here is a decision guide for when to use a budget cap, when to use a rate limit, and why you almost always want both.
A 429 and a 402 mean different things and need different client logic. Here is how to handle rate limits, budget caps, and out-of-credits responses gracefully — with backoff, not blind retries.
Rate limits cap velocity, not total spend — and they're a security boundary, not a knob. Here is how RPM/TPM limits work per key, team, and org, and why the caller can never override them.
4% Markup
0% Tier 3
Before an LLM call leaves the gateway it clears four independent limits: credit balance, budget cap, rate limit, and guardrails. Here is what each one checks and why they are separate.