Platform Fee
4% Markup
Platform Fee
0% Tier 3
The Four Ceilings Every LLM Request Passes
Before an LLM call leaves the gateway it clears four independent limits: credit balance, budget cap, rate limit, and guardrails. Here is what each one checks and why they are separate.
People often collapse "spend control" into a single setting. In practice, an LLM gateway enforces four separate controls on every request, each answering a different question and each able to reject a call on its own. Understanding the difference is the difference between a budget that protects you and one that surprises you.
This is the pre-flight checklist NemoRouter runs before a request is ever forwarded to a provider.
The four controls at a glance
| Ceiling | Question it answers | Scope | Rejects with |
|---|---|---|---|
| Credit balance | Is there money in the account at all? | Org | 402 |
| Budget cap | Has this scope spent its allowance? | Org / team / key | 429 |
| Rate limit | Too many requests/tokens too fast? | Key / team / org | 429 |
| Guardrails | Is the content allowed? | Key > team > org | 400-class block |
They run in parallel conceptually, but each is independent: passing one says nothing about the others. A request with plenty of credits can still be blocked by a budget cap; a request under budget can still be rate-limited; a request under every numeric limit can still be stopped by a guardrail.
Ceiling 1 — Credit balance: is there money?
Credits are the fuel. Before anything else, the gateway reserves the call's estimated cost against the org's balance (see reserve-and-settle). If the balance can't cover the reservation, the request returns 402 Payment Required — not 429. The distinction matters: 402 means "top up or enable auto-topup," while 429 means "you have money, but a limit stopped you."
A balance can never go negative, so this ceiling is the hard floor under everything else.
Ceiling 2 — Budget caps: has this scope spent its allowance?
A budget is a spend allowance attached to a scope and a window. You can have credits and still be blocked, because the team or key has hit the cap you set for it.
balance: $4,000 available ← Ceiling 1 is fine
key cap: $50 / day, $50 spent ← Ceiling 2 blocks → 429This is what makes budgets a safety system rather than a billing readout: the cap is the most-restrictive line the request falls under, across org, team, and key. The tightest one wins. (Full walkthrough in How to set hard spend limits.)
Ceiling 3 — Rate limits: too fast?
Budgets cap total dollars over a window. Rate limits cap velocity — requests per minute (RPM) and tokens per minute (TPM). They protect against bursty abuse and runaway loops that a daily dollar cap wouldn't catch until it's too late.
Rate limits are deliberately not overridable per request — they're a security boundary, not a knob the caller can turn. A leaked key throttled at 60 RPM can do far less damage than the same key uncapped, regardless of the budget behind it.
Ceiling 4 — Guardrails: is the content allowed?
The first three ceilings are about quantity. Guardrails are about content. On every request (when enabled), the gateway can:
- detect and redact PII before it reaches a provider,
- block known prompt-injection patterns,
- enforce keyword blocklists,
- apply content-safety policy.
Guardrails resolve with key > team > org precedence: the most specific scope wins, so a stricter per-key policy overrides a looser org default. A request that is perfectly affordable and perfectly within rate can still be stopped here — because cost says nothing about safety. (Deep dive: guardrails on every request.)
Why keep them separate?
It's tempting to imagine one "limit" that does everything. But each ceiling fails in a different way and protects a different stakeholder:
- Credits protect us from serving unpaid inference.
- Budgets protect the customer's invoice from a runaway scope.
- Rate limits protect the platform from velocity abuse.
- Guardrails protect the data and the model from unsafe content.
Collapsing them would mean a single threshold has to encode four unrelated risks — and the moment you tune it for one, you mis-tune it for the other three. Keeping them independent means each can be set correctly, and a request only ships when it clears all four.
Putting it together
Every NemoRouter request runs this gauntlet before it touches a provider, and the cost is only ever settled after a successful, allowed call. The result is a gateway where "it's within budget" and "it's safe to send" are answered by the right control, every time — free on every tier. Start at the docs or pricing.