People often collapse "spend control" into a single setting. In practice, an LLM gateway enforces four separate controls on every request, each answering a different question and each able to reject a call on its own. Understanding the difference is the difference between a budget that protects you and one that surprises you.

This is the pre-flight checklist NemoRouter runs before a request is ever forwarded to a provider.

The four controls at a glance

Ceiling	Question it answers	Scope	Rejects with
Credit balance	Is there money in the account at all?	Org	`402`
Budget cap	Has this scope spent its allowance?	Org / team / key	`429`
Rate limit	Too many requests/tokens too fast?	Key / team / org	`429`
Guardrails	Is the content allowed?	Key > team > org	`400`-class block

They run in parallel conceptually, but each is independent: passing one says nothing about the others. A request with plenty of credits can still be blocked by a budget cap; a request under budget can still be rate-limited; a request under every numeric limit can still be stopped by a guardrail.

Ceiling 1 — Credit balance: is there money?

Credits are the fuel. Before anything else, the gateway reserves the call's estimated cost against the org's balance (see reserve-and-settle). If the balance can't cover the reservation, the request returns 402 Payment Required — not 429. The distinction matters: 402 means "top up or enable auto-topup," while 429 means "you have money, but a limit stopped you."

A balance can never go negative, so this ceiling is the hard floor under everything else.

Ceiling 2 — Budget caps: has this scope spent its allowance?

A budget is a spend allowance attached to a scope and a window. You can have credits and still be blocked, because the team or key has hit the cap you set for it.

balance:  $4,000 available   ← Ceiling 1 is fine
key cap:  $50 / day, $50 spent  ← Ceiling 2 blocks → 429

This is what makes budgets a safety system rather than a billing readout: the cap is the most-restrictive line the request falls under, across org, team, and key. The tightest one wins. (Full walkthrough in How to set hard spend limits.)

Ceiling 3 — Rate limits: too fast?

Budgets cap total dollars over a window. Rate limits cap velocity — requests per minute (RPM) and tokens per minute (TPM). They protect against bursty abuse and runaway loops that a daily dollar cap wouldn't catch until it's too late.

Rate limits are deliberately not overridable per request — they're a security boundary, not a knob the caller can turn. A leaked key throttled at 60 RPM can do far less damage than the same key uncapped, regardless of the budget behind it.

Ceiling 4 — Guardrails: is the content allowed?

The first three ceilings are about quantity. Guardrails are about content. On every request (when enabled), the gateway can:

detect and redact PII before it reaches a provider,
block known prompt-injection patterns,
enforce keyword blocklists,
apply content-safety policy.

Guardrails resolve with key > team > org precedence: the most specific scope wins, so a stricter per-key policy overrides a looser org default. A request that is perfectly affordable and perfectly within rate can still be stopped here — because cost says nothing about safety. (Deep dive: guardrails on every request.)

Why keep them separate?

It's tempting to imagine one "limit" that does everything. But each ceiling fails in a different way and protects a different stakeholder:

Credits protect us from serving unpaid inference.
Budgets protect the customer's invoice from a runaway scope.
Rate limits protect the platform from velocity abuse.
Guardrails protect the data and the model from unsafe content.

Collapsing them would mean a single threshold has to encode four unrelated risks — and the moment you tune it for one, you mis-tune it for the other three. Keeping them independent means each can be set correctly, and a request only ships when it clears all four.

Putting it together

Every NemoRouter request runs this gauntlet before it touches a provider, and the cost is only ever settled after a successful, allowed call. The result is a gateway where "it's within budget" and "it's safe to send" are answered by the right control, every time — free on every tier. Start at the docs or pricing.

The Four Ceilings Every LLM Request Passes

The four controls at a glance

Ceiling 1 — Credit balance: is there money?

Ceiling 2 — Budget caps: has this scope spent its allowance?

Ceiling 3 — Rate limits: too fast?

Ceiling 4 — Guardrails: is the content allowed?

Why keep them separate?

Putting it together

More from Guides

Ship AI Features Faster: API Key to Production in an Afternoon

Forward LLM Logs to Datadog, Langfuse, S3, and Slack

The Free Promo Tier: Signup Credits Explained