$5 free credits when you sign up
← All posts
Guides

The Four Ceilings Every LLM Request Passes

Before an LLM call leaves the gateway it clears four independent limits: credit balance, budget cap, rate limit, and guardrails. Here is what each one checks and why they are separate.

Nemo Team8 min read

People often collapse "spend control" into a single setting. In practice, an LLM gateway enforces four separate controls on every request, each answering a different question and each able to reject a call on its own. Understanding the difference is the difference between a budget that protects you and one that surprises you.

This is the pre-flight checklist NemoRouter runs before a request is ever forwarded to a provider.

The four controls at a glance

CeilingQuestion it answersScopeRejects with
Credit balanceIs there money in the account at all?Org402
Budget capHas this scope spent its allowance?Org / team / key429
Rate limitToo many requests/tokens too fast?Key / team / org429
GuardrailsIs the content allowed?Key > team > org400-class block

They run in parallel conceptually, but each is independent: passing one says nothing about the others. A request with plenty of credits can still be blocked by a budget cap; a request under budget can still be rate-limited; a request under every numeric limit can still be stopped by a guardrail.

Ceiling 1 — Credit balance: is there money?

Credits are the fuel. Before anything else, the gateway reserves the call's estimated cost against the org's balance (see reserve-and-settle). If the balance can't cover the reservation, the request returns 402 Payment Required — not 429. The distinction matters: 402 means "top up or enable auto-topup," while 429 means "you have money, but a limit stopped you."

A balance can never go negative, so this ceiling is the hard floor under everything else.

Ceiling 2 — Budget caps: has this scope spent its allowance?

A budget is a spend allowance attached to a scope and a window. You can have credits and still be blocked, because the team or key has hit the cap you set for it.

balance:  $4,000 available   ← Ceiling 1 is fine
key cap:  $50 / day, $50 spent  ← Ceiling 2 blocks → 429

This is what makes budgets a safety system rather than a billing readout: the cap is the most-restrictive line the request falls under, across org, team, and key. The tightest one wins. (Full walkthrough in How to set hard spend limits.)

Ceiling 3 — Rate limits: too fast?

Budgets cap total dollars over a window. Rate limits cap velocity — requests per minute (RPM) and tokens per minute (TPM). They protect against bursty abuse and runaway loops that a daily dollar cap wouldn't catch until it's too late.

Rate limits are deliberately not overridable per request — they're a security boundary, not a knob the caller can turn. A leaked key throttled at 60 RPM can do far less damage than the same key uncapped, regardless of the budget behind it.

Ceiling 4 — Guardrails: is the content allowed?

The first three ceilings are about quantity. Guardrails are about content. On every request (when enabled), the gateway can:

  • detect and redact PII before it reaches a provider,
  • block known prompt-injection patterns,
  • enforce keyword blocklists,
  • apply content-safety policy.

Guardrails resolve with key > team > org precedence: the most specific scope wins, so a stricter per-key policy overrides a looser org default. A request that is perfectly affordable and perfectly within rate can still be stopped here — because cost says nothing about safety. (Deep dive: guardrails on every request.)

Why keep them separate?

It's tempting to imagine one "limit" that does everything. But each ceiling fails in a different way and protects a different stakeholder:

  • Credits protect us from serving unpaid inference.
  • Budgets protect the customer's invoice from a runaway scope.
  • Rate limits protect the platform from velocity abuse.
  • Guardrails protect the data and the model from unsafe content.

Collapsing them would mean a single threshold has to encode four unrelated risks — and the moment you tune it for one, you mis-tune it for the other three. Keeping them independent means each can be set correctly, and a request only ships when it clears all four.

Putting it together

Every NemoRouter request runs this gauntlet before it touches a provider, and the cost is only ever settled after a successful, allowed call. The result is a gateway where "it's within budget" and "it's safe to send" are answered by the right control, every time — free on every tier. Start at the docs or pricing.

Written by Nemo TeamEngineering, product, and company posts from the Nemo Router team — code-first, cost-honest, no vendor-marketing fluff.