$5 free credits when you sign up
Budgets

Spend control, enforced at the database.

Set spending limits at every level — organization, team, and individual API key. Reserve + settle pattern, 402 returned cleanly when over budget, never a partial debit. The database refuses to write a negative balance.

budget enforcement · live

API key sk-nemo-...x9y0

Org limit$500.00 / mo
Team (Engineering)$250.00 / mo
Key limit$100.00 / mo
Spent this month$45.20
Reservations open8 421
Released (errors)13
Negative-balance attempts0
Verdictallow
org > team > keyreserve + settle402 cleanly$0.00 drift
Ledger drift target
$0.00

24h parity check: sum(transactions) == balance

Spend pattern
Reserve + settle

Atomic mutations under advisory locks

Over-budget response
402

Never a partial debit, never an overage

Cap scopes
Org · Team · Key

Hierarchical with explicit override

Capabilities

Cost controls at every level

Per-key, per-team, and per-org spending limits with hard and soft thresholds. Alerts at 50/80/100%, RPM/TPM rate limits stop runaways, and auto-topup keeps production online.

Per-key budget controls

Set maximum spend on individual virtual keys — daily, monthly, or absolute. When a key hits its limit, requests are blocked instantly with a clean 402; no surprise charges, no overruns.

  • Max-spend (absolute, $/day, $/month) per key
  • Independent prod / staging / experiment keys
  • Per-key revocation: kill a leaked key in one click
  • Reservations released the moment a request errors

Per-org budgets, hierarchical

Budget limits cascade from organization to team to virtual key. Child limits never exceed parent limits. The result: a single dashboard view of where every dollar of spend goes.

  • Org > Team > Key inheritance with explicit override
  • Real-time aggregation across child keys
  • Per-team spend visible separately for cost-allocation
  • Owner / Admin / Member RLS-scoped read access

Reserve + settle atomic

Every LLM request reserves credits before forwarding. After the response returns, the actual cost is settled and excess released. Failures release the full reservation. The database refuses to write a negative balance.

  • reserve_credits → forward → settle_credits
  • Failure path always calls release_reservation
  • Postgres advisory locks for atomic mutations
  • No partial debit on errors — 402 returned cleanly

RPM + TPM rate limits

Cap requests-per-minute and tokens-per-minute at any scope. Prevent runaway loops from burning credits, smooth out bursty workloads, and protect downstream provider rate limits.

  • RPM: requests per minute (key, team, org)
  • TPM: tokens per minute (key, team, org)
  • 429 returned when limits hit; reservations not consumed
  • Per-tier hard ceilings to prevent provider rate-limit spillover

Budget alerts at 50 / 80 / 100%

Get notified when spending hits configurable thresholds. Multi-channel (email, Slack, Teams, webhook). Per-org thresholds with hysteresis to prevent flapping.

  • 50% / 80% / 100% configurable per scope
  • Multi-channel: email · Slack · Teams · webhook
  • Hysteresis prevents flapping on near-threshold spend
  • Cleared automatically when balance drops below alert level

Auto-topup

Automatic credit replenishment when balance drops below a threshold. Stay online without manual intervention. Stripe charges your card and credits land in seconds.

  • Threshold + amount fully configurable
  • Stripe-backed; webhook idempotent on event id
  • Hard cap prevents runaway topup-loops
  • Audit trail entry per topup with actor + IP
Reserve + Settle

The credit pattern, in four atomic steps

Every LLM request goes through reserve → forward → settle (or release on failure). Postgres advisory locks make every mutation atomic; the database refuses to write a negative balance.

Per-request credit lifecycle

  1. Reserve

    estimate + hold

    Conservative estimate held against the balance.

  2. Forward

    Nemo Backend → in-process router

    Provider call with auth + guardrails in path.

  3. Settle

    x-nemo-response-cost

    Deduct actual cost; release the unused reservation.

  4. Release

    on error / timeout

    Failure path returns the full reservation.

Cost on settle comes from the x-nemo-response-cost header — the provider’s number, never reconstructed by us. Failures always call release_reservation; denied requests cost zero credits.

Over-Budget Response

402 cleanly, never an overage

Money safety

No partial debits, no leftover reservations

When a request would breach a budget — at any scope — the API returns 402 cleanly before forwarding to the provider. The reservation is released, no LLM call is made, no credits are charged. 429 is reserved for rate-limit overruns and includes a retry-after.

  • 402 = budget exceeded; reservation released, no provider call
  • 429 = RPM/TPM exceeded; retry-after header included
  • reserve_credits is the gate: the call cannot proceed past it
  • release_reservation called on every failure path (4xx/5xx/timeout/circuit-break)
  • Daily ledger parity check: sum(transactions) == balance
enforcement · last 60s

Live budget signals

Reservations opened8 421
Settled w/ provider cost8 408
402 (budget)12
429 (rate-limit)7
Released (errors / timeouts)13
Negative-balance attempts0
Ledger drift (24h)$0.00
advisory lock402 clean429 retry-after$0.00 drift
Hierarchy

Org > Team > Key with inheritance

Inheritance

Child limits never exceed parent limits

Budgets cascade from organization to team to individual virtual key. The org sets the hard ceiling; teams divide it into sub-pools; individual keys carve from the team allocation. Real-time aggregation across child keys means the dashboard always reflects current spend.

  • Org-level: the hard ceiling for the entire account
  • Team-level: cost allocation across product / engineering / etc.
  • Key-level: per-environment caps (prod / staging / experiments)
  • Same UUID flows through every scope — no mapping table
  • Owner / Admin / Member roles enforced via RLS policies
cap hierarchy · acme inc.

Spend rollup

Org (Acme Inc.)$320 / $500
↳ Team Engineering$180 / $250
↳ key prod-api$140 / $200
↳ key staging$28 / $50
↳ Team Marketing$95 / $150
↳ key prod-blog$95 / $150
RLS policies enforced4 / 4
orgteamkeyinheritanceRLS-scoped
Auto-Topup

Stay online — without manual intervention

Set a minimum balance threshold and a topup amount. When your balance drops below the threshold, Stripe charges your card and the credits land in seconds. Hard cap prevents runaway topup-loops; every topup is audit-logged.

Set the trigger

Threshold (e.g. balance < $10) and topup amount (e.g. +$50). Configure once; revoke any time.

Stripe charges the card

Stripe webhook is idempotent on event ID. Failed payment surfaces as a 402 with retry guidance.

Credits land — audited

Audit-trail entry per topup with actor + source IP. Hard cap prevents runaway topup-loops.

FAQ

Common budget questions

Reserve + settle · 402 cleanly · $0 drift target

Cost control as an architectural choice — not a feature

Sign up, set a budget, ship. The database refuses to overspend; you don’t have to babysit it.