Budget Controls

Set spending limits and rate controls

Last updated

Nemo Router provides granular budget and rate controls to prevent runaway costs and enforce usage policies. Set spending limits and rate controls at the organization, team, or individual key level.

Budgets page at /[organization]/budgets — current spend vs limit and a Create Budget action

Budget Types

Nemo Router supports two types of controls:

Control TypeWhat It LimitsUnits
Spending BudgetsTotal dollar spend over a time periodUSD ($)
Rate LimitsRequest and token throughputRPM (requests/min), TPM (tokens/min)

Both can be applied at multiple scopes and work together. A request is blocked if either the budget or rate limit is exceeded.

Spending Budgets

Spending budgets cap how much a key or team can spend over a defined period.

Creating a Budget

  1. Navigate to the Budgets page in the dashboard
  2. Click Create Budget
  3. Configure the budget:
FieldDescriptionExample
Budget NameDescriptive label"Backend API — Monthly"
Max SpendMaximum dollar amount$500.00
DurationBudget reset periodMonthly, Weekly, Daily, or Total
  1. Assign the budget to a key or team

Budget Durations

DurationBehavior
DailyResets at midnight UTC each day
WeeklyResets at midnight UTC each Monday
MonthlyResets on the 1st of each month at midnight UTC
TotalNever resets — a hard lifetime cap

Budget Scope

Budgets can be scoped to different levels:

ScopeDescriptionUse Case
OrganizationCaps total spend across the whole orgA backstop on the entire account
TeamLimits spend for all keys in a teamDepartment-level budgets
UserCaps a single member's spend within the orgPer-person allowances
API KeyLimits spend for a single keyPer-environment or per-service limits

Setting a budget at the organization, team, or user scope is restricted to org owners and admins (scope-ownership enforcement); key-level budgets can be set by the key's owner.

When a budget is exceeded, requests are blocked. The status code depends on the scope:

  • Organization / team / user budgets return 402 with code budget_exceeded.
  • Per-key budgets return 429 with code key_budget_exceeded.
{
  "error": "Team budget 'Data Science — Monthly' (monthly) exceeded: $2000.12 used of $2000.00 limit.",
  "code": "budget_exceeded",
  "request_id": "req_..."
}

Budget Examples

Limit a staging key to $50/month:

  • Budget Name: "Staging Monthly"
  • Max Spend: $50.00
  • Duration: Monthly
  • Assign to: staging-api-key

Cap the data science team at $2,000/month:

  • Budget Name: "Data Science Monthly"
  • Max Spend: $2,000.00
  • Duration: Monthly
  • Assign to: Team "Data Science"

Hard cap for a proof-of-concept:

  • Budget Name: "POC Total Budget"
  • Max Spend: $100.00
  • Duration: Total (never resets)
  • Assign to: poc-demo-key

Threshold Alerts

A budget can notify you before it hits the hard cap. Configure soft thresholds (for example 50%, 80%, and 100% of the limit) and each one dispatches an alert to your connected channels when crossed — so heavy spend surfaces while you can still act on it, not only when requests start getting blocked. Wire the destinations (email, Slack, Teams, webhook) under Manage → Logging → Alert Channels. The hard cap still enforces the block; the thresholds are the early-warning layer on top.

Enforcement Modes

Each budget chooses what happens at the cap. The mode lets you decide between a hard stop and a softer signal:

ModeAt the capUse case
BlockRequests are rejected — 402 (org/team/user) or 429 (key).The hard backstop against runaway spend. The default for production caps.
WarnAn alert fires to your channels; requests keep flowing.Visibility without interruption — track overage while a team sorts out budget.
ThrottleAn alert fires and the budget is flagged for rate reduction; requests are not hard-blocked.A middle ground for high-traffic keys you don't want to fully cut off.

Soft budget vs. hard cap

A soft budget is an early-warning threshold below the hard cap — crossing it sends an alert but never blocks. The max budget is the hard cap that the enforcement mode acts on. Set a soft budget at, say, 75% of your max so you hear about heavy spend before the mode kicks in.

Rate Limits

Rate limits control the throughput of requests and tokens per minute.

Rate Limit Types

LimitDescriptionEnforcement
RPM (Requests Per Minute)Maximum number of API requests per minuteReturns 429 when exceeded
TPM (Tokens Per Minute)Maximum number of tokens processed per minuteReturns 429 when exceeded

How Rate Limits Work

Rate limits use a sliding window. When a limit is exceeded, the API returns a 429 Too Many Requests response with a Retry-After header:

{
  "error": {
    "message": "Rate limit exceeded. Retry after 12 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Rate Limit Scope

Rate limits can be set at the key level. Higher-tier plans include higher default rate limits:

TierDefault RPMDefault TPM
Pay As You GoStandardStandard
Tier 2EnhancedEnhanced
Tier 3PremiumPremium
EnterpriseCustom SLAsCustom SLAs

Custom rate limits set on a specific key override the tier defaults.

Handling Rate Limits in Code

Implement exponential backoff when you receive a 429 response:

import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="sk-nemo-your-key-here",
    base_url="https://api.nemorouter.ai/v1",
)

def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt  # 1s, 2s, 4s
            print(f"Rate limited. Retrying in {wait_time}s...")
            time.sleep(wait_time)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.NEMOROUTER_API_KEY,
  baseURL: "https://api.nemorouter.ai/v1",
  maxRetries: 3, // The OpenAI SDK handles retries automatically
});

Credits and Budgets

It's important to understand how credits and budgets interact:

  • Credits are your account's payment balance. When you buy $100 in credits, you have $100 to spend on API calls.
  • Budgets are spending limits that restrict how much of those credits a specific key or team can consume.

For example:

  • Your organization has $1,000 in credits
  • Team A has a $300/month budget
  • Team B has a $500/month budget
  • $200 in credits remains unbudgeted (available to keys without budgets)

Budgets do not reserve credits — they only cap spend. If Team A spends $300 and Team B spends $500, the remaining $200 is available to any key.

Monitoring Budgets

Cost report at /[organization]/advanced/cost — spend over time with breakdowns by key, team, and model

Track budget usage from the dashboard:

  • Budgets page — View all budgets with current spend vs. limit
  • Budgets report at /[organization]/advanced/budgets — The enforcement view (see below)
  • Cost report at /[organization]/advanced/cost — Drill into spend by key, team, model, or time period
  • API Keys page — See per-key spend at a glance

Spend Limits & Enforcement Panel

The Advanced → Budgets report opens with a live enforcement panel so you can see, at a glance, how close you are to your limits and whether anything is being blocked right now:

  • Org spend limit — A meter showing current spend against your org-wide cap (the account backstop). When several org caps exist, the tightest one is shown, since it binds first.
  • Top team budget — Your largest team allocation with its live utilization.
  • Blocked by enforcement — How many budgets are rejecting requests right now. Every figure on this panel is the same spend the gateway enforces on, so the dashboard reflects exactly what your callers experience.
  • Enforcement modes — A breakdown of how many budgets use Block, Warn, and Throttle.

Below the panel, the burn table lists every budget with its utilization bar and an Enforcement badge — Blocking when it's actively rejecting requests, Over · alerting when a warn/throttle budget is past its cap, or its configured mode otherwise.

No historical block count

Budget blocks happen at request pre-flight, before the call reaches a provider — so a blocked request produces no spend log. The dashboard therefore shows budgets that are blocking now rather than a historical count of rejected requests. To see request-level outcomes, use the Request Logs under Reports → Observability.

Best Practices

  • Set budgets on all production keys — Even generous ones. They're a safety net against bugs that make runaway API calls.
  • Use daily budgets for development — Catch issues early with $10-20/day limits on dev keys.
  • Use total budgets for POCs — Hard-cap spend on proof-of-concept projects.
  • Monitor the Cost report (/[organization]/advanced/cost) weekly — Look for unexpected spend spikes before they become expensive.
  • Separate high-volume and low-volume keys — Don't let a batch pipeline share a key with an interactive chat interface.

Next Steps

Was this page helpful?