NemoRouter
Guides

Budget Controls

Set spending limits and rate controls

Budget Controls

NemoRouter provides granular budget and rate controls to prevent runaway costs and enforce usage policies. Set spending limits and rate controls at the organization, team, or individual key level.

Dashboard screenshot pending

Replace with dashboard-budgets.jpg showing /[organization]/budgets — list of budgets with current spend vs limit and a Create Budget action.

Budget Types

NemoRouter supports two types of controls:

Control TypeWhat It LimitsUnits
Spending BudgetsTotal dollar spend over a time periodUSD ($)
Rate LimitsRequest and token throughputRPM (requests/min), TPM (tokens/min)

Both can be applied at multiple scopes and work together. A request is blocked if either the budget or rate limit is exceeded.

Spending Budgets

Spending budgets cap how much a key or team can spend over a defined period.

Creating a Budget

  1. Navigate to the Budgets page in the dashboard
  2. Click Create Budget
  3. Configure the budget:
FieldDescriptionExample
Budget NameDescriptive label"Backend API — Monthly"
Max SpendMaximum dollar amount$500.00
DurationBudget reset periodMonthly, Weekly, Daily, or Total
  1. Assign the budget to a key or team

Budget Durations

DurationBehavior
DailyResets at midnight UTC each day
WeeklyResets at midnight UTC each Monday
MonthlyResets on the 1st of each month at midnight UTC
TotalNever resets — a hard lifetime cap

Budget Scope

Budgets can be scoped to different levels:

ScopeDescriptionUse Case
API KeyLimits spend for a single keyPer-environment or per-service limits
TeamLimits spend for all keys in a teamDepartment-level budgets

When a budget is exceeded, all requests using the associated key or team keys return a 402 Payment Required error:

{
  "error": {
    "message": "Budget exceeded. Current spend: $500.12 / $500.00 limit.",
    "type": "budget_error",
    "code": "budget_exceeded"
  }
}

Budget Examples

Limit a staging key to $50/month:

  • Budget Name: "Staging Monthly"
  • Max Spend: $50.00
  • Duration: Monthly
  • Assign to: staging-api-key

Cap the data science team at $2,000/month:

  • Budget Name: "Data Science Monthly"
  • Max Spend: $2,000.00
  • Duration: Monthly
  • Assign to: Team "Data Science"

Hard cap for a proof-of-concept:

  • Budget Name: "POC Total Budget"
  • Max Spend: $100.00
  • Duration: Total (never resets)
  • Assign to: poc-demo-key

Rate Limits

Rate limits control the throughput of requests and tokens per minute.

Rate Limit Types

LimitDescriptionEnforcement
RPM (Requests Per Minute)Maximum number of API requests per minuteReturns 429 when exceeded
TPM (Tokens Per Minute)Maximum number of tokens processed per minuteReturns 429 when exceeded

How Rate Limits Work

Rate limits use a sliding window. When a limit is exceeded, the API returns a 429 Too Many Requests response with a Retry-After header:

{
  "error": {
    "message": "Rate limit exceeded. Retry after 12 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Rate Limit Scope

Rate limits can be set at the key level. Higher-tier plans include higher default rate limits:

TierDefault RPMDefault TPM
Pay As You GoStandardStandard
Tier 2EnhancedEnhanced
Tier 3PremiumPremium
EnterpriseCustom SLAsCustom SLAs

Custom rate limits set on a specific key override the tier defaults.

Handling Rate Limits in Code

Implement exponential backoff when you receive a 429 response:

import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    api_key="sk-nemo-your-key-here",
    base_url="https://api.nemorouter.ai/v1",
)

def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt  # 1s, 2s, 4s
            print(f"Rate limited. Retrying in {wait_time}s...")
            time.sleep(wait_time)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.NEMOROUTER_API_KEY,
  baseURL: "https://api.nemorouter.ai/v1",
  maxRetries: 3, // The OpenAI SDK handles retries automatically
});

Credits and Budgets

It's important to understand how credits and budgets interact:

  • Credits are your account's payment balance. When you buy $100 in credits, you have $100 to spend on API calls.
  • Budgets are spending limits that restrict how much of those credits a specific key or team can consume.

For example:

  • Your organization has $1,000 in credits
  • Team A has a $300/month budget
  • Team B has a $500/month budget
  • $200 in credits remains unbudgeted (available to keys without budgets)

Budgets do not reserve credits — they only cap spend. If Team A spends $300 and Team B spends $500, the remaining $200 is available to any key.

Monitoring Budgets

Dashboard screenshot pending

Replace with dashboard-analytics.jpg showing /[organization]/analytics — spend over time with breakdowns by key, team, and model.

Track budget usage from the dashboard:

  • Budgets page — View all budgets with current spend vs. limit
  • Analytics page — Drill into spend by key, team, model, or time period
  • API Keys page — See per-key spend at a glance

Best Practices

  • Set budgets on all production keys — Even generous ones. They're a safety net against bugs that make runaway API calls.
  • Use daily budgets for development — Catch issues early with $10-20/day limits on dev keys.
  • Use total budgets for POCs — Hard-cap spend on proof-of-concept projects.
  • Monitor the Analytics page weekly — Look for unexpected spend spikes before they become expensive.
  • Separate high-volume and low-volume keys — Don't let a batch pipeline share a key with an interactive chat interface.

Next Steps