Budget Controls
Set spending limits and rate controls
Last updated
Nemo Router provides granular budget and rate controls to prevent runaway costs and enforce usage policies. Set spending limits and rate controls at the organization, team, or individual key level.
![Budgets page at /[organization]/budgets — current spend vs limit and a Create Budget action](/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdashboard-budgets.bd5dc09e.jpg&w=3840&q=75)
Budget Types
Nemo Router supports two types of controls:
| Control Type | What It Limits | Units |
|---|---|---|
| Spending Budgets | Total dollar spend over a time period | USD ($) |
| Rate Limits | Request and token throughput | RPM (requests/min), TPM (tokens/min) |
Both can be applied at multiple scopes and work together. A request is blocked if either the budget or rate limit is exceeded.
Spending Budgets
Spending budgets cap how much a key or team can spend over a defined period.
Creating a Budget
- Navigate to the Budgets page in the dashboard
- Click Create Budget
- Configure the budget:
| Field | Description | Example |
|---|---|---|
| Budget Name | Descriptive label | "Backend API — Monthly" |
| Max Spend | Maximum dollar amount | $500.00 |
| Duration | Budget reset period | Monthly, Weekly, Daily, or Total |
- Assign the budget to a key or team
Budget Durations
| Duration | Behavior |
|---|---|
| Daily | Resets at midnight UTC each day |
| Weekly | Resets at midnight UTC each Monday |
| Monthly | Resets on the 1st of each month at midnight UTC |
| Total | Never resets — a hard lifetime cap |
Budget Scope
Budgets can be scoped to different levels:
| Scope | Description | Use Case |
|---|---|---|
| Organization | Caps total spend across the whole org | A backstop on the entire account |
| Team | Limits spend for all keys in a team | Department-level budgets |
| User | Caps a single member's spend within the org | Per-person allowances |
| API Key | Limits spend for a single key | Per-environment or per-service limits |
Setting a budget at the organization, team, or user scope is restricted to org owners and admins (scope-ownership enforcement); key-level budgets can be set by the key's owner.
When a budget is exceeded, requests are blocked. The status code depends on the scope:
- Organization / team / user budgets return
402with codebudget_exceeded. - Per-key budgets return
429with codekey_budget_exceeded.
{
"error": "Team budget 'Data Science — Monthly' (monthly) exceeded: $2000.12 used of $2000.00 limit.",
"code": "budget_exceeded",
"request_id": "req_..."
}Budget Examples
Limit a staging key to $50/month:
- Budget Name: "Staging Monthly"
- Max Spend: $50.00
- Duration: Monthly
- Assign to:
staging-api-key
Cap the data science team at $2,000/month:
- Budget Name: "Data Science Monthly"
- Max Spend: $2,000.00
- Duration: Monthly
- Assign to: Team "Data Science"
Hard cap for a proof-of-concept:
- Budget Name: "POC Total Budget"
- Max Spend: $100.00
- Duration: Total (never resets)
- Assign to:
poc-demo-key
Threshold Alerts
A budget can notify you before it hits the hard cap. Configure soft thresholds (for example 50%, 80%, and 100% of the limit) and each one dispatches an alert to your connected channels when crossed — so heavy spend surfaces while you can still act on it, not only when requests start getting blocked. Wire the destinations (email, Slack, Teams, webhook) under Manage → Logging → Alert Channels. The hard cap still enforces the block; the thresholds are the early-warning layer on top.
Enforcement Modes
Each budget chooses what happens at the cap. The mode lets you decide between a hard stop and a softer signal:
| Mode | At the cap | Use case |
|---|---|---|
| Block | Requests are rejected — 402 (org/team/user) or 429 (key). | The hard backstop against runaway spend. The default for production caps. |
| Warn | An alert fires to your channels; requests keep flowing. | Visibility without interruption — track overage while a team sorts out budget. |
| Throttle | An alert fires and the budget is flagged for rate reduction; requests are not hard-blocked. | A middle ground for high-traffic keys you don't want to fully cut off. |
Soft budget vs. hard cap
A soft budget is an early-warning threshold below the hard cap — crossing it sends an alert but never blocks. The max budget is the hard cap that the enforcement mode acts on. Set a soft budget at, say, 75% of your max so you hear about heavy spend before the mode kicks in.
Rate Limits
Rate limits control the throughput of requests and tokens per minute.
Rate Limit Types
| Limit | Description | Enforcement |
|---|---|---|
| RPM (Requests Per Minute) | Maximum number of API requests per minute | Returns 429 when exceeded |
| TPM (Tokens Per Minute) | Maximum number of tokens processed per minute | Returns 429 when exceeded |
How Rate Limits Work
Rate limits use a sliding window. When a limit is exceeded, the API returns a 429 Too Many Requests response with a Retry-After header:
{
"error": {
"message": "Rate limit exceeded. Retry after 12 seconds.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}Rate Limit Scope
Rate limits can be set at the key level. Higher-tier plans include higher default rate limits:
| Tier | Default RPM | Default TPM |
|---|---|---|
| Pay As You Go | Standard | Standard |
| Tier 2 | Enhanced | Enhanced |
| Tier 3 | Premium | Premium |
| Enterprise | Custom SLAs | Custom SLAs |
Custom rate limits set on a specific key override the tier defaults.
Handling Rate Limits in Code
Implement exponential backoff when you receive a 429 response:
import time
from openai import OpenAI, RateLimitError
client = OpenAI(
api_key="sk-nemo-your-key-here",
base_url="https://api.nemorouter.ai/v1",
)
def call_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
except RateLimitError as e:
if attempt == max_retries - 1:
raise
wait_time = 2 ** attempt # 1s, 2s, 4s
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.NEMOROUTER_API_KEY,
baseURL: "https://api.nemorouter.ai/v1",
maxRetries: 3, // The OpenAI SDK handles retries automatically
});Credits and Budgets
It's important to understand how credits and budgets interact:
- Credits are your account's payment balance. When you buy $100 in credits, you have $100 to spend on API calls.
- Budgets are spending limits that restrict how much of those credits a specific key or team can consume.
For example:
- Your organization has $1,000 in credits
- Team A has a $300/month budget
- Team B has a $500/month budget
- $200 in credits remains unbudgeted (available to keys without budgets)
Budgets do not reserve credits — they only cap spend. If Team A spends $300 and Team B spends $500, the remaining $200 is available to any key.
Monitoring Budgets
![Cost report at /[organization]/advanced/cost — spend over time with breakdowns by key, team, and model](/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdashboard-analytics.66439072.jpg&w=3840&q=75)
Track budget usage from the dashboard:
- Budgets page — View all budgets with current spend vs. limit
- Budgets report at
/[organization]/advanced/budgets— The enforcement view (see below) - Cost report at
/[organization]/advanced/cost— Drill into spend by key, team, model, or time period - API Keys page — See per-key spend at a glance
Spend Limits & Enforcement Panel
The Advanced → Budgets report opens with a live enforcement panel so you can see, at a glance, how close you are to your limits and whether anything is being blocked right now:
- Org spend limit — A meter showing current spend against your org-wide cap (the account backstop). When several org caps exist, the tightest one is shown, since it binds first.
- Top team budget — Your largest team allocation with its live utilization.
- Blocked by enforcement — How many budgets are rejecting requests right now. Every figure on this panel is the same spend the gateway enforces on, so the dashboard reflects exactly what your callers experience.
- Enforcement modes — A breakdown of how many budgets use Block, Warn, and Throttle.
Below the panel, the burn table lists every budget with its utilization bar and an Enforcement badge — Blocking when it's actively rejecting requests, Over · alerting when a warn/throttle budget is past its cap, or its configured mode otherwise.
No historical block count
Budget blocks happen at request pre-flight, before the call reaches a provider — so a blocked request produces no spend log. The dashboard therefore shows budgets that are blocking now rather than a historical count of rejected requests. To see request-level outcomes, use the Request Logs under Reports → Observability.
Best Practices
- Set budgets on all production keys — Even generous ones. They're a safety net against bugs that make runaway API calls.
- Use daily budgets for development — Catch issues early with $10-20/day limits on dev keys.
- Use total budgets for POCs — Hard-cap spend on proof-of-concept projects.
- Monitor the Cost report (
/[organization]/advanced/cost) weekly — Look for unexpected spend spikes before they become expensive. - Separate high-volume and low-volume keys — Don't let a batch pipeline share a key with an interactive chat interface.
Next Steps
- Organization Setup — Configure your organization
- Team Management — Create teams and assign budgets
- Authentication — Create and manage API keys