Budget Controls
Set spending limits and rate controls
Budget Controls
NemoRouter provides granular budget and rate controls to prevent runaway costs and enforce usage policies. Set spending limits and rate controls at the organization, team, or individual key level.
Dashboard screenshot pending
Replace with dashboard-budgets.jpg showing /[organization]/budgets — list of budgets with current spend vs limit and a Create Budget action.
Budget Types
NemoRouter supports two types of controls:
| Control Type | What It Limits | Units |
|---|---|---|
| Spending Budgets | Total dollar spend over a time period | USD ($) |
| Rate Limits | Request and token throughput | RPM (requests/min), TPM (tokens/min) |
Both can be applied at multiple scopes and work together. A request is blocked if either the budget or rate limit is exceeded.
Spending Budgets
Spending budgets cap how much a key or team can spend over a defined period.
Creating a Budget
- Navigate to the Budgets page in the dashboard
- Click Create Budget
- Configure the budget:
| Field | Description | Example |
|---|---|---|
| Budget Name | Descriptive label | "Backend API — Monthly" |
| Max Spend | Maximum dollar amount | $500.00 |
| Duration | Budget reset period | Monthly, Weekly, Daily, or Total |
- Assign the budget to a key or team
Budget Durations
| Duration | Behavior |
|---|---|
| Daily | Resets at midnight UTC each day |
| Weekly | Resets at midnight UTC each Monday |
| Monthly | Resets on the 1st of each month at midnight UTC |
| Total | Never resets — a hard lifetime cap |
Budget Scope
Budgets can be scoped to different levels:
| Scope | Description | Use Case |
|---|---|---|
| API Key | Limits spend for a single key | Per-environment or per-service limits |
| Team | Limits spend for all keys in a team | Department-level budgets |
When a budget is exceeded, all requests using the associated key or team keys return a 402 Payment Required error:
{
"error": {
"message": "Budget exceeded. Current spend: $500.12 / $500.00 limit.",
"type": "budget_error",
"code": "budget_exceeded"
}
}Budget Examples
Limit a staging key to $50/month:
- Budget Name: "Staging Monthly"
- Max Spend: $50.00
- Duration: Monthly
- Assign to:
staging-api-key
Cap the data science team at $2,000/month:
- Budget Name: "Data Science Monthly"
- Max Spend: $2,000.00
- Duration: Monthly
- Assign to: Team "Data Science"
Hard cap for a proof-of-concept:
- Budget Name: "POC Total Budget"
- Max Spend: $100.00
- Duration: Total (never resets)
- Assign to:
poc-demo-key
Rate Limits
Rate limits control the throughput of requests and tokens per minute.
Rate Limit Types
| Limit | Description | Enforcement |
|---|---|---|
| RPM (Requests Per Minute) | Maximum number of API requests per minute | Returns 429 when exceeded |
| TPM (Tokens Per Minute) | Maximum number of tokens processed per minute | Returns 429 when exceeded |
How Rate Limits Work
Rate limits use a sliding window. When a limit is exceeded, the API returns a 429 Too Many Requests response with a Retry-After header:
{
"error": {
"message": "Rate limit exceeded. Retry after 12 seconds.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}Rate Limit Scope
Rate limits can be set at the key level. Higher-tier plans include higher default rate limits:
| Tier | Default RPM | Default TPM |
|---|---|---|
| Pay As You Go | Standard | Standard |
| Tier 2 | Enhanced | Enhanced |
| Tier 3 | Premium | Premium |
| Enterprise | Custom SLAs | Custom SLAs |
Custom rate limits set on a specific key override the tier defaults.
Handling Rate Limits in Code
Implement exponential backoff when you receive a 429 response:
import time
from openai import OpenAI, RateLimitError
client = OpenAI(
api_key="sk-nemo-your-key-here",
base_url="https://api.nemorouter.ai/v1",
)
def call_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
except RateLimitError as e:
if attempt == max_retries - 1:
raise
wait_time = 2 ** attempt # 1s, 2s, 4s
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.NEMOROUTER_API_KEY,
baseURL: "https://api.nemorouter.ai/v1",
maxRetries: 3, // The OpenAI SDK handles retries automatically
});Credits and Budgets
It's important to understand how credits and budgets interact:
- Credits are your account's payment balance. When you buy $100 in credits, you have $100 to spend on API calls.
- Budgets are spending limits that restrict how much of those credits a specific key or team can consume.
For example:
- Your organization has $1,000 in credits
- Team A has a $300/month budget
- Team B has a $500/month budget
- $200 in credits remains unbudgeted (available to keys without budgets)
Budgets do not reserve credits — they only cap spend. If Team A spends $300 and Team B spends $500, the remaining $200 is available to any key.
Monitoring Budgets
Dashboard screenshot pending
Replace with dashboard-analytics.jpg showing /[organization]/analytics — spend over time with breakdowns by key, team, and model.
Track budget usage from the dashboard:
- Budgets page — View all budgets with current spend vs. limit
- Analytics page — Drill into spend by key, team, model, or time period
- API Keys page — See per-key spend at a glance
Best Practices
- Set budgets on all production keys — Even generous ones. They're a safety net against bugs that make runaway API calls.
- Use daily budgets for development — Catch issues early with $10-20/day limits on dev keys.
- Use total budgets for POCs — Hard-cap spend on proof-of-concept projects.
- Monitor the Analytics page weekly — Look for unexpected spend spikes before they become expensive.
- Separate high-volume and low-volume keys — Don't let a batch pipeline share a key with an interactive chat interface.
Next Steps
- Organization Setup — Configure your organization
- Team Management — Create teams and assign budgets
- Authentication — Create and manage API keys