One key, no provider config
Every use case below starts the same way: one NemoRouter key, the OpenAI-compatible endpoint, and the model catalog. No provider accounts, no key vault, no per-model SDK.
RAG pipelines, AI agents, support bots, coding assistants, document processors, voice apps — they all need routing, guardrails, cost control, and observability. Nemo Router gives you all four behind one key across 20+ models.
Same gateway, different jobs
models on Google Vertex AI
Guardrails, budgets, observability — never gated
p50 added latency — LLM time dominates
On Tier 3 — undercuts the 5% standard
Each page below is tailored to how that workload actually runs — which Nemo capabilities matter, and how the request flow looks end to end.
RAG pipelines
A RAG pipeline calls two model families — embeddings to index and query, a chat model to synthesize the answer. Route both through one endpoint, cache the repeats, and track cost per pipeline stage.
AI agents
Agents fan out dozens of LLM calls per task. Auto-failover keeps a long run alive when a provider degrades, per-agent budgets cap spend, and every tool call lands in the request log for audit.
Customer support bots
Support traffic carries real customer data and hostile inputs. Guardrails redact PII and block prompt injection on every request; versioned prompt templates keep tone consistent across the fleet.
Code generation
Coding tools mix fast autocomplete with deep reasoning. Pick the model per task from the catalog, route latency-sensitive calls to the quickest endpoint, and cap spend with per-key, per-seat budgets.
Document processing
Document workloads run vision-capable models over scans and PDFs in large batches. Tag-filtered routing keeps every request on a vision model, and cost tracking attributes spend per batch job.
Voice & realtime AI
Voice apps live or die on latency. Latency-based routing steers each turn to the quickest healthy endpoint, fallbacks keep the conversation alive, and every turn is logged for observability.
The workloads differ, but the gateway underneath is the same. These three guarantees hold whether you ship a RAG bot or a voice agent.
Every use case below starts the same way: one NemoRouter key, the OpenAI-compatible endpoint, and the model catalog. No provider accounts, no key vault, no per-model SDK.
PII redaction, prompt-injection detection, and per-org / per-team / per-key budgets apply to every workload — RAG, agents, or voice. You opt out, never in.
LiteLLM reports the real cost of every call; Nemo logs the strategy, model, latency, and result. Slice spend by pipeline, agent, batch job, or seat.
One key. One bill. Every workload.
Sign up, paste your virtual key, change the base URL. Routing, guardrails, budgets, and observability are unlocked on every plan.