$5 free credits when you sign up
Use Cases

Build it on one managed gateway.

RAG pipelines, AI agents, support bots, coding assistants, document processors, voice apps — they all need routing, guardrails, cost control, and observability. Nemo Router gives you all four behind one key across 20+ models.

nemo · one key, every workload

Same gateway, different jobs

RAG pipelineembed + chat
Agent runtimefallback chains
Support botPII redacted
Doc processingvision · batch
Voice assistantlatency-routed
Provider confignone
one keyone billno BYOK
One key
20+

models on Google Vertex AI

Every feature
All tiers

Guardrails, budgets, observability — never gated

Gateway overhead
95 ms

p50 added latency — LLM time dominates

Platform fee
0%

On Tier 3 — undercuts the 5% standard

Pick your workload

Six use cases, one gateway

Each page below is tailored to how that workload actually runs — which Nemo capabilities matter, and how the request flow looks end to end.

RAG pipelines

Retrieval-augmented generation, one key for embed + chat

A RAG pipeline calls two model families — embeddings to index and query, a chat model to synthesize the answer. Route both through one endpoint, cache the repeats, and track cost per pipeline stage.

  • Embed + chat routing
  • Response caching
  • Per-pipeline cost tracking
Explore rag pipelines

AI agents

Autonomous agents that stay up and stay in budget

Agents fan out dozens of LLM calls per task. Auto-failover keeps a long run alive when a provider degrades, per-agent budgets cap spend, and every tool call lands in the request log for audit.

  • Fallback chains
  • Per-agent budgets
  • Tool-call observability
Explore ai agents

Customer support bots

Support bots with PII redaction and injection defense built in

Support traffic carries real customer data and hostile inputs. Guardrails redact PII and block prompt injection on every request; versioned prompt templates keep tone consistent across the fleet.

  • PII redaction
  • Injection defense
  • Versioned prompt templates
Explore customer support bots

Code generation

Coding assistants with model choice and per-seat budgets

Coding tools mix fast autocomplete with deep reasoning. Pick the model per task from the catalog, route latency-sensitive calls to the quickest endpoint, and cap spend with per-key, per-seat budgets.

  • Model catalog choice
  • Latency routing
  • Per-seat budgets
Explore code generation

Document processing

Extract, classify, and summarize documents at volume

Document workloads run vision-capable models over scans and PDFs in large batches. Tag-filtered routing keeps every request on a vision model, and cost tracking attributes spend per batch job.

  • Vision-capable models
  • Batch-friendly routing
  • Per-job cost tracking
Explore document processing

Voice & realtime AI

Low-latency routing for voice and realtime assistants

Voice apps live or die on latency. Latency-based routing steers each turn to the quickest healthy endpoint, fallbacks keep the conversation alive, and every turn is logged for observability.

  • Latency-based routing
  • Turn-level observability
  • Model catalog
Explore voice & realtime ai
The common foundation

What every use case shares

The workloads differ, but the gateway underneath is the same. These three guarantees hold whether you ship a RAG bot or a voice agent.

One key, no provider config

Every use case below starts the same way: one NemoRouter key, the OpenAI-compatible endpoint, and the model catalog. No provider accounts, no key vault, no per-model SDK.

Guardrails and budgets, always on

PII redaction, prompt-injection detection, and per-org / per-team / per-key budgets apply to every workload — RAG, agents, or voice. You opt out, never in.

Cost and logs you can attribute

LiteLLM reports the real cost of every call; Nemo logs the strategy, model, latency, and result. Slice spend by pipeline, agent, batch job, or seat.

One key. One bill. Every workload.

Start with the use case that fits your build

Sign up, paste your virtual key, change the base URL. Routing, guardrails, budgets, and observability are unlocked on every plan.