What happens to a live conversation if a provider degrades?

The fallback chain retries the next provider transparently. A 5xx or timeout on the primary triggers the next link mid-conversation — the turn still completes and the caller does not hear an error.

Can I see latency for each turn of a voice conversation?

Yes. Every turn lands in the request log with the model, latency, token counts, and cost. Latency metrics let you watch p50 and p99 per model so you can spot a slow turn before it affects callers.

What happens to a live conversation if a provider degrades?

The fallback chain retries the next provider transparently. A 5xx or timeout on the primary triggers the next link mid-conversation — the turn still completes and the caller does not hear an error.

Can I see latency for each turn of a voice conversation?

Yes. Every turn lands in the request log with the model, latency, token counts, and cost. Latency metrics let you watch p50 and p99 per model so you can spot a slow turn before it affects callers.

Does Nemo Router handle speech-to-text and text-to-speech?

Nemo Router is the LLM hop in a voice pipeline — the reasoning turn between transcription and synthesis. Your speech-to-text and text-to-speech layers stay yours; the gateway gives that LLM turn low-latency routing, failover, and observability.

Use Case · Voice & Realtime

Voice assistants that answer fast — every turn.

A voice app lives or dies on latency. Nemo Router steers each conversational turn to the quickest healthy endpoint, keeps the line alive with transparent failover, and logs every turn for observability.

Get started See the turn flow

voice-turn · conversation 4c1d

One turn of a live conversation

Route strategylatency-based

Endpoint chosenquickest healthy

Streamingtoken-by-token

Gateway overhead~95 ms p50

Mid-call failovertransparent

Turn loggedp50 / p99

low-latencyfailover-safeobservable

Gateway overhead: 95 ms
Each turn: Latency-routed
Mid-call failure: Failed over
Turn-level: Observable

Why Nemo for voice

Speed, resilience, and a turn-level trail

Voice is the least forgiving LLM workload — every turn is on a clock and a caller is listening. Nemo Router gives that turn fast routing, transparent failover, and observability.

Latency-based routing

Silence on the line is the failure mode. Each conversational turn is steered to the quickest healthy endpoint using live latency signal — the gateway adds ~95 ms p50, LLM time dominates.

Failover mid-conversation

A provider blip during a live call cannot be a dead-air moment. The fallback chain retries transparently — the turn completes and the caller hears a response, not an error.

Turn-level observability

Every turn lands in the request log with model, latency, tokens, and cost — p50 / p99 per model surfaces a slow tail before callers feel it.

Model catalog for realtime

Pick the model with the latency profile you need, swap as faster models ship — no SDK change. Streaming is proxied with no hot-path buffering.

How it works

A conversational turn, end to end

Nemo Router is the LLM hop between transcription and synthesis. The turn routes for latency, streams back token-by-token, and lands in the log with p50 / p99 metrics.

Voice turn flow

Caller speaks
speech-to-text upstream
Your speech layer transcribes the turn to text.
Turn request
POST /v1/chat/completions
The transcript becomes a streaming chat request.
Latency route
quickest healthy endpoint
Live latency signal picks the fastest model deployment.
Stream the reply
token-by-token
Tokens flow to your text-to-speech with no buffering.
Turn logged
latency metrics
Model, latency, cost per turn — p50 / p99 tracked.

Your speech-to-text and text-to-speech layers stay yours. Nemo Router gives the reasoning turn in between low-latency routing, failover, and a logged record — 97+ models to choose from.

The code

Stream the reasoning turn

A voice turn is a streaming chat request — transcript in, tokens out to your speech layer. These snippets come from the same SDK examples the playground uses; enable streaming and tokens flow as they generate.

Installpip install openai

1	`# Cache: enabled (org default). Pass nemo_cache: false to skip.`
2	`from openai import OpenAI`
3	`import os`
4
5	`client = OpenAI(`
6	`api_key=os.environ["NEMOROUTER_API_KEY"],`
7	`base_url="https://api.nemorouter.ai/v1",`
8	`)`
9
10	`response = client.chat.completions.create(`
11	`model="gemini-2.5-flash-lite",`
12	`temperature=1,`
13	`max_tokens=1024,`
14	`top_p=1,`
15	`messages=[`
16	`{"role": "user", "content": "Hello! What models do you support?"},`
17	`],`
18	`extra_body={`
19	`# "nemo_cache": False, # Uncomment to skip cache`
20	`},`
21	`)`
22
23	`print(response.choices[0].message.content)`

Set stream: true — Nemo Router proxies the token stream with no hot-path buffering.

FAQ

Common voice & realtime questions

Fast turns, no dead air

Build a voice assistant that never leaves the caller waiting

Latency-based routing, transparent failover, and turn-level observability — all unlocked on every plan.

Get started How routing works

Nemo Router is the LLM hop — pair it with any speech-to-text and text-to-speech stack.