API ReferenceChat Completions
Chat Completions
Send messages and receive AI-generated responses
Last updated
The Chat Completions endpoint is the primary way to interact with LLMs through Nemo Router. It's fully compatible with the OpenAI Chat Completions API, so any existing code or SDK that works with OpenAI will work with Nemo Router.

Endpoint
POST https://api.nemorouter.ai/v1/chat/completionsRequest Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <NEMOROUTER_API_KEY> |
Content-Type | Yes | application/json |
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | The model to use (e.g., gpt-4o, claude-sonnet-4-20250514, gemini-2.5-pro) |
messages | array | Yes | — | Array of message objects with role and content |
temperature | number | No | 1.0 | Sampling temperature between 0 and 2. Lower values are more deterministic. |
max_tokens | integer | No | Model default | Maximum number of tokens to generate |
top_p | number | No | 1.0 | Nucleus sampling parameter. Alternative to temperature. |
n | integer | No | 1 | Number of completions to generate |
stream | boolean | No | false | Whether to stream the response using server-sent events |
stop | string or array | No | null | Sequences where the model will stop generating |
presence_penalty | number | No | 0 | Penalize tokens based on whether they appear in the text so far (-2.0 to 2.0) |
frequency_penalty | number | No | 0 | Penalize tokens based on frequency in the text so far (-2.0 to 2.0) |
response_format | object | No | null | Set to {"type": "json_object"} for JSON mode |
seed | integer | No | null | Seed for deterministic generation (best effort) |
tools | array | No | null | List of tool/function definitions for function calling |
tool_choice | string or object | No | "auto" | Controls which tool the model calls |
user | string | No | null | Stable identifier for the downstream consumer behind this call (e.g. your user's opaque ID, a tenant ID, or a project ID). Surfaces on the dashboard's API Consumers page for per-consumer spend, tags, and rate-limit overrides. Send an opaque ID — never PII. Max 200 chars. |
Message Object
Each message in the messages array has the following structure:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | One of system, user, assistant, or tool |
content | string | Yes | The message content |
name | string | No | Optional name for the message author |
tool_calls | array | No | Tool calls made by the assistant (for assistant role) |
tool_call_id | string | No | ID of the tool call being responded to (for tool role) |
Nemo Router Extra Fields
Nemo Router supports additional fields via extra_body for platform-specific features:
| Field | Type | Description |
|---|---|---|
nemo_guardrail_ids | array | Run only these specific guardrail IDs (overrides org defaults) |
nemo_cache | boolean | Set to false to skip cache for this request |
nemo_prompt_template_id | string | Apply a saved prompt template |
nemo_prompt_variables | object | Variables to inject into the prompt template |
Example Request
cURL
curl https://api.nemorouter.ai/v1/chat/completions \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that provides concise answers."
},
{
"role": "user",
"content": "Explain quantum computing in 3 sentences."
}
],
"temperature": 0.7,
"max_tokens": 256
}'Python
from openai import OpenAI
client = OpenAI(
api_key="sk-nemo-your-key-here",
base_url="https://api.nemorouter.ai/v1",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.NEMOROUTER_API_KEY,
baseURL: "https://api.nemorouter.ai/v1",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing in 3 sentences." },
],
temperature: 0.7,
max_tokens: 256,
});
console.log(response.choices[0].message.content);Response
{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1709251200,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits (qubits) that can exist in multiple states simultaneously through superposition..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 30,
"completion_tokens": 85,
"total_tokens": 115
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion |
object | string | Always chat.completion |
created | integer | Unix timestamp of creation |
model | string | The model used for the completion |
choices | array | Array of completion choices |
choices[].index | integer | Index of the choice |
choices[].message | object | The generated message |
choices[].finish_reason | string | stop, length, tool_calls, or content_filter |
usage | object | Token usage statistics |
usage.prompt_tokens | integer | Tokens in the prompt |
usage.completion_tokens | integer | Tokens in the completion |
usage.total_tokens | integer | Total tokens used |
Nemo Router Response Headers
Nemo Router adds observability headers to every response:
| Header | Description |
|---|---|
x-nemo-request-id | Unique request ID (always present) — quote it in support tickets |
x-nemo-request-cost | Cost of this request in USD |
x-nemo-latency-ms | Gateway latency for this request in milliseconds |
x-nemo-org-id | Your organization ID |
x-nemo-key-alias | The key alias used for this request |
Streaming
Set stream: true to receive the response as server-sent events (SSE):
curl https://api.nemorouter.ai/v1/chat/completions \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Write a short poem."}],
"stream": true
}'Each SSE event contains a chunk of the response:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"In"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}
data: [DONE]Function Calling
Nemo Router supports OpenAI-compatible function calling with models that support it:
curl https://api.nemorouter.ai/v1/chat/completions \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather in San Francisco?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'Error Responses
| Status | Description |
|---|---|
400 Bad Request | Invalid request body, unsupported parameter, or guardrail_blocked — a guardrail rejected the request |
401 Unauthorized | Invalid or missing API key |
402 Payment Required | Insufficient credit balance |
429 Too Many Requests | Rate limit exceeded (RPM or TPM) |
503 Service Unavailable | credit_check_failed — credit system temporarily unavailable |
Next Steps
- Embeddings API — Generate vector embeddings
- Models API — List available models
- Python SDK — Full Python integration guide
- Budget Controls — Set spending limits
Was this page helpful?