API Reference
Chat Completions
Send messages and receive AI-generated responses
Chat Completions
The Chat Completions endpoint is the primary way to interact with LLMs through NemoRouter. It's fully compatible with the OpenAI Chat Completions API, so any existing code or SDK that works with OpenAI will work with NemoRouter.

Endpoint
POST https://api.nemorouter.ai/v1/chat/completionsRequest Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <NEMOROUTER_API_KEY> |
Content-Type | Yes | application/json |
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | The model to use (e.g., gpt-4o, claude-4-sonnet, gemini-2.5-pro) |
messages | array | Yes | — | Array of message objects with role and content |
temperature | number | No | 1.0 | Sampling temperature between 0 and 2. Lower values are more deterministic. |
max_tokens | integer | No | Model default | Maximum number of tokens to generate |
top_p | number | No | 1.0 | Nucleus sampling parameter. Alternative to temperature. |
n | integer | No | 1 | Number of completions to generate |
stream | boolean | No | false | Whether to stream the response using server-sent events |
stop | string or array | No | null | Sequences where the model will stop generating |
presence_penalty | number | No | 0 | Penalize tokens based on whether they appear in the text so far (-2.0 to 2.0) |
frequency_penalty | number | No | 0 | Penalize tokens based on frequency in the text so far (-2.0 to 2.0) |
response_format | object | No | null | Set to {"type": "json_object"} for JSON mode |
seed | integer | No | null | Seed for deterministic generation (best effort) |
tools | array | No | null | List of tool/function definitions for function calling |
tool_choice | string or object | No | "auto" | Controls which tool the model calls |
Message Object
Each message in the messages array has the following structure:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | One of system, user, assistant, or tool |
content | string | Yes | The message content |
name | string | No | Optional name for the message author |
tool_calls | array | No | Tool calls made by the assistant (for assistant role) |
tool_call_id | string | No | ID of the tool call being responded to (for tool role) |
NemoRouter Extra Fields
NemoRouter supports additional fields via extra_body for platform-specific features:
| Field | Type | Description |
|---|---|---|
nemo_guardrail_ids | array | Run only these specific guardrail IDs (overrides org defaults) |
nemo_cache | boolean | Set to false to skip cache for this request |
nemo_prompt_template_id | string | Apply a saved prompt template |
nemo_prompt_variables | object | Variables to inject into the prompt template |
Example Request
cURL
curl https://api.nemorouter.ai/v1/chat/completions \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that provides concise answers."
},
{
"role": "user",
"content": "Explain quantum computing in 3 sentences."
}
],
"temperature": 0.7,
"max_tokens": 256
}'Python
from openai import OpenAI
client = OpenAI(
api_key="sk-nemo-your-key-here",
base_url="https://api.nemorouter.ai/v1",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.NEMOROUTER_API_KEY,
baseURL: "https://api.nemorouter.ai/v1",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing in 3 sentences." },
],
temperature: 0.7,
max_tokens: 256,
});
console.log(response.choices[0].message.content);Response
{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1709251200,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits (qubits) that can exist in multiple states simultaneously through superposition..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 30,
"completion_tokens": 85,
"total_tokens": 115
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion |
object | string | Always chat.completion |
created | integer | Unix timestamp of creation |
model | string | The model used for the completion |
choices | array | Array of completion choices |
choices[].index | integer | Index of the choice |
choices[].message | object | The generated message |
choices[].finish_reason | string | stop, length, tool_calls, or content_filter |
usage | object | Token usage statistics |
usage.prompt_tokens | integer | Tokens in the prompt |
usage.completion_tokens | integer | Tokens in the completion |
usage.total_tokens | integer | Total tokens used |
NemoRouter Response Headers
NemoRouter adds observability headers to every response:
| Header | Description |
|---|---|
x-nemo-org-id | Your organization ID |
x-nemo-key-alias | The key alias used for this request |
Streaming
Set stream: true to receive the response as server-sent events (SSE):
curl https://api.nemorouter.ai/v1/chat/completions \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Write a short poem."}],
"stream": true
}'Each SSE event contains a chunk of the response:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"In"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}
data: [DONE]Function Calling
NemoRouter supports OpenAI-compatible function calling with models that support it:
curl https://api.nemorouter.ai/v1/chat/completions \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather in San Francisco?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'Error Responses
| Status | Description |
|---|---|
400 Bad Request | Invalid request body or unsupported parameter |
401 Unauthorized | Invalid or missing API key |
402 Payment Required | Insufficient credit balance |
429 Too Many Requests | Rate limit exceeded (RPM or TPM) |
500 Internal Server Error | Server-side error — retry with backoff |
Next Steps
- Embeddings API — Generate vector embeddings
- Models API — List available models
- Python SDK — Full Python integration guide
- Budget Controls — Set spending limits