Chat Completions

The Chat Completions endpoint is the primary way to interact with LLMs through NemoRouter. It's fully compatible with the OpenAI Chat Completions API, so any existing code or SDK that works with OpenAI will work with NemoRouter.

Public playground — try chat completions interactively

Endpoint

POST https://api.nemorouter.ai/v1/chat/completions

Request Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <NEMOROUTER_API_KEY>`
`Content-Type`	Yes	`application/json`

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	The model to use (e.g., `gpt-4o`, `claude-4-sonnet`, `gemini-2.5-pro`)
`messages`	array	Yes	—	Array of message objects with `role` and `content`
`temperature`	number	No	1.0	Sampling temperature between 0 and 2. Lower values are more deterministic.
`max_tokens`	integer	No	Model default	Maximum number of tokens to generate
`top_p`	number	No	1.0	Nucleus sampling parameter. Alternative to temperature.
`n`	integer	No	1	Number of completions to generate
`stream`	boolean	No	false	Whether to stream the response using server-sent events
`stop`	string or array	No	null	Sequences where the model will stop generating
`presence_penalty`	number	No	0	Penalize tokens based on whether they appear in the text so far (-2.0 to 2.0)
`frequency_penalty`	number	No	0	Penalize tokens based on frequency in the text so far (-2.0 to 2.0)
`response_format`	object	No	null	Set to `{"type": "json_object"}` for JSON mode
`seed`	integer	No	null	Seed for deterministic generation (best effort)
`tools`	array	No	null	List of tool/function definitions for function calling
`tool_choice`	string or object	No	"auto"	Controls which tool the model calls

Message Object

Each message in the messages array has the following structure:

Field	Type	Required	Description
`role`	string	Yes	One of `system`, `user`, `assistant`, or `tool`
`content`	string	Yes	The message content
`name`	string	No	Optional name for the message author
`tool_calls`	array	No	Tool calls made by the assistant (for `assistant` role)
`tool_call_id`	string	No	ID of the tool call being responded to (for `tool` role)

NemoRouter Extra Fields

NemoRouter supports additional fields via extra_body for platform-specific features:

Field	Type	Description
`nemo_guardrail_ids`	array	Run only these specific guardrail IDs (overrides org defaults)
`nemo_cache`	boolean	Set to `false` to skip cache for this request
`nemo_prompt_template_id`	string	Apply a saved prompt template
`nemo_prompt_variables`	object	Variables to inject into the prompt template

Example Request

cURL

curl https://api.nemorouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that provides concise answers."
      },
      {
        "role": "user",
        "content": "Explain quantum computing in 3 sentences."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-nemo-your-key-here",
    base_url="https://api.nemorouter.ai/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in 3 sentences."},
    ],
    temperature=0.7,
    max_tokens=256,
)

print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.NEMOROUTER_API_KEY,
  baseURL: "https://api.nemorouter.ai/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in 3 sentences." },
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

Response

{
  "id": "chatcmpl-abc123def456",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits) that can exist in multiple states simultaneously through superposition..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 85,
    "total_tokens": 115
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique identifier for the completion
`object`	string	Always `chat.completion`
`created`	integer	Unix timestamp of creation
`model`	string	The model used for the completion
`choices`	array	Array of completion choices
`choices[].index`	integer	Index of the choice
`choices[].message`	object	The generated message
`choices[].finish_reason`	string	`stop`, `length`, `tool_calls`, or `content_filter`
`usage`	object	Token usage statistics
`usage.prompt_tokens`	integer	Tokens in the prompt
`usage.completion_tokens`	integer	Tokens in the completion
`usage.total_tokens`	integer	Total tokens used

NemoRouter Response Headers

NemoRouter adds observability headers to every response:

Header	Description
`x-nemo-org-id`	Your organization ID
`x-nemo-key-alias`	The key alias used for this request

Streaming

Set stream: true to receive the response as server-sent events (SSE):

curl https://api.nemorouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a short poem."}],
    "stream": true
  }'

Each SSE event contains a chunk of the response:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"In"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}

data: [DONE]

Function Calling

NemoRouter supports OpenAI-compatible function calling with models that support it:

curl https://api.nemorouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and state, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Error Responses

Status	Description
`400 Bad Request`	Invalid request body or unsupported parameter
`401 Unauthorized`	Invalid or missing API key
`402 Payment Required`	Insufficient credit balance
`429 Too Many Requests`	Rate limit exceeded (RPM or TPM)
`500 Internal Server Error`	Server-side error — retry with backoff

Next Steps

Embeddings API — Generate vector embeddings
Models API — List available models
Python SDK — Full Python integration guide
Budget Controls — Set spending limits

Chat Completions

On this page