Embeddings

The Embeddings endpoint generates vector representations of text that can be used for semantic search, clustering, classification, and retrieval-augmented generation (RAG).

Endpoint

POST https://api.nemorouter.ai/v1/embeddings

Request Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <NEMOROUTER_API_KEY>`
`Content-Type`	Yes	`application/json`

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	The embedding model to use (e.g., `text-embedding-3-small`, `text-embedding-3-large`)
`input`	string or array	Yes	—	The text to embed. Can be a single string or an array of strings.
`encoding_format`	string	No	`"float"`	Format of the returned embeddings: `"float"` or `"base64"`
`dimensions`	integer	No	Model default	The number of dimensions for the output embedding (supported by some models)

Example Request

cURL

curl https://api.nemorouter.ai/v1/embeddings \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "NemoRouter is an enterprise LLM gateway."
  }'

Batch Embeddings

Generate embeddings for multiple texts in a single request:

curl https://api.nemorouter.ai/v1/embeddings \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": [
      "NemoRouter is an enterprise LLM gateway.",
      "One API key for every major LLM provider.",
      "Built-in guardrails and budget controls."
    ]
  }'

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-nemo-your-key-here",
    base_url="https://api.nemorouter.ai/v1",
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="NemoRouter is an enterprise LLM gateway.",
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.NEMOROUTER_API_KEY,
  baseURL: "https://api.nemorouter.ai/v1",
});

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "NemoRouter is an enterprise LLM gateway.",
});

const embedding = response.data[0].embedding;
console.log(`Embedding dimensions: ${embedding.length}`);

Custom Dimensions

Some models (like text-embedding-3-small and text-embedding-3-large) support reducing the output dimensions:

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="NemoRouter is an enterprise LLM gateway.",
    dimensions=256,  # Reduce from default 3072 to 256
)

Reducing dimensions saves storage and speeds up similarity searches, with a small trade-off in accuracy.

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283,
        -0.005336422,
        0.024047505,
        -0.003570588,
        0.013664783
      ]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Response Fields

Field	Type	Description
`object`	string	Always `"list"`
`data`	array	Array of embedding objects
`data[].object`	string	Always `"embedding"`
`data[].index`	integer	Index of the input text this embedding corresponds to
`data[].embedding`	array	The embedding vector (array of floats)
`model`	string	The model used
`usage`	object	Token usage statistics
`usage.prompt_tokens`	integer	Number of tokens in the input
`usage.total_tokens`	integer	Total tokens processed

Available Embedding Models

Model	Provider	Dimensions	Description
`text-embedding-3-small`	OpenAI	1536	Fast, cost-effective embeddings
`text-embedding-3-large`	OpenAI	3072	Higher accuracy, supports custom dimensions
`text-embedding-ada-002`	OpenAI	1536	Legacy model, still widely used

NemoRouter supports additional embedding models from other providers. Use the Models API to list all available embedding models.

Use Cases

Semantic Search

Generate embeddings for your document corpus and query to find semantically similar content:

# Embed your documents
docs = [
    "NemoRouter provides a unified LLM gateway.",
    "API keys are created from the dashboard.",
    "Guardrails protect against harmful content.",
]

doc_embeddings = client.embeddings.create(
    model="text-embedding-3-small",
    input=docs,
)

# Embed the search query
query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I create an API key?",
)

# Compare with cosine similarity (using numpy)
import numpy as np

query_vec = np.array(query_embedding.data[0].embedding)
for i, doc_emb in enumerate(doc_embeddings.data):
    doc_vec = np.array(doc_emb.embedding)
    similarity = np.dot(query_vec, doc_vec) / (
        np.linalg.norm(query_vec) * np.linalg.norm(doc_vec)
    )
    print(f"Doc {i}: {similarity:.4f} — {docs[i]}")

RAG (Retrieval-Augmented Generation)

Combine embeddings with chat completions for grounded responses:

# 1. Embed the query
query = "How do budget controls work?"
query_emb = client.embeddings.create(
    model="text-embedding-3-small",
    input=query,
)

# 2. Retrieve relevant documents (from your vector store)
relevant_docs = search_vector_store(query_emb.data[0].embedding)

# 3. Generate a grounded response
context = "\n".join(relevant_docs)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": f"Answer based on this context:\n{context}"},
        {"role": "user", "content": query},
    ],
)

Error Responses

Status	Description
`400 Bad Request`	Invalid input or unsupported model
`401 Unauthorized`	Invalid or missing API key
`402 Payment Required`	Insufficient credit balance
`429 Too Many Requests`	Rate limit exceeded

Next Steps

Chat Completions API — Send messages and get AI responses
Models API — List all available models
Python SDK — Full Python integration guide

Embeddings

On this page