Embeddings
Generate vector embeddings for text
Embeddings
The Embeddings endpoint generates vector representations of text that can be used for semantic search, clustering, classification, and retrieval-augmented generation (RAG).
Endpoint
POST https://api.nemorouter.ai/v1/embeddingsRequest Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <NEMOROUTER_API_KEY> |
Content-Type | Yes | application/json |
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | The embedding model to use (e.g., text-embedding-3-small, text-embedding-3-large) |
input | string or array | Yes | — | The text to embed. Can be a single string or an array of strings. |
encoding_format | string | No | "float" | Format of the returned embeddings: "float" or "base64" |
dimensions | integer | No | Model default | The number of dimensions for the output embedding (supported by some models) |
Example Request
cURL
curl https://api.nemorouter.ai/v1/embeddings \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "NemoRouter is an enterprise LLM gateway."
}'Batch Embeddings
Generate embeddings for multiple texts in a single request:
curl https://api.nemorouter.ai/v1/embeddings \
-H "Authorization: Bearer $NEMOROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": [
"NemoRouter is an enterprise LLM gateway.",
"One API key for every major LLM provider.",
"Built-in guardrails and budget controls."
]
}'Python
from openai import OpenAI
client = OpenAI(
api_key="sk-nemo-your-key-here",
base_url="https://api.nemorouter.ai/v1",
)
response = client.embeddings.create(
model="text-embedding-3-small",
input="NemoRouter is an enterprise LLM gateway.",
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.NEMOROUTER_API_KEY,
baseURL: "https://api.nemorouter.ai/v1",
});
const response = await client.embeddings.create({
model: "text-embedding-3-small",
input: "NemoRouter is an enterprise LLM gateway.",
});
const embedding = response.data[0].embedding;
console.log(`Embedding dimensions: ${embedding.length}`);Custom Dimensions
Some models (like text-embedding-3-small and text-embedding-3-large) support reducing the output dimensions:
response = client.embeddings.create(
model="text-embedding-3-large",
input="NemoRouter is an enterprise LLM gateway.",
dimensions=256, # Reduce from default 3072 to 256
)Reducing dimensions saves storage and speeds up similarity searches, with a small trade-off in accuracy.
Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.006929283,
-0.005336422,
0.024047505,
-0.003570588,
0.013664783
]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}Response Fields
| Field | Type | Description |
|---|---|---|
object | string | Always "list" |
data | array | Array of embedding objects |
data[].object | string | Always "embedding" |
data[].index | integer | Index of the input text this embedding corresponds to |
data[].embedding | array | The embedding vector (array of floats) |
model | string | The model used |
usage | object | Token usage statistics |
usage.prompt_tokens | integer | Number of tokens in the input |
usage.total_tokens | integer | Total tokens processed |
Available Embedding Models
| Model | Provider | Dimensions | Description |
|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | Fast, cost-effective embeddings |
text-embedding-3-large | OpenAI | 3072 | Higher accuracy, supports custom dimensions |
text-embedding-ada-002 | OpenAI | 1536 | Legacy model, still widely used |
NemoRouter supports additional embedding models from other providers. Use the Models API to list all available embedding models.
Use Cases
Semantic Search
Generate embeddings for your document corpus and query to find semantically similar content:
# Embed your documents
docs = [
"NemoRouter provides a unified LLM gateway.",
"API keys are created from the dashboard.",
"Guardrails protect against harmful content.",
]
doc_embeddings = client.embeddings.create(
model="text-embedding-3-small",
input=docs,
)
# Embed the search query
query_embedding = client.embeddings.create(
model="text-embedding-3-small",
input="How do I create an API key?",
)
# Compare with cosine similarity (using numpy)
import numpy as np
query_vec = np.array(query_embedding.data[0].embedding)
for i, doc_emb in enumerate(doc_embeddings.data):
doc_vec = np.array(doc_emb.embedding)
similarity = np.dot(query_vec, doc_vec) / (
np.linalg.norm(query_vec) * np.linalg.norm(doc_vec)
)
print(f"Doc {i}: {similarity:.4f} — {docs[i]}")RAG (Retrieval-Augmented Generation)
Combine embeddings with chat completions for grounded responses:
# 1. Embed the query
query = "How do budget controls work?"
query_emb = client.embeddings.create(
model="text-embedding-3-small",
input=query,
)
# 2. Retrieve relevant documents (from your vector store)
relevant_docs = search_vector_store(query_emb.data[0].embedding)
# 3. Generate a grounded response
context = "\n".join(relevant_docs)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": query},
],
)Error Responses
| Status | Description |
|---|---|
400 Bad Request | Invalid input or unsupported model |
401 Unauthorized | Invalid or missing API key |
402 Payment Required | Insufficient credit balance |
429 Too Many Requests | Rate limit exceeded |
Next Steps
- Chat Completions API — Send messages and get AI responses
- Models API — List all available models
- Python SDK — Full Python integration guide