v1.7 — Model catalog with leaderboard rankings
Browse every registered model on one page. Sort by latency, cost, or quality. Switch models with a one-line env var change — your code stays exactly the same.
You shouldn't need to read a provider's release notes to know which model is fastest today. v1.7 ships the model catalog — every model we route to, on one searchable, sortable page.
What's on the page
/{org}/models lists every model registered in the router right now. For each:
- Provider (Google Vertex AI, OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, …)
- Mode (chat / embedding / image / video / audio / rerank / OCR)
- Context window + max output tokens
- Price per 1M input / output tokens (cache-discounted variants shown separately)
- Live p50 / p95 latency measured from real traffic — not synthetic benchmarks
- Quality rank for chat models, scored against a fixed eval set
Sort by any column, filter by capability (vision, code, long-context, structured-output), search by name. Click any model to see the JSON it returns from /v1/models so your SDK lookup matches the dashboard.
ROUTER_STATS as source of truth
The catalog reads from a generated ROUTER_STATS constant — same constant the landing page hero uses to render "18 models live". When we add a model, we update one place and every surface reflects it automatically. No hardcoded "18" anywhere. (We've audited this — see feedback_litellm_live_default.md.)
Switching models
You don't need new code to try a new model. Your SDK call uses the model field:
# Was:
response = client.chat.completions.create(model="gemini-2.5-flash", ...)
# Now try:
response = client.chat.completions.create(model="gemini-2.5-pro", ...)Same key, same endpoint, same billing flow. The catalog tells you what's available; your code chooses.
Public listing
A public, unauthenticated copy lives at /models — useful for procurement reviews, RFPs, and the "do you support X model" question that lands in every sales call.