$5 free credits when you sign up
Analytics

See where every dollar goes — and where every millisecond goes.

Deep-dive spend, token, and latency analytics across every model, team, key, and tag. Cost is read directly from provider headers; the dashboard updates in real time.

analytics · last 30 days

Spend overview

gemini-2.5-pro$234.00 · 45%
claude-3.5-sonnet$156.00 · 30%
gpt-4o$89.00 · 17%
gemini-2.5-flash$42.00 · 8%
Total spend$521.00
Δ vs last month+12%
real-timeprovider-pricedledger-parity
Spend visibility
Real-time

Within seconds of completion

Latency reporting
p50 / p95 / p99

Per model, per key, per tag

Cost source
Provider headers

x-nemo-response-cost — never estimated

Ledger drift target
$0.00

Daily reported-vs-ledger parity check

Capabilities

Every metric, one dashboard

Real-time spend by model, team, key, and customer. Tag rollups, time-range filters, latency percentiles, and CSV export — all read directly from provider pricing headers, with no instrumentation in your code.

Spend by model, team, key, customer

See exactly how much each model costs and which slice of your traffic it serves. Compare providers side-by-side, find the models eating the budget, and shift volume in the dashboard — not in a spreadsheet.

  • Per-model spend with month-over-month deltas
  • Per-key spend with budget-cap proximity
  • Per-team spend with budget enforcement
  • Per-customer spend for end-user billing

Spend-by-tag rollups

Pass arbitrary tags on each request — team, project, feature, environment — and group spend by any tag combination. Works with the unmodified OpenAI metadata field.

  • Multi-tag intersection (`team:eng` AND `feature:search`)
  • Tag schema is yours — no fixed dimensions
  • Cardinality enforced server-side to keep dashboards fast
  • Tag groups exportable to CSV with one click

p50 / p95 / p99 latency

Latency percentiles per model, key, tag, and time window. Spot p99 regressions on a specific provider before they hit your users; investigate with single-request drill-down in observability.

  • Time-to-first-token + total latency, both reported
  • Streaming-aware percentiles
  • Per-provider breakdowns — find the slow path
  • Linked to request logs for one-click drill-down

Time-range filtering

Slice every report by hourly, daily, weekly, or monthly windows. Spot usage spikes, compare week-over-week, and chart monthly trends. The same time-range applies across every analytics view.

  • Hourly, daily, weekly, monthly granularity
  • Custom range with calendar picker
  • Time-zone aware — your org timezone, not UTC
  • Persists across the Usage Explorer + Analytics Overview

Token breakdown — input vs output

See prompt vs completion tokens per model. Identify high-input models where prompt-engineering pays off, and high-output models where max-tokens caps move the needle.

  • Prompt + completion token rollups per model
  • Cached-input token tracking where supported
  • Per-key token-usage charts
  • Avg input/output tokens per request, with trend

CSV export — every report

Export any analytics view to CSV with one click. Pipe it into Looker, Tableau, Metabase, or your finance system. Raw per-request logs are available through the observability API for deeper joins.

  • Spend breakdowns, usage rollups, token counts, tag groups
  • Server-side export for large ranges (no row caps in the UI)
  • Schema documented in the OpenAPI spec
  • BI-tool-friendly column names (snake_case)
Cost Integrity

Cost integrity by design — what you see equals what you paid

The number on the dashboard is the number on the provider invoice. Cost flows from `x-nemo-response-cost` into the request log, into the credit ledger, and into the analytics rollup — same value, same UUID, same row.

Provider-priced

Cost is read, never computed — by us or by your code

The Nemo Intelligent Proxy Router owns cost calculation. We read `x-nemo-response-cost` from the response headers and write that exact value to the credit ledger and the analytics rollup. There is no second source of truth, no estimation, no rounding.

  • Single source of truth — `x-nemo-response-cost`
  • Every cost number on the dashboard maps to one ledger row
  • No client-side estimation; no manual price tables
  • Daily reported-vs-ledger parity check targets $0.00 drift
cost-integrity probe · 24h

Reported vs ledger

Requests settled w/ provider cost8 408 / 8 421
Released (errors / timeouts)13
Reported spend$521.00
Credit-ledger debit$521.00
Drift$0.00
Negative-balance attempts0
provider-pricedreserve+settleparity-checked
Latency

Latency percentiles where you can act on them

p50 / p95 / p99

Time-to-first-token and total latency, per dimension

The Usage Explorer reports both time-to-first-token and total request latency at p50/p95/p99 — per model, per key, per tag, per time window. Streaming-aware percentiles mean a streaming response gets a fair p95 number, not an inflated one.

  • Time-to-first-token (TTFT) and total latency tracked separately
  • Per-provider breakdowns — find the slow link
  • Streaming-aware: TTFT is honest, not last-byte
  • Click any percentile to drill into the underlying request log
latency · gemini-2.5-flash · 24h

Percentiles by metric

TTFT · p50180 ms
TTFT · p95420 ms
TTFT · p99780 ms
Total · p50980 ms
Total · p951 240 ms
Total · p992 100 ms
per-modelper-tagstreaming-aware
Dimensions

Pivot every report by tag, time, or dimension

Spend-by-tag

Tag schema is yours — no fixed dimensions

Pass arbitrary tags on each request via the standard metadata field. Group, filter, and pivot in the dashboard by any combination. The same tag schema flows from request → analytics → CSV export.

  • Multi-tag intersection in every chart
  • Cardinality enforced server-side (so dashboards stay fast)
  • Per-tag spend, latency, error-rate rollups
  • Same tags exportable to CSV in every report view
spend-by-tag · last 7d

Top tag groups

team:engineering$312.40
project:chatbot$198.20
feature:search$87.60
env:prod$521.00
customer:cust_456$8.91
Time rangeWeekly
multi-tagpivot-ablecsv-export
The reported-vs-ledger parity check is the chart finance asks for first. $0.00 drift over a 30-day window is what closed our procurement review.

Head of FinOps

Series-C SaaS, ~200 engineers

Cost per Request

Track cost-per-request trends — and act before the bill arrives

Per-request average

Spot regressions when prompts get longer or models get pricier

A rising cost-per-request is the earliest signal that a prompt has bloated, a model has been swapped, or a feature is using more tokens than expected. The dashboard charts the trend and surfaces deltas at the model and tag level.

  • Cost-per-request trend by model and tag
  • Token-per-request trend in the same chart
  • Deltas vs last week / last month
  • Linked to the prompts page so you can A/B-test the fix
cost-per-request · last 7d

Per-request economics

Avg cost / request$0.023
Δ vs last week-8%
Avg input tokens1 280
Avg output tokens340
Cache hit rate12%
trendtoken-decomposedA/B-linked
FAQ

Common analytics questions

Stop reconciling spreadsheets

Real-time spend, latency, and ledger parity — on every plan

Sign up, send your first request, and the analytics dashboard fills in. No instrumentation, no schemas to define, no overnight ETL to wait for.