The Fork Temptation

When you build a product on top of an open-source project, the easiest path feels like forking. You clone the repo, start modifying files, and within a week you have custom features that work exactly how you want. The problem starts on week two.

The upstream project ships a security patch. You need it, but your fork has diverged. You spend a day resolving merge conflicts. Next month, a major feature release drops — new provider support, performance improvements, bug fixes. Merging becomes a multi-day effort. Within six months, your fork is effectively a separate project that shares history with the original but cannot easily absorb its improvements.

We have seen this pattern destroy engineering velocity at multiple companies. That is why NemoRouter takes a fundamentally different approach.

The Read-Only Dependency

LiteLLM is a pinned-version dependency, declared in our backend's pyproject.toml and locked to an exact release. The rule is absolute: we never fork, never patch, never modify upstream source. It is a vendored dependency, treated like any other library we pull from a package registry.

Bumping to a new upstream release is a single command:

poetry add 'litellm@X.Y.Z'

Because we have zero local modifications, there are zero merge conflicts. Every upstream release — security patches, new providers, performance improvements — lands cleanly. We regenerate our TypeScript types from the updated schema and move on.

For day-to-day development, we keep a read-only reference clone of the upstream source outside the main repository so engineers can grep and read the code locally, but nothing in that clone is ever committed or installed as an editable package.

Where Our Code Lives

If we cannot modify LiteLLM, where do our enterprise features go? Into a separate service that sits in the request path in front of LiteLLM.

Frontend (:3001) --> Nemo Backend (:8090, LiteLLM imported as a library) --> Providers

The Nemo Backend is a FastAPI service that intercepts every request. Before handing off to LiteLLM, it applies organization-level configuration: guardrails, prompt templates, A/B test routing, credit reservations, and cost tracking metadata. After LiteLLM returns a response, Nemo Backend settles the credit reservation and adds observability headers.

This separation means LiteLLM does what it does best — provider routing, load balancing, and failover — while our service handles everything enterprise: multi-tenancy, billing, safety, and prompt management.

Sacred Fields and API Contracts

Building on top of LiteLLM means respecting its data model. We treat certain LiteLLM field names as sacred: model_group, custom_llm_provider, prompt_tokens, spend. Our frontend types must match exactly. When upstream changes a schema, we adapt our types — never the other way around.

The same principle applies to cost tracking. LiteLLM computes request cost and returns it in the x-litellm-response-cost header. We read that value. We never compute cost independently because divergent cost calculations would create reconciliation nightmares.

// We read LiteLLM's cost, never compute our own
const cost = parseFloat(response.headers["x-litellm-response-cost"]);
await settleCredits(orgId, reservationId, cost);

The Same-UUID Architecture

Integration between services is simplified by a deliberate architectural choice: the organization UUID is the same everywhere. When a user signs up and an organization is created in Supabase, that exact UUID becomes the organization_id in LiteLLM. No mapping tables, no sync jobs, no entity translation layer.

This means a single query can join data across both schemas because they share the same database with the same identifier. It eliminates an entire class of bugs around identity resolution and cross-service consistency.

Lessons Learned

After months of operating this architecture, several patterns have proven their value:

Upstream sync is a non-event. We pull upstream changes weekly. It takes minutes, not days. New provider support and bug fixes arrive automatically.

Service boundaries prevent scope creep. When a feature request arrives, the question is always: does this belong in the proxy layer or in LiteLLM? If LiteLLM already handles it, we configure it rather than reimplementing it.

Read-only discipline requires enforcement. Our backend depends on LiteLLM through a pinned package version, not a fork or a subtree. The only way to change the LiteLLM version is to bump the pin, which makes accidental patching impossible.

Type generation bridges the gap. After every upstream bump, we regenerate TypeScript types from the LiteLLM database schema. This keeps the frontend in lockstep with whatever LiteLLM ships, without manual type maintenance.

When to Fork, When Not To

Forking makes sense when you are building something fundamentally different from the upstream project. If you need to change core routing logic, rewrite the provider abstraction, or alter the data model, a fork is honest.

But if your value-add is a layer on top — multi-tenancy, billing, safety, observability — a read-only dependency with a separate service is almost always the better choice. You get upstream improvements for free, your code is cleanly separated, and your engineering team spends time building product instead of resolving merge conflicts.

Why We Built on LiteLLM Open Source Instead of Forking

The Fork Temptation

The Read-Only Dependency

Where Our Code Lives

Sacred Fields and API Contracts

The Same-UUID Architecture

Lessons Learned

When to Fork, When Not To

More from Engineering

Multi-Agent Cost Tracking — Attributing LLM Spend Across Agent Pipelines

LLM Routing for AI Agent Pipelines — A Production Guide

Multi-Tenant LLM Routing with Supabase RLS