$5 free credits when you sign up
← All posts
Engineering

Mounting LiteLLM In-Process: One Fewer Network Hop

Running LiteLLM as a separate service adds a network hop, a port to harden, and a process to operate. Here is why NemoRouter mounts it in-process as an ASGI app — and the trade-off that came with it.

Nemo Team9 min read

For a while, NemoRouter ran LiteLLM the way most people do: as a separate service on its own port, with our backend calling it over HTTP. It worked. It also meant every LLM request crossed an extra network boundary, we operated two processes instead of one, and there was a port on the network whose only job was to talk to the process right next to it. In 2026 we mounted LiteLLM in-process as an ASGI app inside the backend. This post is why, and the honest trade-off it introduced.

The separate-service shape

BEFORE
client → Nemo Backend (:8090) ──HTTP──► LiteLLM (:4000) → providers

                          a network hop between two processes
                          on the same host, every request

The backend did auth, credit reservation, and guardrails, then made an HTTP call to LiteLLM on :4000 for the actual provider dispatch. That hop bought us nothing — both processes lived on the same host — but cost us latency, a second process to deploy and monitor, and an extra listening port to secure.

The in-process shape

AFTER
client → Nemo Backend (:8090, LiteLLM mounted as ASGI sub-app) → providers

                  handoff is a function call, not a network request

LiteLLM is a pinned PyPI dependency, and we mount its ASGI app inside our FastAPI backend. The handoff from our pre-flight (auth + reserve credits + guardrails) to LiteLLM's dispatch is now an in-process ASGI call, not an HTTP round trip. There is no :4000 anymore.

What we gained

Separate serviceIn-process mount
HandoffHTTP round tripASGI function call
Processes to run21
Ports to harden:8090 + :4000:8090
Deploy units2 (version skew possible)1 (always in sync)
Latency+ hop per requestnone added

The deploy-units point is subtle but real: two services can drift to incompatible versions; one process can't disagree with itself. Mounting the dependency means the gateway and the proxy ship as a single, always-consistent unit.

The backend is still always in the path

In-process doesn't mean the frontend talks to LiteLLM. Every request still enters through the Nemo Backend, which does auth, credit reservation, and guardrails before handing off to the mounted LiteLLM. The architecture boundary is unchanged — we removed a network hop, not a control point.

The honest trade-off: the master key

Mounting LiteLLM in-process meant the backend process now holds the LITELLM_MASTER_KEY — because mounted LiteLLM reads it at startup for admin-token validation and encryption salts. So the master key now lives in the backend's environment, where before it lived only behind the separate service. That's a real expansion of blast radius: compromise the backend and the master key is in scope.

We accepted it deliberately, with an offset. We removed an entire service process to harden (:4000 is gone), shrinking the attack surface in one dimension while the master key grew it in another. And the thing that actually matters — customer LLM traffic — still uses only virtual keys, which are rate-limited, budgeted, and revocable. The master key never touches customer inference; it's management-only. We wrote the trade-off down rather than pretending the mount was free, because security decisions you don't document are security decisions you forget you made.

Why a pinned dependency, not a fork

Mounting an upstream project in-process only works if you control its version precisely. LiteLLM is pinned to an exact version in our dependency manifest; bumping it is a deliberate, tested step, not an automatic drift. We read the upstream source for reference but don't fork it — forking would mean inheriting maintenance of a fast-moving project forever. Pinning gives us the in-process benefits with a clean upgrade path. (More on why we build on LiteLLM OSS.)

The takeaway

A separate service for a dependency that lives on the same host is a network hop and an operational unit you may not need. Mounting LiteLLM in-process as an ASGI app removed both — one process, one port, no added latency, always-consistent versions — at the cost of the backend now holding the master key, which we offset by deleting a whole service to harden and keeping customer traffic on virtual keys only. The lesson generalizes: question the hops that exist only because of how you happened to deploy, not because the architecture needs them.

Written by Nemo TeamEngineering, product, and company posts from the Nemo Router team — code-first, cost-honest, no vendor-marketing fluff.

More from Engineering

All posts →
Engineering

Hydration-Safe Rendering for Money and Time

new Date() and Math.random() in a React render body cause hydration mismatches — and on a billing dashboard, a flicker on a number erodes trust. Here is the pattern that keeps server and client agreeing.

Nemo Team
8 min
Engineering

Canary Deploys and Auto-Rollback by SLO

A deploy shouldn't need a human watching a dashboard. Here is how a 5% canary, a fixed observation window, and SLO-gated auto-rollback let changes ship and self-heal without a 3 a.m. page.

Nemo Team
9 min
Engineering

Credit Ledger Parity Checks: Catching Drift Early

If a balance and its ledger ever disagree, money is wrong somewhere. Here is how continuous parity checks compare balance to ledger sum and surface a reservation leak before it becomes a billing incident.

Nemo Team
8 min