crv ← writing

WHY I'M BUILDING QUINFERENCE

ASHWIN VISWATMULA · APRIL 23, 2026 · 3 min read

Quinference started as a hackathon bet: that the problem of routing inference traffic across heterogeneous GPU clouds is structurally identical to the problem of routing marketing treatments across heterogeneous customer segments — and that the tooling should look the same.

This is the short version of the thesis. More to come.

The observation

Marketing orchestration, stripped to its primitives:

Now inference orchestration, stripped to its primitives:

These are the same problem.

The bet

If the problems are structurally the same, the interfaces probably should be too. Quinference borrows the campaign-orchestration mental model directly:

Once you set the problem up this way, a lot of the hard-won intuition from marketing operations transfers. Holdouts become A/B routing tests. Budget forecasting becomes cost simulation. Preflight checks become provider health checks.

Why it matters

The naive version of inference routing is “send everything to the cheapest provider that can meet my latency.” In practice that breaks down for the same reasons naive channel routing breaks down:

  1. Availability risk. Providers go down. If 100% of your workload is on one provider, your app is down with them.
  2. Price volatility. Spot prices move. An allocation that’s optimal today is mispriced tomorrow.
  3. Workload heterogeneity. A voice agent and a batch summarizer should not route the same way, even at the same token volume.
  4. Model heterogeneity. Dense and MoE models have totally different GPU affinities and TTFT profiles.

The interesting work is in the optimization layer that respects all of this. The dashboard is there to make the decisions legible — to you and, eventually, to an agent doing the optimization for you.

Why now

Two reasons.

First: the GPU market is one of the most heterogeneous infrastructure markets in a long time. The pricing gap between the cheapest and most expensive providers for the same workload is routinely 3–5×. Very few teams are set up to exploit that.

Second: the marketing-tech playbook for exactly this kind of optimization already exists. It just needs to be ported. That’s what Quinference is.