Project · 2026

Quinference

An AI inference orchestration dashboard that routes LLM requests across heterogeneous GPU clouds, optimizing for cost, latency, and availability — the marketing campaign metaphor, applied to compute.

Stack Next.js · TypeScript · Tailwind · Recharts

Role Hackathon prototype

Links Live demo ·Repository ·Thesis

Inference is becoming a supply-and-demand problem. GPU clouds are heterogeneous — different providers, different hardware (H100, H200, B200, MI300X), different pricing, different availability tiers. Models have their own shape: dense vs. MoE, different TTFT profiles, different quantization tradeoffs. Workloads have their own constraints: a voice agent needs sub-200ms TTFT; batch summarization doesn’t.

Quinference treats this like campaign orchestration. Inference requests are campaigns, GPU providers are channels, models are treatments, and latency and cost budgets are resting rules. The dashboard shows a provider registry, a model registry, and a workload × provider × model allocation matrix. A cost simulator and a “what-if” panel (provider outages, spot spikes, new model releases) let you see the impact of routing decisions.

It’s a hackathon-grade prototype — backed by real public pricing from CoreWeave, Lambda, and others — but the underlying metaphor is the point. The next wave of infrastructure problems look a lot like the ones marketers have been solving for a decade.

← all projects