experiment·2025·creator

tiny-llm-router

Routes requests across multiple LLM providers with cost and latency guards. Drop-in OpenAI-compatible API.

overview

A thin Python service that sits in front of multiple LLM providers and decides which one to call based on cost, latency, and per-route quality requirements. The API is OpenAI-compatible, so existing clients work without any code changes.

It started as a way to stop overpaying for cheap, easily-routable prompts and ended up as a small but useful piece of infrastructure I now run in front of every side project that touches an LLM.

key features

OpenAI-compatible REST API — drop-in replacement
Provider routing by cost, latency, or model capability
Streaming token responses preserved end-to-end
Redis-backed request cache with semantic dedup
Per-route budget caps with automatic fallback

built with

PythonFastAPIRedisOpenAI SDK

← all projects next projectreadme.css →