experiment·2025·creator
tiny-llm-router
Routes requests across multiple LLM providers with cost and latency guards. Drop-in OpenAI-compatible API.
overview
A thin Python service that sits in front of multiple LLM providers and decides which one to call based on cost, latency, and per-route quality requirements. The API is OpenAI-compatible, so existing clients work without any code changes.
It started as a way to stop overpaying for cheap, easily-routable prompts and ended up as a small but useful piece of infrastructure I now run in front of every side project that touches an LLM.
key features
- OpenAI-compatible REST API — drop-in replacement
- Provider routing by cost, latency, or model capability
- Streaming token responses preserved end-to-end
- Redis-backed request cache with semantic dedup
- Per-route budget caps with automatic fallback
built with
PythonFastAPIRedisOpenAI SDK