← back to projects
experiment·2025·creator

tiny-llm-router

Routes requests across multiple LLM providers with cost and latency guards. Drop-in OpenAI-compatible API.

tiny-llm-router cover

overview

A thin Python service that sits in front of multiple LLM providers and decides which one to call based on cost, latency, and per-route quality requirements. The API is OpenAI-compatible, so existing clients work without any code changes.

It started as a way to stop overpaying for cheap, easily-routable prompts and ended up as a small but useful piece of infrastructure I now run in front of every side project that touches an LLM.

key features

  • OpenAI-compatible REST API — drop-in replacement
  • Provider routing by cost, latency, or model capability
  • Streaming token responses preserved end-to-end
  • Redis-backed request cache with semantic dedup
  • Per-route budget caps with automatic fallback

built with

PythonFastAPIRedisOpenAI SDK