Olla: A Smart Load Balancer for your LLM Infrastructure

Thushan Fernando — Sun, 03 May 2026 00:00:00 +0000

When you start running local LLMs at home or your work, you usually start with one box and one Ollama install. Things are simple, you’re like, this AI thing isn’t half bad and it runs on my own tin. Then you add another box, maybe an old gaming rig with a spare GPU, then a Mac with LM Studio for the MLX-friendly models, then a Linux box running vLLM for serving things at scale. Suddenly you’ve got four URLs, four model lists, four different API quirks and no way to know which one’s hot or cold.

Lm-Studio on Thushan Fernando

Olla: A Smart Load Balancer for your LLM Infrastructure