<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Lm-Studio on Thushan Fernando</title><link>https://www.thushanfernando.com/tags/lm-studio/</link><description>Recent content in Lm-Studio on Thushan Fernando</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>Thushan Fernando</managingEditor><webMaster>Thushan Fernando</webMaster><lastBuildDate>Sun, 03 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.thushanfernando.com/tags/lm-studio/feed.xml" rel="self" type="application/rss+xml"/><item><title>Olla: A Smart Load Balancer for your LLM Infrastructure</title><link>https://www.thushanfernando.com/2026/05/olla-a-smart-load-balancer-for-your-llm-infrastructure/</link><pubDate>Sun, 03 May 2026 00:00:00 +0000</pubDate><author>Thushan Fernando</author><guid>https://www.thushanfernando.com/2026/05/olla-a-smart-load-balancer-for-your-llm-infrastructure/</guid><description>&lt;p&gt;When you start running local LLMs at home or your work, you usually start with one box and one &lt;a href="https://ollama.com"&gt;Ollama&lt;/a&gt; install. Things are simple, you&amp;rsquo;re like, this AI thing isn&amp;rsquo;t half bad and it runs on my own tin. Then you add another box, maybe an old gaming rig with a spare GPU, then a Mac with &lt;a href="https://lmstudio.ai/"&gt;LM Studio&lt;/a&gt; for the MLX-friendly models, then a Linux box running &lt;a href="https://github.com/vllm-project/vllm"&gt;vLLM&lt;/a&gt; for serving things at scale. Suddenly you&amp;rsquo;ve got four URLs, four model lists, four different API quirks and no way to know which one&amp;rsquo;s hot or cold.&lt;/p&gt;</description></item></channel></rss>