Technology
Leave a comment

I stopped paying for ChatGPT and switched to a local LLM that runs on my laptop

I stopped paying for ChatGPT and switched to a local LLM that runs on my laptop


I’ve been really getting into local LLMs lately, to the point where I’ve even tried running them on my phone. My AI subscriptions are only going to get more expensive over time, and if they don’t raise prices, they’ll probably just nerf the token limits instead.

So I figured it was time to actually commit to local inference and build a setup that works, rather than waiting until I’m forced to, and I think I’ve got my laptop setup in place now.

I tried running a chatbot on my old computer hardware and it actually worked

You don’t need to fork out for expensive hardware to run an AI on your PC.

LM Studio runs my models faster than Ollama

Faster is better, duh

I already have Ollama set up and working, and I’m keeping it that way. When I need a local model connected to something external, like Claude Code, Ollama is what I use. It just integrates better with other tools and services in my experience, and I don’t see a reason to change that part of the workflow.

But for just sitting down and chatting with a model, LM Studio is better. It’s a proper desktop app. You open it, find a model, download it, and start chatting. The whole thing takes a few minutes, and you don’t need to touch a terminal.

I know Ollama also has a desktop app, but it’s never been the biggest priority. Ollama’s biggest pull has always been connecting to other apps. Even in terms of raw performance, I get a higher token/second on LM Studio, which pretty much settles the conversation for me.

LM Studio vs Ollama tokens per second
Raghav Sethi/MakeUseOf
Credit: Raghav Sethi/MakeUseOf

The model discovery is also where LM Studio actually pulls ahead. You can filter by parameter count, quantization, and use case, and you can see the download size before committing. Comparing a few different 8B models side by side inside the app is a lot more practical than jumping between browser tabs.

LM Studio has a plugin system too. The DuckDuckGo plugin lets the model pull live search results before responding, which fixes the most common complaint about local models being out of date. There’s a Wikipedia plugin as well. Neither of these is a game changer, but they make the experience more practical day to day.

As I mentioned before, where LM Studio falls short is as background infrastructure for other apps. For that, Ollama is still what I reach for. The two run on different ports, so they coexist fine on the same machine. Ollama runs in the background, which I have connected to Claude Code. LM Studio is what I open when I want to actually use a model.

The models I am running on my MacBook right now

A 4B model is actually… good?

A local LLM in LM Studio running Gemma 4 Credit: Amir Bohlooli / MUO

Picking your first local LLM is much easier than you think it is. There are two things that are worth knowing before picking a model: parameter count and quantization.

The number before the “B” stands for billions of parameters. More parameters generally means a more capable model, but also a heavier one. A 4B model loads quickly and runs well on basically any modern Mac. A 12B model works on 16GB but leaves less room for longer conversations.

Quantization is how the model’s weights get compressed to use less memory. The most common format is Q4_K_M, which drops precision down to 4-bit. That sounds like it should hurt quality a lot, but modern quantization is good enough that for most tasks you won’t notice. You get around 75% memory savings.

A MacBook running DeepSeek-R1 locally with a monitor in the background
Image by Raghav Sethi
Credit: Raghav Sethi/MakeUseOf

With 16GB of unified memory on a MacBook Air or Pro, you’re in a good spot for local AI right now. macOS takes a few GB for itself, so you’re working with around 12 to 13GB for the model and context. That’s enough to run something decent without constantly hitting memory pressure.

Right now, the model I’m spending most of my time with is Gemma 4 E4B, Google’s newer MoE model. It loads comfortably on 16GB, runs at a decent speed, and handles images as well as text. Being able to drop a screenshot into the chat is something I use more than I expected.

While Gemma 4 has been pretty good for me, I’d also recommend trying other models, particularly the Qwen family. A lot of it comes down to your hardware and what you’re actually using the model for, so it’s worth experimenting a bit before settling on one.

What about when my laptop isn’t enough?

Use your own cloud!

Mac Mini on a table
Raghav Sethi/MakeUseOf

The MacBook handles most things, but not everything. For bigger models, longer context windows, running multiple tasks at once, I have built a small private AI server at home. The main machine is a Mac mini with more unified memory than the laptop. Ollama runs on it, which means I can load larger models that would just not work on the MacBook.

I primarily run a quant of Qwen 3.6 27B on it for coding, and it’s been working really well to the point it can do most of the tasks I use Sonnet for. Tailscale is what connects everything together. It sets up a private encrypted network across all my devices, so my MacBook and my phone can reach the Mac mini as if they were all on the same local network, no matter where I actually am.

I can be out somewhere and still connect back to the bigger models running at home, without dealing with opening any ports on my router or exposing anything to the public internet myself.

The other thing this setup does is take third-party services out of the picture entirely. No API keys, no usage limits, no subscription price going up. The models are on hardware I own, the network is private, and nothing about it depends on what any particular company decides to do next.

LM Studio running an AI chat.

I’ll never pay for AI again

AI doesn’t have to cost you a dime—local models are fast, private, and finally worth switching to.

We are heading in the right direction

The goal was to get rid of as many AI subscriptions as possible, and I’ve made a lot of progress. Have I fully done it? No. I still use Claude for really complex things, and I don’t see that changing anytime soon. But if you had told me six months ago I’d be this far with local models running on my own hardware, I would have said you’re joking.

The pace at which these models are improving is hard to overstate. What feels like a decent local setup today would have seemed unrealistic not long ago. Who knows where things are in another year. Maybe local inference just becomes the default for most people. I wouldn’t rule it out.

LM Studio logo

OS

Windows, macOS, Linux

Developer

Element Labs

Price model

Free

LM Studio is a desktop app that lets you download, manage, and run large language models directly on your computer, with a built-in chat interface included.




Source link

Leave a Reply

Your email address will not be published. Required fields are marked *