Tiled Hacker news on React Router

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

254 points - today at 9:35 AM

Source

Aurornis
today at 2:44 PM
If this is your first time using open weight models right after release, know that there are always bugs in the early implementations and even quantizations.
Every project races to have support on launch day so they don’t lose users, but the output you get may not be correct. There are already several problems being discovered in tokenizer implementations and quantizations may have problems too if they use imatrix.
So you’re going to see a lot of “I tried it but it sucks because it can’t even do tool calls” and other reports about how the models don’t work at all in the coming weeks from people who don’t realize they were using broken implementations.
If you want to try cutting edge open models you need to be ready to constantly update your inference engine and check your quantization for updates and re-download when it’s changed. The mad rush to support it on launch day means everything gets shipped as soon as it looks like it can produce output tokens, not when it’s tested to be correct.
jasonriddle
today at 6:17 PM
Slightly off topic, but question for folks.
I'm hoping to replace coding with Claude Sonnet 4.5 with a model with an open source or open weights model. Are any of the models on Ollama.com cloud offering (https://ollama.com/search?c=cloud) or any of the models on OpenRouter.ai a close replacement? I know that no model right now matches the full performance and capabilities of Claude Sonnet 4.5, but I want to know how close I can get and with which model(s).
If there is a model you say can replace it, talk about how long you have been using it for, and using what harness (Claude code, opencode, etc), and some strengths and weakness you have noticed. I'm not interested in what benchmarks say, I want to hear about real world use from programmers using these models.
neo_doom
today at 4:04 PM
Huge Claude user here… can someone help me set some realistic expectations if I bought a Mac mini and spun one up? I use Claude primarily for dev work and Home Lab projects. Are the open models good enough to run locally and replace the Claude workload? Or am I better off with my $20/mo Claude subscription?
milchek
today at 1:06 PM
I tested briefly with a MacBook Pro m4 with 36gb. Run in LM Studio with open code as the frontend and it failed over and over on tool calls. Switched back to qwen. Anyone else on similar setup have better luck?
OkGoDoIt
today at 7:07 PM
Sorry for being off topic, but why can’t I open this without being logged into GitHub? I thought gists are either completely private or publicly accessible. Are they no longer publicly accessible?
spencer-p
today at 4:41 PM
Weird that the steps are for "Gemma 4 12b", which does not exist, and then switches to 26b midway through.
There's also a step to verify that it doesn't fit on the GPU with ollama ps showing "14%/86% CPU/GPU". Doesn't this mean you'll have really bad performance?
anonyfox
today at 2:07 PM
M5 air here with 32gb ram and 10/10 cores. Anyone got some luck with mlx builds on oMLX so far? Not at my machine right now and would love to know if these models already work including tool calling
aetherspawn
today at 12:28 PM
Which harness (IDE) works with this if any? Can I use it for local coding right now?
kristopolous
today at 3:27 PM
Are you getting tool call and multimodal working? I don't see it in the quantized unsloth ggufs...
easygenes
today at 10:40 AM
Why is ollama so many people’s go-to? Genuinely curious, I’ve tried it but it feels overly stripped down / dumbed down vs nearly everything else I’ve used.
Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.
boutell
today at 11:38 AM
Last night I had to install the VO.20 pre-release of ollama to use this model. So I'm wondering if these instructions are accurate.
redrove
today at 10:25 AM
There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.
Ollama is slower and they started out as a shameless llama.cpp ripoff without giving credit and now they “ported” it to Go which means they’re just vibe code translating llama.cpp, bugs included.
logicallee
today at 12:06 PM
In case someone would like to know what these are like on this hardware, I tested Gemma 4 32b (the ~20 GB model, the largest Gemma model Google published) and Gemma 4 gemma4:e4b (the ~10 GB model) on this exact setup (Mac Mini M4 with 24 GB of RAM using Ollama), I livestreamed it:
https://www.youtube.com/live/G5OVcKO70ns
The ~10 GB model is super speedy, loading in a few seconds and giving responses almost instantly. If you just want to see its performance, it says hello around the 2 minute mark in the video (and fast!) and the ~20 GB model says hello around 5 minutes 45 seconds in the video. You can see the difference in their loading times and speed, which is a substantial difference. I also had each of them complete a difficult coding task, they both got it correct but the 20 GB model was much slower. It's a bit too slow to use on this setup day to day, plus it would take almost all the memory. The 10 GB model could fit comfortably on a Mac Mini 24 GB with plenty of RAM left for everything else, and it seems like you can use it for small-size useful coding tasks.
zachperkel
today at 3:33 PM
how many TPS does a build like this achieve on gemma 4 26b?
renewiltord
today at 2:41 PM
Just told Claude to sort it out and it ran it. 26 tok/s on the Mac mini I use for personal claw type program. Unusable for local agent but it’s okay.
robotswantdata
today at 10:57 AM
Why are you using Ollama? Just use llama.cpp
brew install llama.cpp
use the inbuilt CLI, Server or Chat interface. + Hook it up to any other app
techpulselab
today at 4:13 PM
[dead]
aplomb1026
today at 5:31 PM
[dead]
jiusanzhou
today at 4:15 PM
[dead]
volume_tech
today at 1:11 PM
[dead]
kanehorikawa
today at 3:45 PM
[dead]
greenstevester
today at 9:35 AM
[flagged]
mark_l_watson
today at 1:58 PM
The article has a few good tips for using Ollama. Perhaps it should note that the Gemma 4 models are not really trained for strong performance with coding agents like OpenCode, Claude Code, pi, etc. The Gemma 4 models are excellent for applications requiring tool use, data extraction to JSON, etc. I asked Gemini Pro about this earlier and Gemini Pro recommended qwen 3.5 models specifically for coding, and backed that up with interesting material on training. This makes sense, and is something that I do: use strong models to build effective applications using small efficient models.

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Aurornis

colechristensen

embedding-shape

alfiedotwtf

embedding-shape

kamranjon

Aurornis

vardalab

jasonriddle

dimgl

jasonriddle

scottcha

jasonriddle

MrScruff

neo_doom

NietTim

MrScruff

hamdingers

alfiedotwtf

milchek

internet101010

Aurornis

jasonjmcghee

abroadwin

jasonjmcghee

abroadwin

OkGoDoIt

OkGoDoIt

spencer-p

Schiendelman

anonyfox

Yukonv

anonyfox

smith7018

aetherspawn

lambda

kristopolous

kristopolous

easygenes

diflartle

flux3125

ryandrake

polotics

easygenes

xenophonf

vonneumannstan

ekianjo

the_lucifer

linolevan

alfiedotwtf

brcmthrowaway

jrm4

the_lucifer

ekianjo

DiabloD3

wolvoleo

boutell

redrove

alifeinbinary

gen6acd60af

zozbot234

jrm4

faitswulff

DiabloD3

ffsm8

u8080

ffsm8

beanjuiceII

meltyness

iLoveOncall

DiabloD3

simondotau

lousken

jedisct1

walthamstow

logicallee

dminik

logicallee

zachperkel