Tiled Hacker news on React Router

iPhone 17 Pro Demonstrated Running a 400B LLM

324 points - today at 2:30 PM

https://xcancel.com/anemll/status/2035901335984611412

Source

firstbabylonian
today at 3:01 PM
> SSD streaming to GPU
Is this solution based on what Apple describes in their 2023 paper 'LLM in a flash' [1]?
1: https://arxiv.org/abs/2312.11514
johnwhitman
today at 6:18 PM
The heat problem is going to be the real constraint here. I've been running smaller models locally for some internal tooling at work and even those make my MacBook sound like a jet engine after twenty minutes. A 400B model on a phone seems like a great way to turn your pocket into a hand warmer, even with MoE routing. The unified memory is clever but physics still applies.
CrzyLngPwd
today at 7:06 PM
I had a dream that everyone had super intelligent AIs in their pockets, and yet all they did was doomscroll and catfish...shortly before everything was destroyed.
lainproliant
today at 7:14 PM
This reminds me of how excited people were to get models running locally when llama.c first hit.
andix
today at 6:01 PM
My iPad Air with M2 can run local LLMs rather well. But it gets ridiculously hot within seconds and starts throttling.
yalogin
today at 5:52 PM
Apple’s unified memory architecture plays a huge part in this. This will trigger a large scale rearchitecture of mobile hardware across the board. I am sure they are already underway.
I understand this is for a demo but do we really need a 400B model in the mobile? A 10B model would do fine right? What do we miss with a pared down one?
illwrks
today at 6:52 PM
I installed Termux on an old Android phone last week (running LineageOS), and then using Termux installed Ollama and a small model. It ran terribly, but it did run.
cj00
today at 3:08 PM
It’s 400B but it’s mixture of experts so how many are active at any time?
causal
today at 3:29 PM
Run an incredible 400B parameters on a handheld device.
0.6 t/s, wait 30 seconds to see what these billions of calculations get us:
"That is a profound observation, and you are absolutely right ..."
_air
today at 3:48 PM
This is awesome! How far away are we from a model of this capability level running at 100 t/s? It's unclear to me if we'll see it from miniaturization first or from hardware gains
r4m18612
today at 5:32 PM
Impressive. Running a 400B model on-device, even at low throughput, is pretty wild.
redwood
today at 5:29 PM
It will be funny if we go back to lugging around brick-size batteries with us everywhere!
skiing_crawling
today at 6:20 PM
I can't understand why this is a surprise to anyone. An iphone is still a computer, of course it can run any model that fits in storage albiet very slowly. The implementation is impressive I guess but I don't see how this is a novel capability. And for 0.6t/s, its not a cost efficient hardware for doing it. The iphone can also render pixar movies if you let it run long enough, mine bitcoin with a pathetic hashrate, and do weather simulations but not in time for the forecast to be relevant.
dv_dt
today at 4:48 PM
CPU, memory, storage, time tradeoffs rediscovered by AI model developers. There is something new here, add GPU to the trade space.
russellbeattie
today at 4:30 PM
I have some macro opinions about Apple - not sure if I'm correct, but tell me what you think.
Apple has always seen RAM as an economic advantage for their platform: Make the development effort to ensure that the OS and apps work well with minimal memory and save billions every year in hardware costs. In 2026, iPhones still come with 8Gb of RAM, Pro/Max come with 12Gb.
The problem is that AI (ML/LLM training and inference) are areas where you can't get around the need for copious amounts of fast working memory. (Thus the critical shortage of RAM at the moment as AI data centers consume as many memory chips as possible.)
Unless there's something I don't know (which is more than possible) Apple can't code their way around this problem, nor create specialized SoCs with ML cores that obviate the need for lots and lots of RAM.
So, it's going to be interesting whether they accept this reality and we start seeing the iPhones in the future with 16Gb, 32Gb or more as standard in order to make AI performant. And if they give up on adding AI to the billions of iPhones with minimal RAM already out there.
As a side note, 8Gb of RAM hasn't been enough for a decade. It prevents basic tasks like keeping web tabs live in the background. My pet peeve is having just a few websites open, and having the page refresh when swapping between them because of aggressive memory management.
To me, Apple's obvious strength is pushing AI to the edge as much as possible. While other companies are investing in massive data centers which will have millions of chips that will be outdated within the next couple years, Apple will be able to incrementally improve their ML/AI features by running on the latest and greatest chips every year. Apple has a huge advantage in that they can design their chips with a mega high speed bus, which is just as important as the quantity of RAM.
But all that depends on Apple's willingness to accept that RAM isn't an area they can skimp on any more, and I'm not sure they will.
Sorry for the brain dump. I'd love to be educated on this in case I'm totally off base.
1970-01-01
today at 6:19 PM
"400 bytes should be enough for anybody"
ashwinnair99
today at 2:57 PM
A year ago this would have been considered impossible. The hardware is moving faster than anyone's software assumptions.
HardCodedBias
today at 6:01 PM
The power draw is going to be crazy (today).
Practical LLMs on mobile devices are at least a few years away.
pier25
today at 3:37 PM
https://xcancel.com/anemll/status/2035901335984611412
Yanko_11
today at 6:01 PM
[dead]
jlhawn
today at 6:06 PM
[dead]
aplomb1026
today at 5:32 PM
[dead]
jee599
today at 3:23 PM
[dead]
literoldolphin
today at 6:13 PM
[dead]
anemll
today at 2:30 PM
[flagged]
rwaksmunski
today at 3:19 PM
Apple might just win the AI race without even running in it. It's all about the distribution.
simopa
today at 2:57 PM
It's crazy to see a 400B model running on an iPhone. But moving forward, as the information density and architectural efficiency of smaller models continue to increase, getting high-quality, real-time inference on mobile is going to become trivial.

iPhone 17 Pro Demonstrated Running a 400B LLM

firstbabylonian

simonw

anemll

Yukonv

anemll

superjan

zozbot234

Aurornis

zozbot234

MillionOClock

zozbot234

Aurornis

zozbot234

Aurornis

zozbot234

jnovek

simonw

foobiekr

manmal

johnwhitman

jgraham

MasterScrat

noboostforyou

Sparkle-san

mordechai9000

alterom

croisillon

zozbot234

CrzyLngPwd

SecretDreams

cindyllm

lainproliant

andix

yalogin

Aurornis

alwillis

Aurornis

happyopossum

root_axis

geek_at

refulgentis

illwrks

Aachen

cj00

simonw

thecopy

Aurornis

freedomben

zozbot234

anemll

jnovek

Hasslequest

anshumankmr

causal

intrasight

bartread

patapong

zozbot234

staticman2

whyenot

AnonymousPlanet

thinkingtoilet

patapong

ctxc

GTP

RuslanL

xeyownt

ep103

lesam

xg15

WarmWash

baal80spam

WarmWash

timcobb

otikik

winwang

Terretta

tombert

actusual