Remixed Tiled Hackernews

Apple M3 Ultra

941 points - yesterday at 1:59 PM

Source

cxie
yesterday at 3:28 PM
512GB of unified memory is truly breaking new ground. I was wondering when Apple would overcome memory constraints, and now we're seeing a half-terabyte level of unified memory. This is incredibly practical for running large AI models locally ("600 billion parameters"), and Apple's approach of integrating this much efficient memory on a single chip is fascinating compared to NVIDIA's solutions. I'm curious about how this design of "fusing" two M3 Max chips performs in terms of heat dissipation and power consumption though
jjuliano
today at 9:27 AM
Currently, Docker does not support Metal GPUs.
When running LLMs on Docker with an Apple M3 or M4 chip, they will operate in CPU mode regardless of the chip's class, as Docker only supports Nvidia and Radeon GPUs.
If you're developing LLMs on Docker, consider getting a Framework laptop with an Nvidia or Radeon GPU instead.
Source: I develop an AI agent framework that runs LLMs inside Docker on an M3 Max (https://kdeps.com).
InTheArena
yesterday at 2:28 PM
Whoa. M3 instead of M4. I wonder if this was basically binning, but I thought that I had read somewhere that the interposer that enabled this for the M1 chips where not available.
That Said, 512GB of unified ram with access to the NPU is absolutely a game changer. My guess is that Apple developed this chip for their internal AI efforts, and are now at the point where they are releasing it publicly for others to use. They really need a 2U rack form for this though.
This hardware is really being held back by the operating system at this point.
ksec
yesterday at 2:26 PM
Previous model of M2 Ultra had max memory of 192GB. Or 128GB for Pro and some other M3 model, which I think is plenty for even 99.9% of professional task.
They now bump it to 512GB. Along with insane price tag of $9499 for 512GB Mac Studio. I am pretty sure this is some AI Gold rush.
lauritz
yesterday at 2:34 PM
They update the Studio to M3 Ultra now, so M4 Ultra can presumably go directly into the Mac Pro at WWDC? Interesting timing. Maybe they'll change the form factor of the Mac Pro, too?
Additionally, I would assume this is a very low-volume product, so it being on N3B isn't a dealbreaker. At the same time, these chips must be very expensive to make, so tying them with luxury-priced RAM makes some kind of sense.
TheTxT
yesterday at 2:22 PM
512GB unified memory is absolutely wild for AI stuff! Compared to how many NVIDIA GPUs you would need, the pricing looks almost reasonable.
bustling-noose
today at 2:22 AM
I wonder if Apple needs to reconsider Xserve. While Apple probably have some kind of server infrastructure teams, making their own server infrastructure out of their own hardware and software sounds like something they could explore. The app ecosystem coupled with apples servers offered in the cloud or ones you could buy would be a very interesting service business they could get into. Apples App Store needs better apps given how much the hardware is capable of now especially with iPads using M chips. A cloud backed hardware and software service specially designed for the app ecosystem sounds very tempting.
The hardware has evolved faster than software at Apple. It’s usually the opposite with most tech companies where hardware is unable to keep up with software.
mrtksn
yesterday at 2:37 PM
Let's say you want to have the absolute max memory(512GB) to run AI models and let's say that you are O.K. with plugging a drive to archive your model weights then you can get this for a little bit shy of $10K. What a dream machine.
Compared to Nvidia's Project DIGITS which is supposed to cost $3K and be available "soon", you can get a specs matching 128GB & 4TB version of this Mac for about $4700 and the difference would be that you can actually get it in a week and will run macOS(no idea how much performance difference to expect).
I can't wait to see someone testing the full DeepSeek model on this, maybe this would be the first little companion AI device that you can fully own and can do whatever you like with it, hassle-free.
teleforce
yesterday at 11:04 PM
Thunderbolt 5 (TB 5) is pretty handy, you can have a very thin and lightweight laptop, then can get access to external GPU or eGPU via TB 5 if needed [1]. Now you can have your cake (lightweight laptop) and eat it too (potent GPU).
[1] Asus just announced the world’s first Thunderbolt 5 eGPU:
https://www.theverge.com/24336135/asus-thunderbolt-5-externa...
VVilhelmsen
today at 7:42 AM
Who is this made for? Who needs a personal computer this powerful? Not trying to be funny - it's a genuine question.
Gamers don't generally use a mac because of the lack of games and I'm guessing those who are really into LLMs use Linux for the flexibility. Video editing can be done on much cheaper hardware.
Very rich LLM enthusiasts who wants to try out mac?
c0deR3D
yesterday at 2:44 PM
When would Apple silicons made natively support for OSes such as Linux? Apple seemlingly reluctant to release detailed technical reference manual for M-series SoCs, which makes running Linux natively on Apple silicon challenging.
_alex_
yesterday at 3:06 PM
apple keeps talking about the Neural Engine. Does anything actually use it? Seems like all the current LLM and Stable Diffusion packages (including MLX) use the GPU.
raydev
yesterday at 5:31 PM
I know it's basically nitpicking competing luxury sports cars at this point, but I am very bothered that existing benchmarks for the M3 show single core perf that is approximately 70% of M4 single core perf.
I feel like I should be able to spend all my money to both get the fastest single core performance AND all the cores and available memory, but Apple has decided that we need to downgrade to "go wide". Annoying.
submeta
yesterday at 2:59 PM
I am confused. I got an M4 with 64 GB Ram. Did I buy something from the future? :) Now why M3 now? Not M4 Ultra.
rjeli
yesterday at 3:56 PM
Wow, incredible. I told myself I’d stop waffling and just buy the next 800gb/s mini or studio to come out, so I guess I’m getting this.
Not sure how much storage to get. I was floating the idea of getting less storage, and hooking it up to a TB5 NAS array of 2.5” SSDs, 10-20tb for models + datasets + my media library would be nice. Any recommendations for the best enclosure for that?
narrator
yesterday at 10:08 PM
Not to rain on the Apple parade, but cloud video editing with the models running on H100s that can edit videos based on prompts is going to be vastly more productive than anything running locally. This will be useful for local development with the big Deepseek models though. Not sure if it's worth the investment unless Deepseek is close to the capability of cloud models, or privacy concerns overwhelm everything.
joshhart
yesterday at 8:17 PM
This is pretty exciting. Now an organization could produce an open weights mixture of experts model that has 8-15b active parameters but could still be 500b+ parameters and it could be run locally with INT4 quantization with very fast performance. DeepSeek R1 is a similar model but over 30b active parameters which makes it a little slow.
I do not have a good sense of how well quality scales with narrow MoEs but even if we get something like Llama 3.3 70b in quality at only 8b active parameters people could do a ton locally.
alok-g
yesterday at 6:59 PM
Two questions for the fellow HNers:
1. What are various average joe (as opposed to researchers, etc.) use cases for running powerful AI models locally vs. just using cloud AI. Privacy of course is a benefit, but it by itself may not justify upgrades for an average user. Or are we expecting that new innovation will lead to much more proliferation of AI and use cases that will make running locally more feasible?
2. With the amount of memory used jumping up, would there be a significant growth for companies making memories? If so, which ones would be the best positioned?
Thanks.
divan
yesterday at 5:53 PM
Model with 512GB VRAM costs $9500, if anyone wonders.
ferguess_k
yesterday at 3:05 PM
Ah, if we can have the hardware and the freedom of installing a good Linux repo on top of it. How is Asahi? Is it good enough? I assume, that since Asahi is focused on Apple hardware, it should have an easier time figuring out drivers and etc?
crest
yesterday at 2:33 PM
Too bad it lacks even the streaming mode SVE2 found in M4 cores. If only Apple would provide a full SVE2 implementation to put pressure on ARM to make it non-optional so AArch64 isn't effectively restricted to NEON for SIMD.
iambateman
yesterday at 2:24 PM
People who know more than me: they’re talking a lot about RAM and not much about GPU.
Do you expect this will be able to handle AI workloads well?
All I’ve heard for the past two years is how important a beefy GPU is. Curious if that holds true here too.
tempodox
yesterday at 2:57 PM
I could salivate over the hardware no end, if only Apple software (including the OS) weren't that shoddy.
screye
yesterday at 4:40 PM
How does the 500gb vram compare with 8xA100s ? ($15/hr rentals)
If it is equivalent, then the machine pays for itself in 300 hours. That's incredible value.
tap-snap-or-nap
yesterday at 9:44 PM
All this hardware but I don't know how to best utilize it because 1) I am not a pro, and 2) The apps are not as helpful which can make complex jobs easier, which is what old apple used to do really well.
DidYaWipe
yesterday at 11:51 PM
How do people feel about the value of the M3 Ultra vs. the M4 Max for general computing, assuming that you max out the RAM on the M4 version of the Studio?
dlachausse
yesterday at 2:24 PM
Interesting that they’re releasing M3 Ultra after the M4 Macs have already shipped.
I wonder if the plan is to only release Ultras for odd number generations.
epolanski
yesterday at 7:04 PM
Can anybody ELI5 why aren't there multi gpu builds to run LLMs locally?
It feels like one should be able to build a good machine for 3/4k if not less with 6 16GB mid level gaming GPUs.
Sharlin
yesterday at 3:57 PM
> it can be configured up to 512GB, or over half a terabyte.
Hah, I see what they did there.
yesterday at 4:45 PM
wewewedxfgdf
yesterday at 8:37 PM
Computers these days - the more appealing, exciting, cooler desirable, the higher the price, into the stratosphere.
$9499
What ever happening to competition in computing?
Computing hardware competition used to be cut throat, drop dead, knife fight, last man standing brutally competitive. Now it's just a massive gold rush cash grab.
bredren
yesterday at 2:58 PM
Apart from enabling an 120h update to the XDR Pro, does TB5 offer a viable pathway for eGPUs on Apple Silicon macbooks?
This is a cool computer, but not something I'd want to lug around.
gigatexal
yesterday at 10:50 PM
16TB, 512gb ram, m3 ultra 15k+ usd. Wow.
Did they say why there’s not an m4 ultra?
aurareturn
yesterday at 4:31 PM
You can run the full Deepseek 671b q4 model at 40 tokens/s. 37B active params at a time because R1 is MoE.
martin_a
yesterday at 8:47 PM
> Apple today announced M3 Ultra, the highest-performing chip it has ever created
Well, duh, it would be a shame if you made a step backwards, wouldn't it? I hate that stupid phrase...
datadeft
today at 12:01 AM
Apple today announced M3 Ultra, the highest-performing chip it has ever created
I thought it was few weeks ago when M4 Max came by.
yesterday at 2:43 PM
apatheticonion
today at 12:40 AM
God I wish Linux ran on Apple Silicon (with first class hardware support).
johntitorjr
yesterday at 2:32 PM
Lots of AI HW is focused on RAM (512GB!). I have a cost-sensitive application that needs speed (300+ TOPS), but only 1GB of RAM. Are there any HW companies focused on that space?
wewewedxfgdf
yesterday at 8:50 PM
The good news is that AMD and Intel are both in good positions to develop similar products.
gpapilion
yesterday at 3:18 PM
I think this will eventually morph into apples server fleet. This in conjunction with the ai server factory they are opening makes a lot of sense.
nottorp
yesterday at 2:20 PM
> support for more than half a terabyte of unified memory
Soldered?
827a
yesterday at 3:12 PM
Very curiously: They upgraded the Mac Studio but not the Mac Pro today.
crowcroft
yesterday at 7:12 PM
Kinda curious to see how man tok/sec it can crush. Could be a fun way to host AI apps.
tuananh
yesterday at 3:27 PM
but is it actually usable for anything if it's too slow.
Has anyone has a ballpark number how many tokens per second we can get with this?
datadrivenangel
yesterday at 2:21 PM
Unclear what devices this will be in outside of the mac studio. Also most of the comparisons were with M1 and M2 chips, not M4.
dangoodmanUT
yesterday at 8:52 PM
800GB/s and 512 unified ram is going to go stupid for llms
pier25
yesterday at 2:31 PM
So weird they released the Mac Studio with an M4 Max and M3 Ultra.
Why? They have too many M3 chips on stock?
desertmonad
yesterday at 2:27 PM
Time to upgrade m1 ultra I guess! M1 ultra has been pretty good with deepseek locally.
ozten
yesterday at 5:05 PM
We've come a long way since beowulf clusters of smart toasters.
FloatArtifact
yesterday at 3:17 PM
So, what's the question if the M1/M2 Ultra was limited by GPU/NPU or more memory bandwidth at this point?
I'm curious what instruction sets may have been included with the M3 chip that the other two lack for AI.
So far the candidates seem to be NVIDIA digits, Framework Desktop, M1 64gb M2/M3 128gb studio/ultra.
The GPU market isn't competitive enough for the amount of VRAM needed. I was hoping for an Battlemage GPU Model with 24GB that would be reasonably priced and available.
The framework desktop and devices I think a second generation will be significantly better than what's currently on offer today. Rationale below...
For a max spec processor with ram at $2,000, this seems like a decent deal given today's market. However, this might age very fast for three reasons.
Reason 1: LPDDR6 may debut in the next year or two this could bring massive improvements to memory bandwidth and capacity for soldered on memory.
LPDDR6 vs LPDDR5 - Data bus width - 24 bits, 16 bits Burst length - 24 bits, 15 bits Memory bandwidth - Up to 38.4 GB/s, Up to 6.7 GB/s
- Camm ram may or may not be maintain signal integrity as memory bandwidth increases. Until I see it implemented for a AI use-case in a cost-effective manner, I am skeptical.
Reason 2: - It's a laptop chip with limited PCI lanes and reduced power envelope. Theoretically, a desktop chip could have better performance, more lanes, socketable (Although, I don't think I've seen a socketed CPU with soldered RAM)
Reason 3: In addition, what does hardware look like being repurposed in the future compared to alternatives?
- Unlike desktop or server counterparts which can have a higher cpu core count, PCEe/IO Expansion, this processor with its motherboard is limited on re-purposing later down the line as a server to self-host other software besides AI. I suppose could be turned into a overkill, NAS with ZFS and HBA Single Controller Card in new case.
- Buying into the framework desktop is pretty limited based on the form factor. Next generation might be able to include a 16x slot fully populated, a 10G nic. That seems about it if they're going to maintain the backward compatibility philosophy given the case form factor.
anArbitraryOne
yesterday at 11:10 PM
Can't wait to run asahi on it
ummonk
yesterday at 4:41 PM
Is the Mac Pro dead or are they waiting for M4 Ultra refresh it?
minton
yesterday at 8:52 PM
+ $4,000 to bump to 512GB from 96GB.
mlboss
yesterday at 6:43 PM
$14K with 512gb memory and 16 Tb storage
fintechie
yesterday at 3:51 PM
IMO this is a bigger blow to the AI big boys than Deepseek's release. This is massive for local inference. Exciting times ahead for open source AI.
ballooney
today at 2:52 AM
I'm from the dark ages and am interested in this for non-AI things like CFD. What is the state of SDK support for these chips? Is there a nice rust or C++ library that abstracts the hardware and lets you just do very big Matrix multiplications?
m3kw9
yesterday at 6:43 PM
Instantly hippa compliant high end models running locally.
gatienboquet
yesterday at 2:40 PM
No benchmarks yet for the LLMs :(
cynicalpeace
yesterday at 3:51 PM
Can someone explain what it would take for Apple to overtake NVIDIA as the preferred solution for AI shops?
This is my understanding (probably incorrect in some places)
1. NVIDIA's big advantage is that they design the hardware (chips) and software (CUDA). But Apple also designs the hardware (chips) and software (Metal and MacOS).
2. CUDA has native support by AI libraries like PyTorch and Tensorflow, so works extra well during training and inference. It seems Metal is well supported by PyTorch, but not well supported by Tensorflow.
3. NVIDIA uses Linux rather than MacOS, making it easier in general to rack servers.
universenz
yesterday at 2:21 PM
96gb on baseline model m3 ultra with a max of 512gb! Looks like they’re leaning in hard with the AI crowd.
okamiueru
yesterday at 2:23 PM
Don't know what the prior extreme apple is alluding to here. But, apple marketing is what it is.
ntqvm
yesterday at 3:23 PM
Disappointing announcement. M4 brings a significant uplift over M3, and the ST performance of the M3 Ultra will be significantly worse than the M4 Max.
Even for its intended AI audience, the ISA additions in M4 brought significant uplift.
Are they waiting to put M4 Ultra into the Mac Pro?
xedrac
yesterday at 11:57 PM
Now let me run Linux on it natively without having to jump through hoops. That would be something...
ein0p
yesterday at 5:45 PM
That's all nice, but if they are to be considered a serious AI hardware player, they will need to invest in better support of their hardware in deep learning frameworks such as PyTorch and Jax. Currently the support is rather poor, and is not suitable for any serious work.
perfmode
yesterday at 5:12 PM
32 core, 512GB RAM, 8TB SSD
please take my money now
api
yesterday at 3:22 PM
Half a terabyte could run 8 bit quantized versions of some of those full size llama and deepseek models. Looking forward to seeing some benchmarks on that.
xyst
yesterday at 2:43 PM
I might like Apple again if the SoC could be sold separately and opened up. It would be interesting to see a PC with Asahi or Windows running on Apple’s chips.

daft_pink

yesterday at 3:44 PM

Really? M4 Max or M3 Ultra instead of M4 Ultra?

aurareturn

yesterday at 4:21 PM

  With an M3 Ultra going into the Mac Studio, Apple could  differentiate from the Mac Pro, which could then get the M4 Ultra. Right now, the Mac Studio and Mac Pro oddly both have the M2 Ultra and same overall performance.

https://x.com/markgurman/status/1896972586069942738

behnamoh
yesterday at 2:31 PM
819GB/s bandwidth...
what's the point of 512GB RAM for LLMs on this Mac Studio if the speed is painfully slow?
it's as if Apple doesn't want to compete with Nvidia... this is really disappointing in a Mac Studio. FYI: M2 Ultra already has 800GB/s bandwidth
chvid
yesterday at 2:25 PM
Now make a data center version.
varjag
yesterday at 2:22 PM
Call me a unit fundamentalist but calling 512Gb "over half a terabyte memory" irks me to no end.
1attice
yesterday at 5:36 PM
Now with Ultra-class backdoors? https://news.ycombinator.com/item?id=43003230
cytocync
today at 6:44 AM
[dead]
junglistguy
yesterday at 6:10 PM
[dead]
JacksCracked
yesterday at 3:01 PM
[dead]
Acelar0
yesterday at 3:26 PM
[dead]
catlover76
yesterday at 3:06 PM
[dead]
GypsyKing716
today at 4:25 AM
[flagged]
mythz
yesterday at 2:22 PM
Ultra disappointing, they waited 2 years just to push out a single gen bump, even my last year's iPad Pro runs M4.
giancarlostoro
yesterday at 2:38 PM
At 9 grand I would certainly hope that they support the device software wise longer than they supported my 2017 Macbook Air. I see no reason to be forced to cough up 10 grand essentially every 7 years to Apple, that's ridiculous.
ldng
yesterday at 9:22 PM
Well, a shame for Apple, a lot of the rest of the world is going to boycott american products after such level of treacherousness.
NorwegianDude
yesterday at 2:56 PM
The memory amount is fantastic, memory bandwidth is half decent(~800 GB/s), and the compute capabilities are terrible(36 TOPS).
For comparison, a single consumer card like the RTX 5090 is only 32 GB of memory, has 1792 GB/s memory and 3593 TOPS of compute.
The use cases will be limited. While you can't run a 600B model directly like Apple says(cause you need more memory for that), you can run a quantized version, but it will be very slow unless its a MoE architecture.
moondev
yesterday at 2:39 PM
> support for more than half a terabyte of unified memory — the most ever in a personal computer
AMD Ryzen Threadripper PRO 3995WX released over four years ago and supports 2TB (64c/128t)
> Take your workstation's performance to the next level with the AMD Ryzen Threadripper PRO 3995WX 2.7 GHz 64-Core sWRX8 Processor. Built using the 7nm Zen Core architecture with the sWRX8 socket, this processor is designed to deliver exceptional performance for professionals such as artists, architects, engineers, and data scientists. Featuring 64 cores and 128 threads with a 2.7 GHz base clock frequency, a 4.2 GHz boost frequency, and 256MB of L3 cache, this processor significantly reduces rendering times for 8K videos, high-resolution photos, and 3D models. The Ryzen Threadripper PRO supports up to 128 PCI Express 4.0 lanes for high-speed throughput to compatible devices. It also supports up to 2TB of eight-channel ECC DDR4 memory at 3200 MHz to help efficiently run and multitask demanding applications.

Apple M3 Ultra

cxie

FloatArtifact

lhl

lynguist

tgma

saagarjha

water9

kergonath

McDaveNZ

kmacdough

nightski

water9

vaxman

vaxman

1R053

a1o

ryao

j45

forrestthewoods

reitzensteinm

jahewson

forrestthewoods

reitzensteinm

forrestthewoods

bustling-noose

tgma

khana

drited

valine

FloatArtifact

valine

bick_nyers

valine

ein0p

valine

Der_Einzige

ein0p

saagarjha

rfoo

doctorpangloss

jonfromsf

hot_gril

km3r

DevKoala

ein0p

DevKoala

ein0p

diggan

johnmaguire

bastardoperator

diggan

jamesy0ung

staticman2

Matl

a1o

lostmsu

Matl

nomel

fetus8

hangonhn

titzer

woadwarrior01

kridsdale1

azinman2

cxie

swivelmaster

ksec

DidYaWipe

umanwizard

ksec

petepete

gustomksimus25

bob1029

kridsdale1

deepGem

astrange

saagarjha

TheRealPomax

okanesen