bluegatty
today at 4:31 AM
You wasted all of your commentary on snark and sadly unfunny humour, and yet still managed to add nothing.
Groq is more performant for the growing categories of inference-based tasks, wherein Nvidia's advantage in inference depends bulk/batch processing which will make up a smaller category over time, in relative terms.
The future of AI Silicon is inference, and the cost structure of AI data centres is constrained around the current necessity to have 'high GPU utilization' otherwise, the cost / amortization of the chips doesn't work out.
That cost structure is a limitation of Nvidia architecture.
Groq serves a lot faster, and without the limiting batching requirement, which opens hosting arrangements common in most classical hosting scenarios aka without necessarily the high utilization requirements.
Groq has bespoke hardware, lack of CUDA, much lower memory desnsity obviously and they don't have the deep distribution networks and leverage over TSMC that Nvidia has - but pound for pound, were we to be able to 'fire up a server' for our inference needs, it would be Groq, not Nvidia that we'd turn to.
Were they not a later market entrant and didn't have those barriers to entry, they'd be gigantic.
is groq still using 6 racks to serve Llama3-70B or is that old news?