Tiled Hacker news on React Router

The Tradeoffs of SSMs and Transformers

69 points - 07/08/2025

macleginn
07/08/2025
The part on tokenisation is not very convincing. Replacing BPE with characters or even bytes will not "remove tokenisation" -- atoms will still be tokens, relating to different things in different cultures/writing traditions (a "Chinese byte" is a part of a Chinese character; an "English byte" is basicaly a letter or a number) and not relating to something fundamentally linguistic. BPE can be thought of as another way of representing linguistic sequences with symbols of some kind; it provides less inductive bias into the use of language, but it is not perhaps categorically different from any kind of writing.
Herring
07/08/2025
I'm a bit bearish on SSMs (and hybrid SSM/transformers) because the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are all transformers. There's just no way none of them tried SSMs.