\

SepLLM: Accelerate LLMs by Compressing One Segment into One Separator

32 points - last Monday at 1:27 PM

Source
  • kevmo314

    today at 4:05 AM

    This paper seems like it misses the forest for the trees. The analysis is certainly interesting and the proposal sounds viable, sort of like a sliding window attention with a little more history.

    But if it is true that the separators contribute the most towards the attention scores, wouldn't that imply that the tokenization scheme can be improved? Introducing a compression scheme seems like patching around that compared to if the model naturally generated a more random attention distribution.

    • xp84

      today at 5:30 AM

      Or, put another way:

      'Why waste time say lot token when few token do trick?"

      -Kevin Malone