\

High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

14 points - last Sunday at 11:35 AM

Source
  • vivahir215

    last Sunday at 11:50 AM

    Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?

      • jchandra

        last Sunday at 11:57 AM

        [dead]

    • jchandra

      last Sunday at 11:36 AM

      [dead]