\

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

22 points - today at 7:38 PM

Source
  • yu3zhou4

    today at 8:39 PM

    README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code

    • nazgulsenpai

      today at 8:41 PM

      I love the documentation formatted in lessons. I can't wait to read through it.