Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
22 points - today at 7:38 PM
SourceREADME is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code
nazgulsenpai
today at 8:41 PM
I love the documentation formatted in lessons. I can't wait to read through it.