LLM inference engine using C++ and CUDA from scratch without libraries. https://andrewkchan.dev/posts/yalm.html