Going deep on the layer below the model: LLM serving engines, KV-cache and attention internals, and GPU kernels, all built from scratch.
- π jvoltci.github.io: the climb, and the log
- π Mosaic: my open course on AI systems, ML compilers, and inference (7 tracks)
- π LinkedIn
- π Currently building: a from-scratch LLM inference engine (mini-vLLM). Benchmarks soon.





