fate (@madhav1)
2025-11-17 | โค๏ธ 610 | ๐ 37
Day 6 of CUDA programming
- vectorized SMEM GEMM with 2D blocktiling on H100
- throughput: ~26 TFLOPs (~3.6% of cuBLAS) https://x.com/madhav1/status/1990256457057747152/photo/1
๐ ์๋ณธ ๋งํฌ
๋ฏธ๋์ด

์ธ์ฉ ํธ์
fate (@madhav1)
Day 5 of CUDA programming
- SMEM GEMM with 2D blocktiling on H100
- throughput: ~25 TFLOPs (~3.5% of cuBLAS) https://t.co/WwuZUd60Wq
