How to optimize a CUDA matmul kernel for cuBLAS-like performance (2022)

103 points | by mpweiher a month ago

33 comments