CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

95 points | by dzign 9 hours ago

9 comments