ThunderKittens: Simple, fast, and adorable AI kernels

75 points | by lnyan 19 hours ago

15 comments

danielhanchen 15 hours ago
This is super cool! Especially matrix mult getting similar or better perf than cuBLAS! If anyone is interested on other kernels like swiglu, geglu, RMS layernorm, I coded some at https://github.com/unslothai/unsloth/tree/main/unsloth/kerne...
zackangelo 6 hours ago
I’m working on an inference platform that allows for tokens to be appended to the context after some tokens have been generated. If there’s other sequences in the batch, it means they’ll have to be padded. Currently this means I can’t use FlashAttention because it doesn’t support arbitrary masks/padding masks… can ThunderKittens help me?
convexstrictly 13 hours ago
CUDA + ThunderKittens 4.5 hour tutorial
https://www.youtube.com/watch?v=xcpEl0cGCC4
boywitharupee 9 hours ago
so, these are hand optimized primitives for specific model of nvidia gpus? do you still have to make launch/scheduling decisions to maximize occupancy? how does this approach scale to other target devices with specialized instruction sets and different architecture?
mynameismon 16 hours ago
How easy is it to run on older GPUs (think 1080Tis)? The reason I ask this is because torch.compile refuses to support that, and that alone makes things much slower.
[-]
- danielhanchen 15 hours ago
  The other issue is Pascal cards don't have tensor cores, so there much slower than those with them. You could try Unsloth for 2x faster llama fine-tuning - someone made P40s and P100s work. Although I would suggest upgrading to at least RTX 20x series.
- formalsystem 10 hours ago
  The project is very much focused on maxing out tensor cores and since older GPUs don’t have them it’s not where the project shines best
- almostgotcaught 16 hours ago
  > torch.compile
  torch.compile is a pt2.0 feature and has nothing to do with handwritten cuda kernels
  > How easy is it to run on older GPUs
  this is a torch cpp extension
  https://github.com/HazyResearch/ThunderKittens/blob/8daffc9c...
  so you're going to have the same exact issue (whatever issue you're having)
simarora777 10 hours ago
hi! We're the devs - we're planning the livestream for 1pm and we'll post the link here, twitter, and in the discord tonight
Archit3ch 11 hours ago
I hate to be that guy, but Metal support?
[-]
- simarora777 10 hours ago
  coming!
pama 15 hours ago
I dont want to use the Platform Formerly Known as Twitter, but does anyone have a way to get the link to their livestream tomorrow?
[-]
- convexstrictly 12 hours ago
  Simran Arora: "Join us for a livestream this Thursday, Halloween/Diwali, and join our channel on the GPU Mode Discord server to hang out with us/get involved:"
  https://discord.com/login?redirect_to=%2Fchannels%2F11894982...
  [-]
  - simarora777 4 hours ago
    Livestream link: https://youtube.com/live/IAwLzkldxUk?feature=share! Come ask questions!
    [-]
    - pama 3 hours ago
      Thanks!