VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

231 points | by timhigins 10 hours ago

99 comments