It’s interesting how little press Minimax M3 gets, given it outperforms Deepseek V4 Pro, previously the SOTA for open models. Meanwhile GLM has been in the news daily.
It is strange, huh? But the hype cycles around these models often ignore good contenders. Xiaomi's MiMo-V2.5 Pro was doing really well and didn't get much hype either.
It’s interesting how little press Minimax M3 gets, given it outperforms Deepseek V4 Pro, previously the SOTA for open models. Meanwhile GLM has been in the news daily.
It is strange, huh? But the hype cycles around these models often ignore good contenders. Xiaomi's MiMo-V2.5 Pro was doing really well and didn't get much hype either.
I wonder if multiple attempts at the opossum would produce better results.
If we didn’t have the previous example I would interpret this as pretty solid evidence that labs were training on the Pelican “benchmark”.
I just can’t imagine a model dropping so significantly from one version to the next on such a silly task.
Related:
GLM-5.2 is the new leading open weights model on Artificial Analysis
https://news.ycombinator.com/item?id=48567759