Why averaging LLM benchmark scores is fundamentally broken

1 points | by testofschool 6 hours ago

1 comments