Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

20 points | by darkrishabh 4 hours ago

3 comments

ssgodderidge 14 minutes ago
The example model in the documentation is 4o-mini, you might want to update that to a more recent model.
As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?
ianhxu 10 minutes ago
How do you iterate on the judge prompt? Is there an auto rater?
egeozcan an hour ago
Are there any published results gathered using this?