20 points | by darkrishabh 4 hours ago
3 comments
The example model in the documentation is 4o-mini, you might want to update that to a more recent model.
As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?
How do you iterate on the judge prompt? Is there an auto rater?
Are there any published results gathered using this?
The example model in the documentation is 4o-mini, you might want to update that to a more recent model.
As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?
How do you iterate on the judge prompt? Is there an auto rater?
Are there any published results gathered using this?