DeepSeek: Inference-Time Scaling for Generalist Reward Modeling

163 points | by tim_sw 7 days ago

36 comments