Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

106 points | by matt_d 10 hours ago

82 comments