返回事件列表
7

Analysis Questions SWE-Bench Improvements vs Real-World Merge Rates

论文 大模型 2026-03-13 09:44:21

概要

Critical analysis examines whether SWE-bench benchmark improvements translate to actual production merge rates, suggesting disconnect between benchmark performance and real-world utility.

影响分析

This highlights growing concern about benchmark gaming and need for realistic evaluation metrics. Organizations should complement benchmark scores with internal testing. May drive new benchmarks correlating with production outcomes.

来源