
benchmarks safetyNew
SWE-Bench
Benchmark evaluating LLMs on resolving real GitHub software issues.
weight 0.0Open SourceLaunched 2026-06-07
💸 No earnings reported yet
What it is
SWE-bench is a benchmark that tests language models on resolving real-world software engineering issues pulled from GitHub repositories. The de facto standard for measuring coding-agent capability.
How AI plugs in
Evaluates models by having them resolve real GitHub issues end-to-end, scoring whether the generated patch passes the repo's own tests.
★ Reviews
No reviews yet — be the first.Your rating
