AERIOXFLUX
◆ LIVE MARKETS & AI WIRE — LOADING…
Frontier Labs
benchmarks safetyNew

SWE-Bench

Benchmark evaluating LLMs on resolving real GitHub software issues.

weight 0.0Open SourceLaunched 2026-06-07

💸 No earnings reported yet

What it is

SWE-bench is a benchmark that tests language models on resolving real-world software engineering issues pulled from GitHub repositories. The de facto standard for measuring coding-agent capability.

How AI plugs in

Evaluates models by having them resolve real GitHub issues end-to-end, scoring whether the generated patch passes the repo's own tests.

★ Reviews

No reviews yet — be the first.

Your rating

Discussion (0)

The state of AI, in flux.

The directory + magazine for AI tools and the workflows people use to make money with them.

🔥 The Sauce Drop

The week's highest-earning AI workflows, in your inbox.

Some outbound links are affiliate links — Flux may earn a commission at no cost to you; this never affects rankings. Earnings figures are self-reported and not guarantees of income; most people earn less, some earn nothing.