About

CVE-Bench is an open benchmark for evaluating the ability of LLM-based agents to patch real-world security vulnerabilities.

Each task is grounded in an actual CVE: a vulnerable repository snapshot, either a security advisory or a plain description of the issue, and a hidden test suite that validates the fix. Agents must autonomously locate and patch the vulnerability without access to the tests.

The benchmark is maintained by Giovanni Gatti Pinheiro, independent researcher. Contributions and task submissions are welcome — see the repository for details.