placeholder

Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes

We are introducing AutoPatchBench, a benchmark for the automated repair of vulnerabilities identified through fuzzing. By providing a standardized benchmark, AutoPatchBench enables researchers and …

Click to view the original at engineering.fb.com

Hasnain says:

Great work by some great folks here, gonna bookmark this for late re-reading

“In some instances, the LLM resorted to “cheating” by producing patches that superficially resolved the issue without addressing the underlying problem. This can occur when the generator modifies or removes code in a way that prevents the crash from occurring, but does not actually fix the root cause of the issue. We observed that cheating happens more frequently when we request the LLM to retry within the same trajectory. A potential solution to this could be to empower the LLM to say “I cannot fix it,” which may come with a tradeoff with success rate. However, note that most of the cheating was caught in the verification step, highlighting the utility of differential testing.”

Posted on 2025-04-30T06:23:32+0000