Author: Bas van Schaik
Engineering and security teams typically have one main requirement when they consider security tools: effectiveness at detecting vulnerabilities (i.e. produce few false negatives) while maintaining a low false positive rate. That’s exactly what is measured by the OpenSSF CVE Benchmark.
The OpenSSF CVE Benchmark tooling and data are open source and available on GitHub.
A new approach to benchmarking
The benchmark addresses two problems that security teams face today when assessing security tools. First, rather than using synthetic test code, the OpenSSF CVE Benchmark uses real historical CVEs. Using this approach, security tools are tested on real codebases that contain real vulnerabilities. Second, by also analyzing the patched version of every codebase, the tools’ false positive rates can be established more accurately.
For each of the over 200 CVEs in the dataset, the CVE Benchmark determines:
- Is a tool able to detect the vulnerability, or does it produce a false negative?
- Does it recognize the patch, or does it produce a false positive on the patched code?
Currently, one of the most common methods of assessing the efficacy of a security tool, is to run it on a codebase containing synthetic vulnerabilities. For each popular programming language, there exists multiple such codebases, which are maintained to various degrees. Most of such “benchmark codebases” or “test suites” are worked on by volunteers, while others are published by large organisations (like NIST’s SAMATE initiative, which maintains the Juliet test suite).
Creating codebases with synthetic vulnerabilities is challenging. In many cases, the test suites don’t resemble real codebases: every odd-looking snippet of code is almost guaranteed to be a valid result. This is far from true for real-world code. Requesting that tool vendors optimize their analysis capabilities for such codebases is counterproductive: a tool that performs well on a synthetic codebase is by no means guaranteed to perform well on real-world code.
In fact, real vulnerabilities are often the result of a complex interplay between an application’s own code and its dependencies. For example, user-controlled data might enter a web application through a web framework. It might then flow through the application’s own functions and data structures, to eventually end up in a templating framework. It is close to impossible for a group of volunteers (or a government agency) to continually keep their benchmark codebase or test suite up to date with large numbers and the latest versions of popular frameworks and libraries.
Evaluating a tool’s false positive rate is also critical: if a tool produces too many false positives, engineers will lose too much time to the triaging process, and eventually cease using the tool altogether. Unfortunately, it is hard to gauge a tool’s false positive rate by running it on synthetic benchmark codebases. These codebases don’t ship with patches for their vulnerabilities, making it impossible to ascertain whether a tool would recognize such a patch, or flag up a false positive instead.
We need your help!
The OpenSSF CVE Benchmark is a community project that was initiated by the OpenSSF’s Security Tooling working group, and we would love your help!
The benchmark framework currently provides integrations for three different security tools: ESLint, nodejsscan, and CodeQL. Developing an integration for additional security tools is straightforward; it typically requires < 200 lines of code. We invite everyone to contribute their integrations back to our open source repository!