By David Korczynski and Adam Korczynski, Ada Logics
In this blog post, we present background information and updates on Fuzz Introspector, which is a tool focused on assisting the way software projects are being fuzzed and is developed in a collaboration between OpenSSF and Google’s OSS-Fuzz.
Fuzz Introspector is an open source (repository) tool that at its core provides insights and suggestions for improvements on how a given project is being fuzzed. Additionally, several tools have been built around Fuzz Introspector to support more specific use cases, such as a web application for macro-level insights about the fuzzing of many hundreds of open source projects and also an auto fuzzing harness generation tool providing automatic fuzzer generation. We will give a brief introduction to the core of Fuzz Introspector and then proceed to highlight four areas where Fuzz Introspetor has recently had many updates. Finally, we will provide conclusions and discuss future work.
Fuzz Introspector core
At the core, Fuzz Introspector is a tool that helps fuzzer developers understand their fuzzer’s performance, limitations, gaps and any potential blockers. Fuzz Introspector aggregates data about the fuzzing process for some software under analysis, such as the fuzzers’ coverage, hit frequency, entry point functions, complexity of code and so on. The goal of this is to present a birds-eye view of the fuzzing set up for the given software under analysis. In this sense, Fuzz Introspector aims to improve the fuzzing experience of a project by guiding on whether you should:
- Introduce a new fuzzing harness to a given project.
- Modify the existing harnesses to improve their quality.
This information can be digested in both an HTML report that nicely presents the findings in a visual manner or in computer-readable formats for further processing.
Under the hood, Fuzz Introspector relies on static and dynamic program analysis. The static program analysis is based on control-flow analysis to understand the flow of the software under analysis, as well as the fuzzing harness’s flow. The dynamic analysis comes down to code coverage understanding. The data produced by these two analyses are then consumed and analyzed by the Fuzz Introspector core, in order to extract higher-level insights useful for the fuzzer developer. Three example uses of Fuzz Introspector are rapidly understanding the high-level status of the fuzzing of a project, identifying fuzz blockers, and developing ideas for how to improve the fuzzing of a project.
Rapidly understanding the high-level status of the fuzzing of a project is a central feature of Fuzz Introspector. Fuzz Introspector does this by presenting the aggregated data about the software under analysis ito highlight the important bits, showing both positive and negative aspects of the software under analysis. For example, Fuzz Introspector contains data about all functions in the target software under analysis, such as reachability data, code coverage, accumulated complexity and more. This can be used to, e.g., find the most complex functions with the lowest code coverage simply by sorting the columns of the function table. Some of this overview information can be summarized in simple graphics, as shown below, while others need e.g. tables to show vast amounts of data. Fuzz Introspector also provides visual overviews in the form of bitmaps showing the code coverage relative to the control-flow of a given fuzzer. Please see this walk-through for a more in-depth example of this.
Identifying fuzz blockers is a central use case for Fuzz Introspector. The aim of a fuzzing harness is to explore an underlying codebase by continuously mutating the input to the fuzzing harness. A fuzzing harness has a large potential code reach from a control-flow perspective, but this code can only be reached if certain conditions are met by the input to the harness. The purpose of the fuzzing engine is to mutate the input so these conditions are met. However, it’s not always possible for the fuzzing engine to easily generate these inputs, or that it may not be possible for it to do this due to some programmatic limitations in the fuzzing harness. In these circumstances there may exist a discrepancy between what the fuzzer developer understands as possibly reachable versus what is actually being covered at runtime. Fuzz Introspector provides capabilities to identify the areas where this discrepancy is large, thus highlighting where the fuzzer is not achieving its full potential since it’s not executing code that it’s intended to do. Fuzz Introspector can do this down to the code level by identifying specific branches that are causing these blockers and also pinpoint the exact source code location. Additionally, Fuzz Introspector can visualize the control-flow of a fuzzer in combination with the code coverage achieved by the fuzzer. An example of this is shown below, showing the control-flow presented in a depth-first search manner overlaid with code coverage information. This can be used to visually identify large gaps of code execution that should be theoretically possible to guide the developer on how to adjust the fuzzer to get it to cover the specific code parts. This is achieved using a combination of static and dynamic code analysis.
Developing ideas for how to improve the fuzzing of a project is another central feature of Fuzz Introspector. This is done by presenting both explicit suggestions, such as new functions targets, and also guidelines on how to identify gaps through the data show by Fuzz Introspector, such as:
- Finding large functions with low code converge.
- Find functions that have large undiscovered complexity.
- Find the most complex functions in the target code.
In general, the ability to suggest improvements comes down to finding the optimal areas for fuzzing that are not yet being fuzzed in the underlying codebase. This is done by prioritizing and filtering in all of the code data extracted from the dynamic analysis, and then matching this with the code coverage information which shows the ground truth of what the fuzzers achieve. This combination of information then needs to be narrowed down to a few high-value targets that have potential for yielding results when fuzzed. This guide has many more details on how to do this.
There are several documented case studies on how features from Fuzz Introspector have been used to improve projects’ fuzzing set up. For example, bzip2 achieved 100% static reachability, Liblouis improving coverage from 20% to 80% and file improving function code coverage from 45% to 89%. For further case studies see this document.
Recent updates in Fuzz Introspector
In this section we outline recent updates and upgrades in Fuzz Introspector. We will go through four areas in total.
Unifying support for C/C++/Python/Java
In recent months we have improved support for multiple target languages, and Fuzz Introspector now has unified support for C, C++, Java and Python. Fuzz Introspector supports analysis of all these languages and can, for each of these languages, help developers and security researchers rapidly identify how to improve fuzzing. The difficulties in reaching this is in combining the underlying static analysis, which is different for each of the languages, the dynamic code coverage collection which is also different for each language and representing the data in a unified manner.
Automatic fuzzing harness generation
An interesting problem in the fuzzing world is how to improve scalability of generating harnesses. We have created a set of techniques for doing auto-generation of fuzzing harnesses by way of Fuzz Introspector. This is currently being developed for Python and Java projects, but will also be applied on C/C++ projects later this year. To date, we have run initial analyses on this for a set of roughly 500 Python projects and successfully generated fuzzers for 280 of these, with 150 of these fuzzers having successfully explored multiple code paths during an initial 10 second trial run, and for around 80 of these the edge coverage gained was over 100. To further experimentation efforts, 24 of these have been integrated into Google’s open source fuzzing service OSS-Fuzz and we can confirm that several bugs have been found from the automatically generated fuzzers. In the next few months we intend on scaling up this activity to assist many more projects in adopting fuzzing.
Macro-level insights into open source fuzzing
We have developed a front-end that shows Fuzz Introspector output for many projects and can display it in a unified manner to give macro-level insights for many projects together. This macro-level overview is available by way of OSS-Fuzz, which displays the data for almost all OSS-Fuzz projects on https://introspector.oss-fuzz.com. This website can be used to
- Observe the fuzzing status for many open source security critical projects;
- Search for arbitrary function names from a given OSS project and check if it’s being fuzzed (if it’s in the set of projects fuzzed by OSS-Fuzz);
- Provide insights into how a project can improve its fuzzing without the need for being an expert in the source code of the fuzzer.
The high level goal of this work is to make it a central location for introspection of open source fuzzing, showing how well a given open source projects use fuzzing as well as making it easy to identify potential gaps. This can, for example, be used by users of open source software to quickly identify if the code they use has been security tested by fuzzing. The web application also contains historical fuzzing data, making it easy to track progress of the fuzzing of a given project, such as that done by the Liblouis project as described here. The figure below shows a snippet of the overview data from the OSS-Fuzz integration of the web application.
A recent direction in fuzzing is to create sanitizers that help identify higher-level vulnerabilities. To assist on this matter, we have developed static analysis heuristics that can identify various bug heuristics, e.g. if there are potential areas in the target project with possible code injection. Fuzz Introspector then reports and gives insights into whether these parts of the code are being fuzzed, if they are statically reached, and so on. In the event the particular code parts are not being fuzzed then Fuzz Introspector can provide guidance as to how to do this.
Conclusions and future work
Fuzz Introspector is a tool for optimizing the way software is fuzzed. In collaboration with OSS-Fuzz, data produced by Fuzz Introspector is available for almost a thousand security critical projects by way of a publicly available web application. The central goal of Fuzz Introspector is to enable developers and security researchers to optimize the way open source projects are being fuzzed. In this blog post we have highlighted how Fuzz Introspector achieves this, and in particular identified some of the most important features of Fuzz Introspector. We use these features ourselves on a daily basis.
In the short-term we hope to improve Fuzz Introspector, in particular its automatic generation of fuzzing harnesses and the web application for macro-level insights. Additionally, there are many Golang projects with fuzzing infrastructure in place; we are hoping to add support for Golang projects in Fuzz Introspector in the near future. Finally, the primary purpose of Fuzz Introspector is to improve how open source projects use fuzzing, and to this end we are looking to continue applying Fuzz Introspector ourselves to improve the fuzzing of open source projects.
About the Authors
David Korczynski is a security researcher at Ada Logics, where he leads program analysis and security automation efforts. This includes developing tools for security automation and also applying these tools in a practical manner. He’s a maintainer of Fuzz Introspector and an avid contributor in the open source fuzzing space. David holds a PhD in Computer Science from University of Oxford where his research focused on automation of reverse engineering malicious software.
Adam Korczynski is a security engineer at Ada Logics, where he leads Cloud Native security efforts. He focuses on security assessments, threat modeling and software assurance, often combining the three to carry out holistic security engagements. Adam has performed security audits of many projects in the open source landscape, and has security advisories across a wide range of projects, including Golang, Containerd and Helm.