Tag

Cyber Reasoning Systems

Hack to the Future: The Impact and Legacy of the DARPA AIxCC Challenge

By AI, Blog, Global Cyber Policy, Guest Blog

By Helen Woeste

AIxCC Competition Background & Results: 

In 2023, DARPA announced a two-year long competition called the Artificial Intelligence Cyber Challenge (AIxCC) with the goal to safeguard open source software used in critical infrastructure throughout America. The intent is to hasten the development of open source AI tooling that can assist developers with finding and fixing bugs in live software with minimal cost. Open source is a drastically underfunded and underresourced form of infrastructure. It therefore presents an exciting, practical target, and opportunity for the research and development of AI in cybersecurity. Additionally, open source’s publicly observable code is ideal for competition and collaboration. 

AIxCC was run in collaboration with ARPA-H and supported with contributions from Anthropic, Google, Microsoft, and OpenAI, with additional consulting around open source provided by the Linux Foundation and the Open Source Security Foundation (OpenSSF). This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The competition consisted of two rounds, the Semifinal Competition (ASC) and the Final Competition (AFC), where cash prizes from a pot of $30,500,000 were distributed. For the ASC, 42 team submissions were accepted across two tracks; the Open Track and the Small Business Track, which required an additional technical paper submission. The top seven teams moved forward to the AFC which was set up to mimic a real world CI/CD pipeline. The scoring algorithm was also designed to highlight behaviors that would make the competing systems more useful to developers. At the conclusion of AFC, the top three teams were Team Atlanta, Trail of Bits, and Theori

For the AIxCC competition, real open source projects were selected, and their code was forked and then modified to insert artificial bugs for the Cyber Reasoning Systems (CRS) to discover and fix. However, during the execution of the competition, the CRSs discovered several real potential bugs alongside the artificial ones. This introduced the issue of how to triage and manage resolution of fixes in the projects. OpenSSF engaged third party open source security organization Open Source Technology Improvement Fund (OSTIF) to get involved with the closing out of the bugs identified as a result of the AIxCC competition. 

OSTIF selected the team at Ada Logics for their extensive experience working with open source fuzzing, bug verification, and disclosure. With a list of potential bugs identified through the course of the competition, Ada Logics was tasked with securely submitting verified issues, ensuring that anything reported to open source project maintainers was a proven bug. The Ada Logics team was able to reproduce and confirm twenty-seven issues after multiple rounds of testing and continued coordination between AIxCC competitors, collaborators, and contributors. CRS teams, including Team Atlanta, Team Buttercup, Team FuzzingBrain, Team Shellphish, Team Theori, Team 42-b3yond-6ug, and Team Lacrosse, working together with Kudu Dynamics and the OpenSSF, continued to collaborate and meet with OSTIF around the disclosures to ensure total accuracy of the reported issue’s testing and resulting decision around disclosure. 

It was of utmost importance that any and all real bugs detected during the competition were verified before alerting the project maintainer to the issue. This is to differentiate how the competition reports issues to projects from the low-quality reports plaguing open source maintainers today. In several cases, CRS-generated patches were submitted alongside bugs, an offering to project maintainers looking to quickly resolve the finding. Additionally, feedback was sourced from the projects around their experience as a target in the competition as well as the disclosure procedure following. 

The Findings:

Teams discovered twenty-seven candidate real-world issues during the competition and OSTIF engineers were ultimately able to replicate all of the draft bugs. The affected projects were cURL, shadowsocks-libev, healthcare-data-harmonization, hertzbeat, little-cms, and mongoose. Once identified, the hard work began of fixing those bugs, implementing CRS tooling to perform the second half of its double duty to find and fix security issues. 

However, some of the findings did not meet a level of security concern for various reasons. Some issues were fixed by code changes in the projects during the time-period in between the competition and when engineers reproduced them. Others were outside of the threat model of the project and did not meet the criteria needed to incorporate into the project (for example, the Apache Poi project threat model states “Expect any type of Exception when processing documents,” making any exception-based findings non-issues). One issue had actually already been found by OSS-Fuzz, but the project hadn’t fixed it yet.

Ultimately, interesting findings were discovered and fixed by the Cyber Reasoning Systems in this competition, and the systems found a lot of valid issues. Further, some projects had introduced fixes before the bugs were reported. This is likely because the AIxCC teams submitted the fuzzing harnesses to the projects before triage had taken place, which re-discovered the same bugs before triage had completed. One significant lesson learned from this is that cyber reasoning systems may benefit from doing self-triage when discovering potential issues by checking against the project’s documentation and understanding the types of issues that the project accepts as security bugs that need to be addressed.

Conclusion & Looking Forward:

The AIxCC program was a massive undertaking by dozens of organizations, all working to contribute back to open source security in a meaningful way using novel AI tooling. The competition was mindfully designed and carried out, with attention given towards the open source projects and maintainers, the wide variety of competitors and interests, and the impact of the competition itself on the industry all the way down to the maintainers. 

OpenSSF is the home for extended collaboration on these new open source tools through its newly formed Cyber Reasoning Systems Special Interest Group. OSS-CRS and FuzzingBrain, two open source projects that emerged from the competition, are now hosted at OpenSSF in the Linux Foundation. A third tool applied and was accepted to the OpenSSF, and has a few remaining steps before the official transition. The group aims to foster their development and adoption, and to establish best practices that help projects use CRSs effectively and responsibly.

This work is already producing real results. For example, FuzzingBrain has since turned its AI-assisted fuzzing system on the broader open source ecosystem, discovering sixty-two vulnerabilities across twenty-six projects, from CUPS and Apache Avro to Ghidra and OpenLDAP, with forty-three confirmed by maintainers and thirty-six already patched upstream. 42-b3yond-6ug has expanded its CRS to uncover twelve kernel-related vulnerabilities in the Linux kernel and related components, plus ten zero-day vulnerabilities in userspace projects including Eclipse Mosquitto and OpenLDAP. The team is also developing a platform to support more efficient model training and evaluation of models and agents, with a release expected soon. Using OSS-CRS, Team Atlanta discovered twenty-five vulnerabilities across sixteen projects spanning a broad range of software including PHP, U-Boot, memcached, and Apache Ignite 3. Of those, nine have been fixed and eight more have been confirmed with fixes in progress.

The future of AI assisting maintainers in finding and fixing security vulnerabilities is bright. The challenges raised by the AIxCC competition already have solutions being developed in open source, such as LLM-based tools that build threat models by looking at the data-flow of projects, and AI agents that triage findings against threat models and documentation before reporting issues. As these tools all continue to develop, they will harmonize into reliable solutions that maintainers can use to elevate their security with far less effort than today.

Our gratitude to the folks at Ada Logics for triaging the potential bugs and working hard to reproduce the issues so maintainers didn’t have to, OpenSSF for trusting us to bring together all of the stakeholders to work on the issues together, DARPA and ARPA-H for holding the AIxCC competition and sponsoring this work, the teams that built the Cyber Reasoning Systems for the competition, Kudu Dynamics for their support in confirming the findings, and all of the maintainers that worked with us to resolve the issues.

OpenSSF and OSTIF will continue to support this kind of work by serving as human connectors between CRS tools and open source communities. The goal is to help triage and validate vulnerability reports and proposed patches before they reach maintainers, ensuring findings are accurate, actionable, and respectful of maintainers’ time.

Organizing a competition of this scale on behalf of open source maintainers and its end users takes both enormous collaboration and individual effort. Understanding the communities involved, and building lightweight programs that shield maintainers from headaches while strengthening security is the best possible outcome for the ecosystem. It took everyone coming together to make this happen, and ongoing efforts will bring low-cost and low-maintenance tools to everyone that are valuable and make us all safer. 

As AI moves forward at breakneck speed, innovative work like this highlights how you can move fast and build things together for a better tomorrow. 

Author Bio

Helen Woeste joined OSTIF in 2023, coming from a decade of work experience in the restaurant and hospitality industries. With a passion (and degree) for writing and governance structures, Woeste quickly transitioned into an operations and communications role in technology. 

 

The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)

From AIxCC to OpenSSF: Welcoming OSS-CRS to Advance AI Driven Open Source Security

By AI, Blog, Global Cyber Policy

By Jeff Diecks

Artificial intelligence is changing how we approach software security. Open source is at the center of that shift.

Over the past year, DARPA’s Artificial Intelligence Cyber Challenge (AIxCC) showed that cyber reasoning systems (CRS) can go beyond finding vulnerabilities. These systems can analyze code, confirm issues, and generate patches. This brings us closer to a future where security is more automated and scalable.

When the competition ended, one question remained. How do we take these breakthroughs and make them usable in the real world?

Today, we are taking an important step forward.

The Open Source Security Foundation (OpenSSF) is welcoming OSS-CRS as a new open source project under the AI / ML Security Working Group.

OSS-CRS emerged from AIxCC and is a standard orchestration framework for building and running LLM-based autonomous bug-finding and bug-fixing systems.

The open framework is designed to make CRS practical outside of the AIxCC environment. During the competition, teams built powerful systems that were released as open source. However, many of them depended on the competition infrastructure, which made them difficult to reuse or extend. OSS-CRS addresses that gap.

OSS-CRS Features include:

  • Standard CRS Interface: OSS-CRS defines a unified interface for CRS development. Build your CRS once following the development guide, and run it across different environments (local, Azure, …) without any modification.
  • Effortless Targeting: Run any CRS against projects in OSS-Fuzz format. If your project is compatible with OSS-Fuzz, OSS-CRS can orchestrate CRSs against it out of the box.
  • Ensemble Multiple CRSs: Compose and run multiple CRSs together in a single campaign to combine their strengths and maximize bug-finding and bug-fixing coverage.
  • Resource Control: Manage CPU limits and LLM budgets per CRS to keep costs and resources in check.

Read the OSS-CRS research paper: https://doi.org/10.48550/arXiv.2603.08566

From Competition to Community

The move of OSS-CRS into OpenSSF marks a clear transition from research and competition to open collaboration and long term development.

OpenSSF provides a neutral home where projects like OSS-CRS can grow. Contributors can work together to improve the tools, validate results, and support adoption across the ecosystem.

OSS-CRS is already producing real results. Using OSS-CRS, Team Atlanta discovered twenty-five vulnerabilities across sixteen projects spanning a broad range of software including PHP, U-Boot, memcached, and Apache Ignite 3.

OpenSSF will continue to support this important work by providing human connectors between CRS tools and open source communities. The goal is to help triage and validate vulnerability reports and proposed patches before they reach maintainers, ensuring findings are accurate, actionable, and respectful of maintainers’ time.

Recent research from the OSS-CRS team validates the necessity of having a human in the loop. The team manually reviewed a set of 630 AI-generated patches and found 20-40% of the patches to be semantically incorrect. The incorrect patches pass all automated validation but are actually wrong — a dangerous failure mode only catchable by manual review.

A key benefit of the OSS-CRS project is its Ensemble feature. The Ensemble feature enhances accuracy and reliability by combining patches from multiple CRS approaches and using a selection process to pick the one most likely to be correct. The research showed this approach consistently matches or outperforms the best single component in improving semantic correctness, which is hard to eliminate at the single-agent level. This collaboration of systems helps produce more robust results for open source defenders.

Get Involved

With projects like OSS-CRS, OpenSSF will continue to support AI-driven security work to help turn innovation into practical outcomes for open source.

We offer several options to get involved including:

Author Bio

Jeff Diecks is a Senior Technical Program Manager at The Linux Foundation. He has more than two decades of experience in technology and communications with a diverse background in operations, project management and executive leadership. A participant in open source since 1999, he’s delivered digital products and applications for universities, sports leagues, state governments, global media companies and non-profits.