What’s in the SOSS? Podcast #54 – S3E6 AIxCC Part 4 – Cyber Reasoning Systems: The Real-World Journey After AIxCC

Summary

In this final episode of our AI Cyber Challenge (AIxCC) series, CRob and Jeff Diecks wrap-up the journey from DARPA’s groundbreaking two-year competition to the exciting collaborative phase happening now. Discover how winning teams are taking their AI-powered vulnerability detection systems into the real world, finding actual bugs in projects like the Linux Kernel and CUPS. Learn about the innovative OSS-CRS project that aims to create a standard infrastructure for mixing and matching the best components from different systems, and hear valuable lessons about how to responsibly introduce AI-generated security findings to open source maintainers. The competition may be over, but the real work—and collaboration—is just beginning.

This episode is part 4 of a four-part series on AIxCC:

Listen on Apple Podcasts Listen on Spotify Listen on Overcast Listen on Pocket Casts

Conversation Highlights

00:00 – Welcome and Introduction to AICC
01:37 – OpenSSF’s AI Security Mission: Two Lenses
03:54 – Competition Highlights: What the Teams Discovered
07:43 – Real-World Impact: From Research to Production
10:44 – Lessons Learned: Working with Open Source Maintainers
13:13 – OSS-CRS: Building a Standard Infrastructure
14:29 – Breaking Down Walls: Post-Competition Collaboration
15:39 – How to Get Involved

Episode Links

Transcript

CRob (00:09.408)
Welcome, welcome, welcome to What’s in the SOSS, the OpenSSF’s podcast where I get to talk to the most amazing people in the planet that are either involved or on the outskirts of open source software and open source security. Today, we have a treat. We get to talk to one of my dear friends and teammates, Jeff, and we’re gonna dive into a topic that I really don’t know a lot about today.

So Jeff, why you introduce yourself to the audience and kind of describe what you do for the foundation.

Jeff Diecks (00:44.686)
Yeah, thanks, CRob. And hello, I’m Jeff Diecks. I’m a technical project manager with OpenSSF. And I’ve been involved in open source for 20 plus years now. Goodness. And I am OpenSSF’s lead on the AI cyber challenge program that we work on. And CRob is sort of telling you the truth. He’s been on the three episodes prior to this where he’s learned plenty about AICC, but we’re here to talk a little bit more about this and wrap up the series today.

CRob (01:17.582)
Yeah, these words you use, AI, that isn’t something I hear a lot about. Wink. Could maybe you recap for us, like what is the OpenSSF doing around AI security? And then just maybe give a brief recap about the AI CC.

Jeff Diecks (01:37.028)
Yeah, for sure. So OpenSSF in the world of AI, we have our AI ML Security Working Group that looks at security and AI from kind of two lenses. The first is AI for security, which is what we’ll be talking about here today, projects that help you AI to help improve the security of projects, and security for AI, which is securing all this new world of AI things and all the lessons we’ve learned about securing software. AI is software too and it needs securing. We have a whole suite of projects and work that focuses on that too. Specific to AIxCC, again, it’s the AI Cyber Challenge. It was a two-year competition run by DARPA and ARPA-H. If you’re just hearing this episode first, I encourage you to go back to the first episode in this series with Andrew Carney from DARPA and ARPA-H for an overview of the program. And then we got into some good conversations with a couple of the team leads from some of the winning teams. But the purpose of the competition was to use AI and develop new systems to both find and fix vulnerabilities in open source software that are important to our critical infrastructure. An interesting part about the competition… written into the rules for any of the competitors accepting prize money, they were obligated to release their software as open source.

CRob (03:07.214)
Nice. That’s awesome. Well, yeah, again, I say that a bit tug-and-cheek, but there has been quite a lot of activity, whether it’s in the working group or specific to the follow-ons to the AICC competition, which is what we’re here about today to kind of put a bow on these conversations and help encourage the community to engage and go forward. So we talked about, we talked to a couple of the teams, we talked to Andrew, kind of gave us an overview of the program.

From your perspective and your engagement with the community, we have a new cyber reasoning special interest group within the foundation. So what have the teams been up to subsequently since the August close of the competition?

Jeff Diecks (03:54.414)
Yeah, there’s really two parts of this and I’ve had the great honor of meeting with and speaking with a lot of the teams and learning about what they’ve been doing. But we first started with just conversations about, you know, their experiences with the competition and what they learned, similar to the couple of episodes that we did. And what was really interesting is, you know, every single team, there’s something of value that came out of their system. They excelled in a, you know, at least a specific area.

They were all finding bugs in real world software. Just a couple of highlights from some of the other teams. Team Theori, which was among the three winners, the third place winner, they had a unique approach. Their system, unlike the others, did not use fuzzing. It used pure LLM AI. so just an interesting variation and you potential there for that system to be super flexible because it doesn’t come with some of the requirements of projects already being set up with fuzzing. So interesting to see what becomes of their system there. And then in a couple of other cases, it was really interesting. Teams that have systems that are extremely capable, but for one reason or another,

CRob (05:08.142)
Yeah.

Jeff Diecks (05:18.096)
There was maybe a specific part of their system that just didn’t work well with the scoring mechanisms of the competition itself. So we had a team that was one of the best at generating patches for the issues that it found that was most effective. But as these things go, there was kind of a late change in the architecture of their system during the competition. the part of their system that was supposed to submit all these patches that got generated into the competition for scoring didn’t function correctly and didn’t submit everything. So they didn’t get credit for all their great work. But we’ll talk in a little bit about well, but it’s still a capable system. And now we can use it not for a competition, but for real stuff that’s potentially even more valuable. There was another that it was great at generating proof of vulnerabilities.

CRob (06:13.87)
Mm-hmm.

Jeff Diecks (06:15.328)
And, but they had made some assumptions based on the competition infrastructure and the system just didn’t perform well within the confines of the competition. But what was interesting about the architecture of their system was they would generate a potential result that they thought might have been a finding from part of their system. And then they would submit that potential result out to several other LLMs and have them feedback.

CRob (06:24.718)
Mm-hmm.

Jeff Diecks (06:45.176)
a verdict on whether they thought it might be effective or not. We kind of made the joke. It was like doing the poll of getting eight out of nine dentists to agree and decide on the submission. So those are some highlights from the competition. But you asked about what the team’s been up to in recent months. So that’s been really interesting. DARPA has kind of extended the incentives from their program and they’re offering

CRob (06:48.846)
Hmm.

Jeff Diecks (07:13.552)
incentives and rewards for the teams now taking these systems and using them in the real world against real open source projects and demonstrating that they’re effective there. And if they can demonstrate that, they earn additional reward money, which is encouraging the adoption and the transition of this research into real world usage. So we’ve had some interesting findings there and a few examples there.

CRob (07:27.374)
Awesome.

Jeff Diecks (07:43.248)
We’ve got Team 42 that we’ve been working with. They’ve focused a lot on their system seeming to be very effective in working with the Linux kernel and specifically some of the out-of-tree subsystems. They’ve found and reported several bugs and had some of them accepted, accepted patches. And actually later this week we’ve scheduled and we’re doing a consultation for that team with a kernel maintainer.

to give them feedback and help their research move forward and any guidance they can give on how to make their system more effective. So we’re looking forward to that conversation.

CRob (08:23.36)
Excellent.

Today, so we have a mixture of projects that are being donated. They’re all open source, but some of them are being donated like here, for example. Where would we go from here? What is the group’s thoughts about these, broadly the cyber reasoning systems and kind of what other interests or ideas are floating around to keep the momentum?

Jeff Diecks (08:52.91)
Yeah, there’s a few things. So one, OpenSSF is involved with the teams and we’ve formed, as you mentioned, the special interest group, the cyber reasoning system special interest group as the kind of continued home from the competition for all the teams to continue collaborating together and working there. some interesting developments so far, we’ve hosted having the teams present to one another.

of their work with real open source projects and examples of bugs they’ve found in the process they’ve been following and how they’ve been getting received from open source projects with what they’ve been submitting. So for example, Team Fuzzing Brain, who has donated their CRS systems for OpenSSF to host and support.

They’ve been working against a bunch of projects, but specifically they shared some examples of their work with the CUPS project where they found some bugs, they reported them, they’ve had some accepted patches, and they’ve gotten great feedback from the CUPS maintainers who are very appreciative of their work, both finding bugs and submitting patches, but also helping to generate and expand the fuzzing.

CRob (09:56.408)
God.

Jeff Diecks (10:17.028)
know, harness coverage of the project itself, which the systems are pretty capable of. So we’ve been learning a lot about the reporting process because it’s one thing to have these capable systems, but you know, it’s the world where like you and I and everybody else are a bit, you know, skeptics of just, you know, pure AI things, right? So we’re, working our way through and kind of learning from one another about

CRob (10:19.82)
very nice.

Jeff Diecks (10:44.078)
What’s the best way to keep humans in the mix and how are projects receiving these things? What are some lessons learned? So for example, we had a conversation in a SIG meeting where we’re talking about the patch submission process and some of the projects were kind of reacting. It was perhaps a bit too aggressive to just go ahead and introduce yourself by submitting a patch to the system.

CRob (10:46.51)
Mm-hmm.

Jeff Diecks (11:12.784)
right into the pull request queue of a project. And the group suggested maybe for the next go-round, it’s maybe a more polite way to introduce yourself by opening an issue, reporting how it was found, what it was found, all the supporting information, and then attaching a patch to be considered versus just, hey, here’s a PR. By the way, it came from AI.

So some interesting.

CRob (11:43.022)
Right. And we’ve heard a lot of feedback from upstream about their disinterest in that approach.

Jeff Diecks (11:50.772)
Yeah, well, and was a big focus of the scoring of the competition itself. That was among the feedback that we gave consistently for a couple of years of make sure you’re incentivizing the development of systems that don’t make life more difficult for maintainers, but hopefully make things easier and think about how these things will be received, not just technical capabilities.

But you mentioned, you know, donated projects and, you know, the one that I think is of real interest and, you know, for folks to follow along. So Team Atlanta led the way development of a project that we’re calling OSS-CRS, and bundled in with it, they intend to have something called CRS Benchmark.

And what these are for is it intends to be a standard infrastructure for building and running and evaluating all these CRSs and being able to kind of mix and match and use different parts of different ones for, you know, kind of a combined solution.

So, you know, if we think of a future where we’ve got, you know, a system like we talked about that’s most effective at generating patches, but we’ve got a different one that’s best at finding vulnerabilities.

CRob (12:58.755)
Wow.

Jeff Diecks (13:13.488)
And the hope is that through this standard interface, folks can leverage and kind of fine tune things to get the best performance and the best results out of a combination of systems rather than just relying upon a single one. So you can just think of it the way it’s intended to run. If you just imagine yourself at a command line prompt and you issue a OSS dash bug find dash CRS build.

then give it a configuration and a project, a compatible project, and that’ll build a system to run against. And then you can issue, thing, OSS bug find CRS run config project and the name of harness, and it’ll go off and do its thing. So again, you’re specifying which configurations you’re wanting to use, which subsystems you want it to.

CRob (14:01.666)
Mm-hmm.

Jeff Diecks (14:14.01)
pull from. So they’ve got an interesting roadmap. You know, we’re talking with them and, you know, hoping to bring our community and perspective to help support that project and its development, you know, and adoption into the real world.

CRob (14:29.176)
And I remember us talking around DEF CON last year. And I think the competition and the prize money are great. That was very exciting. But I’m most excited about this kind of phase we’re in now, where we’re seeing the teams with that ethical wall down between them from the competition. Now they’re actually able to talk and collaborate and share ideas. I’m really excited to see the community come together, helping support these students on these ideas.

Jeff Diecks (14:58.916)
Yeah, and that’s been the interesting part and part of why this it’s taken us a bit to get this whole podcast series out through the course of the competition for competition integrity reasons. You know, we were advising the competition organizers, but we weren’t interacting with the teams themselves. So we had to go through a whole process after the finals to introduce ourselves to all of the individual teams and let them know that we’re here and about and you know, the things we offer to help.

support them in the further development of the system. that’s been an interesting few months of lots of great conversations and seeing these teams come together within our working group and special interest group.

CRob (15:39.842)
So you’ve inspired me. I sure would like to know more on how to get involved. How can I do that?

Jeff Diecks (15:41.488)
Ha

Well, if your Monday afternoons at 1 p.m. Eastern are free, we have two different meetings that basically are in that time slot on alternating weeks. Our full AI ML security working group meets again on Mondays at 1 p.m. Eastern time on a biweekly basis. And on the alternating weeks, the cyber reasoning system special interest group meets in that time slot. You can find them.

CRob (16:09.326)
Peace.

Jeff Diecks (16:11.844)
You know, both of those meeting series on our community calendar at opensf.org slash calendar.

CRob (16:18.958)
Well, I want to thank you for helping shepherd and guide the folks in the competitions. We’re seeing some great results come out of this. And I’m really excited to see what our community and these amazing students kind of come up with on how to further the use of AI to help improve security on things. Yeah. And with that, we’ll say this is a wrap.

Jeff Diecks (16:42.308)
Sounds good. Thanks, CRob, and we’ll see you in the meetings.

CRob (16:48.911)
I for one welcome our new robot overlords and I wish everybody a happy open sourcing. Have a great day.

What’s in the SOSS? Podcast #54 – S3E6 AIxCC Part 4 – Cyber Reasoning Systems: The Real-World Journey After AIxCC

Summary

Conversation Highlights

Episode Links

Transcript

We envision a future where OSS is universally trusted, secure, and reliable. Join us in making open source more secure.

Get the latest announcements, event info, and the community news in your inbox