OpenSSF Podcast

Feb 09

What’s in the SOSS? Podcast #51 – S3E3 AIxCC Part 1 – From Skepticism to Success: The AI Cyber Challenge (AIxCC) with Andrew Carney

By OpenSSF Podcast

Summary

This episode of What’s in the SOSS features Andrew Carney from DARPA and ARPA-H, discussing the groundbreaking AI Cyber Challenge (AIxCC). The competition was designed to create autonomous systems capable of finding and patching vulnerabilities in open source software, a crucial effort given the pervasive nature of open source in the tech ecosystem. Carney shares insights into the two-year journey, highlighting the initial skepticism from experts that ultimately turned into belief, and reveals the surprising efficiency of the competing teams, who collectively found over 80% of inserted vulnerabilities and patched nearly 70%, with remarkably low compute costs. The discussion concludes with a look at the next steps: integrating these cyber reasoning systems into the open source community to support maintainers and supercharge automated patching in development workflows.

This episode is part 1 of a four-part series on AIxCC:

Listen on Apple Podcasts Listen on Spotify Listen on Overcast Listen on Pocket Casts

Conversation Highlights

00:00 – Introduction and Guest Welcome
00:59 – Guest Background: Andrew Carney’s Role at DARPA/ARPA-H
02:20 – Overview of the AI Cyber Challenge (AIxCC)
03:48 – Competition History and Structure
04:44 – The Value of Skepticism and Surprising Learnings
07:11 – Surprising Efficiency and Low Compute Costs
08:15 – Major Competition Highlights and Results
13:09 – What’s Next: Integrating Cyber Reasoning Systems into Open Source
16:55 – A Favorite Tale of “Robots Gone Bad”
18:37 – Call to Action and Closing Thoughts

Episode Links

Transcript

Intro music & intro clip (00:00)

CRob (00:23)
Welcome, welcome, welcome to What’s in the SOSS, the OpenSSF podcast where I talk to people that are in and around the amazing world of open source software, open source software security and AI security. I have a really amazing guest today, Andrew.

He was one of the leaders that helped oversee this amazing AI competition we’re going to talk to. So let me start off, Andrew, welcome to the show. Thanks for being here.

Andrew Carney (00:57)
Thank you for having me so much, CRob. Really appreciate it.

CRob (00:59)
Yeah, so maybe for our audience that might not be as familiar with you as I am, could you maybe tell us a little bit about yourself, kind of where you work and what types of problems are you trying to solve?

Andrew Carney (01:12)
Yeah, I’m a vulnerability researcher. That’s been the core of my career for the last 20 years. And part of that has had me at DARPA. And now I’m at DARPA and ARPA-H, where I sort of work on cybersecurity research problems focused on national defense and/or health care. So it’s sort of the space that I’ve been living in for the past few years.

CRob (01:28)
That’s an interesting collaboration between those two worlds.

Andrew Carney (01:43)
Yeah, it’s, you know, it’s, I think the vulnerability research and reverse engineering community is, pretty tight, you know, pretty, pretty small. And, a lot of folks across lots of different industries and sectors have similar problems that, you know, we’re able to help with. So, yeah, it’s, it’s exciting to kind of see, see how, how, you know, folks in finance or automotive industry or the energy sector kind of all deal with similar-ish problems, but different scales with different kind of flavors of concerns.

CRob (02:20)
That’s awesome. And so as I mentioned, we were introduced through the AIxCC competition. Maybe for our audience that might not be as familiar, could you maybe give us an overview of AIxCC, the competition, and kind of why you felt this effort was so important and we’ve spent so much time working through this, years.

Andrew Carney (02:42)
Absolutely. I mean, AIxCC, uh, is a competition to create autonomous systems that can find and patch vulnerabilities in source code. Uh, a big part of this competition was focusing on open source software, um, because of how critical it is kind of across our tech ecosystem. It really is sort of like the font of all software.

And so DARPA and ARPA-H and other partners across the federal government, we saw this kind of need to support the open source community and also leverage kind of new technologies on the scene like LLMs. So how do we take these new technologies and apply them in a very principled way to help solve this massive problem? And working with the Linux Foundation and OpenSSF has been a huge piece of that as well. So I really appreciate everything you guys have done throughout the competition.

CRob (03:41)
Thank you.

CRob (03:48)
And maybe could you give us just a little history of when did the competition start and kind of how it was structured?

Andrew Carney (03:54)
Yeah. So the competition was announced at Black Hat in August of 2023. The competition was structured into two main sections. We had a qualifying event at DEF CON in 2024. And then we had our final event this past DEF CON, August 2025. And throughout that two-year period, we designed a competition that kept pushing the competitors sort of ahead of wherever the current models, the current kind of agentic technologies were, whatever that bar they were setting, we continued to push the competitors past that. So it’s been a really dynamic sort of competition because that technology has continued to evolve.

CRob (04:44)
I have to say when I initially heard about the competition, I’ve been doing cybersecurity a very long time. I was very skeptical about what the results will be, not to bury, to bury the lead, so to speak. But I was very surprised with the results that you all shared with the world this summer in Las Vegas. We’ll get to that in a minute. But again, this competition went over many years and as it progressed, could you maybe share what you learned that maybe surprised you, you didn’t expect from when this all kicked off.

Andrew Carney (05:21)
Yeah, think so. I think there have been a lot of surprises along the way. And I’ll also say that, you know, skepticism, especially from, you know, informed experts is a really good sign for a DARPA challenge. So for a lot of projects at DARPA generally, you know, if you’re kind of waffling between this is insanely hard and there’s no way we’ll be successful and this is kind of a much easy, like, you know, there’s an easy solution to this. If you’re constantly in that space of uncertainty, like, no, I really think this is really, really hard. And I’m getting skepticism from people that know a lot about this space. For us, that’s fuel. That’s okay. There is, you know, there’s a question to answer here. And so that really was part of driving us, even competitors, competitors that ended up making it to finals themselves were skeptical even as they were competing.

So I love that. I love that. Like, you know, we want to try to do really hard things and, you know, criticism helps us improve. Like that’s super beneficial.

CRob (06:33)
Yeah, it was, and I’ve had the opportunity to talk with many of the teams and now we’re in the phase post-competition where we’re actually starting to figure out how to share the results with the upstream projects and how to build communities around these tools. you assembled a really amazing group of folks in these competitive teams, some super top-notch minds. again,

You made me a believer now, where I really do believe that AI does have a place and can legitimately offer some real value to the world in this space.

Andrew Carney (07:11)
Yeah, think one of the biggest surprises for me was the efficiency. I think a lot of times, especially with DARPA programs, we expect that technical miracles will come with a pretty hefty price tag. And then you’ll have to find a way to scale down, to economize, to make that technology more useful, more more widely kind of distributable.

With AIxCC, we found the teams pushing so hard on the core kind of research questions, but at the same time, sort of woven into that was using their resources efficiently. And so even the competition results themselves were pleasantly surprising in terms of the compute costs for these systems to run. We’re talking tens to hundreds of dollars.

vulnerability discovered or patch emitted, which is really quite amazing.

CRob (08:15)
Yeah, so maybe could you just give me some highlights of kind of what the competition discovered, what the competitors achieved?

Andrew Carney (08:24)
Yeah. So I think when we’re trying to tackle these really challenging research questions and we’re examining it from all angles and being extremely critical of even our own approach, as well as the competitors’ approaches, that initially back in August of 2024, we had this amazing proof of life moment where the teams demonstrated with only a few hundred dollars in total compute budget.

that they were able to analyze large open source projects and find real issues. One of the teams found a real issue in SQLite that we had disclosed at the time to the maintainers. And they found that, once again, with this very limited compute budget across multiple millions of lines of code in these projects. So that was sort of the OK, there’s a there there, like there’s something here and we can keep pushing. So that was a really exciting moment for everyone. And then over the following year, up to August 2025, we had a series of these non-scoring events where the teams would be given challenges that looked very similar to what we’d give them for finals with an increasing level of scale and difficulty.

So you can think of these as like extreme integration events where we’re still giving the teams hundreds of thousands or millions of lines of code. We’re giving them, you know, eight to 12 hours per kind of task. And we’re seeing what they can do. This was important to ensure that the final competition went off without a hitch. And also because the models they were leveraging continue to evolve and change.

So it was really exciting. In that process, the teams found and disclosed hundreds of vulnerabilities and produced hundreds of potential patches that they would offer up to maintainers of the projects that they were doing their own internal kind of development on. So that was really exciting just to see that the SQLite bug wasn’t a fluke and that the teams could consistently kind of perform and keep pushing as we push them to move further and faster and deal with more complex code, they were able to adapt and find a way forward.

CRob (11:02)
That’s awesome. And I know you had, it was a long journey that you and the team and all the support folks went through, but is there any particular moment that kind of you smile on when you reflect on over the course of the competition?

Andrew Carney (11:20)
Oh, man, so many. I think there’s an equal number of like those smiling moments and also, you know, premature gray hairs that the team and myself have created. But I think one of the big moments, there were a number of just outstanding kind of experts in the field on social media.

in talks that would, the way that they talked about kind of AI powered program analysis was very skeptical. near the end, leading up to semi-finals, we had this lovely moment where the Google project zero team and the Google deep mind teams penned a blog post that said that they were inspired by one of the teams, by the SQL light bug, by one of the team’s discoveries. And that was huge, I think both for that team and just the competition as a whole. And then after that, seeing people’s opinions change and seeing people that had held, that were, like I said, top tier experts in the field, change their perspective pretty drastically, which that was, you know, that was helpful signal for us to demonstrate that we were being successful. Like converting a critic, I think, is one of the best kind of victories that you can have. Because now they can be a collaborator, right? Like now we can still kind of spar over different perspectives or ideas, but now we’re working together. That’s very exciting.

CRob (13:09)
That’s awesome. So what’s next? The hard work of the competition is over and now we’re in kind of the after action phase where we’re trying to integrate all this great work and kind of get these projects out to the world to use. So from your perspective or from DARPA or the competition, what’s next for you?

Andrew Carney (13:29)
Yeah, so one of the biggest challenges with DARPA programs is when you’re successful, sometimes you have that technological miracle, you have that accomplishment, and maybe the world’s not entirely ready for it yet. Or maybe there’s additional development that needs to happen to get it kind of into the real world. With AIxCC, we made the competition as realistic as possible. The automated systems, these cyber reasoning systems, were being given bug reports, they’re being given patch diffs, they’re being given artifacts that we would consume and review as human developers. So we modeled all the tasks very closely to the real things that we would want these systems to do. And they demonstrated incredible kind of performance. Collectively, the teams were able to find over 80 % of the vulnerabilities that we’d synthetically kind of inserted. And they patched nearly 70 % of those vulnerabilities. And that patching piece is so critical. What we didn’t want to do was create systems that made open source maintainers lives more problematic.

CRob (14:54)
Thank you.

Andrew Carney (14:56)
We wanted to demonstrate that this is a reachable bug and here’s a candidate patch. And in the months after the competition, we’ve incentivized the teams further than just the original prize money to go out into the open source community and support open source maintainers with their tools. And we’ve had folks come back and literally in their kind of reports, document that the patch they suggested to a maintainer was nearly identical to what the maintainer actually committed. Yeah. And those reports are coming in daily. So we’re getting, we have this constant feed of engagement and the tools are still obviously being improved and developed. But it’s really exciting to see it. So when I think about what’s next is like we’re already in the what’s next like getting the technology out there, using government funding to support open source maintainers wherever we can, especially if their code is part of widely used applications or code used in critical infrastructure. So that’s where we find ourselves now. And then we’re thinking a lot about how we supercharge that effort to the…

there have been, you the federal government supports a lot of actively used open source projects, right? And we’ve been working with all these partner agencies across the federal government and just making sure that we’re supporting the existing programs when we find them. And then where we see a gap, kind of figuring out what it would take to fill that gap that community that could use more support.

CRob (16:55)
So on a slightly different note, we’re both technologists and we love the field, but as I was going through this journey, kind of on the sidelines with you all, I was reflecting, do you have a a favorite tale of robots gone bad? Like Terminator’s Skynet or HAL 9000 or the Butlerian Jihad?

Andrew Carney (17:22)
That’s a, you know, I think I, I’ll, I don’t know that this is my favorite, but it is one of the most recent ones that I’ve read. There’s a series called Dungeon Crawler Carl. Yeah. And it’s been really like entertaining reading. And I just think the tension between the primal AIs and the corporations that rely on said independent entities, but also are constantly trying to rein them in is, I don’t know, it’s been really interesting to see that narrative evolve.

CRob (18:08)
I’ve always enjoyed science fiction and fantasy’s ability to kind of hold a mirror up to society and kind of put these questions in a safe space where you can kind of think about 1984 and Big Brother or these other things, but it’s just in paper or on your iPad or whatever. So it’s a nice experiment over there. And we don’t want that to be happening here.

Andrew Carney (18:29)
Yes, yes. Yeah, the fiction as thought experimentation, right?

CRob (18:37)
Right, exactly. So as we wind down, do you have a particular call to action or anything you want to highlight to the audience that they should maybe investigate a little further or participate in?

Andrew Carney (18:50)
Yeah, I think so a big one is, you know, we would love for open source maintainers to reach out to us directly. AIXCC at DARPA.mil. That’s the email address that our team uses. And we’ve been looking for more maintainers to connect with so that we can make sure that if we can provide resources to them, one, that they’re right sized for the challenges that those maintainers are having, or maintainer, right? Sometimes it’s just one person. And then two, that we’re engaging with them in the way that they would prefer to be engaged with. We want to be helpful help, not unhelpful help. So that’s a big one. And then I think in more generally, I would love to see more patching added into the kind of vulnerability research lifecycle. I think there’s so many opportunities for commercial and open source tools that have that discovery capability and that’s really their big selling point. And now with AIxCC and with the technology that the competitors open source themselves, since all of their systems were open sourced after the competition, there’s this real potential, I think that we haven’t seen it realized the way that it really could be. And so that’s, I would love to see more of that kind of automated patching added to tools and kind of development workflows.

CRob (20:29)
I’ll say my personal favorite experience out of all this is now that the competition, the minute the competition was over, then there was an ethical wall up between, you your administrators and us and the different competition teams. But now I’ve, we’ve observed the competitors, like looking at each other’s work and asking questions to each other and collaborating. that is, I’m so super excited to see what comes next. Now that all these smart people have proven themselves. and they found kind of connected spirits and they’re gonna start working together for even more amazing things.

Andrew Carney (21:07)
Absolutely. I think we’re expecting a state of knowledge paper with all the teams as authors. That’s something they’ve organized independently, to your point. And yeah, I cannot wait to see what they come out with collaboratively.

CRob (21:23)
Yeah. And anyone that’s interested to learn more or potentially directly interact with some of these competition experts, whether they’re in academia or industry, the OpenSSF is sponsoring as part of our AI ML working group. We’ve created a cyber reasoning special interest group specifically for the competition, all the competitors, and just to have public discussions and collaboration around these things. And we would invite everybody to show up and listen and participate as they feel comfortable and learn.

Well, Andrew and the whole DARPA and ARPA-H team, everyone that was involved in the competition, thank you. Thank you to our competitors. And we actually are going to have a series of podcasts talking to the individual competitors, kind of learning a little bit of the unique flavors and challenges these had. But thank you for sponsoring this and kind of really delivering something I think is going to have a ton of utility and value to the ecosystem.

Andrew Carney (21:47)
Thank you for working with us on this journey and we definitely look forward to more collaboration in the future.

CRob (21:54)
Well, and with that, we’ll wrap it up. I just want to tell everybody happy open sourcing. We’ll talk to you soon.

Oct 07

Love0

What’s in the SOSS? Podcast #41 – S2E18 The Remediation Revolution: How AI Agents Are Transforming Open Source Security with John Amaral of Root.io

By Jeff Diecks Podcast

Summary

In this episode of What’s in the SOSS, CRob sits down with John Amaral from Root.io to explore the evolving landscape of open source security and vulnerability management. They discuss how AI and LLM technologies are revolutionizing the way we approach security challenges, from the shift away from traditional “scan and triage” methodologies to an emerging “fix first” approach powered by agentic systems. John shares insights on the democratization of coding through AI tools, the unique security challenges of containerized environments versus traditional VMs, and how modern developers can leverage AI as a “pair programmer” and security analyst. The conversation covers the transition from “shift left” to “shift out” security practices and offers practical advice for open source maintainers looking to enhance their security posture using AI tools.

Listen on Apple Podcasts Listen on Spotify Listen on Overcast Listen on Pocket Casts

Conversation Highlights

00:25 – Welcome and introductions
01:05 – John’s open source journey and Root.io’s SIM Toolkit project
02:24 – How application development has evolved over 20 years
05:44 – The shift from engineering rigor to accessible coding with AI
08:29 – Balancing AI acceleration with security responsibilities
10:08 – Traditional vs. containerized vulnerability management approaches
13:18 – Leveraging AI and ML for modern vulnerability management
16:58 – The coming “remediation revolution” and fix-first approach
18:24 – Why “shift left” security isn’t working for developers
19:35 – Using AI as a cybernetic programming and analysis partner
20:02 – Call to action: Start using AI tools for security today
22:00 – Closing thoughts and wrap-up

Episode Links

Transcript

Intro Music & Promotional clip (00:00)

CRob (00:25)
Welcome, welcome, welcome to What’s in the SOSS, the OpenSSF’s podcast where I talk to upstream maintainers, industry professionals, educators, academics, and researchers all about the amazing world of upstream open source security and software supply chain security.

Today, we have a real treat. We have John from Root.io with us here, and we’re going to be talking a little bit about some of the new air quotes, “cutting edge” things going on in the space of containers and AI security. But before we jump into it, John, could maybe you share a little bit with the audience, like how you got into open source and what you’re doing upstream?

John (01:05)
First of all, great to be here. Thank you so much for taking the time at Black Hat to have a conversation. I really appreciate it. Open source, really great topic. I love it. Been doing stuff with open source for quite some time. How do I get into it? I’m a builder. I make things. I make software been writing software. Folks can’t see me, but you know, I’m gray and have no hair and all that sort of We’ve been doing this a while. And I think that it’s been a great journey and a pleasure in my life to work with software in a way that democratizes it, gets it out there. I’ve taken a special interest in security for a long time, 20 years of working in cybersecurity. It’s a problem that’s been near and dear to me since the first day I ever had my like first floppy disk, corrupted. I’ve been on a mission to fix that. And my open source journey has been diverse. My company, Root.io, we are the maintainers of an open source project called Slim SIM (or SUM) Toolkit, which is a pretty popular open source project that is about security and containers. And it’s been our goal, myself personally, and as in my latest company to really try to help make open source secure for the masses.

CRob (02:24)
Excellent. That is an excellent kind of vision and direction to take things. So from your perspective, I feel we’re very similar age and kind of came up maybe in semi-related paths. But from your perspective, how have you seen application development kind of transmogrify over the last 20 or so years? What has gotten better? What might’ve gotten a little worse?

John (02:51)
20 years, big time frame talking about modern open source software. I remember when Linux first came out. And I was playing with it. I actually ported it to a single board computer as one of my jobs as an engineer back in the day, which was super fun. Of course, we’ve seen what happened by making software available to folks. It’s become the foundation of everything.

Andreessen said software will eat the world while the teeth were open source. They really made software available and now 95 or more percent of everything we touch and do is open source software. I’ll add that in the grand scheme of things, it’s been tremendously secure, especially projects like Linux. We’re really splitting hairs, but security problems are real. as we’ve seen, proliferation of open source and proliferation of repos with things like GitHub and all that. Then today, proliferation of tooling and the ability to build software and then to build software with AI is just simply exponentiating the rate at which we can do things. Good people who build software for the right reasons can do things. Bad people who do things for the bad reasons can do things. And it’s an arms race.

And I think it’s really both benefiting software development, society, software builders with these tremendously powerful tools to do things that they want. A person in my career arc, today I feel like I have the power to write code at a rate that’s probably better than I ever have. I’ve always been hands on the keyboard, but I feel rejuvenated. I’ve become a business person in my life and built companies.

And I didn’t always have the time or maybe even the moment to do coding at the level I’d like. And today I’m banging out projects like I was 25 or even better. But at the same time that we’re getting all this leverage universally, we also noticed that there’s an impending kind of security risk where, yeah, we can find vulnerabilities and generate them faster than ever. And LLMs aren’t quite good yet at secure coding. I think they will be. But also attackers are using it for exploits and really as soon as a disclosed vulnerability comes out or even minutes later, they’re writing exploits that can target those. I love the fact that the pace and the leverage is high and I think the world’s going to do great things with it, the world of open source folks like us. At the same time, we’ve got to be more diligent and even better at defending.

CRob (05:44)
Right. I heard an interesting statement yesterday where folks were talking about software engineering as a discipline that’s maybe 40 to 60 years old. And engineering was kind of the core noun there. Where these people, these engineers were trained, they had a certain rigor. They might not have always enjoyed security, but they were engineers and there was a certain kind of elegance to the code and that was people much like artists where they took a lot of pride in their work and how the code you could understand what the code is. Today and especially in the last several years with the influx of AI tools especially that it’s a blessing and a curse that anybody can be a developer. Not just people that don’t have time that used to do it and now they get to of scratch that itch. But now anyone can write code and they may not necessarily have that same rigor and discipline that comes from like most of them engineering trades.

John (06:42)
I’m going to guess. I think it’s not walking out too far on limb that you probably coded in systems at some point in your life where you had a very small amount of memory to work with. You knew every line of code in the system. Like literally it was written. There might have been a shim operating system or something small, but I wrote embedded systems early in my career and we knew everything. We knew every line of code and the elegance and the and the efficiency of it and the speed of it. And we were very close to the CPU, very close to the hardware. It was slow building things because you had to handcraft everything, but it was very curated and very beautiful, so to speak. I find beauty in those things. You’re exactly right. I think I started to see this happen around the time when JVM started happening, Java Virtual Machines, where you didn’t have to worry about Java garbage collection. You didn’t have to worry about memory management.

And then progressively, levels of abstraction have changed right to to make coding faster and easier and I give it more you know more power and that’s great and we’ve built a lot more systems bigger systems open source helps. But now literally anyone who can speak cogently and describe what they want and get a system and. And I look at the code my LLM’s produce. I know what good code looks like. Our team is really good at engineering right?

Hmm, how did it think to do it that way? Then go back and we tell it what we want and you can massage it with some words. It’s really dangerous and if you don’t know how to look for security problems, that’s even more dangerous. Exactly, the level of abstraction is so high that people aren’t really curating code the way they might need to to build secure production grade systems.

CRob (08:29)
Especially if you are creating software with the intention of somebody else using it, probably in a business, then you’re not really thinking about all the extra steps you need to take to help protect yourself in your downstream.

John (08:44)
Yeah, yeah. think it’s an evolution, right? And where I think of it like these AI systems we’re working with are maybe second graders. When it comes to professional code authoring, they can produce a lot of good stuff, right? It’s really up to the user to discern what’s usable.

And we can get to prototypes very quickly, which I think is greatly powerful, which lets us iterate and develop. In my company, we use AI coding techniques for everything, but nothing gets into production, into customer hands that isn’t highly vetted and highly reviewed. So, the creation part goes much faster. The review part is still a human.

CRob (09:33)
Well, that’s good. Human on the loop is important.

John (09:35)
It is.

CRob (09:36)
So let’s change the topic slightly. Let’s talk a little bit more about vulnerability management. From your perspective, thinking about traditional brick and mortar organizations, how have you seen, what key differences do you see from someone that is more data center, server, VM focused versus the new generation of cloud native where we have containers and cloud?

What are some of the differences you see in managing your security profile and your vulnerabilities there?

John (10:08)
Yeah, so I’ll start out by a general statement about vulnerability management. In general, the way I observe current methodologies today are pretty traditional.

It’s scan, it’s inventory – What do I have for software? Let’s just focus on software. What do I have? Do I know what it is or not? Do I have a full inventory of it? Then you scan it and you get a laundry list of vulnerabilities, some false positives, false negatives that you’re able to find. And then I’ve got this long list and the typical pattern there is now triage, which are more important than others and which can I explain away. And then there’s a cycle of remediation, hopefully, a lot of times not, that you’re cycling work back to the engineering organization or to whoever is in charge of doing the remediation. And this is a very big loop, mostly starting with and ending with still long lists of vulnerabilities that need to be addressed and risk managed, right? It doesn’t really matter if you’re doing VMs or traditional software or containerized software. That’s the status quo, I would say, for the average company doing vulnerability maintenance. And vulnerability management, the remediation part of that ends up being some fractional work, meaning you just don’t have time to get to it all mostly, and it becomes a big tax on the development team to fix it. Because in software, it’s very difficult for DevSec teams to fix it when it’s actually a coding problem in the end.

In traditional VM world, I’d say that the potential impact and the velocity at which those move compared to containerized environments, where you have

Kubernetes and other kinds of orchestration systems that can literally proliferate containers everywhere in a place where infrastructure as code is the norm. I just say that the risk surface in these containerized environments is much more vast and oftentimes less understood. Whereas traditional VMs still follow a pattern of pretty prescriptive way of deployment. So I think in the end, the more prolific you can be with deploying code, the more likely you’ll have this massive risk surface and containers are so portable and easy to produce that they’re everywhere. You can pull them down from Docker Hub and these things are full of vulnerabilities and they’re sitting on people’s desks.

They’re sitting in staging areas or sitting in production. So proliferation is vast. And I think that in conjunction with really high vulnerability reporting rates, really high code production rates, vast consumption of open source, and then exploits at AI speed, we’re seeing this kind of almost explosive moment in risk from vulnerability management.

CRob (13:18)
So there’s been, over the last several, like machine intelligence, which has now transformed into artificial intelligence. It’s been around for several decades, but it seems like most recently, the last four years, two years, it has been exponentially accelerating. We have this whole spectrum of things, AI, ML, LLM, GenAI, now we have Agentic and MCP servers.

So kind of looking at all these different technologies, what recommendations do you have for organizations that are looking to try to manage their vulnerabilities and potentially leveraging some of this new intelligence, these new capabilities?

John (13:58)
Yeah, it’s amazing at the rate of change of these kinds of things.

CRob (14:02)
It’s crazy.

John (14:03)
I think there’s a massively accelerating, kind of exponentially accelerating feedback loop because once you have LLMs that can do work, they can help you evolve the systems that they manifest faster and faster and faster. It’s a flywheel effect. And that is where we’re going to get all this leverage in LLMs. At Root, we build an agentic platform that does vulnerability patching at scale. We’re trying to achieve sort of an open source scale level of that.

And I only said that because I believe that rapidly, not just us, but from an industry perspective, we’re evolving to have the capabilities through agentic systems based on modern LLMs to be able to really understand and modify code at scale. There’s a lot of investment going in by all the major players, whether it’s Google or Anthropic or OpenAI to make these LLM systems really good at understanding and generating code. At the heart of most vulnerabilities today, it’s a coding problem. You have vulnerable code.

And so, we’ve been able to exploit the coding capabilities to turn it into an expert security engineer and maintainer of any software system. And so I think what we’re on the verge of is this, I’ll call it remediation revolution. I mentioned that the status quo is typically inventory, scan, list, triage, do your best. That’s a scan for us kind of, you know, I’ll call it, it’s a mode where mostly you’re just trying to get a comprehensive list of the vulnerabilities you have. It’s going to get flipped on its head with this kind of technique where it’s going to be just fix everything first. And there’ll be outliers. There’ll be things that are kind of technically impossible to fix for a while. For instance, it could be a disclosure, but you really don’t know how it works. You don’t have CWEs. You don’t have all the things yet. So you can’t really know yet.

That gap will close very quickly once you know what code base it’s in and you understand it maybe through a POC or something like that. But I think we’re gonna enter into the remediation revolution of vulnerability management where at least for third party open source code, most of it will be fixed – a priority.

Now, zero days will start to happen faster, there’ll be all the things and there’ll be a long tail on this and certainly probably things we can’t even imagine yet. But generally, I think vulnerability management as we know it will enter into this phase of fix first. And I think that’s really exciting because in the end it creates a lot of work for teams to manage those lists, to deal with the re-engineering cycle. It’s basically latent rework that you have to do. You don’t really know what’s coming. And I think that can go away, which is exciting because it frees up security practitioners and engineers to focus on, I’d say more meaningful problems, less toil problems. And that’s good for software.

CRob (17:08)
It’s good for the security engineers.

John (17:09)
Correct.

CRob (17:10)
It’s good for the developers.

John (17:11)
It’s really good for developers. I think generally the shift left revolution in software really didn’t work the way people thought. Shifting that work left, it has two major frictions. One is it’s shifting new work to the engineering teams who are already maximally busy.

CRob (17:29)
Correct.

John (17:29)
I didn’t have time to do a lot of other things when I was an engineer. And the second is software engineers aren’t security engineers. They really don’t like the work and maybe aren’t good at the work. And so what we really want is to not have that work land on their plate. I think we’re entering into an age where, and this is a general statement for software, where software as a service and the idea of shift left is really going to be replaced with I call shift out, which is if you can have an agentic system do the work for you, especially if it’s work that is toilsome and difficult, low value, or even just security maintenance, right? Like lot of this work is hard. It’s hard. That patching things is hard, especially for the engineer who doesn’t know the code. If you can make that work go away and make it secure and agents can do that for you, I think there’s higher value work for engineers to be doing.

CRob (18:24)
Well, and especially with the trend with open source, kind of where people are assembling composing apps instead of creating them whole cloth. It’s a very rare engineer indeed that’s going to understand every piece of code that’s in there.

John (18:37)
And they don’t. I don’t think it’s feasible. don’t know one except the folks who write node for instance, Node works internally. They don’t know. And if there’s a vulnerability down there, some of that stuff’s really esoteric. You have to know how that code works to fix it. As I said, luckily, agent existing LLM systems with agents kind of powering them or using or exploiting them are really good at understanding big code bases. They have like almost a perfect memory for how the code fits together. Humans don’t, and it takes a long time to learn this code.

CRob (19:11)
Yeah, absolutely. And I’ve been using leveraging AI in my practice is there are certain specific tasks AI does very well. It’s great at analyzing large pools of data and providing you lists and kind of pointers and hints. Not so great making it up by its own, but generally it’s the expert system. It’s nice to have a buddy there to assist you.

John (19:35)
It’s a pair programmer for me, and it’s a pair of data analysts for you, and that’s how you use it. I think that’s a perfect. We effectively have become cybernetic organisms. Our organic capabilities augmented with this really powerful tool. I think it’s going to keep getting more and more excellent at the tasks that we need offloaded.

CRob (19:54)
That’s great. As we’re wrapping up here, do you have any closing thoughts or a call to action for the audience?

John (20:02)
Call to action for the audience – I think it’s again, passion play for me, vulnerability management, security of open source. A couple of things. same. Again, same cloth – I think again, we’re entering an age where think security, vulnerability management can be disrupted. I think anyone who’s struggling with kind of high effort work and that never ending list helps on the way techniques you can do with open source projects and that can get you started. Just for instance, researching vulnerabilities. If you’re not using LLMs for that, you should start tomorrow. It is an amazing buddy for digging in and understanding how things work and what these exploits are and what these risks are. There are tooling like mine and others out there that you can use to really take a lot of effort away from vulnerability management. I’d say that for any open source maintainers out there, I think you can start using these programming tools as pair programmers and security analysts for you. And they’re pretty good. And if you just learn some prompting techniques, you can probably secure your code at a level that you hadn’t before. It’s pretty good at figuring out where your security weaknesses are and telling you what to do about them. I think just these things can probably enhance security open source tremendously.

CRob (24:40)
That would be amazing to help kind of offload some of that burden from our maintainers and let them work on that excellent…

John (21:46)
Threat modeling, for instance, they’re actually pretty good at it. Yeah. Which is amazing. So start using the tools and make them your friend. And even if you don’t want to use them as a pair of programmer, certainly use them as a adjunct SecOps engineer.

CRob (22:00)
Well, excellent. John from Root.io. I really appreciate you coming in here, sharing your vision and your wisdom with the audience. Thanks for showing up.

John (22:10)
Pleasure was mine. Thank you so much for having me.

CRob (22:12)
And thank you everybody. That is a wrap. Happy open sourcing everybody. We’ll talk to you soon.

What’s in the SOSS? Podcast #51 – S3E3 AIxCC Part 1 – From Skepticism to Success: The AI Cyber Challenge (AIxCC) with Andrew Carney

Summary

Conversation Highlights

Episode Links

Transcript

What’s in the SOSS? Podcast #41 – S2E18 The Remediation Revolution: How AI Agents Are Transforming Open Source Security with John Amaral of Root.io

Summary

Conversation Highlights

Episode Links

Transcript

We envision a future where OSS is universally trusted, secure, and reliable. Join us in making open source more secure.

Get the latest announcements, event info, and the community news in your inbox