Brian Fox on AI Security & Open Source Sustainability

Summary

In this inaugural episode of Big Thoughts, Open Sources, host CRob sits down with Brian Fox, Co-founder and CTO of Sonatype, to discuss the friction between rapid AI adoption and foundational software security. Brian shares insights from the 11th annual State of the Software Supply Chain Report, revealing the emergence of “slop squatting” and the high frequency of AI models recommending non-existent or vulnerable dependencies. The conversation explores how the Model Context Protocol (MCP) could revolutionize developer compliance and why the industry must fund the critical infrastructure supporting our trillion-dollar open source ecosystem.

Listen on Apple Podcasts Listen on Spotify Listen on Overcast Listen on Pocket Casts

Conversation Highlights

00:23 – Welcome: Big Thoughts, Open Sources inaugural episode.
01:01 – Brian Fox’s journey: Apache Maven, Sonatype, and OpenSSF.
02:53 – The critical role of Maven Central in the software supply chain.
03:26 – Decades of security trends: The persistent “Log4Shell” pattern.
05:34 – The “Tribal Knowledge” problem for AI agents.
07:06 – State of the Software Supply Chain Report: AI recommending made-up code versions.
08:09 – Explaining “Slop Squatting” and AI hallucinations.
10:03 – Model Context Protocol (MCP): Turning security tools into AI expert systems.
13:42 – Do not ignore 60 years of software engineering “physics”.
15:11 – The “Vulcan Mind Meld”: Injecting governance data into AI agents.
17:19 – Risks, rewards, and the need for ML SecOps discipline.
19:30 – “Inefficient code is still inefficient code”: Lessons from cloud migrations.
21:01 – Building an “AI-native SDLC” with upfront security.
24:18 – The sustainability crisis: Secure open source builds are not free.
27:17 – Conclusion: Funding open source infrastructure (8 trillion dollars of value).

Episode Links

Transcript

Crob (00:23)
Welcome, welcome, welcome to Big Thoughts, Open Sources, the OpenSSF’s new podcast. We’re gonna dive a little more deeply in with some of the amazing community members, subject matter experts, and thought leaders within open source, cybersecurity, and high technology. Today in our inaugural episode, I’m very pleased to welcome a friend of the show, Brian Fox from Sonotype. How you doing, Brian?

Brian Fox (00:47)
I’m doing well, how are you?

Crob (00:48)
Excellent, we’re super glad to have you today. So maybe just for our audience members that are unfamiliar with your work, could you maybe talk a little bit about how you got into open source and kind of what you specialize in in this amazing ecosystem?

Brian Fox (01:01)
How I got an open source, that’s a long conversation. geez, all the way back in 2002, 2003, I suppose, is when I really, really got involved. I had done some dabbling and some other things before that, but I got involved around that time in Apache Maven. I started writing some plugins.

They’re pretty popular plugins, people still use them these days. And those ultimately got pulled into the Apache project, the official project. I kind stowawayed and came in as a committer. A few years later, I joined up with some other folks that were also working on Maven and we co-founded Sonatype. It’s been 19 years now.

CRob (1:45)
Wow, that’s awesome.

Brian Fox (1:46)
Yeah, and so then I was ultimately the…the chair of the Apache Maven project for a long time. still an Apache member of the foundation. And then more recently, even though it’s been a while, what, four or five years now, we joined the OpenSSF. I’ve been on the Governing Board with you for a while. I’m also on the Governing Board of FINOS, which is the financial open source.

And for the last couple years, also been on the Singapore Monetary Authority’s Cyber
Experts Group. Yeah, that’s fun And so, you know, I’ve spent a lot of time focused on those things. One of the things that Sonatype does for the community we run Maven Central, right? Which for people that don’t know that’s where all the world’s open source Java components come from.

CRob

Yeah, it’s kind of sounds like a big job It is a big job running critical infrastructure for all that kind of stuff And so, you know over the years that’s given us really interesting insights into what’s going on with the supply chain so, you know, that’s kind of that’s sort of what led us to the path that brought me to OpenSSF and all those other things.

CRob (2:53)
Yeah, you and your team have been amazing participants and contributors to our community and just kind of even putting aside all the work with Maven. Just your kind of participation in our working groups and our efforts has been amazing. Yeah, thank you. So today I think you wanted to talk about a topic a lot of people probably haven’t heard about. This little thing called AI. I have a hard time spelling that.

Brian Fox (3:16)
Right?

CRob (3:20)
Let’s just set the stage. What are you thinking about? What do we want to have a conversation around AI about?

Brian Fox (3:26)
Yeah, I think so if we back up a little bit, right? So it was probably around 2011, 2012, I suppose. We started looking at some of the trends that we were seeing within the Maven central downloads. We were seeing things like the most popular version of a crypto library was the one that had a level 10 vulnerability.

fixed and patched years before, but that everybody was still using the vulnerable version. The log for J, log for shell pattern has existed basically forever. It’s not actually new. And so that led us down the path to start doing different things to help our customers A, understand what open source they were using. Way back then, nobody knew. They were like, we’re not using open source. What they really meant was, I don’t think I’m using Linux in open office. They didn’t understand.

that their developers were pulling in all these components. And so the problem space back then was helping them have visibility and then providing data and controls to help them better govern their choices. So we’ve always been trying to help expedite and make it more efficient for developers to make better choices. And so it’s interesting to see this development of AI and all of the kind of things that have come along with it. So that got me thinking, you know, what?

When we started out to build some of the stuff that we built for our customers, my focus at that time was to make it possible to do the analysis in real time so that it wasn’t the case that, we’re just going to do all our stuff and then we’re going to run a compliance scan at the end of the week or end of the month or something. So we were very focused on, it needs to be able to be run every single bill all the time. We need to be able to provide guidelines so that they don’t have to ask the legal team and wait six weeks for an answer, or the security team, right?

We were trying to capture those roles, or those rules, into the system so that they could make better choices in real time. And that was a big thing that organizations needed to be able to scale and become efficient. When you start dealing with thousands of developers, tens of thousands or millions of applications, the tribal knowledge problem kind of falls apart.

CRob (5:33)
Absolutely.

Brian (5:34)
Right? And so you start thinking about what happens with AI, and if you don’t have that stuff in an automated, you know, coded kind of way, how do you feed that to an agent? The agent’s not hanging out with you at lunch. It doesn’t get an onboarding session where we say things like, you know what, we never use an LGPL dependency because we ship our code. Or, you know, we only fix vulnerabilities five and above. Or, you know, whatever the policy may be, those things sometimes can be shared among developers.

CRob (6:02)
Right. and it plays into kind of the classic problem with engineering – is most engineers I’ve met don’t like doing documentation. And with AI entering the chat room becoming this accelerant, it’s making decisions based off of knowledge or lack thereof. if you don’t have your security policy documented, it even goes back to thinking about the early days of Kubernetes.

Where it was a big mental shift for people to have that software defined network inside. And that helped, I think, a lot of organizations get better discipline and rigor where you had less mysterious outages. Because the firewall guy in the back end said, I didn’t do anything, but try test it again.

Brian Fox (6:46)
Right, right. Yeah, for sure. And that’s kind of what we’re seeing now. We’re seeing a lot of that with the, not just with agents. mean, agents are sort of like the next big step and not everybody’s
fully there yet. Some people are dabbling with it. But even just AI assisted coding, you’re seeing the same problem that you come in and you say, hey, I want a new feature. And it just grabs whatever statistically likely thing dependency is going to be in there. We’ve done some studies. We recently released the state of the software supply chain report. It’s a great report. Yeah, thank you. This was our 11th year. We just published it last month. And we did a deep dive on AI recommendations, you know, and we found that 30 % of the time the models were recommending made up versions.

CRob (7:35)
What?

Brian Fox (7:36)
Yeah, just making them up. You know, so it’s kind of shocking. In the real world, you know, if you’ve got a tool, that’s one of those things that fails fast, right? Like it picks a version that doesn’t exist, the thing goes and it immediately blows up and then, you know, Claude or whatever you’re using will go, whoops, and it’ll fix it. So it’s kind of funny, burns some tokens, but the downsides aren’t huge.

If the agent randomly picks a terrible dependency or a very old one that does in fact exist, I would argue that’s worse because there’s no fail fast in that scenario.

CRob (8:09)
Well, you also have the whole problem with slop squatting. Where the models seem to, regardless of what vendor provider you’re getting it from, they seem to fairly consistently suggest the wrong dependencies, kind of like typo squatting.

And so now the bad guys have recognized this kind of fairly consistent behavior and they’re uploading malicious packages with those bad names so that you don’t break the build because it can find what needs.

Brian Fox (8:33)
So instead of it failing fast, it fails fast by grabbing a back door or something. Exactly, that’s exactly right. That’s what slop squatting is what they call it now. Yeah, and so those are some of the challenges that we observe and you kind of take it to the extreme where now you potentially have less sophisticated developers, not classically trained developers using these tools, and they don’t know what they don’t know.

They wouldn’t necessarily stop to say, hey, I want you to now be a security expert and do an assessment of the code you just created. Like somebody who knows better will do that. But if you’ve not lived through the pain that you and I have lived, you wouldn’t think about that. And so on average, these things are going to potentially toss away a lot of the learnings that we’ve known for so many years.

CRob (09:21)
And that’s been a chronic challenge, trying to get the tribal knowledge instantiated, trying to help people make those right decisions. And the AI tools are amazing productivity and efficiency savers, but they are bringing in, as you said, classically untrained professionals that they are not a software engineer. They don’t understand how a system should be architected, or they don’t understand kind of the app sec best practices that help secure the foundation of everything and not let the world fall apart.

Brian Fox (9:59)
The interesting thing is I think they can be if prompted correctly.

CRob (10:02)
Yes.

Brian Fox (10:03)
Right? And that’s where some of the knowledge gap comes in. And I think, what was it last summer, Anthropic released the MCP model control protocol, right? Which is, I’ve spent a lot of time thinking about that pretty deeply and looking at all the tools. And I wrote about this. I think that there’s a high likelihood that we see a lot of the tools we use in software, in the SDLC today, moving more towards providing their capabilities as subject matter experts in “a thing” to an AI agent via MCP.

So I think that, for me, is pretty exciting for a number of reasons. It means, as a tool vendor, I don’t have to create a plug-in for IntelliJ and one for Eclipse and one for VS Code. As an example, MCP can be the same thing for I don’t care what tool you’re using, because it’s interacting with me via this standard API. And I’m kind of talking to it in more or less English prompts. So my ability to deliver the value that we have into whatever tool you feel like using today, and they change every week, is pretty cool.

And I would also argue that the ability to insert that information and to potentially roll out the root prompting that all of the developers are using in these capabilities is better. You’re going to get potentially better compliance than you do today. One of the things I struggled with forever was we created an IDE plugin for our capabilities that it demoed amazingly well. It showed, hey, this dependency has vulnerabilities, or license, or would make recommendations. It was great. But developers just didn’t want to install more plugins. They just weren’t using it, right?

So while it demoed well, the actual usage of it was very low for compliance reasons. That’s a thing we struggle with. Every tool vendor struggles with that. But if you were able to insert that same information into an MCP capability and the company rolls out a root prompt that says something like, hey, every time you’re choosing a new project or a new dependency or trying to assess a dependency, use this MCP server to get up-to-date real-time information, it’s more or less going to do that every time. Right.

CRob (12:13)
Yeah, and I think back to like when I was a baby cyber person going studying for my CISSP, there was a lot of talk in the exam materials about expert systems, which is exactly what I think a best case scenario with these tools can be. It’s you’re expert. I don’t have to necessarily have this expertise. That’s right. But thinking about it takes a lot of knowledge to craft these expert systems. Let’s talk about how some of these models have been trained on potentially less than expert data.

Brian Fox (12:43)
Right, and that’s just, think, the inevitable nature that the frontier models have been trained on, you know, all the stuff that they can find.

CRob (12:49)
The internet.

Brian Fox (12:49)
On the internet, good information, bad information, people talking about terrible dependencies a lot might statistically make that more of recommendation, right? And I think that can be okay as long as you’re plugging in the models that have real data. The things that we’ve seen, you know, when we assess the models is that like I said, they make up versions, they pick old versions arbitrarily, they don’t know about anything newer than when they were last trained, which means both new versions and also vulnerabilities that might be an older version.

So they’re inadvertently recommending, and it’s not even a recommendation really, it’s just using it, right? It’s putting it in there and writing the code around it. Imagine picking Spring, right? It’s just going to go, I’m going to write a Spring app and I’m going to use all Spring 5.

And then when you probe it, then it’s like, oh, sorry, I have to do two major framework updates. You almost have to throw it away and start over. And so if you’re able to plug the right data in up front, you don’t have all of that waste. And again, if you have people who don’t know to prompt it to ask about the latest versions, you can insert that underneath the hood. I think that’s what’s really cool.

Brian Fox (13:59)
But what we’re seeing currently, I kind of wrote about this a little bit too, that I feel like we’re throwing out all the lessons of the past. We’re talking about situations where whole tools, SAST is under fire right now, right? Because when all the code can be just completely generated, what’s the problem with SAST? But I do think that we’ve learned a lot of things over the years if we can figure out how to plug those capabilities into what’s being generated.

I think we can bring all of that forward with us. But the entire SDLC is going to have to adapt to that. It’s not going to be sufficient to say, I’ve got a bunch of developers over here. They’re doing AI assisted development. And then later, we’re going to run a bunch of SAS and produce legacy reports. That’s not going to work. The information has to be fed directly into the AI capabilities up front.

CRob (14:51)
And it’s the classic problem we’ve always had, where security historically is the the last thing done, addressed, it’s bolted on at the end in a lot of cases. And just this AI tooling and just the velocity it has is a huge accelerant for the sins of our past we’ve never actually addressed.

Brian Fox (15:11)
Absolutely, but it also provides the Vulcan mind meld if you want to think of it. You now have that opportunity to plug that right into what the agent is thinking about in the moment. You can’t do that with the humans, but you can do that with the agent. And that’s what I think is potentially exciting about this.

Where I described it recently at a summit, we’re sort of in a bootstrap situation, though, right? Like, we don’t have all of those capabilities. Organizations haven’t rolled them all out. And so we’re sort of in this weird situation, one foot on the boat, one foot on the dock, and it’s not going to end well as we’re going through it. And worse, there are people that are afraid of the MCP protocol. So I hear lots of organizations say, we just block it completely.

Yeah. It’s a little hard to argue that that’s not a reasonable place to start because of the nature of what’s happening. We saw just the other day the latest version of Shai-Hulud came out. Did you see this? And they used MCP capabilities as data exfiltration. And I’m like, come on, guys. There’s so much power in this, but now you’re making it like a bad thing. So I think the industry and the tools and all of that are going to have to work through governance of the MCP capabilities, sanitization inspection of the MCP capabilities just like we’ve seen. So it’s sort of one of these things like when you’ve been around long enough you can recognize the patterns. It’s new and exciting but also the pattern rhymes with a bunch of stuff we’ve done before and what frustrates me is that like everybody charges ahead so fast they just feel like it’s all new it’s all different it’s like yes but let’s not forget everything we’ve learned over the last 60 years of software engineering because the physics is still the same.

CRob (16:50)
Well, and that’s where so our AI / ML working group wrote a paper around ML SecOps. And the paper was really interesting. I recommend the audience check it out. But it was they talked about classic techniques that are assisted and are helpful with AI development. And then it talked about some gaps where we have things like are not documented policies that are kind of a hindrance and something that’s an opportunity in the future to try to get addressed.

Brian Fox (17:18)
Yeah.

CRob (17:19)
But…I’m of two minds about my friends, our new robot overlords, in that it can be extremely helpful, but I don’t see a lot of people reconsidering those lessons of the past of software engineering. To say this is all brand new and totally different, like, well, you’ve got different GPU accelerators and dedicated cores to do things.

And now with this like agentic and ADA architecture where things are more highly distributed, yeah, that’s new twists, but it’s not brand new. We’ve done networking. We’ve done composite applications for decades.

Brian Fox (18:01)
Right. It’s the same thing, you know, we saw when, you know, we were like, oh, everything should be serverless or let’s go to the micro architecture, micro architecture, micro service architecture is going to solve everything until it doesn’t. Right.

Or, you know, that’s no problem, we’ll just put it in the cloud because I can just infinitely scale my machines, right? So I see the same pattern all again, that we sort of say, yes, but this time is different because insert new technology, and then we realize, yes, but everything we know is still true. And that’s what I think we’re sort of grappling with right now as we go through this. What is absolutely true is that, you know, the AI capabilities, the agents, all these things are making everything happen so much faster. That can be good.

can also be bad. If you’ve forgotten all the lessons of the past, you’re just going to create a ton of crap much faster than you could before. And by the time you realize it, it might be too late.

CRob (18:57)
I’m familiar with a lot of enterprises that were going through a digital transformation journey, trying to update their heritage software to newer things and to the cloud to get that scalability and cost efficiency. But a lot of organizations didn’t take that journey, didn’t learn from lessons from the past.

they just crammed what they had out in the cloud, and then a month later they get this giant bill and they’re shocked and confused, or they didn’t understand that this thing wasn’t architected for zero trust, and they’re leaking data everywhere.

Brian Fox (19:30)
Right, right. Or that, or just even the performance reasons why you were excited to infinitely scale, sure, but somebody’s not excited to infinitely pay a bigger bill. Inefficient code is still inefficient code, right? And that’s what I think we’re gonna see with… with AI capabilities is just going to happen faster. And without humans in the loop, it provides less opportunities for us to course correct, which is why I’ve been taking a step back and thinking about how do we do that? How does it make sense? I think for some of the stuff that we’ve been doing as a business, it’s really exciting because we have built up really interesting, unique data sets based on being able to see everything going on with Maven Central. We’ve long had Nexus, the repository manager that’s out there.

We have hundreds of thousands of instances. Those things are proxying for enterprises, not just Maven, but NPM, Nougat, Python, all the things. And so that gives us visibility into other ecosystems so we can understand what’s going on, what’s commonly used in enterprises, these kinds of things. And so all of that data can be fed now directly, like I said, the Vulcan mind meld directly into these tools. And it makes it a lot easier.

So in some ways, when we sort this out, and people become less afraid of MCP capabilities, we can directly inject a stream of high quality data to make all of those things better. But, before businesses can really leverage that, they have to get out of the experimentation phase. They have to roll that out. And these things are kind of interrelated. What we see is that organizations are afraid to let developers just go with AI assisted development because it’s not governed, because they can’t govern it.

And those are echoes of what I saw firsthand during the early days of open source. Like I said in the beginning, people said, we’re not using it. And then I’d tell them, yes, you are. And then their reaction was like, well, just shut it all out. It’s like, right, you can’t do anything. So the reaction that some enterprises have right now of like, we’re just not going to do anything with AI, is just setting themselves up to be left behind.

The right answer is to do it thoughtfully and use tools to help them make better decisions.

CRob (21:43)
So reflecting back, mentioned in your report that you have some guidance for people around AI. What would the top two or three things, if somebody’s thinking about moving more aggressively in this AI direction, what can they take away and do immediately or start thinking about?

Brian Fox (21:01)
Yeah, I mean, think the biggest thing is humans like to…try to take the old patterns and just adopt it to the new new technologies like we were talking about take an inefficient architecture and throw it in a cloud It’s gonna fix everything. No, it’s not and I think that’s true of Let’s call it the AI SDLC right an AI native SDLC Might resemble a normal SDLC, but it should be designed differently, right?

You know trying to do the checks and balances after the fact is even worse than it was with humans You need to think about providing that information upfront so that you get the value in the creation of the code and not try to chase it out. You need to be able to think about how all of these things can be done in parallel with agents, breaking these things down. what I would say is, don’t just try to do what you’re doing today and use AI to do it. Take a step back and really assess how can you adopt this.

It’s sort of like the conversations we were having in the board today about developers, maintainers are getting overwhelmed with AI slop. It’s true. A reaction is to stop allowing that to be contributed, just dismiss everything AI. That’s not a good answer. A better answer is let’s figure out how to help them use AI tools to be able to keep up with that, right? Because that’s what it’s good for. It can review and assess the patches faster than the maintainers and then provide sort of a first pass filter, if you will, right?

But that requires thinking outside of the box. Don’t just try to keep doing what you’re doing and try to keep up with it. Think about how you judo move that into something that makes more sense for your organization.

CRob (23:42)
And this skirts along another kind of project you’re passionate about, sustainability and funding. It is one thing to try to admonish the developers, why aren’t you using AI? But there are real costs involved around this. And, just to say, well, you should use the tool that doesn’t help them when there’s no funding. They don’t have access to infrastructure to be able to do these things. And that’s like, think, it touches on your passion project around trying to help get the package repositories more sustainably funded.

Brian Fox (24:18)
That’s right. Yeah, I mean, if you take a step back and you think about open source when probably you started, certainly when I started, what that really meant was you were donating your time. And you were sharing your thoughts, and you were sharing your words via code. And that was in a time when it was perfectly acceptable. In fact, it was the only choice that you built things and you shipped them off of your laptop.

There was when the Apple MacBook Air launched, the first one. That launched with a version of Maven on it that was signed by my key, my personal key, that was built on my personal laptop. So everybody that bought the launch version of the MacBook Air had my signed code on it. That’s kind of cool.

But also kind of scary, right, when you think about it. Like, what if my laptop was compromised? And that’s the world we live in today. Fortunately, in 2009 or whatever it was, that was a little bit more remote of a chance. And so everybody thinks like, well, that’s crazy. You wouldn’t do that anymore. So what does it mean today? It means you have to have certified builds. Usually that means it’s running in the cloud, and it’s attested to, and all these kinds of things. And that’s not free.

Like I can’t donate that, I’m not a hyperscaler. Most open source gets that infrastructure donated by these big companies, but there’s a lot of opportunity for abuse, right? And these types of things. it’s just, at the end of the day, it’s not free. So the cost of producing open source is not free anymore. It’s not just donating my time with equipment and internet access I already have, right? That’s the big difference. And I think people don’t really recognize that and now fast forward to what we’re just talking about AI the obvious answer to deal with AI, you know Piled on PRS is to have AI assistance help.

Who’s gonna pay for that? It’s literally not free. It costs electricity, last time I checked we still pay for our electricity Regardless

CRob (26:12)
Electricity, water…

Brian Fox (26:14)
Right all of these things, right? These are very…they have very real implications. They’re just not free and so There’s no good answer to that. How does that get aligned? How do we…how do we continue to create open source software that can power all of these industries in a world where it’s not just somebody donating their time and thoughts? There are no good answers. But we’re working towards trying to align that. Because the bulk of open source software, certainly in our world, in these areas, is being consumed by organizations that are selling for-profit software, more or less.

There’s definitely a lot of hobbyists and stuff like that the biggest consumers from our repositories are all the giant companies. I’ve named the top 100. You would know every single one of them. And I’m sure that’s true for all the registries. So there has to be an answer in there. I don’t know the stat off the top of my head, but the Linux Foundation does the census, right? And it’s billions of dollars of economic value that open source creates. Eight billion? Nine billion?

CRob (27:16)
Trillion.

Brian Fox (27:17)
Oh, it’s a trillion now?

CRob (27:17)
It’s eight (8) trillion, I believe.

Brian Fox ()
Eight (8) Trillion dollars worth of economic value being produced by open source…1 % of that would pay for a lot of that infrastructure, and then a whole bunch more. And so I think that’s what ultimately we have to figure out how to balance. AI just makes that worse, because it moves the bar even further.

CRob (27:39)
Interesting conversation. Any final thoughts you want our listeners and viewers to take away?

Brian Fox (27:46)
Well, certainly go take a look at The State of the Software Supply Chain Report.

CRob (27:51)
Great report.

Brian Fox (27:52))
sonatype.com/SSCR Certainly, I’ve also written a number of blogs. You can find those at our website as well. That deep dive, kind of all these topics we touched on here. Yeah.

CRob (28:02)
Excellent. We’ll put some links as we do our summary. So Brian, thank you for our inaugural episode of Big thoughts, Open Sources. I think this was an amazing conversation that we’re gonna continue to be adding onto and reconsidering in the coming weeks and months.

Brian Fox (28:20)
Yeah, thanks for having me kick it off in Napa.

CRob (28:25)
Thank you. Well, I hope everybody stays cyber safe and sound. We’ll talk to you soon.

What’s in the SOSS? Podcast #58 – S3E10 Big Thoughts, Open Sources: Beyond the Hype: Brian Fox on Securing the Agentic Future of Open Source

Summary

Conversation Highlights

Episode Links

Transcript

We envision a future where OSS is universally trusted, secure, and reliable. Join us in making open source more secure.

Get the latest announcements, event info, and the community news in your inbox