Skip to main content

What’s in the SOSS? Podcast #9 – Sonatype’s Brian Fox and the Perplexing Phenomenon of Downloading Known Vulnerabilities

By July 16, 2024August 5th, 2024Podcast

Summary

Brian Fox is Co-founder and Chief Technology Officer at Sonatype, bringing over 28 years of hands-on experience driving software development for organizations of all sizes, from startups to large enterprises.

A recognized figure in the Apache Maven ecosystem and a longstanding member of the Apache Software Foundation, Brian has played a crucial role in creating popular plugins like the maven-dependency-plugin and maven-enforcer-plugin. His leadership includes overseeing Maven Central, the world’s largest repository of open-source Java components, which recently surpassed a trillion downloads annually.

As a Governing Board member for the Open Source Security Foundation, Brian actively contributes to advancing cybersecurity. Working with other industry leaders, he helped create The Open Source Consumption Manifesto, urging organizations to elevate their awareness of the Open Source Software (OSS) components they use.

Conversation Highlights

  • 00:57 Brian shares his background
  • 03:56 The confusing trend of downloading assets on Maven Central with known vulnerabilities
  • 08:16 How this trend continues in other repos
  • 11:08 Brian and CRob discuss Log4Shell
  • 16:54 Brian answers CRob’s rapid-fire questions
  • 18:46 Brian’s advice for up-and-coming security professionals
  • 19:50 Brian’s call to action

Transcript

Brian Fox soundbite (00:01)
The customer is not gonna accept as an excuse, like, why did you get hacked? Well, my direct dependencies were fine, it’s those indirect ones that were bad. Like, nobody cares, right? Like if you buy a pie from a bakery and you get sick, it’s not acceptable for them to go, well it was my sugar provider that did a bad job, here. 

CRob (00:18)
Hello everybody, welcome to What’s in the SOSS? I’m CRob. I do security stuff on the internet. I’m also a community leader within the Open Source Security Foundation, and I’m one of the hosts of this amazing podcast. Today I’ve got my friend Brian Fox. He’s the CTO and co-founder of Sonatype and he is an amazing open source contributor and he’s a great collaborator with us and a lot of industry efforts. So I want to say, Brian, welcome.

Brian Fox (00:49)
Yeah, thanks for having me.

CRob (00:50)
Could you maybe share a little bit about your origin story? How you got into open source and kind of the stuff you’re into today?

Brian Fox (00:57)
Oh, sure, my origin story. That’s a long one, but short version is I dabbled in open source before it was I think called open source in college actually, some PERL CGI open source stuff way, way back in the day. But I would say my first real introduction into like durable open source, let’s say — I had I dabbled with a JSP library at SourceForge. And I don’t really count that, but you can see they’re all connected in a straight line. I got involved with Apache Maven super early

in the 2.0 alpha days. I wrote some plugins. I was basically doing development management during the day, had some folks working for me, and we were trying to transform our builds from Ant to Maven and move from CVS, I guess it was, to Subversion. 

Yeah, I’m dating myself a little bit. And we were running into some issues, and so because I didn’t have much opportunity to code during the day, I was coding at night. So I would go home and start working on fixing bugs and writing plugins that the folks working for me during the day the next day would then use to move forward. And so I wrote some pretty popular plugins, the Dependency plugin and the Enforcer plugin that later came from, the Code Haas Project — Code Haas was sort of an adjacent open SourceForge, if you will, that eventually got pulled into Apache Maven proper. And I came along with it as a committer, was later a project management committee member, and for a while after we started Sonotype was even the chair of that project. 

And, and for, for historical reasons that, you know, we could have a whole podcast on, you know, the Maven central repository has always been a separate project from the Apache Maven software base. And basically, a handful of us, and then later Sonatype, have basically been the stewards of that all along. So, I know where all the bodies are buried and have lots of war stories from, from open source Java and Maven Central in general, from, for the last 20-plus years. I suppose.

CRob (03:03)
So I’m a little bit of a data nerd and I know you share that kind of passion for data and numbers. Your organization, Sonatype, puts out an annual report, well many of annual reports, but the one I’m most interested in is the State of Open Source Supply Chain Report. That’s one, I’ve referenced that many, many years. It’s pretty amazing. And I want to kind of talk about some of your findings from the most recent one. Maybe we can kind of, that’ll get us into this topic. But you know, when I was looking at the 2023 report, I noticed that you had many statements, one of which was you noted that 96% of all vulnerable downloads from Maven Central had known fixes available. What’s up with that? Why are folks still downloading known vulnerable packages when there’s a fix available?

Brian Fox (03:56)
Yeah, yeah, this has been my soapbox, as you know, for a couple of years. 2022 was the first year we published that stat. Sadly, it was unchanged in 2023. It did not get better. What does this stat mean exactly? At a point in time, when a thing that is in Maven Central, and I think off the top of my head, I’m not certain. I think around 12% of the things in central have a known vulnerability. But they’re skewed, obviously, towards more popular versions of it. So when those things are downloaded, looking at the point in time that it was downloaded, was there already a fix for that vulnerability available? So in other words, it’s not a case of this vulnerability is out there and people are using it because it’s not fixed.

After Log4Shell, which I’m sure we’ll get into a bit more, you know, we saw a lot of people talking about like open source should do a better job of fixing their stuff. And it’s like, well, when you sit where I am and you look at that and you go, well, wait a minute, when these things are being consumed from the repository, 96% of the time, the fix already is available. So what does that tell you? It tells you that the consuming organizations are not choosing to update for the fix. It shouldn’t be a 0% stat, right? There’s always going to be vulnerabilities that in certain contexts just don’t apply, and so you can continue to use the thing, or you have other mitigating controls.

But 96% is crazy. And the fact that it doesn’t change. And so why does that happen? There’s lots of reasons. And I’ve been recently in blog posts and articles exploring more the psychological side of things. I think there’s, humans are wired to procrastinate and hope that the future will be better. And I think there’s an element of that, that people look at it and go, well, you know, we’ve lived with these things so long, you know, I’m afraid to do an update. Dr. Wheeler and I have kind of toyed with the idea that API changes introduce a bunch of these problems, or at least the fear and the history of API breakages have caused people to be afraid to make an update. 

And so there’s lots of little reasons that cause organizations to just sit there and not update and therefore continue to consume these known vulnerable things. And so I’ll ask the obvious question because when I point this stat out, everybody turns around and asks the same question, which is, Brian, why do you make vulnerable versions available for download in Maven Central at all? Right? And so, you know, part of that is precedence in history that Maven Central is known for its stability, that we don’t allow authors to just up and decide to unpublish something. For those of you that remember left-pad and NPM years ago, it was a little bit of a tit-for-tat, and a maintainer removed a package and it broke like the internet, right? And so for reasons like that, we don’t allow things to just up and disappear. 

Most of the time, like I said, not everything is universally applicable. So I use an analogy. For some people, peanut butter can kill them. They are deathly allergic to peanut butter. I can go into the store right now, I buy peanuts, buy peanut butter, buy things with peanut butter in them. Why? Because I happen to, fortunately, not be allergic to peanut butter. I do not have that vulnerability, or at least that vulnerability does not apply to me. So like that analogy, why should we take every single component down just because it has a vulnerability and make it impossible to reproduce old things, break things that don’t need to be broken? 

And I draw a distinct line between a vulnerability and a known malicious package. A known malicious package is like saying we’re going to continue to sell salmonella-tainted  food because some people luckily don’t get affected by it. Like, no, that’s very, very different, right? So salmonella components, yeah, they’re coming out immediately. Peanut butter, they should be labeled, they should be understood so that people that have vulnerabilities can know about it, right? And so that’s why we don’t just disappear components merely because there’s a vulnerability. 

CRob (08:04)
You have a lot of observability into Maven Central. Do you feel that you know the pattern you’re seeing does that probably exist in other repositories as well? Do you think it’s similar? 

Brian Fox (08:16)
I know it does. Yeah, I know it does. How do I wanna phrase this? There are similar problems. They don’t have the same root in the origin of it. So, for example, Maven has a strong namespace in the group ID, right? So typically that would be the reverse DNS of your company, just like Java package standards. That’s where we, we didn’t invent it. We just followed the Java standard. And when, when somebody comes to Maven Central, we validate that they control that domain and that namespace, or alternatively they can use their project URL at GitHub as an example, right? And, and we do the validation and that’s so somebody can’t show up to the repository and pretend to be Eclipse and start publicizing things. You can’t pretend to be Apache and start publishing things. 

That’s an important piece of this. The other piece of it is that Maven prefers stability over auto-updates. So it prefers staying on a version number. So it’s sort of an anti-pattern in Maven land to say, just get me the latest version of this thing all the time. So that choice, that design choice, has unintentionally created the issue that I talked about, that people aren’t upgrading, right? That’s an unintended side effect of that. But now let’s look at what’s happening on ecosystems like NPM, Python, Ruby, where the tool has made the choice that we are always gonna check and fetch the latest version unless I’m told not to, right? 

That’s why you have the concept of a lock file to lock it down because if you don’t lock it, it’s going to fetch it. And also, unfortunately, those ecosystems also don’t have a strong namespace enforcement. So what happened starting around 2017, the attackers figured this out and they start publishing components into the repository that have confusingly similar names, basically a typosquat of a name. And since there’s no namespace enforcement, it’s very hard for consumers to tell the difference between am I getting the legit component or not?

And you couple that with the fact that the tool is going to auto-update these versions all the time, which means if somebody were to compromise an actual legitimate JavaScript project and put a fake component up there — and this has happened many times — the consumers will fetch it almost immediately. So you’ve created this situation where it’s easy for the attackers to put stuff in the repo and everybody will consume it almost immediately so you have a ready willing audience, which means you’re seeing on those ecosystems this massive stat we talk about, 350,000 known malicious component attacks in the last couple of years and it’s been doubling year over year. 

That’s only happening in those ecosystems. So we don’t have this problem in Maven because there’s not the auto-update and there is the namespace. But we’re having the massive attacks on these other ones because they do, right? So it’s sort of like these design decisions lead to different unintended consequences, I guess I would say.

CRob (11:08)
So let’s let’s talk about a specific example. Our friend Log4Shell that ruined many, many IT folks holidays in December a year or two back. I have an old stat: there were almost 250,000 known downloads of the malicious packages of Log4j which was about 29% of all the downloads at that time.  How is it that people are still grabbing that known malicious package? Why is that with such a widely known vulnerability?

Brian Fox (11:45)
(Laughs)  Yeah, I wish I knew, honestly. So I’m looking at the latest stats and anybody listening to the podcast, you can see them yourself. We have a resource center that’s been up since 2021. It’s at sonatype.com/log4j. So you can see those stats yourself. As we sit here right now, we’re looking at 405 million downloads since it was announced and worse the last seven-day average is 35% of those are of those non-vulnerable versions. And so it’s actually gone backwards in the last year. We were down at one point to, I think it was in October when we published the report, it was in the twenties and it took multiple years to get there, which was really disappointing, but it slid backwards for reasons that I have not been able to explain. 

And so, I also use this one sort of as the perfect bellwether for the reasons that you outlined, right? That it’s fairly easy to exploit. It’s fairly publicized. Everybody knows about it. And it was sort of one of these situations very early on that every vendor was getting asked by their customers, did you remediate this? And they would have to explain why. 

And so it created this situation that it was easier to just upgrade than it was to explain why you weren’t upgrading. So like I said before, I wouldn’t expect the uptake of every vulnerability to be 100% updates. In Log4Shell,  I think that is closer to true, and so we should expect to see the uptake here to be you know closer to, you know, 80, 90, 95% I would expect. I’ve done some research recently to look into the origin of this because you know people will often try to justify like, it’s just a few people with broken builds that are running all the time, right? Because it does seem incredulous. 

I’ve looked into the vulnerable downloads within the last month and they’re coming from millions of IPs worldwide. You can’t pin it down to, this is like a provisioning script that’s running on EC2 or something like this, right? Or some serverless configuration that might be happening in a Lambda, you know, weird things like that. No, it’s all over the map. 

And when I’ve looked at like the cohort of dependencies when I’ve taken the IP numbers that are fetching these vulnerable versions and I look at the cohort of dependencies to see like, is it one application everybody’s built building or something weird. Is it one place that’s causing this as a transit of dependency? I’ve not been able to put my finger on it. It is all over the map, which just tells me it’s kind of what I suspected in the beginning. It’s just average usage and folks for whatever reason just freaking haven’t upgraded. (Laughs) 

So it’s really disappointing because if it were any of the other things I mentioned, we could go attack that at its source and have a big impact. Instead, it’s messaging like this to get people to pay attention. And this is, it kind of intersects the SBOM area. If you can’t produce an SBOM, kind of implies you don’t know what’s inside your software. If you don’t know what’s inside your software, why should we expect that you’re doing a good job of updating it?

CRob (15:493)
So thinking about solutioning this a bit, do you have any advice for what developers or downstream consumers can do as they’re ingesting open source components to avoid this continual downloading of known vulnerable packages?

Brian Fox (15:15)
Yeah, I mean, tooling has existed for a long time to help you understand your entire transitive hull of your application. You know, Sonatype’s been providing this for, what year is it now? 14 years. (Laughs)So this is not a new thing. And you know, in the world, the regulated world, we’re seeing legislation all over the place from the US government, from European Union, PCI standards, everybody’s pushing towards, you need to be able to produce an SBOM. And this is part of the reason why, that it’s no longer acceptable to just blindly ignore what’s inside your software. And we saw for years organizations were only focused on the direct dependencies, the things their developers use, and ignored the transitive dependencies. 

But there’s more transitive dependencies than direct ones. So your odds are of pulling in something bad are even higher further down. It’s quite literally the iceberg. 70% of it is coming from these open source transitive dependencies. You need to have visibility in that because your customer is not going to accept as an excuse, like, why did you get hacked? Well, my direct dependencies were fine. It’s those indirect ones that were bad. Nobody cares. If you buy a pie from a bakery and you get sick, it’s not acceptable for them to go, well, it was my sugar provider that did a bad job here. Like I’m sorry like you know you still created this thing and sold it to me you’re still responsible. 

And I think that ultimately that’s what it’s gonna take that some form of liability reform to the point where organizations are actually responsible for the outcomes is needed to change this behavior. 

CRob (16:54)
Well, I think this is gonna be the first of an intermediary step in an ongoing conversation. And hopefully we can help raise th at awareness and get folks focused in on this unique problem. Let’s move on to the rapid fire section of the interview here. Got a couple questions and we’re gonna see what your thoughts are. First question, Brian, spicy or mild food?

Brian Fox (17:20)
Depends on the day, but let’s go with spicy today.

CRob (17:23)
(Sound effect “Oh, that’s spicy) Nice. Being a developerologist as your background, what’s your preference? Vi or Emacs?

Brian Fox (17:34)
Oh, Vi.

CRob (17:35)
Huzzah! What’s your favorite type of whiskey?

Brian Fox (17:41)
Ooh, ooh. I don’t know. I like variety. How’s that? I like to explore new ones. I have one that they make here — actually, it’s in Vermont, just over the border — they make it from maple syrup. And that one is really unique. So I’m gonna go with that. How’s that?

CRob (17:58)
All right, that sounds like a delight. And our final,  most controversial question, tabs or spaces?

Brian Fox (18:06)
(Sighs)  Spaces.

CRob (18:09)
Spaces, very good. There are no wrong answers, but that’s interesting how that causes a lot of controversy.

Brian Fox (18:17)
Yeah, you’re not going to ask about, you know, Alman versus K&R? I mean, that that always is related to tabs and spaces.

CRob (18:24)
(Laughter) Maybe in season two.

Brian Fox (18:28)
Yeah!

CRob (18:30)
Well, thanks for playing along. (Sound effect: That’s so sweet!) As we close out, what advice do you have for  someone that’s thinking about entering this field, whether it’s being an open source developer or getting into the field of cybersecurity? What advice do you have for newcomers?

Brian Fox (18:46)
I think my advice is always the same. Pick something that you’re interested in, whether it’s robotics or build software, whatever, and go find an open source project and try to get involved. It’s a great way to learn a whole bunch of skills. I mean, to be able to learn how to write better code, but also how to navigate, you know, the inevitable politics when you have more than, you know, two humans involved. The communication skills, the politics, the collaboration. 

I think you can learn a lot from open source, especially if you’re, you know, let’s say in high school or college and you don’t necessarily have the ability to get a job here. You can go get some real-world experience through open source projects and there’s open source for basically everything. So you know if you’re into rockets or planes whatever go find something it’s out there. It’s even easier today than it was, you know, 20 years ago, right? And and that would be my advice. 

CRob (19:44)
And finally, do you have a call to action for our listeners to help kind of inspire them?

Brian Fox (19:50)
If your organization can’t immediately assess — if I were to tell you about a new vulnerability right now that you never heard of, if you can’t immediately assess — if you’re even using that component anywhere in your organization or further, you can’t quickly produce the list of applications that are using that, then you’re basically powerless to respond to this next problem, right? That situation I just described is what most of the world went through in, what, December 9th, 2021, when Log4Shell dropped on them. 

We have studies, it was in the 2022 report, and I think we probably repeated it in 23, that showed organizations that had tooling in place were mediated their portfolio of thousands of applications within days versus other organizations that spent six months doing triage. So if you can’t immediately have a system that can tell you, are you using this component anywhere, any version of it, and you can’t get to the point of saying, we are using this precise version in these applications, then you need to solve that problem immediately. If only to prepare for the inevitable next thing than to prepare for the legislations that are going to require you to have the SBOMs.

And I would add, if you have solved that, then you need to be looking at these intentionally malicious problems because those require different solutions. They’re not going to show up in SBOMS. They’re trickier because they’re attacking the developer as soon as they download them. So if you think you have your inventory under control, you need to ask yourselves, how would I know if a developer downloaded one of these typosquatted components that may have dropped a backdoor or might have had custom code that simply exfiltrated data directly in the open source? Your traditional scanners are not going to pick this up. So I guess that’s two things depending on where you sit on that spectrum.

CRob (21:39)
Well, excellent. I appreciate you have some amazing advice and those are some interesting things to think about and act on. As always, I really appreciate your partnership and your ongoing contributions to help make the whole ecosystem better, Brian. You’re an amazing partner.

Brian Fox (21:53)
Yeah. Likewise. Thanks, CRob, for inviting me.

CRob (21:55)
Cheers!

Announcer (21:56)
Thank you for listening to What’s in the SOSS? An OpenSSF podcast. Be sure to subscribe to our series of conversations on Spotify, Apple, Amazon or wherever you get your podcasts. And to keep up to date on the Open Source Security Foundation community, join us online at OpenSSF.org/get involved. We’ll talk to you next time on What’s in the SOSS?