Skip to main content

What’s in the SOSS? Podcast #11 – Google’s Andrew Pollock and Addressing Open Source Vulnerabilities

By August 13, 2024Podcast

Summary

Andrew Pollock is a Senior Software Engineer at Google, currently working on https://osv.dev. With a background as an Enterprise Security Engineer, he has extensive experience in large-scale Linux Systems Administration and GCP Security. Andrew is passionate about the human factors in security, focusing on scalable solutions, great user experiences and self-service opportunities. He has primarily worked in Linux/Unix environments as a Site Reliability Engineer or Security Engineer, with a strong interest in process improvement and automation.

Conversation Highlights

  • 00:52 – Andrew shares his background as a “mid-90s data nerd”
  • 02:31 – Managing vulnerabilities in the open source ecosystem
  • 03:57 – How to navigate inconsistent metadata
  • 06:26 – The challenge of source attribution
  • 07:54 – The rapid-fire round
  • 09:15 – Andrew’s advice to open source developers
  • 10:22 – Andrew’s call to action to developers

Transcript

Andrew Pollock soundbite (00:01)
The beautiful thing about open source is it’s open to all, it’s accessible. You can play around with things, you can break it, you can fix it. It’s fairly approachable. Most of the larger projects have vibrant communities around them that are fairly welcoming and inclusive. 

CRob (00:18)
Hello everybody, I’m CRob. I do security stuff on the internet and I also am a community leader within the OpenSSF, the Open Source Security Foundation. One of the really cool things I get to do as part of the OpenSSF is to host What’s in the SOSS?  where I talk to amazing people from across the open source ecosystem. And today I have my friend Andrew Pollack from Google. G’day, sir.

Andrew Pollock (00:41)
G’day, CRob, how you doing?

CRob (00:43)
I’m doing wonderful. Thank you for asking. Andrew, could you maybe give the audience a little bit of your backstory, your origin? How did you get into open source and what are you doing today?

Andrew Pollock (00:52)
Yeah, so I’m a computer nerd of the mid-90s. I grew up in the MS-DOS era and out of high school, I started studying an information technology degree at uni and I was also working as a mainframe operator at the Brisbane City Council where I also encountered Unix and found Linux in its early existence to be just way cooler than DOS. My Linux distro of choice was Debian and I quite enjoyed using Debian and aspired to become part of that project, and so I joined Debian in the early noughties. I became a Debian developer and started participating in the Debian project proper. And that was sort of my ground level sort of involvement with open source. And I’m still a Debian project member today, but nowhere near as active as I would like to be.

CRob (01:47)
And what are you doing today within the open source ecosystem?

Andrew Pollock (01:50)
Today I work on OSV, which is a couple of different things, right? So the part that intersects with the OpenSSF is the OSV schema. And the part that I more actively work on is osv.dev, which is an open source implementation of a database using the OSV schema to aggregate and enrich those records.

CRob (02:16)
Very nice. Well, I know amongst other things, you’re a big data guy. So let’s talk about vulnerability data and why it’s important. Could you maybe explain some of the key kind of data points that we need to effectively manage vulnerabilities in this wacky ecosystem?

Andrew Pollock (02:31)
Yeah, you know, this is all about the data. So I really enjoy looking at the data in aggregate form and zooming out and looking at challenges with using that in bulk. So OSV provides vulnerability information about software packages at the source code level predominantly and it also provides Linux package of vulnerability information as well. 

And the thing that we need to be able to convey that information accurately is a good package name and good version information because we want to address the use case of vulnerability scanning, first and foremost, and then vulnerability remediation for things that are detected. So step one, to be able to identify what’s vulnerable, you need to know, you need to have a package name that’s meaningful within the ecosystem that you’re sort of wanting to operate on. And then you need to, at a minimum, know what version you need to be beyond to not be impacted by the vulnerability.

CRob (03:43)
I bet you see, let’s call it mildly, some inconsistencies across our amazing ecosystem. Could you maybe speak to some of the challenges that you see within vulnerability metadata today?

Andrew Pollock (03:57)
OSVs got multiple different data sources today, and a lot of them come from language ecosystems, which have some sort of curation within themselves so they’re fairly internally consistent with their packages because they know their own sort of backyard. Where I started looking more broadly was the problem space of C and C++ software, which doesn’t sort of have a centralized repository of vulnerability information. So we had to look at the CVE space for that, and we started pulling CVEs from the NVD to try and figure out which ones of them related to software that was not being covered by OSV-native vulnerability information.

And that’s where — me as a data nerd — just lost my mind because things were very, very inconsistent. So the challenges in that space are around naming. As I said, knowing what the package is to be able to identify it and, to a lesser degree, versioning as well because there’s no consistent versioning scheme when you go from one random open source project to another. And not all open source projects even follow any particular release practice or versioning scheme. So you don’t know anything about a string that is being called a version.

CRob (05:31)
Can you maybe just give us some insight to how you might try to handle some of those problems?

Andrew Pollock (05:35)
I would obviously advocate for projects as part of their maturation process to adopt release management practices that would include things like a formal versioning scheme. SemVer is the one that immediately springs to mind. It’s a standardized format. You can make some pretty clear assumptions about how that one walks and talks. Calva is another one that I know of. So that’s really, really helpful for reasoning about a project.

CRob (06:08)
So can you maybe talk to — you and the others in the OSP project — what you’re attempting to solve within the open source ecosystem? You know, how do you find these authoritative sources? Like how do you know what the right source is to try to make some of these attributions?

Andrew Pollock (06:26)
For the more formalized languages, we don’t have to find the source. The source sort of defines itself, right? So for your Pythons and your Gos and your Rusts and those sorts of things, they have a curated data source and they’ve chosen to supply that data natively in the OSD format and that’s great, right? So where we’re having to do that synthesis ourselves is where things get trickier. And so the challenge that we have is firstly determining what is the authoritative source for a particular piece of software. So that’s sort of 90% of the time that’s attributing it back to a GitHub repo. And then figuring out what the versions are in that repo. 

So we’ve got a real mix where there’s somebody in the business of providing accurate authoritative data, and then one where we’re having to do some inferences about it ourselves. And for the ones where we’re doing the inferences ourselves, we just have to look at the existing vulnerability data that’s available to us, so the CPE records and other data sources like the NVD CPE dictionary and look for hints as to what the what the repository might be. And that’s either metadata in the CPE dictionary or metadata on the CPE itself. So reference URLs typically.

CRob (07:54)
Well, I think it’s amazing work that you all are doing for the community. I think it’s very helpful for researchers and downstream and also community members. So I really appreciate it. Let’s move on to the rapid-fire part of our talk today. First off, spicy or  mild food.

Andrew Pollock (08:12)
I’m on the milder end of things. I value my taste buds and my sense of taste, so I don’t like to destroy them by too spicy stuff.

CRob (08:20)
Very nice. What’s your favorite whiskey?

Andrew Pollock (08:23)
Ooh, well funnily enough I just talked about taste bud destruction, but Laphroaig.

CRob (08:29)
Laphroaig. Very nice. Yummy. So being a developer from your background. Vi or Emacs?

Andrew Pollock (08:37)
Oh, Vi all the way.

CRob (08:39)
Yes! Another fan. And finally, tabs or spaces, sir?

Andrew Pollock (08:44)
Uhh, spaces.

CRob (08:46)
Spaces. All right. Any any rationale why?

Andrew Pollock (08:49)
I think I like my consistency, right? And so if you have tabs, you’re at the whims of what the editor’s tab spacing is, whereas if they’re spaces, they’re spaces. It’s unequivocal.

CRob (09:02)
Very nice. Well, and as we wrap up, what advice would you give somebody that’s kind of coming into this space? They want to become a new open source developer or contribute to a project, or they even want to get into cybersecurity. What advice do you have for those folks?

Andrew Pollock (09:15)
Well, my career background is I’m very self-taught, and I’d like to think that that’s still a feasible career path for people today. And the beautiful thing about open source is it’s open to all, it’s accessible. You can tinker with things as a hobbyist, you can tinker with things in your own time, you can play around with things, you can break it, you can fix it. It’s fairly approachable in a sort of incremental way. And most of the larger projects have vibrant communities around them that are fairly welcoming and inclusive. 

And so stay curious, experiment, learn by breaking things and putting them back together, I think that’s a great way of learning. I’m an experiential learner myself, so I think that’s a great way to learn. I think open source is a fabulous sort of ground level way to get involved.

CRob (10:08)
That’s awesome. I think that’s excellent advice for somebody looking to get into this crazy space we all live in. Closing out, do you have any call to action anything you want to try to inspire our listeners to get into or help out with?

Andrew Pollock (10:22)
Yeah, given that I’m spending a lot of time looking at vulnerabilities in the aggregate, my sort of call to action to developers is to think about the bigger picture when you take a dependency in code that you’re writing because that’s normally how known vulnerabilities come into your code that you’re working on. So the OpenSSF has some really awesome best practices guides around evaluating a dependency or a project before you might want to take it on as a dependency. 

And we’ve got some other teams in the Google open source security team that they’re doing a lot of dependency analysis type insights. So it’s very easy to just take a dependency on board to solve a problem. But if you don’t sort of look at the entire graph of dependencies that are behind that sometimes you’re actually sort of taking on quite a liability. So that would be my sort of piece of advice to sort of developing in the open source space is sometimes it might be better off just re implementing some functionality yourself rather than grabbing something that solves the need.

CRob (11:32)
Yeah, that’s unknown that you don’t know who how made it and what’s inside it.

Andrew Pollock (11:37)
Could be an iceberg.

CRob (11:39)
Exactly. Well, Andrew, I really thank you for your time, everything you do for the community, your time today. Thank you all and have a great day.

Announcer (11:47)
Thank you for listening to What’s in the SOSS? An OpenSSF podcast. Be sure to subscribe to our series of conversations on Spotify, Apple, Amazon or wherever you get your podcasts. And to keep up to date on the Open Source Security Foundation community, join us online at OpenSSF.org/getinvolved. We’ll talk to you next time on What’s in the SOSS?