Skip to main content

What’s in the SOSS? Podcast #14 – CoSAI, OpenSSF and the Interesting Intersection of Secure AI and Open Source

By September 10, 2024Podcast

Summary

Omkhar is joined by Dave LaBianca, security engineering director at Google, Mihai Maruseac, member of the Google Open Source Security Team, and Jay White, security principal program manager at Microsoft. David and Jay are on the Project Governing Board for the Coalition for Secure AI (CoSAI), an alliance of industry leaders, researchers and developers dedicated to enhancing the security of AI implementations. Additionally, Jay — along with Mihai — are leads on the OpenSSF AI/ML Security Working Group. In this conversation, they dig into CoSAI’s goals and the potential partnership with the OpenSSF.

Conversation Highlights

  • 00:57 – Guest introductions
  • 01:56 – Dave and Jay offer insight into why CoSAI was necessary
  • 05:16 – Jay and Mihai explain the complementary nature of OpenSSF’s AI/ML Security Working Group and CoSAI
  • 07:21 – Mihai digs into the importance of proving model provenance
  • 08:50 – Dave shares his thoughts on future CoSAI/OpenSSF collaborations
  • 11:13 – Jay, Dave and Mihai answer Omkhar’s rapid-fire questions
  • 14:12 – The guests offer their advice to those entering the field today and their call to action for listeners

Transcript

Jay White soundbite (00:01)
We are always talking about building these tentacles that spread out from the AI/ML security working group and the OpenSSF. And how can we spread out across the other open source communities that are out there trying to tackle the same problem but from different angles? This is the right moment, the right time and we’re the right people to tackle it. 

Omkhar Arasaratnam (00:18)
Hi everyone, and welcome to What’s in the SOSS? I’m your host Omkhar Arasaratnam. I’m also the general manager of the OpenSSF. And today we’ve got a fun episode for y ‘all. We have not one, not two, but three friends on to talk about CoSAI, OpenSSF AI and ML, how they can be complementary, what they do together, how they will be focusing on different areas and what we have ahead in the exciting world of security and AI/ML. So to begin things, I’d like to turn to my friend David LaBianca. David, can you let the audience know what we do?

Dave LaBianca (00:57)
Yep, hey, so I’m David LaBianca. I’m at Google and I’m a security engineering director there and I do nowadays a lot of work in the secure AI space.

Omkhar Arasaratnam (01:06)
Thanks so much, David. Moving along to my friend Jay White. Jay, can you tell the audience what you do?

Jay White (01:16)
I’m Jay White. I work at Microsoft. I’m a security principal program manager. I cover the gamut across open source security strategy, supply chain security strategy, and AI security strategy.

Omkhar Arasaratnam (01:23)
Thank you, Jay. And last but not least, my good friend Mihai. Mihai, can you tell us a little bit about yourself and what you do?

Mihai Maruseac (01:30)
Hello, I am at Google and I’m working on the secure AI framework, mostly on model signing and supply chain integrity for models. And collate with the GI, I collate the OpenSSF AI working group.

Omkhar Arasaratnam (01:43)
Amazing. Thank you so much and welcome to one and all. It is a pleasure to have you here. So to kick things off, who’d like to tell the audience a little bit about CoSAI, the goals, why did we need another forum?

Dave LaBianca (01:56)
I can definitely jump in on that one, Omkhar. I think it’s a great question. What we saw since, you know, ChatGPT becoming a big moment was a lot of new questions, queries, inbounds to a whole bunch of the founders of CoSAI surrounding, hey, how are you doing this securely? How do I do this securely? What are the lessons learned? How do I avoid the mistakes that you guys bumped into to get to your secure point today?

And as we all saw this groundswell of questions and need and desire for better insight, we saw a couple things really happening. One is we had an opportunity to really work towards democratizing the access to the information required, the intelligence required, the best practices required to secure AI. And then everybody can execute to their heart’s content at whatever level they’re able to, but it’s not about not knowing how to do it. So we knew that that was our goal. 

And then why another forum? It was because there’s amazing work going on in very precise domains. OpenSSF is an example, but also great work in Frontier Model Forum, in OWASP, in Cloud Security Alliance, on different aspects of what you do around AI security. And the gap we saw was, well, where’s the glue? Where’s the meta program that tells you how you use all these elements together? How do you address this if you’re an ENG director or a CTO looking at the risk to your company from the security elements of AI?

How do you approach tying together classical systems and AI systems when you’re thinking about secure software development, supply chain security? How do you build claims out of these things? So the intent here was, well, how do we make the ecosystem better by filling that gap, that meta gap, and then really working hand in hand with all of the different forums that are going to go deep in particular areas? And then wherever possible, you’ll figure out how we fill any other gaps we identify as we go along.

Omkhar Arasaratnam (04:00)
That’s great. Jay, Mihai, anything to add there?

Jay White (04:02)
Nothing to add, just a bit of a caveat. When David and I spoke way back early on in this year, I was extremely excited about it because as he said, what’s the glue that brings it all together? And you know, up to that point, Mihai and I had already started the AI/ML Security Working Group under the OpenSSF. We’re sitting here thinking about security from the standpoint of well, what’s happening now with these open large language models? How are we creating security apparatus around these models? How is that tying into the broader supply chain security apparatus? And what ways can we think about how to do that kind of stuff? 

And then of course, when I met David, I said, man, this is phenomenal. We are always talking about building these tentacles, right? The tentacles that spread out from the AI/ML security workgroup in the OpenSSF. How can we spread out across the other open source communities that are out there trying to tackle the same problem but from different angles? So, this the right moment, at the right time and we’re the right people to tackle it.

Omkhar Arasaratnam (05:01)
That’s a great summary, Jay. It takes a village for sure. Now we have two of the OpenSSF AI work group leads on the podcast today. So, I mean, how does this relate to the work that we’re doing there, guys? Sounds very complementary, but could you add more color?

Jay White (05:17)
The way that we think about this is well, let’s start with the data. Let’s start with the models and see how we can build some sort of guidance or guideline or spec around how we sign models and how we think about model transparency. And then of course, bringing on a SIG, the model signing SIG, which actually built code. We have an API, a working API that right now we’re taking the next steps towards trying to market. I’ll let Mihai talk about that a little bit further. As a look forward into this conversation, I sit in both CoSAI and AI/ML Security Working Group. So, when we get to that level of discussion, the tie-in is amazing. But Mihai, please talk about the technical stuff that we got going on.

Mihai Maruseac (06:03)
We have two main technical approaches that we have tackled so far in the working group and they are very related. So one is model signing and the other one is trying to get moving forward for some way of registering the supply chain for a model. So the provenance, the SLSA provenance are similar. And I’m saying that they are both related because in the end, in both of these, we need to identify a model by its hash, so we to compute the digest of the model. 

As we work for the model signing, we discover that just simple hashing it as a Blob on disk is going to be very bad because it’s going to take a lot of time. The model is large. So we are investigating different approaches to make hashing efficient. And they can be reused both in model signing and in provenances and any other statement that we can do about AI supply chain. They would all be based on the same hashing scheme for models.

Omkhar Arasaratnam (07:02)
And Mihai, maybe to drill into that a little bit for the folks listening in the audience. So, we can use various hashing techniques to prove provenance, but what does that solve? Why is it that we want provenance of our models? What does that allow us to better reason over?

Mihai Maruseac (07:21)
Yeah, so there are two main categories that we can solve with the hashing of a model. One is making sure that the model has not been tampered with between training and using it in production. We have seen cases where model hubs got compromised and so on. So we can detect all of these compromises before we load the model. The other scenario is trying to determine a path from the model that gets used into an application to the model that got trained or to the data sets that have been used for training. 

When you train a model or when you have a training pipeline, you don’t just take the model from the first training job and put it directly into the application. In general, there are multiple steps, fine-tuning the model, combining multiple models, or you might do quantization. You might transform a model from one format to another.

For example, right now, a lot of the people are moving from pickle formats to safe tensile formats. So each of these steps should be recorded. In case there is some compromise in the environment, you will be able to detect it via the provenance.

Omkhar Arasaratnam (08:27)
Got it. David, I know that your focus has been more on the leadership of CoSAI, but certainly, it’s not the first time you and I have spoken about OpenSSF. I’m curious as you look across the work at OpenSSF, if there’s other opportunities where we may be able to collaborate in CoSAI and how you see that collaboration evolving in the future.

Dave LaBianca (08:50)
I think it’s a great question. We have three in our real work streams. One of them is fundamentally around AI software supply chain, secure software development frameworks. The other two are preparing the defender and AI security governance. But the beginning, the inception of this conversation around that CoSAI wanted to do something in this AI secure supply chain space was conversations with Mihai and others at Google, and Jay, and realizing that there were actually lots of opportunities here. 

You know, one of them that was really straightforward from everybody was, hey, nobody’s really just looking for a provenance statement when you’re a CTO, CIO, director or the like. They want to claim there’s something they’re trying to prove to the outside world or at least state to the outside world. How you compose all of those elements together, especially when it’s not just a model, it’s your entire application set that you’re binding this to. 

It’s the way you did fine-tuning or the way you’re using large token prompts, pulling it all together and being able to make that claim. There needs to be guidance and best practices and things that you can start from so that you don’t have to figure this out all out yourself. So that was one key area. 

Another area was there’s really truly amazing efforts going on in OpenSSF in this element of the AI space on provenance. One of the things that we feel that a group like CoSAI can really help with is collaborating on what are those additional not technical bits of how you prove or create the provenance statement, but what are the other things that over time a practitioner would want to see in a provenance statement or be able to be attested to with a providence statement so that the provenance statement actually ties more closely to a claim in the future? You know, things like, hey, should we have to state what geography the training data was allowed for as part of a statement as you go forward? Things like that. 

So bringing that AI expertise, that ecosystem expertise around things that people want to do with these solutions. And then working with and collaborating with OpenSSF on what does that mean? How do you actually use that? Do you use that in a provenance statement? We see that that’s the type of amazing opportunity, especially because we have really wonderful overlap. Having Jay and Mihai as part of this and all of the other board members that are doing OpenSSF, we see this really great opportunity for collaboration. There’s always that possibility that teams bump heads on things, but like the idea that we’re all working towards the same mission, the same goal, it should be good challenges as we get to the weird edges of this in the future.

Omkhar Arasaratnam (11:13)
And that’s certainly one of the biggest benefits of open source that we can all collaborate. We all bring our points and perspectives to a particular problem area. So at this part of the show, we go through three rapid-fire questions. This is the first time we’ve had three guests on. So I’m going to iterate through each of y’all. Feel free to elaborate. As with any of these answers, I’m going to give you a couple choices. But a valid answer is no, Omkhar, actually, it’s choice number X and here’s why. Fair warning: the first question I’m gonna make judgment and the first question is spicy or mild food? Jay White, let’s begin with you.

Jay White (11:53)
You know what? Somewhere in the middle. I like some things spicy. I like some things mild. I like my salsa spicy, but I’m not a fan of spicy wings.

Omkhar Arasaratnam (12:02)
I mean that was a politically correct statement. Let’s see if Mihai has something a little more deterministic.

Mihai Maruseac (12:08)
I am kind of similar to Jay, except on the other side. I like the spicy wings, but the mild salsa.

Omkhar Arasaratnam (12:15)
Oh man, you guys, you guys, you guys need to run for office. You’re just trying to please the whole crowd. Dave from your days on Wall Street, I know you can be much more black and white with, are you a spicy guy or a mild guy? 

Dave LaBianca (12:29)
Always spicy. 

Omkhar Arasaratnam (12:30)
Always spicy. See, that’s why Dave and I are going to hang out and have dinner next week. All right. Moving into another controversial topic: text editors, Vim, VS Code, or Emacs? Let’s start with Mihai.

Mihai Maruseac (12:43)
I use Vim everywhere for no matter what I’m writing.

Dave LaBianca (12:45)
I mean, it’s the only right answer. I mean, there’s only one right answer in that list and Mihai just said it. So, I mean, like that’s easy.

Omkhar Arasaratnam (12:51)
Absolutely. How about you, Jay? What are you going to weigh in with? 

Jay White (12:54)
It’s Vim. It’s the only answer.

Omkhar Arasaratnam (12:59)
The last controversial question, and we’ll start with Mr. LaBianca for this. Tabs or spaces?

Dave LaBianca (13:09)
That we even have to have this argument is more of the fun of it. It’s got to be spaces. It’s got to be spaces. Otherwise somebody’s controlling your story with tabs. And like, I don’t want that. I want the flexibility. I want to know that I’m using three or I’m using four. It’s spaces.

Omkhar Arasaratnam (13:23)
There’s a statement about control embedded in there somewhere. Mihai, how about you?

Mihai Maruseac (13:28)
I prefer to configure a formatter and then just use what the formatter says.

Omkhar Arasaratnam (13:34)
Ahh, make the computer make things consistent. I like that. Jay?

Jay White (13:38)
I’m spaces. I’m with David. I had to use a typewriter early on in life.

Omkhar Arasaratnam (13:42)
Hahaha. I got it.

Dave LaBianca (13:47)
I have those same scars, Jay.

Jay White (13:48)
Hahaha. Yeah!

Omkhar Arasaratnam (13:52)
So the last part of our podcast is is a bit of a reflection and a call to action. So I’m going to give each of you two questions. The first question is going to be what advice do you have for somebody entering our field today? And the second question will be a call to action for our listeners. So Mihai, I’m going to start with you, then we’ll go to Jay and wrap up with Mr. Labianca. Mihai, what advice do you have for somebody entering our field today?

Mihai Maruseac (16:52.204)
So I think right now the field is evolving very very fast, but that shouldn’t be treated as a blocker or a panic reason. There is a firehouse of papers on archive, a firehouse of new model formats and so on, but most of them have the same basis and once you understand the basis it will be easier to understand the rest.

Omkhar Arasaratnam (14:35)
Thanks, Mihai. What’s your call to action for our listeners?

Mihai Maruseac (14:39)
I think the principle call to action would be to get involved into any of the forums that we are talking about AI and security. It doesn’t matter which one, start with one and then from there expand as time allows to all of the other ones.

Omkhar Arasaratnam (14:52)
Jay, what advice do you have for somebody entering our field today?

Jay White (14:56)
Fundamentals, fundamentals, fundamentals. I can’t stress it enough. Do not start running. Start crawling. Take a look at you know, what was old because what was old is what is new again, and the perspective of what is old is lost on today’s engineer for automation and what’s new. So, your perspective might be very welcomed, especially if you concentrate on the fundamentals. So, anyone coming in today, become a master of the fundamentals, easiest path in and that way your conversation will start there and then everyone else who’s a bit more advanced will plus you up immediately because respect to be given to those fundamentals.

Omkhar Arasaratnam (15:39)
Completely agree. Fundamentals are fundamentals for a reason. And what is your call to action for our listeners?

Jay White (15:46)
So, my call to action is going to be a little different. I’m going to tackle this making a statement for everyone but targeting underrepresented communities because I also co-chair the DE&I working group inside of OpenSSF as well. I feel like this is an excellent opportunity not just for me to tackle it from this standpoint in terms of the AI/ML security working group but also for CoSAI as well. Look, just walk into the walk into the room. I don’t care whether you sit as a fly on the wall for a couple of minutes or whether you open your mouth to speak. Get in the room, be seen in the room. If you don’t know anything say, hey guys, I’m here, I’m interested, I don’t know much, but I want to learn. And be open, be ready to drink from the firehose, be ready to roll up your sleeves and get started. And that’s for the people in the underrepresented community.

And for everyone I would say, I would generally say the same thing. These rooms are free. The game is free. This is free game we’re giving you. Come in and get this free game and hold us accountable. And the merging of information security in general, cybersecurity down into this great AI engine that’s spinning back up again. The merging of these worlds are colliding at such a rapid pace, it’s an excellent time to get in and not know anything. Because guess what? Nine times out of ten, the people in there with the most to talk about don’t know anything either. They’re just talking and talking and talking until they say something right. So, get in and be ready and open to receive.

Omkhar Arasaratnam (17:28)
Those are extremely welcoming words and that outreach is welcome and amazing. Wrapping things up, Mr. LaBianica, what advice do you have for somebody entering our field today?

Jay White (17:41)
So I think honestly, it’s gotta be leaning into where Jay was going. For me, foundational security is the way you don’t speed run the last 40 years of vulnerabilities in your product, right? And whether you like the academic approach of reading and you want to go read Ross Anderson’s Security Engineering, rest in peace, whether you want to find it yourself, but there is so much knowledge out there that’s hidden in silos. 

And this doesn’t just go for people who starting their career. Twenty years in, if you’ve never looked at what information warfare sites the house have found or what signals intelligence have found, like if you haven’t looked across the lines and seen all the other ways these systems go wrong, you’re still working from a disadvantage. So it’s that foundational element and learn the history of it. You don’t have to worry about some of that stuff anymore, but knowing how you got here and why it’s happening now is so critical to the story.

And then especially with AI, please, please don’t forget that, yes, there’s an ooh shiny of the AI and it’s got new security problems, but making sure you’ve prevented access to your host, that you’ve got really strong authorization between systems, that you know why you’re using it and what your data is. Like these things are fundamental, and then you can worry about the more serious newer threats in these domains. But if you don’t do that stuff, really, really hard to catch up.

Omkhar Arasaratnam (18:56)
Words of wisdom to completely agree. And your call to action for our listeners, Dave?

Dave LaBianca (19:02)
I’m gonna tie it together, both Jay and Mihai, because I think both stories are super important. CoSAI is based on this idea of we really wanna democratize security intelligence and knowledge around securing AI. You can’t do that when it’s a single voice in the room, when it’s just Google or just tech companies in the US or just take your pick on what that just is. So my call to action is one, our work streams are starting, please lean in or anybody’s workstreams. It doesn’t have to be CoSAI’s workstreams. Lean in, bring your voice to the table because we need the different viewpoints in the room. We need the 10th person in the room going, wait, but I need a cost-effective solution that works on this low-end device in this region. Nobody can fix that if we don’t make sure that folks are in the room. 

Yes, you have to hold us accountable to make sure we make that space as Jay was saying, but we then also need those voices in the room. And regardless of where you come from, we need that contribution and that community building because there’s no way you can ever pick up anything whether it’s from Microsoft or Anthropic or IBM, Intel or Google and then say that’s gonna work for me. You need those diverse inputs, especially on the security side, right? They’ll let you go, OK well my company thinks about it or my entity thinks about it this way and I need to then figure out how I Find solutions that build to it So I think, you know, get involved, bring your voice to the table and help us all see the things that we’re missing that we’re kind of blind to because of either, you know, we work at a big tech company or whatever.

Omkhar Arasaratnam (20:29)
Thanks so much, Dave. As we close out, I’d love a quick, how should folks that are interested get involved with CoSAI as well as the AI/ML workgroup at the OpenSSF? Maybe we’ll start with Mihai. Mihai, if somebody’s interested in getting involved at the OpenSSF AI ML workgroup, where do they go to? How do they start?

Mihai Maruseac (20:49)
So join a meeting and we can go from there. The meeting is on the OpenSSF calendar every other Monday. 

Omkhar Arasaratnam (20:55)
For anyone that’s looking for where to check that out go to openssf.org, and it’s all up there on the home page. Dave if folks want to get involved with CoSAI, per your generous invitation, where can they go to learn more and how can they get plugged into those work groups that are spinning up?

Dave LaBianca (21:11)
So first things first, go to one word coalitionforsecureai.org. That gets you your starting points around how to find us and how to see where we go. Look at our archives of our email lists. Look at our GitHub repo that shows what we’ve currently published around our governance and our controlling rules. And then in September, look out for the calls to action from our work streams around participation and the rest. And then it’ll be exactly as Mihai said. Join us. Come troll on a GitHub repo, whatever you want to do, but find a way to get engaged because we’d love to have you there.

Omkhar Arasaratnam (21:40)
Amazing. Jay, anything to add in terms of new folks looking to join either project as you span both?

Jay White (21:45)
I’m accessible all over the place. You can find me on LinkedIn. But find me, find Mihai, find David, and contact us directly. We are more than happy to usher you in and bring you in, especially once the workstreams spin up in the Coalition for Secure AI. But Mihai and I were there every other Monday at 10 A.M. Pacific Standard Time. 

Omkhar Arasaratnam (22:08)
All right, well, thanks so much, guys. Really appreciate you joining, and I look forward to y’all helping to secure and democratize secure AI for everyone.

David LaBianca (22:16)
Hey, Omkhar, thank you for having us.

Omkhar Arasaratnam (22:18)
It’s a pleasure.

Announcer (22:19)
Thank you for listening to What’s in the SOSS? An OpenSSF podcast. Be sure to subscribe to our series of conversations on Spotify, Apple, Amazon or wherever you get your podcasts. And to keep up to date on the Open Source Security Foundation community, join us online at OpenSSF.org/get involved. We’ll talk to you next time on What’s in the SOSS?