Tag

Open Source Security

What’s in the SOSS? Podcast #55 – S3E7 The Gemara Project: GRC Engineering Model for Automated Risk Assessment

By Podcast

Summary

Hannah Braswell and Jenn Power, Security Engineers from Red Hat and contributors to the OpenSSF, join host Sally Cooper to discuss the Gemara project. Gemara, an acronym for GRC Engineering Model for Automated Risk Assessment, is a seven-layer logical model that aims to solve the problem of incompatibility in the GRC (Governance, Risk, and Compliance) stack. By outlining a separation of concerns, the project seeks to enable engineers to build secure and compliant systems without needing to be compliance experts. The speakers explain how Gemara grew organically to seven layers and is leveraged by other open source initiatives like the OpenSSF Security Baseline and Finos Common Cloud Controls. They also touch on the ecosystem of tools being built, including Queue schemas and a Go SDK, and how new people can get involved.

Conversation Highlights

00:00 Welcome music + promo clip
00:22 Introductions
02:17 What is Gemara and what problem does it address?
03:58 Why do we need a model for GRC engineering?
05:50 The seven-layer structure of Gemara
07:40 How Gemara connects to other open source projects
10:14 Tools available to help with Gemara model adoption
11:39 How to get involved in the Gemara projects
13:59 Rapid Fire
16:03 Closing thoughts and call to action

Transcript

Sally Cooper (00:22)
Hello, hello, and welcome to What’s in the SOSS, where we talk to amazing people that make up the open source ecosystem. These are developers, security engineers, maintainers, researchers, and all manner of contributors that help make open source secure. I’m Sally, and today I have the pleasure of being joined by two fantastic security engineers from Red Hat. We have Hannah and Jenn.

Thank you both so much for joining me today and to get us started, can you tell us a little bit about yourselves and the work that you do at Red Hat? I’ll start with Jenn.

Jenn Power (00:58)
Sure. I am Jenn Power. I’m a principal product security engineer at Red Hat. My whole life is compliance automation, let’s say that. And outside of Red Hat, I participate in the OpenSSF Orbit Working Group, and I’m also a maintainer of the Gemara project.

Sally Cooper (01:18)
Amazing. Thank you, Jenn and Hannah. How about you? Hi.

Hannah Braswell (01:21)
Hey, Sally. Thanks for the nice introduction. I’m Hannah Braswell, and I’m an associate product security engineer at Red Hat. And I work with Jenn on the same team. And I primarily focus on compliance automation and enablement for compliance analysts to actually take advantage of that automation. Then within the OpenSSF, I’m involved in the Gemara project. I’m the community manager there. And then

I’m kind of a fly on the wall at a lot of the community meetings, whether it be the Gemara meeting or the orbit working group. I like to go to a lot of them.

Sally Cooper (02:01)
we love to hear that. I heard Orbit working group from both of you. That’s exciting. And I also really want to dive in to the project Gemara. So before we do dive into those details, let’s make sure that everyone’s starting from the same place. So for listeners who are hearing about Gemara for the first time, what is Gemara and what problem is it designed to address?

Jenn Power (02:23)
Sure, can start there. It’s actually secretly an acronym. So it stands for GRC Engineering Model for Automated Risk Assessment. So that’s kind of a mouthful, so we just shorten it to Gemara. And the official description I’ll give, and then I can go into it like a little bit more of a relatable example, is that it provides a logical model for describing categories of compliance activities, how they interact,

And it has schemas to enable automated interoperability between them. So like, what does that mean? I think a good, if we anchor this in an analogy, we could call Gemara like the OSI model for the GRC stack. In fact, that was one of the kind of primary inspirations for the categorical layers of Gemara. And Gemara also happens to have seven categorical layers, just like the OSI model.

So if you think about it in networking, if I want to send an email, I don’t have to understand like packet routing. I can just send my email. So in GRC, we can’t really do that today. We have security engineers that also have to be compliance experts to be successful. And so with Gemara, we want to outline the separation of concerns within the GRC stack to make sure that like each specialist can contain their complexity in their own layer while allowing them to exchange information with different specialists completing activities in different layers.

So like if I could give one takeaway, we want to make it so engineers can build secure and compliant systems without having to understand the nuance of every single compliance framework out

Sally Cooper (04:14)
I love that. So we have a baseline now. Let’s talk about the problem and get a little bit deeper into that. So Gemara is responding to a problem that you touched upon. Why do we need a model for GRC engineering and what incompatibility issue are you trying to solve? If you could go a little deeper.

Jenn Power (04:34)
Sure. So I think sharing resources in GRC is just really hard today. Sharing content, sharing tools, none of those tools and content, it doesn’t work together today, if I could say that. engineers are typically having to reinterpret security controls. They’re having to create a lot of glue code to make sure that a tool like a GRC system and a vulnerability scanner can actually talk to each other.

So we’re trying to solve that incompatibility issue on the technical side. But this is also a human problem. And I think that’s kind of the sneakiest part about it. A lot of times, we’re not even saying the same things when we use the same terms. And so that’s another thing that we’re trying to solve within the Gemara project.

This one comes up all the time. Take the word policy. If you say that to an engineer, you’re thinking immediately, policy as code, like a rego file or something you’re going to use with your policy engine. But if you’re talking to someone in the compliance space, they’re thinking like, this is like my 40 page document that outlines my organizational objectives. So we created definitions within the Gemara project to go along with the model to solve the human problem while we’re also trying to solve the technical problem.

Sally Cooper (06:05)
That’s interesting. Okay, I heard you say something about a seven-layer structure. Can you tell me why you chose a seven-layer structure for Gemara?

Jenn Power (06:17)
So this actually stemmed from an initiative under the CNCF called the Automated Governance Maturity Model. And that started as four concepts actually, policy, evaluation, enforcement, and audit. And that established the initial kind of lexicon that the project had been using.

And it initially got some adoption in the ecosystem, specifically in projects under the Linux Foundation, like FINOS Common Cloud Controls (CCC) and the Open Source Project Security Baseline (OSPS Baseline). And through the application of that lexicon, we found that there needed to be more granularity within that policy layer. So it expanded to two new layers called guidance and controls.

And I didn’t mention that we were creating a white paper yet, but we do have a white paper. And through the creation of that white paper, which Eddie Knight did so much work to create that initial draft there, we actually found that we were missing a layer. We had a seventh layer, and it was something that we had called sensitive activities. And it’s something kind of sandwiched in the middle of the Gemara layer. And so with that, we kind of organically grew to seven layers. So that I think is the kind of origin story on how the layers got to seven.

Sally Cooper (07:54)
I love that. And you’re really talking about how Gemara is not built in isolation, that you’re working with other open source projects. For example, you mentioned Baseline and the FINOS Common Cloud Controls. Can you tell me how Gemara connects to those projects?

Hannah Braswell (08:09)
Yeah. So in terms of Gemara connecting to the other open source projects, the first thing that comes to mind is really the CRA because of how prominent it is right now and just the future of its impact. And I really think that Gemara is going to be a catalyst for open source projects in general that are in need of some kind of mechanism to, you know, implement security controls and align with potential compliance requirements.

And the good thing about Gemara is that you don’t have to be a compliance expert to make sure that your open source project is secure. And so I would say that the OSPS Baseline is a great example of Gemara’s layer two, because it provides a set of security controls that engineers can actually implement. So in that case, other projects can reuse the baseline controls and then fit them to their needs.

And I think it also goes to say that, anyone that is actually building a tool they want to sell or distribute in the European Union market that’s using the open source components, they’re gonna have to think about what’s in scope and having something like the OSPS Baseline to understand how to effectively assess your open source components and their risks is really, really valuable and just gonna be super useful. And then in terms of the FINOS Common Cloud Controls, I think that’s

Also another great example, just in terms of the use case and implementation of Gemara, because they have their core catalog, which has its own definitions of threats and controls that’s then imported to their technology specific catalogs. And yeah, so that’s a great implementation within the financial sector.

And then where we’re trying to expand the ecosystem for Gemara, as in the Cloud Native Security Controls catalog refresh. And that’s actually an initiative that Jenn is leading. I’ve done a few contributions to it, but it’s essentially an effort to take the controls catalog that currently exists as a spreadsheet and make it available as a Gemara layer one machine readable guidance document. So Gemara is really connecting to projects that are all great to have on your radar, especially with the CRA coming up.

Sally Cooper (10:26)
Wow, that sounds great. But I’m just thinking about our listeners. They’re probably wondering, like, what does this look like in practice? And I’m curious if there are any tools available to help with the Gemara model adoption.

Jenn Power (10:39)
So we’re actually working on an ecosystem of tools. So we want to bridge that theory that we’re creating within the Gemara white paper to things that are actually implementable just to make sure that you don’t have to start from scratch if you’re trying to implement the Gemara model.

So we have a couple tools within the ecosystem. One would be our implementation of the model. We’re using queue schemas to allow users to create the models like in YAML, for instance, if you wanted to create your layer two, you would create YAML, you could use our queue schemas to validate that your document is in fact a Gemara compliant document. And then we’re also building SDKs. Right now we have a Go SDK, so you can build tooling around the programmatic access and manipulation of Gemara documents. A tool in the ecosystem that’s using this currently is a tool called Privateer that automates the layer five evaluations.

Sally Cooper (11:47)
Wow, that’s great. And of course, none of this works without the people. So I know you mentioned a few of them. How can new people get involved in the Gemara project?

Hannah Braswell (11:58)
So anyone that’s new and interested in getting involved in the Gemara project, my first piece of advice would just be to jump in a community meeting and listen in on what’s happening. I know I started out just by joining those meetings and I, you know, I didn’t necessarily have much to say, but I appreciated the culture and the open discussion, just like bouncing ideas back and forth off of one another.

And there’s also plenty of times when I joined a community meeting and still trying to understand the project if there was some kind of group opinion trying to be formed. Like I think it’s perfectly fine to say, you know, I don’t have the information right now. I don’t have an opinion. I’m still trying to learn about the project. But if something piques your interests and you want to contribute, then volunteer for it or show you’re interested because people are not going to forget about your willingness to step up and be part of the community.

But I started joining those meetings before we were rolling out the white paper. So that kind of brings me back to my first piece of advice. So I’d really suggest reading the white paper first, because it describes the problem and the trajectory of the project so well, and in a really clear way that I think is super important context for anyone that wants to start contributing. And I mean, from there, I mean, I’m the community manager, but I started with small contributions.

that ended up supporting the community in terms of documentation and some other aspects of the project I was excited about and that I could contribute to. So I really think the contributions are dependent on what you’re interested in. And even if there’s some difference in opinion and perspective or background, all of that can make a huge difference for the community and anything from documentation to code or discussion and collaboration will count as valid contribution and effort. So I’d say to anyone that’s wanting to join the Gemara community and start contributing, I think you should just find an area that truly interests you and makes you excited and get involved.

Sally Cooper (14:02)
Oh, that’s great. Well, thanks so much. And before we wrap, we’re going to do rapid, rapid fire. So I hope you’re ready because this is the fun part. No overthinking, no explanations, just the first instinct, okay, that comes to you. And I’m going to bounce. Yes, exactly. I’m going to bounce back and forth and ask you both some questions. Ready?

Jenn Power (14:17)
I’ll close my eyes then.

Sally Cooper (14:25)
Okay, Hannah, you’re up first. Star Wars or Star Trek?

Hannah Braswell (14:29)
Star Wars.

Sally Cooper (14:30)
Nice, I love it.
And Jenn, same question, Star Wars or Star Trek?

Jenn Power (14:335)
Star Wars.

Sally Cooper (14:36)
Okay, we’re all friends here.
Okay, back to Hannah, coffee or tea?

Hannah Braswell (14:42)
Definitely coffee.

Sally Cooper (14:43)
Yay, cheers. That’s solid.
Jenn, morning person or night owl?

Jenn Power (14:49)
Night Owl.

Sally Cooper (14:50)
Ohh that tracks. Hannah, beach vacation or mountains?

Hannah Braswell (14:56)
Hmm beach vacation.

Sally Cooper (14:58)
Nice choice. Jenn, books or movies?

Jenn Power (15:02)
Movies.

Sally Cooper (15:03)
Nice. All right, last round. Hannah, favorite open source mascot?

Hannah Braswell (15:08)
Oh…Zarf. I think that looks like an axolotl. I used to be obsessed with axolotls. And I mean, ever since I saw that, I was like, that’s the mascot.

Sally Cooper (15:18)
I love Zarf too. Cool. Okay. That’s a really strong pick.
Jenn, I’m going to give you the same question. Favorite open source mascot?

Jenn Power (15:26)
actually love the OpenSSF goose. I think it’s so cute.

Sally Cooper (15:30)
Teehee, Honk, he’s the best. Okay, let’s bring it home, Hannah, sweet or savory.

Hannah Braswell (15:38)
Savory.

Sally Cooper (15:39)
interesting. Okay, and Jenn? Spicy or mild?

Jenn Power (15:46)
mild. I can’t handle any spice. I’m a baby.

Sally Cooper (15:51)
love it. That’s amazing. Well, thank you both so much for playing along. And as we wind things down, do you have any other calls to action for our audience if someone’s listening, and they want to learn more or get involved? What is like the best next step for them?

Jenn Power (16:05)
I would say read the white paper. We are looking for feedback on it and that is really a way to understand the philosophy and the architectural goals of Gemara. And if you’re looking to just like, hey I want to learn GRC, that’s a good first step. So I think that’s what I would say.

Sally Cooper (16:28)
Fantastic. Hannah, Jenn, thank you so much for your time today and for the work you’re doing for the open source security community. We appreciate you both. And to everyone listening, happy open sourcing and that’s a wrap.

Gemara

Introducing the Gemara Model

By Blog, Guest Blog

By Eddie Knight, Hannah Braswell, and Jenn Power 

Software development has reached a point where traditional Governance, Risk, and Compliance (GRC) can no longer keep up. Compliance activities often exist only as a separate administrative layer, making it difficult for organizations to prove that security measures are in place long after the work is complete.

To tackle this problem head on, the industry has seen the rise of GRC Engineering and related topics such as policy-as-code or compliance-as-code. Yet, there have been massive alignment gaps pertaining to interoperability between tools, teams, and organizations. At the core, the industry suffers from split-brain attempts to cover related problems without standardizing on philosophies, language, or data schemas.

To enable a global standardization effort by beginning with philosophical alignment, we are excited to announce the publication of Gemara: A Governance, Risk, and Compliance Engineering Model for Automated Risk Assessment.

What’s Inside?

This model provides a structure designed to categorize compliance activities and define their functional interactions. These are activities which are inherent to governance and have existed in practice, but lacked a unified engineering architecture with predictable points of exchange. By decomposing these activities into discrete layers, the model facilitates standardized documentation, shared language, and creates a basis for collaborative maintenance of common resources.

The model stems from the CNCF’s Automated Governance Maturity Model. It also incorporates lessons from prior art, such as NIST’s OSCAL, the FINOS Common Cloud Controls project, and the OpenSSF’s Open Source Project Security Baseline.

Just as the OSI Model gave us a common language for networking, Gemara provides a seven-layer architecture, detailing separation of concerns for the GRC stack:

  • The Definition Layers (1-3): These layers define what “good security” actually looks like for an organization.
  • The Pivot Point (4): This is where policy requirements meet real-world operational activities.
  • The Measurement Layers (5-7): These cover the techniques used to evaluate, enforce, and audit how well you’re sticking to those security definitions.

This structure ensures every stakeholder (and tool) has a clear place in the system. For teams looking to treat GRC as an engineering discipline rather than a checklist, the Gemara model offers a practical way forward.

Join Us

The Gemara Project is an open source initiative stewarded by the OpenSSF with founding maintainers from Sonatype, Red Hat, and more.

  • Learn about the model [Link]
  • Explore the schemas and SDKs on available on GitHub [Link
  • Join the ORBIT Working Group [Link]
  • Explore OpenSSF Membership [Link]

About the Authors

Jenn Power is a Principal Product Security Engineer at Red Hat where she leads upstream collaboration and cross-industry initiatives centered on automated governance and security data standardization. She serves as a Tech Lead for CNCF TAG Security and Compliance, a member of the ORBIT Working Group, and a maintainer of the OpenSSF Gemara project.

 

Hannah Braswell is an Associate Product Security Engineer at Red Hat, where she focuses on compliance automation and developing enablement tooling for compliance analysts. With a B.S. in Computer Engineering from NC State University, she brings a deep background in microarchitecture and embedded systems to her work in the open-source ecosystem. Hannah currently serves as the Community Manager for the OpenSSF Gemara project, driving collaboration and security enablement across the community.

 

Eddie Knight is a Software and Cloud Engineer with a background in banking technology. When he isn’t playing with his 3-year-old son, he combines his passion and job duties by working to improve the security of open source software. Eddie currently helps lead several security and compliance initiatives across the CNCF, OpenSSF, and FINOS.

Case Study: Defending the Open Source Supply Chain in a New Regulatory Era

By Blog, Case Studies, EU Cyber Resilience Act

How Red Hat and OpenSSF are translating regulatory mandates into scalable open source community practices

Challenge

The European Union Cyber Resilience Act (CRA) introduces legally binding cybersecurity requirements for products with digital elements (including software) placed on the EU market. While designed to bolster digital safety, these requirements relied on standards historically shaped by proprietary software assumptions.

For Red Hat, whose products rely on thousands of upstream open source components, the risk was clear. If CRA standards failed to reflect the reality of how open source is built, the resulting compliance hurdles could increase cost and legal uncertainty for the enterprise while placing an unsustainable administrative burden on voluntary community maintainers.

As Red Hat Security Communities Lead Roman Zhukov, along with fellow Red Hatters from Product Security and Public Policy (Jaroslav Reznik, Pavel Hruza, and James Lovegrove), shared insights working on the CRA standards:

“Working on traditional industry standardization ‘behind closed doors’ started as a big challenge for us, upstream-minded people, who used to openly share and collaborate on all the work that we do. But that was important. Because if those standards didn’t reflect how open source actually works, there would be a real risk of imposing corporate-level liability on the community, because of persistent compliance pressure by enterprise adopters.” 

Solution

As a Premier Member of the OpenSSF, Red Hat transitioned from collaboration to leadership, engaging with the European Commission to advocate for a clear understanding of open source development methods and helping shape CRA standards, policy, and implementation guidance.

Through OpenSSF and direct participation in European standards bodies, Red Hat has helped advance open source development practices into CRA standards and technical guidelines, including: 

  • Hardened development lifecycles: Advancing expectations that respect community workflows
  • SBOM and Vulnerability handling: Streamlining how data is shared across the supply chain
  • Supply chain integrity: Promoting frameworks that can verify security without slowing innovation

Red Hat also championed OpenSSF frameworks as essential reference points for industry preparing for CRA compliance, including:

Together, these efforts provided regulators and manufacturers with practical, community-vetted guidance for implementing CRA requirements. This helps shift the responsibility back to manufacturers and stewards through consistent data discovery rather than placing the burden of evidence upon voluntary communities.

Red Hat’s Portfolio Security Architect Emily Fox expanded on her thoughts regarding stewardship and shared responsibility under the CRA:

“True stewardship shields open source creators from legislative burden. We don’t ask maintainers to become commercial suppliers; we step in to absorb the complexity, turning commercial compliance mandates into engagement opportunities that drive real security for everyone.”

Results

Red Hat’s leadership within OpenSSF helped deliver ecosystem-wide impact:

  • Standardization Alignment: State-of-the-art secure development practices were incorporated into EU CRA technical guidelines
  • Framework Recognition: The OpenSSF Security Baseline and SLSA are now recognized as reference frameworks for development
  • Reduced Friction: Lowered compliance barriers across thousands of upstream open source components
  • Increased Confidence: Bolstered regulator and enterprise trust in open source maturity

Why This Matters

Open source software underpins 90% of modern technology stacks. By leading through OpenSSF, Red Hat helped the CRA reinforce shared responsibility and practical security improvements rather than shifting administrative weight onto open source maintainers.

Learn More

About

Roman Zhukov is a cybersecurity expert, engineer, and leader with over 17 years of hands-on experience securing complex systems and software products at scale. At Red Hat, Roman leads open source security strategy, upstream collaboration, and cross-industry initiatives focused on building trusted ecosystems. He is an active contributor to open source security and co-chair of the OpenSSF Global Cyber Policy WG.

 

Emily Fox is a visionary security leader whose sustained contributions have profoundly shaped both internal company strategy and the broader open source industry. With over 15 years of experience, she has consistently operated at the intersection of deep technical expertise and strategic leadership, driving critical initiatives in cloud native security, software supply chain integrity, post-quantum cryptography, and zero trust architecture at top-tier organizations including Red Hat, Apple, and the National Security Agency. Her career is marked by a rare ability to not only architect complex, cutting-edge solutions but also to lead global communities, influence industry standards, and mentor the next generation of technologists.

Getting an OpenSSF Baseline Badge with the Best Practices Badge System

By Blog

By David A. Wheeler

Many open source software (OSS) projects aim to securely develop software and have an easy way to communicate their security posture to others.

Overview

The OpenSSF developed the Open Source Project Security Baseline (OSPS Baseline) to act as a “minimum definition of requirements for a project relative to its maturity level”. It’s a three-level checklist (baseline-1 through baseline-3) to help OSS projects improve their security. The OpenSSF Best Practices Badge Program now supports the baseline criteria, making it easier for OSS projects to determine what they’ve already accomplished and what remains. OSS projects can then display their badge on their web pages, demonstrating what they’ve accomplished and making it easy for potential users to learn more.

This post introduces how to earn an OpenSSF baseline badge through the OpenSSF Best Practices Badge System.

Getting Started with the Best Practices Badge Program

First, visit https://www.bestpractices.dev. The site currently supports nine locales, and this URL automatically redirects you to your preferred language (e.g., https://www.bestpractices.dev/en for English).

Click on “Login” to add information. You can use your GitHub account to log in. Most users prefer this method. You must grant permission during your first visit. You can also create an account specifically for the site.

Click on the “Projects” tab to see projects currently pursuing badges, then click either the “+ Add” tab or the “Add New Project” button. The “New badge” form allows you to enter your project’s repository URL and/or home page URL. You can also decide whether to begin with the “metal” series or the “baseline” series. The baseline series is a focused checklist that includes only MUST security requirements and draws in part from global cybersecurity regulations and frameworks. The metal series is a larger set of criteria that includes suggestions and quality issues impacting security derived in part from the experiences of secure Free/Libre and Open Source Software (FLOSS) projects. Both focus on security, and we encourage projects to eventually complete both; simply choose a starting point. For the purposes of this blog post, we’ll assume you chose the “baseline” series.

When you click on “Submit Project”, the system assigns a unique numeric ID to the project. The system will pause to examine the repository and attempt to automatically determine the answers to various questions. For many, this automation can save a lot of time. Once that’s done, you’ll see a form to update project information. Information highlighted in yellow with the robot symbol 🤖 indicates data entered by automation. We recommend double-checking automation results for accuracy.

Understanding and Completing the Baseline Criteria

You can now fill in the form. Each criterion can be “?” (unknown), “N/A” (not applicable), Unmet, or Met. By default, each is marked “?” (unknown). As you identify more and more items that are Met (or N/A), the % completion bar will increase. We’ve intentionally gamified this; when you reach 100% in baseline-1, you’ve earned a baseline-1 badge. You can also provide justification text; we recommend including it (even when it’s not required) to help others understand the project’s current status. Badge claims are mostly self-assertions. In some cases, automation can override false claims. The answers given are presented for public scrutiny, incentivizing correct answers.

The form shows the criterion requirements; click “show details” for more information. For example, baseline-1 criterion OSPS-AC-01.01 requires that, “When a user attempts to read or modify a sensitive resource in the project’s authoritative repository, the system MUST require the user to complete a multi-factor authentication process.” Any project hosted on GitHub automatically meets this requirement. GitHub has required multi-factor authentication since March 2023, and the system automatically fills in the required information. Not all projects are hosted on GitHub. Those projects must ensure they meet this criterion.

When you’re done, you can select “Save and Continue” or “Save and Exit” to save your work to the website. The “Save and Continue” option not only lets you continue, but also reruns automations to fill in currently unknown information.

The Best Practices Badge site currently supports version v2025.10.10, but it will soon integrate the recently released v2026.02.19. New requirements wil initially appear as “future” criteria, allowing maintainers to address updates without losing their current badge status. There is no reason to wait; projects should begin the process now, as the system will provide ample time to adapt to new criteria.

Displaying Your Baseline Badge

Once you’ve met the baseline-1 criteria, you can add some code to your repository to show off your badge. The site shows the code to add, and it follows the usual badge conventions. For example, in Markdown you would add this:

[![OpenSSF Baseline](https://www.bestpractices.dev/projects/ID/baseline)]
(https://www.bestpractices.dev/projects/ID)

If you’ve earned the baseline-1 badge, this Markdown code would show an image like this:

Advanced Integrations and Automation Options

We support various mechanisms to rapidly get information in and out of the badge system (replace “ID” with the project’s numerical ID), for example:

  • Project’s information (JSON): https://www.bestpractices.dev/projects/ID.json
  • Project’s baseline badge (SVG) https://www.bestpractices.dev/projects/ID/baseline
  • Proposed edit values: https://www.bestpractices.dev/projects/ID/SECTION/edit?PROPOSALS where PROPOSALS is &-separated key=value pairs. This highlights those proposals with a robot icon, so you can review them before accepting them. For example, in section “baseline-1” you can use the proposal “osps_ac_01_01_status=met” to propose setting the status of OSPS-AC-01.01 to “Met”. For more information, see the documentation on automation proposals.

You can also include a “.bestpractices.json” file in the repository that contains proposed values for a badge. If present, these values will be treated as automation results and highlighted during editing so users can decide whether or not to accept them. The .bestpractices.json documentation provides more details.

Why the Baseline Badge Matters

Our goal is to help OSS projects identify next steps to improve security and provide clear guidance. These capabilities help projects demonstrate measurable progress.

If you maintain an OSS project, visit https://www.bestpractices.dev and start working on a badge. If you use OSS, support those projects on which you depend as they strengthen their security practices.

About the Author

Dr. David A. Wheeler is an expert on developing secure software and on open source software.  He created the Open Source Security Foundation (OpenSSF) courses “Developing Secure Software” (LFD121) and “Understanding the EU Cyber Resilience Act (CRA)” (LFEL1001), and is completing creation of the OpenSSF course “Secure AI/ML-Driven Software Development” (LFEL1012).  His other contributions include “Fully Countering Trusting Trust through Diverse Double-Compiling (DDC)”. He is the Director of Open Source Supply Chain Security at the Linux Foundation and teaches a graduate course in developing secure software at George Mason University (GMU).

What’s in the SOSS? Podcast #53 – S3E5 AIxCC Part 3 – Buttercup’s Hybrid Approach: Trail of Bits’ Journey to Second Place in AIxCC

By Podcast

Summary

In the third episode of our AI Cyber Challenge (AIxCC) series, CRob sits down with Michael Brown, Principal Security Engineer at Trail of Bits, to discuss their runner-up cybersecurity reasoning system, Buttercup. Michael shares how their team took a hybrid approach – combining large language models with conventional software analysis tools like fuzzers – to create a system that exceeded even their own expectations. Learn how Trail of Bits made Buttercup fully open source and accessible to run on a laptop, their commitment to ongoing maintenance with prize winnings, and why they believe AI works best when applied to small, focused problems rather than trying to solve everything at once.

This episode is part 3 of a four-part series on AIxCC:

Conversation Highlights

00:04 – Introduction & Welcome
00:12 – About Trail of Bits & Open Source Commitment
03:16 – Buttercup: Second Place in AIxCC
04:20 – The Hybrid Approach Strategy
06:45 – From Skeptic to Believer
09:28 – Surprises & Vindication During Competition
11:36 – Multi-Agent Patching Success
14:46 – Post-Competition Plans
15:26 – Making Buttercup Run on a Laptop
18:22 – The Giant Check & DEF CON
18:59 – How to Access Buttercup on GitHub
21:37 – Enterprise Deployment & Community Support
22:23 – Closing Remarks

Transcript

CRob (00:04.328)
And next up, we’re talking to Michael Brown from Trail of Bits. Michael, welcome to What’s in the SOSS.

Michael Brown (ToB) (00:10.688)
Hey, thanks for having me. I appreciate being here.

CRob (00:12.7)
We love having you. So maybe could you describe a little bit about your organization you’re coming from, Trail of Bits, and maybe share a little insight into what your open source origin story is.

Michael Brown (ToB) (00:23.756)
Yeah, sure. So Trail of Bits is a small business. We’re a security R &D firm. We’ve been in existence since about 2012. I’ve personally been with the company about four years plus. I work there within our research and engineering department. I’m a principal security engineer, and I also lead up our AIML security research team. So Trail of Bits, we do quite a bit of government research. We also work for commercial clients.

And one of the common threads in all of the work that we do, not just government, not just commercial, is that we try to make it as public as much as we possibly can allow. So for example, sometimes, you know, we work on sensitive research programs for the government and they don’t let us make it public. Sometimes our commercial clients don’t want to publicize the results of every security audit, but to the maximum extent that our clients allow us to, we make our tools, we make our findings, we make them open source. And we’re really big believers in

that the work that we do should be a rising tide that raises all ships when it comes to the security posture for the critical infrastructure that we all depend upon, whether we’re working on hobbies at home and whether we’re building things for large organizations, all that stuff.

CRob (01:37.32)
love it. And how did you get into open source?

Michael Brown (ToB) (01:42.146)
Honestly, I’ve just kind of always have been there. So realistically, you know, the open source community is where a lot of the research tools that I started out my research career. That’s where you found them. So I started off a bit in academia. I got my undergrad in computer science and then went and did something completely different for eight years. And then when I kind of

Uh, you know, for context, I joined the military. flew helicopters for like eight years and basically nothing in computing. But as I was starting to get out, um, the army, I, know, I was getting married, about to have kids. I kind of decided I wanted to be, you know, around the house a little bit more often. um, you know, I started getting a master’s degree at Georgia Tech. They’re offering it online. And then after I did that, I went to go, um, do a PhD there and also work for their, um, uh, their applied research arm, Georgia Tech Research Institute.

So lot of the work that I was doing was, you know, cutting edge work on software analysis, compilers and AI ML. And a lot of the stuff that I, you know, built the tools that I did my research on, they came from the open source community. They were tools that were open sourced as part of the publication process for academic work. They were made publicly and available open source by companies like Trail of Bits before I came to work with them as the result of government research projects.

So, honestly, I guess I don’t really have much of an origin story for when I got there. I kind of just landed there when I started my career in security research and just stayed.

CRob (03:16.814)
Everybody has a different journey that gets us here. And interestingly enough, you mentioned our friends at Georgia Tech, which was a peer competitor of yours in the AICC competition, which you all on Trail of Bits team, I believe your project was called Buttercup. And you came in second place. You had some amazing results with your work. So maybe could you tell us a little bit about the…

Michael Brown (ToB) (03:33.741)
Yeah, that’s correct.

CRob (03:43.15)
What you did is part of the AI CC competition and kind of how your team approached this.

Michael Brown (ToB) (03:51.022)
Yeah. So, um, you know, at the risk of sounding a bit like a hipster, um, I’ve been working at the intersection of software security, compiler, software analysis, AI ML for, you know, basically, um, almost my entire, uh, career as a research scientist. So, you know, dating back to the earliest program I worked on, uh, for, DARPA was back in 2019. And, um, so this was, this was before the large language model was a predominant form of the technology or kind of became synonymous with AI. So.

CRob (04:04.719)
Mm.

Michael Brown (ToB) (04:20.792)
For a long time, I’ve been working and trying to understand how we can apply techniques from AI ML modeling to security problems and doing the problem formulation, make sure that we’re applying that in an intelligent way where we’re going to get good solid results that actually generalize and scale. So as the large language model came out and we started recognizing that certain problems within the security domain are good for large language models, but a lot of them aren’t.

When the AI cyber challenge came around, we always approached this and this, I was the lead designer, my co-designer, Ian Smith. And I, you know, when we sat down and make the, the original concept for what became Buttercup, we always took an approach where we were going to use the best problem solving technique for the sub problem at hand. So when we approached this giant elephant of a problem, we did what you do and you have an elephant and you’ve got to eat it, eat it one bite at a time.

So each bite we took a look at it and said, okay, you we have like these five or six things that we have to do really, really well to win this competition. What’s the best way to solve each of these five or six things? And then the rest of it became an engineering challenge to chain them together. Our approach is very much a hybrid approach. This was a similar approach taken by the first place winners at Georgia Tech, which by the way, if you’ve got to be beat by anybody being beat by your alma mater, it takes a little bit of this thing out of it. So, you know, we came in first and second place. It’s funny, I actually have another Georgia Tech PhD alumni.

CRob (05:33.832)
You

Michael Brown (ToB) (05:42.926)
on my team who worked on Buttercup. So Georgia Tech is very well represented in the AI cyber challenge. So yeah, we’ve always had a hybrid approach. The winning team had a hybrid approach. So we used AI where it was useful. We used conventional software analysis techniques where they were useful. And we put together something that ultimately performed really, really well and exceeded even my expectations.

CRob (05:45.458)
That’s awesome.

CRob (06:07.56)
I can say I mentioned in previous talks, I was initially skeptical about the value that could have been derived from this type of work. But the results that you and the other competitors delivered were absolutely stunning. You have converted me into a believer now that I think AI absolutely has a very positive role can play both in the research, but also kind of the vulnerability and operations management space. What

Looking at your buttercup. What is unique about your approach with the cyber reasoning system with the buttercup?

Michael Brown (ToB) (06:45.39)
Yeah, so it’s funny you say that we converted you. I kind of had to convert myself along the way. There was a time in this competition where I thought, you know, this whole thing was going to kind of be reliant on AI too much and was going to fall on its face. And then, you know, at that point, I’d be able to say like, see, I told you, you can’t use LLMs for everything. But then it turns out, you know, as we got through there, we use LLMs for two critical areas and they worked much better than I thought they would. I thought they would work pretty well, but they ended up working to a much better degree than I thought they actually would. you know, what makes Buttercup unique?

CRob (06:49.852)
Yeah.

CRob (07:00.678)
You

Michael Brown (ToB) (07:15.69)
is that, like I said, we take a hybrid approach. We use AIML for very good problems that are well-suited for AIML. And what I mean by that is when we employ large language models, we use them on small subproblems for which we have a lot of context. We have tools that we can install for the large language model to use to ensure that it creates valid outputs and outputs that can carry on to the next stage with a high degree of confidence that they’re correct.

CRob (07:30.076)
Mm-hmm.

CRob (07:43.912)
Mm-hmm.

Michael Brown (ToB) (07:45.934)
And then in the places where we have to create or sorry, in one of the places where we have to use conventional software analysis tools, those areas are very amenable to the conventional analysis. So, you what I mean by this? good example is, for example, we needed to produce a proof of vulnerability. We have to have a crashing test case to show that when we claim a vulnerability exists in a system, we can prove through reproduction that it actually exists. Large language models aren’t great.

at finding these crashing test cases just by asking it to look at the code and say, hey, what’s going to crash this? They don’t do very well at that. They also don’t do well at generating an input that will even get you to a particular point in a program. But fuzzers do a great job of this. So we use the fuzzer to do this. But one of the things about fuzzers is they kind of take a long time. They’re also more generally aimed at finding bugs, not necessarily vulnerabilities.

CRob (08:36.808)
Mm-hmm.

Michael Brown (ToB) (08:42.702)
So we use an AIML, or Large Language Model based accelerator, or a C generator, to help us generate inputs that were going to guide the fuzzer to either saturate the fuzzing harnesses that existed for these programs more quickly. They would help us find and shake loose more crashing inputs that correspond to vulnerabilities as opposed to bugs. And those things really, really helped us deal with some of the short analysis and short.

processing windows that we encountered in the AI cyber challenge. So was really a matter of using conventional tools, but making them work better with AI or using AI for problems like generating software patches for which there really aren’t great conventional software analysis tools to do that.

CRob (09:28.018)
So as you were going through the competition, which went through multiple rounds, are there anything that surprised you or that you learned that, again, you said your opinion changed on using AI? What were maybe some of the moments that generated that?

Michael Brown (ToB) (09:45.226)
Yeah, so there I mean there were a couple of them. I’ll start with one where I can pat myself on the back and I’ll finish with one where I was kind of surprised. So first, we had a couple of moments that were really kind of vindicating as we went through this. Our opinion going into this was that large language models, couldn’t just throw the whole problem at it and expect it to be successful. So going into this, there was a lot of things that we did that we did

CRob (09:49.405)
hehe

Michael Brown (ToB) (10:14.774)
two years ago when we first started out that, you know, that’s like two years ago, it’s like five lifetimes when it comes to the development of AI systems now. So there were some things that we did that didn’t exist before that became industry standard by the time we finished the competition. So things like putting your LLM queries or your LLM prompts in a workflow that includes like validation with tools or the ability to use tools.

CRob (10:29.298)
Mm-hmm.

Michael Brown (ToB) (10:43.062)
That was something that was not mainstream when we first started out, but that was something that we kind of built custom into Buttercup when it came particularly to patching. And then also using a multi-agent approach. know, a lot of the, you know, I don’t know, a lot of the hype around AI is that, know, you just ask it anything and it gives you the answer. You know, we’re asking a lot of AI when we say, here’s a program, tell me what vulnerabilities exist, prove they exist, and then fix them for me.

And also don’t make a mistake anywhere along the way. It’s way too much to ask. we found particularly with patching, were, back then, multi-agent systems or even agentic systems or multi-agentic systems were unheard of. were still just, we were still using chat GPT 3.5, still very much like chatbot interactions, web browser interactions.

CRob (11:16.564)
Yeah.

Michael Brown (ToB) (11:36.438)
integration into tools was certainly less widespread. So we had seen some very early work on archive about solving complex problems with multiple agents. So breaking the problem down for it. And we used this and our patcher ended up being incredibly good. was our most important and our biggest success on the project. Really want to shout out Ricardo Charon, who’s our leader, the lead developer for our patching agent.

CRob (11:47.976)
Mm-hmm.

Michael Brown (ToB) (12:06.414)
or for a patching system within for both the semi finals and finals in the ICC. He did an incredible job and we really built something that I, like I said, I regard as our biggest success. So, know, sure enough, and as we go through this two year competition, now all of a sudden, you know, multi agentic systems, multi agentic tool enabled systems, they’re all the rage. This is how we’re solving these challenging problems. And also a lot of this problem breakdown stuff that has made its way baked into the models now, the newer thinking and reasoning models from

Anthropic and open AI respectively. They you you can give it these large complicated problems and will do it will first try to break it down before trying to solve it. So you know we were building all that stuff into our system before. It came about so that that’s an area where you know, like I said, we learned along the way that we had the right approach from the beginning and it’s really easy to go back and say that that’s what we learned that we were right. So on the other side of this I will have to say, you know, I’ll reiterate I was really surprised at how well.

CRob (12:53.639)
Mm-hmm.

Michael Brown (ToB) (13:04.11)
language models were able to do some of the tasks we asked it to do. Part of it’s how we approach the problem. We didn’t ask too much of it. And I think that’s part of the reason why the large language models were successful. an area that I thought where it was going to be much more challenging was patching. But it turned out to be an area where, a certain degree, this is kind of an easier version of the problem in general because open source software, which are the targets of the AI cyber challenge, they’re ingested into the training.

CRob (13:08.924)
Mm.

Michael Brown (ToB) (13:31.404)
data for all of these large language models. the models do have some a priori familiarity with the targets. So when we give it a chunk of vulnerable code from a given program, it’s not the first time it’s seen the code. But still, they did an amazing job actually generating useful patches. The patch rate that I expected personally to see was much lower than the actual patch rate that we had, both in the semifinals and in the finals. So even in that first year window,

CRob (13:33.64)
Mm.

Michael Brown (ToB) (13:58.63)
I was really kind of blown away with how well the models were doing at code generating, code generation tasks, particularly small focused code generation tasks. So I think, think large language models are kind of getting a bad rap right now when it comes to like, you know, trying to vibe code entire applications. They’re like, gosh, this, this code is slop. It’s terrible. It’s full of bugs and stuff. Well, well, you did also ask you to build the whole thing. You know, if I asked a junior developer to build a whole thing, they probably also put together some.

CRob (14:07.366)
Yeah.

CRob (14:17.233)
Yeah.

CRob (14:26.258)
Yeah.

Michael Brown (ToB) (14:26.71)
and gross stuff. But when I ask a junior developer to give me a bug fix, much like the large language model, when I ask it for a more constrained version of problem, they tend to do a better job because there’s just fewer moving parts. So yeah, those are are two things I took away. One that, you know, like I said, I get to pat myself on the back for another that was actually surprising.

CRob (14:46.012)
That’s awesome. That’s amazing. So now that the competition is over and what does the team plan to do next beyond this competition?

Michael Brown (ToB) (14:57.098)
Yeah, so I mean, look, we spent a lot of our time over the last two years. A lot of I wouldn’t quite say blood. I don’t think anyone would bled over this, but we certainly had some tears. We certainly had a lot of anxiety. you know, we put a lot of we put a lot of ourselves into Buttercup. And so, you we want people to use it. So to that end, Buttercup is fully available and fully open source. know, DARPA made it a contingent, a contingency of participating in the competition that

CRob (15:09.917)
Mm-hmm.

Michael Brown (ToB) (15:26.892)
you had to make the code that you submitted to the semi finals and the finals open source. So we did that along with all of our other competitors, but we actually took it one step further. So the code that we submitted to the finals is great. It’s awesome, but it runs that scale. It used $40,000 of a $130,000, I think, total budget. And it ran across like an Azure subscription that had multiple nodes.

countless replicated containers. This is not something that everyone can use. We want everyone to use it. So actually in the month after we submitted our final version of the CRS, but before DEF CON occurred, where we figured out that we won, we spent a month making a version of Buttercup that’s decoupled from DARPA’s competition infrastructure. So it runs entirely standalone on its own, but more importantly, we scaled it down so it’ll run on a laptop.

CRob (16:18.696)
Mm-hmm.

Michael Brown (ToB) (16:25.154)
We left all of the hooks. We left all of the infrastructure to scale it back up if you want. So the idea now is that if you go to trail of bits.com slash buttercup, you can learn about the tool. have links to our GitHub repositories where it’s available and you can go, you can go download buttercup on your laptop and run it right now. And if you’ve got an API key, that’ll let you spend a hundred dollars. We can run a demo to show you that, we can find and patch a vulnerability and live.

CRob (16:51.496)
That’s easy.

Michael Brown (ToB) (16:53.164)
Yeah, so anyone can do this right now. So if you’re an organization that wants to use Buttercup, you can also use the hooks that we left back in to scale it up to the size of your organization, the budget that you have, and you run it at scale on your own software targets. So even users beyond the open source community, we want this to be used on closed source code too. So yeah, our goal is for, you you asked what we’re gonna do with it afterward. We made it open source, we want people to use it.

And even on top of that, we don’t want it to bit rot. So we actually are going to retain a pretty significant portion of our winnings of our $3 million prize. We’re going to retain a pretty significant portion of that. And we’re going to use it for ongoing maintenance. So we’re maintaining it. We’ve had people submit PRs that we’ve accepted. They’re tiny, know, it’s only been out for like a month, but you know, and then we’ve also made quite a few updates to the public version of Buttercup afterwards. So it’s actively maintained.

There’s money from the company’s putting its money where its mouth is. We’re actively maintaining it. The people who built it are part of the people who are maintaining it. We are taking contributions from the community. We hope they help us maintain it as well. And yeah, we’ve made it so anyone can use it. I think we’ve taken it about as far as we can possibly go in terms of reducing the barriers to adoption to the absolute minimum level for people to use Buttercup and leverage AI to help them find and patch vulnerabilities at scale.

CRob (18:16.716)
I love that approach. Thank you for doing that. How did you fit the giant check through the teller window?

Michael Brown (ToB) (18:22.574)
Fortunately, that check was a novelty and we did not actually have a larger problem than the AICC itself to solve afterward, which was getting paid. So yeah, we did have the comically large check, you know, taped up in our booth at the AICC village at DEF CON and it certainly attracted quite a few photographs from passersby.

CRob (18:26.716)
Ha ha ha!

CRob (18:31.964)
Yeah.

CRob (18:37.864)
Mm-hmm.

Michael Brown (ToB) (18:47.736)
I don’t know. think if you get on like social media or whatever and you look up AICC, if you see anything, there’s probably lots of pictures of me throwing a big smile up and you two thumbs up underneath the check that random people took.

CRob (18:59.464)
So you mentioned that Buttercup is all open source now. So if someone was interested in checking it out or possibly even contributing, where would they go do that?

Michael Brown (ToB) (19:07.564)
Yeah, so we have a GitHub organization. We’re Trail of Bits. You can find Buttercup there. You can also find our public archives of the old versions of Buttercup. So if you’re interested in what, like, the code that actually won the competitions, you can see what got us from the semifinals to the finals. You can see what won us second place in the finals. And you can also download and use the version that’s actively maintained that’ll run on your laptop. And all three of them are there. Their repository name is just Buttercup.

We are not the only people who love the princess bride. So there are other repositories named butter. Yeah. There are other, there are other repositories named butter cup on GitHub. So you might have to sift it a little bit, but yeah, basically it github.com slash trailer bits slash butter cup, I think is like 85 % of the URL there. don’t have it memorized, but, but yeah, you can find it publicly available and, along with a lot of other tools that, trailer bits has made over the years. So we encourage you to check some of those out as well. A lot of those are still actively maintained.

CRob (19:39.036)
That’s what it was.

Michael Brown (ToB) (20:03.72)
have a lot of community support. believe it or not at last, I counted like something like 1250 stars. buttercup is only like our fifth most popular tool that, that trail of bits has created. So, you know, we, we were quite, we were quite notable for creating some binary lifting tools that are up there. we have also some other tools that we’ve created recently for, parser security, analysis like that, like graftage.

And then also some kind of more conventional security tools like algo VPN, like still rank above buttercup. So as awesome as buttercup is, it’s like, it’s like only the fifth coolest tool that we’ve made as voted on by the community. So check out the other stuff while you’re there too. believe it or not, buttercup isn’t, isn’t, isn’t our most popular, offering.

CRob (20:51.56)
Pretty awesome statement to be able to say. That’s only our fifth most important tool.

Michael Brown (ToB) (20:53.966)
Yeah.

Michael Brown (ToB) (20:58.444)
I don’t know, you know, like personally, I’m kind of hoping that maybe we move up a few notches after people get time to like go find it and, you know, and start it. But, you know, we, we’ve, we’ve made some other really significant and really awesome contributions to the community, even outside of the AI cyber challenge. So I want to really, really stress all of that stuff is open source. We, you know, we, we aren’t just doing this because we have to, we actually care about the open source community. We want to secure the software infrastructure. We want people to use the tool and secure the software for, you know, before they, before they, you know,

Get it out there so that you we we we tackle this like kind of untackable problem of securing this massive ecosystem of code.

CRob (21:37.606)
Michael, thank you to Trey Labitz and your whole team for all the work you do, including the competition runner-up Buttercup, which did an amazing job by itself. Thank you for all your work, and thank you for joining us today.

Michael Brown (ToB) (21:52.802)
Yeah, thanks for having me. You one last thing to shout out there. If you’re an organization, you’re looking to employ Buttercup within your organization. Don’t be bashful about reaching out to us and asking about use cases for deploying within your organization. We know we’re happy to help out there. That’s probably an area that we focus on a little bit less in terms of getting this out the door for average folks to use or for individuals to use. So we’re definitely interested in helping to make sure Buttercup gets used.

Like I said, reach out to us, talk to us if you’re interested in Buttercup, we want to hear.

CRob (22:23.44)
Love it. All right. Have a great day.

Michael Brown (ToB) (22:25.678)
All right, thanks a lot.