Tag

PyPi

Detecting Malicious Packages using the OSV API

By Blog, Guest Blog

By Nigel Douglas

By now a bunch of people in the OpenSSF community might already be aware of the Malicious Packages repository, but are you using it as part of your day-to-day software supply chain security?

The OpenSSF Malicious Packages repo is the first open source system for collecting and publishing cross-ecosystem reports of malicious packages – such as dependency and manifest confusion attacks, typosquatting, offensive security tooling, protestware and more.

In the past months we have seen a rise in targeted attacks on open source upstream registries like npm and PyPI – most notably Axios and LiteLLM. These compromised, misleading or outright malicious open source software packages are the focus for this project. A centralised source-of-truth repository for shared intelligence helps the open source community understand the complete range of threats, but ultimately to prevent developers consuming software dependencies that are essentially just backdoors in your codebase.

The reports in the Malicious Packages repo use the Open Source Vulnerability (OSV) format. OSV was, as the name suggests, originally created for classifying open source software packages in JSON-formatted output for known vulnerabilities, fix availability and other security advisory information. By using the OSV format for malicious packages it is possible to make use of existing integrations, including the OSV.dev API, the osv-scanner tool, deps.dev, and build your own tools on top of these open source data sources.

Getting up and running with the API

A good place to start is understanding how malicious packages or malware is classified in OSV. Similar to how vulnerabilities start with “CVE-” (ie: CVE-2025-3248), malicious packages start with “MAL-” (ie: MAL-2025-6812). You can simply curl the existing vulns endpoint for api.osv.dev, but instead of using a CVE ID, use the Malicious Packages ID. 

curl -s "https://api.osv.dev/v1/vulns/MAL-2025-6812" | jq .

While the above command does return a bunch of information about a specific malicious package record, it would assume you already knew what the malicious package ID was in the first place. A more common use-case for the API is to look for a specific package name/version and the associated open source upstream source (ie: npm) to see if there’s a malicious package record associated with it.

curl -s -d   '{"package": {"name": "axios", "ecosystem": "npm"}}'   "https://api.osv.dev/v1/query" | \

  jq '.vulns[] | select(.id | startswith("MAL-"))'

Or in the case of the Axios compromise, there were two different affected versions. Rather than scanning each version separately, you can use the querbybatch endpoint to handle multiple packages, versions and even ecosystems. In the case of MAL-2026-2307, both package versions carry the same malicious package ID.

curl -s -d \

  '{"queries": [

    {"version": "1.4.1", "package": {"name": "axios", "ecosystem": "npm"}},

    {"version": "0.30.4", "package": {"name": "axios", "ecosystem": "npm"}}

  ]}' \

  "https://api.osv.dev/v1/querybatch" | jq .

Building a custom Kubernetes Scanner

I came up with a simple osv-kubernetes.py scanner. The thought process here is that I could create a simple python-based Kubernetes deployment manifest. This pod has a list of Python packages in the filesystem of the pod, as seen when I run the pip list command.

So, I proceeded to create a fake python library (rather than downloading an actual malicious software package). I mean, the package name and version were real, but I fabricated the entire content of the package. It’s a totally dummy package – as you can see from the below echo commands. Let’s see if our custom osv-kubernetes scanner script will pick it up.

So, we created a fake typosquatted Python package. “Reuests” instead of the legitimate “Requests” library. All versions of the typosquatted Reuests library are tracked under MAL-2022-7441. While this is a simple experiment, it takes us beyond the manual process of scanning each library name and version, and automates it by piping the output of the pip list command into the API query. There are many ways that users can use the OSV API, this was purely an experiment for Kubernetes workloads.

Use OSV-Scanner

While there are certainly use-cases for building your own custom scanners, like what we did with the Kubernetes pod scanner earlier, I would recommend using the official OSV-Scanner to find existing vulnerabilities and malicious code injection affecting your project’s dependencies. OSV-Scanner provides the officially supported frontend to the OSV database and CLI interface to OSV-Scalibr that connects a project’s list of dependencies with the vulnerabilities that affect them.

In the below scenario, I used syft to create a simple Software Bill of Materials (SBOM) in JSON output based on an existing Python requirements.txt file. As we found out earlier, the OSV API is entirely JSON-structured, so we wouldn’t scan unstructured .txt files. The most common file to scan would be the SBOM or lock files (ie: osv-scanner –lockfile=package-lock.json).

syft packages requirements.txt -o cyclonedx-json=sbom.cdx.json
osv-scanner -L sbom.cdx.json

As you can see from the screenshot, the CycloneDX SBOM is successfully sourced. The packages LiteLLM and requests were correctly identified as being from the PyPI ecosystem since the Python requirement.txt file was converted into SBOM. As well as having multiple security advisories related to an upstream compromise, LiteLLM was corrected marked as malicious – MAL-2026-2144.

Again, this process is good and all, but you really need to integrate it into the CI/CD process. The OSV-Scanner Github Action leverages the malicious packages repository and the OSV-Scanner CLI tool to track and notify you of known malicious packages across the existing languages and ecosystems. The most common workflow for Github triggers a scan with each pull request and will only report new instances of malware introduced through the PR. The Github Action compares a scan of the target branch to a scan of the feature branch, and will fail if there are new vulnerabilities or malicious packages introduced through the feature branch. Alternatively, this process can be achieved on Scheduled Scans using a cron job.

Moving towards best practices

I say this a lot, but in light of the recent axios@1.14.1 compromise, please make sure you always commit your npm project with the package-lock.json file. It is the only version-locking enforcement mechanism that exists in npm today. Developers should be using npm ci instead of blindly using npm install on Javascript libraries sourced from npm. The npm ci command will only work if a package-lock.json file exists. These lockfiles can also be easily scanned, as seen with osv-scanner. 

Likewise, if you need to update or pull new packages from open source registries like npmjs.com, it’s also worth using the –min-release-age flag (available since npm v11.10.0) to make sure you only install updates, which are at least 3 days old (ie: npm install –min-release-age=3). Most open source malicious packages end up getting classified by OSV.dev within the first 3 days, so configuring a cooldown period is perfect to help prevent consumption of unknown or new variants of malware campaigns.

You can literally hardcode this setting (min-release-age=7) into your .npmrc file. There will always be more malicious actors attacking popular npm and PyPI packages in the future. Thankfully, most will get caught in the first 24 hours, in part due to the fantastic work going on within the OpenSSF Malicious Package packages project. I’m not trying to say that the Javascript (npm) and Python (PyPI) ecosystems are broken by design, but we certainly cannot apply blind trust to the software supply chain.

Get Involved: Help Us Secure the Ecosystem

The strength of the OSV project lies in its community. You can help protect the open source landscape by:

  • Reporting Threats: If you encounter a malicious package, report it to the OpenSSF Malicious Packages repository.
  • Contributing: Help us improve the database by contributing to the OSV project or integrating the API into your own security tooling.

About the Author

Nigel Douglas is the Head of Developer Relations at Cloudsmith. He champions Cloudsmith’s developer ecosystem by creating compelling educational content, engaging with developer communities, and promoting software supply chain security best practices. Nigel helps build and shape the DevOps community through events, tutorials, and innovative programs.

Open Infrastructure is Not Free: A Joint Statement on Sustainable Stewardship

By Blog

An Open Letter from the Stewards of Public Open Source Infrastructure

Over the past two decades, open source has revolutionized the way software is developed. Every modern application, whether written in Java, JavaScript, Python, Rust, PHP, or beyond, depends on public package registries like Maven Central, PyPI, crates.io, Packagist and open-vsx to retrieve, share, and validate dependencies. These registries have become foundational digital infrastructure – not just for open source, but for the global software supply chain.

Beyond package registries, open source projects also rely on essential systems for building, testing, analyzing, deploying, and distributing software. These also include content delivery networks (CDNs) that offer global reach and performance at scale, along with donated (usually cloud) computing power and storage to support them.

And yet, for all their importance, most of these systems operate under a dangerously fragile premise: They are often maintained, operated, and funded in ways that rely on goodwill, rather than mechanisms that align responsibility with usage.

Despite serving billions (perhaps even trillions) of downloads each month (largely driven by commercial-scale consumption), many of these services are funded by a small group of benefactors. Sometimes they are supported by commercial vendors, such as Sonatype (Maven Central), GitHub (npm) or Microsoft (NuGet). At other times, they are supported by nonprofit foundations that rely on grants, donations, and sponsorships to cover their maintenance, operation, and staffing.

Regardless of the operating model, the pattern remains the same: a small number of organizations absorb the majority of infrastructure costs, while the overwhelming majority of large-scale users, including commercial entities that generate demand and extract economic value, consume these services without contributing to their sustainability

Modern Expectations, Real Infrastructure

Not long ago, maintaining an open source project meant uploading a tarball from your local machine to a website. Today, expectations are very different:

  • Dependency resolution and distribution must be fast, reliable, and global.
  • Publishing must be verifiable, signed, and immutable.
  • Continuous integration (CI) pipelines expect deterministic builds with zero downtime.
  • Security tooling expects an immediate response from public registries.
  • Governments and enterprises demand continuous monitoring, traceability, and auditability of systems.
  • New regulatory requirements, such as the EU Cyber Resilience Act (CRA), are further increasing compliance obligations and documentation demands, adding overhead for already resource-constrained ecosystems.
  • Infrastructure must be responsive to other types of attacks, such as spam and increased supply chain attacks involving malicious components that need to be removed.

These expectations come with real costs in developer time, bandwidth, computing power, storage, CDN distribution, operational, and emergency response support. Yet, across ecosystems, most organizations that benefit from these services do not contribute financially, leaving a small group of stewards to carry the burden.

Automated CI systems, large-scale dependency scanners, and ephemeral container builds, which are often operated by companies, place enormous strain on infrastructure. These commercial-scale workloads often run without caching, throttling, or even awareness of the strain they impose. The rise of Generative and Agentic AI is driving a further explosion of machine-driven, often wasteful automated usage, compounding the existing challenges. 

The illusion of “free and infinite” infrastructure encourages wasteful usage.

Proprietary Software distribution

In many cases, public registries are now used to distribute not only open source libraries but also proprietary software, often as binaries or software development kits (SDKs) packaged as dependencies. These projects may have an open source license, but they are not functional except as part of a paid product or platform. 

For the publisher, this model is efficient. It provides the reliability, performance, and global reach of public infrastructure without having to build or maintain it. In effect, public registries have become free global CDNs for commercial vendors.

We don’t believe this is inherently wrong. In fact, it’s somewhat understandable and speaks to the power of the open source development model. Public registries offer speed, global availability, and a trusted distribution infrastructure already used by their target users, making it sensible for commercial publishers to gravitate toward them. However, it is essential to acknowledge that this was not the original intention of these systems. Open source packaging ecosystems were created to support the distribution of open, community-driven software, not as a general-purpose backend for proprietary product delivery. If these registries are now serving both roles, and doing so at a massive scale, that’s fine. But it also means it’s time to bring expectations and incentives into alignment.

Commercial-scale use without commercial-scale support is unsustainable.

Moving Towards Sustainability

Open source infrastructure cannot be expected to operate indefinitely on unbalanced generosity. The real challenge is creating sustainable funding models that scale with usage, rather than relying on informal and inconsistent support. 

There is a difference between:

  • Operating sustainably, and
  • Functioning without guardrails, with no meaningful link between usage and responsibility.

Today, that distinction is often blurred. Open source infrastructure, whether backed by companies or community-led foundations, faces rising demands, fueled by enterprise-scale consumption, without reliable mechanisms to scale funding accordingly. Documented examples demonstrate how this imbalance drives ecosystem costs, highlighting the real-world consequences of an illusion that all usage is free and unlimited.

For foundations in particular, this challenge can be especially acute. Many are entrusted with running critical public services, yet must do so through donor funding, grants, and time-limited sponsorships. This makes long-term planning difficult and often limits their ability to invest proactively in staffing, supply chain security, availability, and scalability. Meanwhile, many of these repositories are experiencing exponential growth in demand, while the growth in sponsor support is at best linear, posing a challenge to the financial stability of the nonprofit organizations managing them.

At the same time, the long-standing challenge of maintainer funding remains unresolved. Despite years of experiments and well-intentioned initiatives, most maintainers of critical projects still receive little or no sustained support, leaving them to shoulder enormous responsibility in their personal time. In many cases, these same underfunded projects are supported by the very foundations already carrying the burden of infrastructure costs. In others, scarce funds are diverted to cover the operational and staffing needs of the infrastructure itself.

If we were able to bring greater balance and alignment between usage and funding of open source infrastructure, it would not only strengthen the resilience of the systems we all depend on, but it would also free up existing investments, giving foundations more room to directly support the maintainers who form the backbone of open source.

Billion-dollar ecosystems cannot stand on foundations built of goodwill and unpaid weekends.

What Needs to Change

It is time to adopt practical and sustainable approaches that better align usage with costs. While each ecosystem will adopt the approaches that make the most sense in its own context, the need for action is universal. These are the areas where action should be investigated:

  • Commercial and institutional partnerships that help fund infrastructure in proportion to usage or in exchange for strategic benefits.
  • Tiered access models that maintain openness for general and individual use while providing scaled performance or reliability options for high-volume consumers.
  • Value-added capabilities that commercial entities might find valuable, such as usage statistics.

These are not radical ideas. They are practical, commonsense measures already used in other shared systems, such as Internet bandwidth and cloud computing. They keep open infrastructure accessible while promoting responsibility at scale.

Sustainability is not about closing access; it’s about keeping the doors open and investing for the future.

This Is a Shared Resource and a Shared Responsibility

We are proud to operate the infrastructure and systems that power the open source ecosystem and modern software development. These systems serve developers in every field, across every industry, and in every region of the world.

But their sustainability cannot continue to rely solely on a small group of donors or silent benefactors. We must shift from a culture of invisible dependence to one of balanced and aligned investments.

This is not (yet) a crisis. But it is a critical inflection point.

If we act now to evolve our models, creating room for participation, partnership, and shared responsibility, we can maintain the strength, stability, and accessibility of these systems for everyone.

Without action, the foundation beneath modern software will give way. With action — shared, aligned, and sustained — we can ensure these systems remain strong, secure, and open to all.

How You Can Help

While each ecosystem may adopt different approaches, there are clear ways for organizations and individuals to begin engaging now:

  • Show Up and Learn: Connect with the foundations and organizations that maintain the infrastructure you depend on. Understand their operational realities, funding models, and needs.
  • Align Usage with Responsibility: If your organization is a high-volume consumer, review your practices. Implement caching, reduce redundant traffic, and engage with stewards on how you can contribute proportionally.
  • Build With Care: If you create build tools, frameworks, or security products, consider how your defaults and behaviors impact public infrastructure. Reduce unnecessary requests, make proxy usage easier, and document best practices so your users can minimize their footprint.
  • Become a Financial Partner: Support foundations and projects directly, through membership, sponsorship, or by employing maintainers. Predictable funding enables proactive investment in security and scalability.

Awareness is important, but awareness alone is not enough. These systems will only remain sustainable if those who benefit most also share in their support.

What’s Next

This open letter serves as a starting point, not a finish. As stewards of this shared infrastructure, we will continue to work together with foundations, governments, and industry partners to turn principles into practice. Each ecosystem will pursue the models that make sense in its own context, but all share the same direction: aligning responsibility with usage to ensure resilience.

Future changes may take various forms, ranging from new funding partnerships to revised usage policies to expanded collaboration with governments and enterprises. What matters most is that the status quo cannot hold.

We invite you to engage with us in this work: learn from the communities that maintain your dependencies, bring forward ideas, and be prepared for a world where sustainability is not optional but expected.

Signed by

Alpha-Omega

Continuous Delivery Foundation (CDF) 

Eclipse Foundation (Open VSX)

OpenJS Foundation

Open Source Security Foundation (OpenSSF)

Packagist (Composer)

Perl and Raku Foundation

Python Software Foundation (PyPI)

Ruby Central

Rust Foundation (crates.io)

Sonatype (Maven Central)

Organizational signatures indicate endorsement by the listed entity. Additional organizations may be added over time.

Acknowledgments: We thank the contributors from the above organizations and the broader community for their review and input.