By Caleb Brown, Google Open Source Security Team and Jossef Harush Kadouri, Software Supply Chain Security at Checkmarx
Today, the OpenSSF Package Analysis team is excited to announce the launch of our Malicious Packages repository, the first open source system for collecting and publishing cross-ecosystem reports of malicious packages.
This repository is a response to the rising incidence of attacks that include malicious open source packages. For example, earlier this year, the Lazarus Group (a prolific North Korean state-backed hacking group) targeted the blockchain and cryptocurrency sectors. The group used sophisticated methods, including deceptive npm packages to compromise various software supply chains. A centralized repository for shared intelligence could have alerted the community to the attack sooner and helped the open-source community understand the complete range of threats. Our hope is for the Malicious Packages repository to be this kind of resource.
What is a malicious package?
A malicious package is a form of malware that is delivered as an open source package and published to a package repository, such as PyPI or NPM. Vulnerable code has unintentional weaknesses that can be exploited, while malicious code is intentionally designed to harm or compromise its victims. Malicious packages are used to attack the developers or companies that unwittingly install and run them. These packages can be used for attacks such as gaining unauthorized access, leaking private information, consuming computing resources, or even destroying or damaging data. These attack flows are not prevented by most endpoint antivirus software.
The Package Analysis project was created to find such malicious packages as soon as possible by downloading, installing and executing packages from popular open source package repositories as they are published. As the packages are run, we capture the executed commands and analyze network traffic. A set of rules is then applied against the observed behavior to decide if the package is acting maliciously. If it is, a report is generated and published to the new Malicious Packages repository.
A unified system
Currently, each open source package repository has its own approach to handling malicious packages. When a malicious package is reported by the community, it is common for the package repository’s security team to remove the package and its associated metadata. Unfortunately, these actions often occur without any public record. Discovering what malicious packages exist requires piecing together data from many disparate public sources, or through proprietary threat intelligence feeds.
The Malicious Packages repository helps fill the data gap by creating a public database that aggregates reports of malicious packages discovered in open source repositories. This database has the potential to stop malicious dependencies from moving through CI/CD pipelines, refine detection engines, scan for and prevent usage in environments, or accelerate incident response.
The reports in the Malicious Packages repository use the Open Source Vulnerability (OSV) format. OSV is a JSON format used for specifying vulnerabilities in open source projects. By using the OSV format for malicious packages it is possible to make use of existing integrations, including the osv.dev API, the osv-scanner tool, and deps.dev. The OSV format is also extensible, allowing additional data to be recorded like indicators of compromise, or classification data.
The Malicious Packages repository is an open, community driven project. To ensure the database has good coverage, and maintains a high standard of quality, community members are encouraged to contribute new reports, amend and annotate existing reports, or even dispute reports. The repository also supports automated continuous ingestion for trusted researchers to feed data to the repository quickly and efficiently.
The repository already has over 15,000 reports of malicious packages, with current data being sourced from the OpenSSF Package Analysis project, Checkmarx security and exports of malicious packages tracked by GitHub.
Get involved
Looking to the future, the Package Analysis team hopes to continue to grow the database by collecting more reports through contributions from security researchers, and by enabling contributors to enrich the reports with additional data about the malicious packages, such as indicators-of-compromise and attack classifications by defining an extension to the OSV schema specifically for malicious packages. This additional data can help researchers identify trends and specific bad actors, rather than focusing solely on individual malicious packages, resulting in improved general countermeasures.
The data can also be made more accessible to consumers through documenting how to use the report data, and by building new integrations with tools and services. Our goal is for the data to be easily accessible, so the community can rapidly counter attacks.
Want to get involved? Please check out our project’s contribution guidelines, where you can learn how to contribute packages or code. Our team is available to chat in the OpenSSF Package Analysis slack channel, or you can join the OpenSSF Securing Critical Projects Working Group for further discussion about the project.
About the Authors
Caleb Brown
Caleb Brown is a Senior Software Engineer on the Google Open Source Security Team, with over 20 years of industry experience. He is currently focused on building open source solutions for understanding the behavior of open source packages, and finding and reporting malicious packages.
Jossef Harush Kadouri
Jossef loves contributing to the open-source community, and he’s ranked in the top 1% on Stack Overflow. Jossef co-founded Dustico in 2020, a software supply chain security company acquired by Checkmarx in 2021, and previously worked for several cybersecurity companies. Jossef and his research team are hunting for software supply chain attackers and keeping the ecosystem safe.