By Devashri Datta, Independent Researcher, Software Supply Chain Security
Third-party notices (TPNs) are documents distributed to users that list open source third-party software components included in the product and key licensing information. Every time you buy a TV or router, you’ve probably seen them. Yet TPNs were never designed for the complexity, scale, and velocity of today’s software ecosystem. TPNs are one of the most widely distributed and yet least understood artifacts in modern software supply chains.
Inside nearly every appliance, firmware image, SaaS platform, and enterprise distribution, the same pattern persists: a long, unstructured PDF is expected to represent the full scope of open source license compliance.
As software systems have scaled, TPNs have quietly become a critical but increasingly fragile pillar. They are now failing technically, operationally, and structurally under the demands of modern development and distribution.
This article examines why TPNs are breaking. It also outlines what the ecosystem must do next based on large-scale analysis of real-world TPN documents and the development of an automated framework for extracting information directly from them. While traditionally viewed as compliance artifacts, Third-Party Notices (TPNs) also represent an underutilized source of security-relevant intelligence. In many real-world scenarios where Software Bills of Materials (SBOMs) are incomplete, unavailable, or restricted, TPNs may provide the only observable evidence of component usage. This positions TPNs as a critical input to software supply chain security workflows, including vulnerability management, third-party risk assessment, and incident response.
The Hidden Reality: TPNs Are the Supply Chain’s Last Mile
Despite advances in Software Bill of Materials (SBOM) formats such as SPDX and CycloneDX, TPNs remain:
- The only compliance artifact that many vendors publicly distribute
- The only artifact available to customers or regulators for proprietary systems
- The only verifiable attribution record when source code and SBOMs are inaccessible
SBOMs provide structured visibility into software components, but their completeness depends on the generation methods and the availability of build-time data. In practice, SBOMs may not consistently capture full transitive dependencies or runtime-resolved components. In some cases, additional components and licensing details may appear in downstream artifacts such as third-party notices (TPNs), though these are typically not integrated into SBOM analysis pipelines. SBOM availability also varies across organizations and products and may not always be accessible to end users or external stakeholders due to policy or regulatory interpretation. Regulatory frameworks such as the EU Cyber Resilience Act (CRA) are evolving, and expectations around SBOM scope and disclosure remain subject to interpretation. As a result, relying solely on SBOM data may not provide complete visibility into whether a product contains a specific vulnerable component, depending on SBOM completeness and related artifact availability.
In practice, TPNs often serve as the last mile of compliance visibility, bridging internal software composition and external disclosure.
However, TPNs were never designed to operate at the scale or complexity of today’s supply chains.
Security Blind Spot in Software Supply Chains
While SBOMs and software composition analysis (SCA) tools have improved visibility during development, they assume access to structured or source level data. In contrast, TPNs often represent the only externally available artifact in downstream consumption environments such as embedded systems, firmware, and proprietary SaaS distributions.
This creates a structural blind spot in software supply chain security: security teams are frequently forced to make risk decisions without machine readable component intelligence. As a result, vulnerability exposure, dependency risk, and third-party software usage often remain partially or completely unobservable at the point of consumption.
Why the TPN Ecosystem Is Breaking
PDFs Are an Anti-Pattern for Machine-Readable Compliance
Most TPNs are distributed as large, heterogeneous PDFs containing:
- Multi-column layouts
- OCR artifacts and noise
- Inconsistent license formatting
- Duplicated or truncated license text
TPNs often omit component identifiers and lack specific version numbers for components.
PDFs are optimized for display, not structured data. As a result, extracting meaningful compliance information programmatically is extremely difficult.
Existing Compliance Tools Don’t Address the Problem
Current tools such as FOSSology, ScanCode, and ORT are designed to analyze source code or binaries—not TPN documents. Yet in many real-world scenarios, especially audits or vendor reviews, TPNs are the only artifact available.
This creates a fundamental gap: The most widely distributed compliance artifact is the least analyzable.
Inconsistent Generation Pipelines Lead to Data Drift
TPNs are generated through highly variable processes:
- Custom scripts
- Proprietary internal tooling
- Manual aggregation from legacy systems
- Partial or outdated SBOM exports
As a result, even TPNs from the same organization can vary significantly across releases, introducing inconsistencies, omissions, and misalignment with actual dependencies.
Scale Has Outpaced Human Review
Modern TPNs often span hundreds of pages across multiple license families and components.
Manual review has become increasingly impractical due to:
- Repetitive license text
- Poorly structured component mappings
- Lack of contextual metadata
- Hidden obligations within large text blocks
Compliance teams are effectively being asked to analyze documents at a scale that exceeds human capability.
Proposed Contribution: TPN-to-Security Intelligence Framework
This work introduces a systematic framework for transforming Third-Party Notices (TPNs) from unstructured compliance artifacts into structured security intelligence inputs. The framework addresses a critical gap in software supply chain security: the absence of machine-readable component visibility in downstream and vendor-distributed environments.
Unlike traditional software composition analysis tools that rely on source code, build artifacts, or SBOMs, this approach operates on TPNs as a primary data source. It enables the extraction, classification, and interpretation of software components and license obligations from highly unstructured documents.
The key contribution of this work is the demonstration that TPNs can be operationalized into actionable security intelligence for:
- Vulnerability exposure identification when SBOMs are unavailable
- Third-party risk assessment using externally visible artifacts
- Incident response prioritization based on inferred component usage
- Governance and compliance enforcement through structured outputs
Breaking the Logjam: Toward Automated License Intelligence
To address this systemic gap, I developed an automated end-to-end framework that treats TPNs as primary compliance artifacts, rather than secondary documentation.
The approach enables structured extraction and interpretation of license intelligence directly from unstructured documents. While TPNs may lack some information, they still provide valuable signals. For example, even without version identifiers, knowing that a product includes a component can be very valuable (e.g., when asking “which products contain a version of log4j that might be vulnerable to this attack?”).
Structured Extraction from Unstructured PDFs
Using normalization, segmentation, and page-level reconstruction, the system identifies and extracts coherent license blocks even from highly inconsistent documents.
License Identification and Classification
A hybrid approach combining rule-based methods and fuzzy matching maps extracted text into meaningful license categories:
- Permissive
- Weak copyleft
- Strong copyleft
- Proprietary
- Public domain
- Content licenses
- Unknown
This approach achieves, in my testing:
- 92–96% accuracy for permissive licenses
- 85–90% accuracy for copyleft detection
Risk Interpretation
Each component is evaluated for compliance risk based on obligations such as:
- Attribution requirements
- Redistribution conditions
- Copyleft scope
- Source disclosure obligations
- Ambiguous or unidentified licenses
Visualization and Machine-Readable Outputs
The framework produces:
- Interactive dashboards
- Structured datasets
- Outputs compatible with governance workflows and SBOM pipelines
This demonstrates that meaningful compliance intelligence can be derived even from the most constrained artifact available. This closes a long-standing visibility gap in the software supply chain.
Security Implications of TPN Breakdown
The failure of TPNs is not only a compliance problem—it has direct consequences for software supply chain security. When TPNs are inconsistent, unstructured, or incomplete, they reduce the ability of downstream stakeholders to:
- Identify exposure to known vulnerable components
- Trace dependency relationships in third-party software
- Perform accurate third-party risk assessments
- Respond quickly to emerging vulnerabilities in production systems
This makes TPN degradation a security visibility problem, not just a documentation inefficiency.
What the Ecosystem Needs Next
TPN failures are not isolated inefficiencies. They represent a structural weakness in how the global software supply chain communicates compliance.
Addressing this requires coordinated effort across standards, tooling, and ecosystem alignment.
Standardized, Machine-Readable TPN Formats
The ecosystem needs formats beyond PDFs, such as:
- Creating a standard TPN-JSON format for use
- SPDX-aligned TPN profiles
These would enable structured, interoperable compliance disclosures.
One possible longer-term solution is to embed machine-readable data (such as an SBOM in SPDX format or a TPN in JSON format) within the PDFs, creating a “hybrid PDF”. The PDF format already permits adding internal files (called “attached files”). LibreOffice already supports generating PDFs that embed the source document, allowing people to use their existing process for exchanging display PDF while also including machine-readable data. Tools that can quickly extract those embedded files and complain when they’re not present could speed their deployment. However, while this approach has promise, it doesn’t deal with the current documents, which do not embed this information.
Improved Support for Dependency Analysis
Unsurprisingly, many improvements for handling dependencies could help in processing TPNs, SBOMs, and many other related formats.
It would be better if there was shared reference corpora for license matching. That’s because accurate license detection requires:
- Canonical license datasets
- Variant and legacy license mappings
- Community-maintained reference corpora
This would significantly improve consistency across tools and organizations.
In addition, there should be open APIs for information on licensing. Standard APIs should support:
- License extraction
- Component-to-license mapping
- Obligation and risk interpretation
This would enable interoperability between vendors, auditors, and regulators.
Integration Between SBOM and TPN Pipelines
Today, SBOMs and TPNs exist in disconnected workflows. Yet in many cases, TPNs provide the only information available about product components.
A unified pipeline would:
- Eliminate duplication
- Reduce inconsistencies
- Ensure alignment between internal and external disclosures
Related Work
Prior efforts across the software supply chain ecosystem have focused on improving license detection and SBOM generation during development and build phases. However, these approaches often assume access to source code or structured metadata, leaving a visibility gap when Third‑Party Notices (TPNs) are the only available compliance artifact.
Related work on automating TPN analysis demonstrates how unstructured compliance documents can be transformed into machine‑readable license intelligence suitable for governance and audit workflows. Supporting datasets for compliance governance and SBOM alignment are described in:
Datta, D., **TPN Compliance Dataset for Software Supply Chain Governance**, Zenodo, 2025.Â
https://doi.org/10.5281/zenodo.19152619
Framework:
https://doi.org/10.5281/zenodo.19099831
Security Workflow Integration Model
The proposed framework reframes TPNs as an input layer in modern software supply chain security workflows. Rather than treating TPNs as static compliance documentation, they can be operationalized into structured security intelligence pipelines.
The extracted data can be integrated into:
- Vulnerability management systems (to identify exposed components when SBOMs are missing)
- Third-party risk management (TPRM) platforms (to assess supplier software risk)
- Incident response workflows (to rapidly evaluate exposure after CVE disclosures)
- DevSecOps pipelines (to enforce policy-based controls on software composition)
This positions TPN analysis as a bridge between compliance documentation and operational security decision-making.
Conclusion: The Future Requires Fixing TPNs
Third-party notices (TPNs) were originally designed as simple attribution mechanisms and ways to declare licenses to recipients (as required by many licenses). Today, they are expected to support audits, transparency, regulatory compliance, and supply chain security.
But they are still delivered as static documents that do not scale.
TPNs are not failing because organizations lack intent; they are failing because the ecosystem has outgrown the tools and formats upon which it relies.
If we want a more transparent, auditable, and trustworthy software supply chain, TPNs must evolve into structured, machine-readable, and interoperable artifacts.
The next phase of open source security will not be defined solely by SBOMs or scanning tools, but by how effectively we solve the last mile of compliance visibility.
Fixing TPNs is an important step toward a more reliable and verifiable software ecosystem.
Acknowledgments
The author acknowledges David A. Wheeler and Sally Cooper for their insightful feedback and helpful discussions during the development of this work.
Resources
The open source implementation of the prototype described in this post, including parsing logic, license-classification rules, and the interactive dashboard, is available on GitHub for anyone interested in exploring or extending the approach:
https://github.com/devashridatta-dotcom/tpn-automation
Community feedback and contributions are welcome.
Author Bio
Devashri Datta is an AI & Software Supply Chain Security Researcher. Security researcher and enterprise security architect focused on software supply chain security, DevSecOps automation, and security governance at scale. Research areas include SBOM governance, vulnerability intelligence (VEX), Third-Party Notice (TPN) analysis, AI-assisted risk modeling, and security exception management in cloud-native environments under compliance frameworks such as SOC 2, ISO 27001, and FedRAMP.

Jonas Rosland is Director of Open Source at Sysdig, where he works on cloud-native security and open source strategy. Sysdig supports open source security projects, including Falco, a CNCF graduated project for runtime threat detection.


Tracy Ragan is the Founder and Chief Executive Officer of DeployHub and a recognized authority in secure software delivery and software supply chain defense. She has served on the Governing Boards of the 
Jeff Diecks is a Senior Technical Program Manager at The Linux Foundation. He has more than two decades of experience in technology and communications with a diverse background in operations, project management and executive leadership. A participant in open source since 1999, he’s delivered digital products and applications for universities, sports leagues, state governments, global media companies and non-profits.



