At Open Source Summit North America earlier this year as a 10th grader, Nathan Naveen, gave a talk about OpenSSF Criticality Score. Nathan takes a look at why understanding tools like the Criticality Score is a valuable skill for anyone involved in open source contributions, no matter your age.
By Nathan Naveen
In a world where open source projects fuel innovation and drive the digital revolution, their importance cannot be overstated. But how do we measure the significance of a project? Is it by the number of contributors? The frequency of commits? Or perhaps the organizations backing it? Enter the OpenSSF Criticality Score, a tool that unlocks the secrets of open source projects’ criticality. Join me as we dive into the depths of this scoring system and uncover its potential, limits, and ability, to reshape the way we evaluate and prioritize open source endeavors.
What is the OpenSSF Criticality Score?
To begin understanding the complexities, let’s clarify the purpose of the Criticality Score. Essentially, it is a scoring system that assesses the relative importance of an open source project based on various signals and weights. These signals may include the number of contributors, commit frequency, and the number of organizations contributing to a project. Weights represent the assigned value of importance for each signal. By combining the weights with the signal values using an algorithm, the project’s score can be calculated.
Under the hood, Criticality Score, employs an algorithm that considers various “weights” assigned to these signals, which an end-user can adjust. The more weight a signal has, the more it influences the final score. For instance, if you assign a higher weight to the number of contributors than other signals, projects with more contributors will have a higher criticality score.
The algorithm has three variables, ai, Si, and Ti. ai is the weight of the i’th signal, Si is the value of the i’th signal, and Ti is the threshold of the i’th signal. The threshold is the maximum value that the signal can take. To simplify it we can substitute out part of our equation in for Xi:
Now the equation looks similar to a weighted arithmetic mean. A weighted arithmetic mean is basically a better way to calculate the mean than just a regular average. We are calculating the average on Xi with the weight being ai.
To help understand, let us take a quick example repository. In this example, we are only calculating the score with two signals, contributor count and commit frequency.
Since we only have 2 signals (contributor count and commit frequency), and since summation is a simplified version of adding multiple values together, we get:
Now we know we just have to solve for this simplified version.
To install Criticality Score, download the latest release. It will have the following to calculate the criticality score:
A Real-World Example
To gain a better understanding of this concept, let’s walk through a real-world demonstration. The Criticality Score tool was utilized to evaluate the dependencies of several OpenSSF projects. The process involved the following steps:
- Getting Signal Data: The first step was to extract the signal data from the projects. We can get the signal data from a file of dependencies parsed as Github repo URLs. In this demo, I have called this file parsed.txt. This was done using the command
criticality_score -scoring-disable -depsdev-disable -format csv -out signals.csv parsed.txt.
- Calculating the Score: The next step was to calculate the Criticality Score on our signal data. We can first calculate the score using the Criticality Scores default weights, original_pike.yml. This was done using the command
cat signals.csv | scorer -config config/scorer/original_pike.yml - > score.csv.This command calculates the Criticality Score using the default weights specified in the
The results were intriguing. The most critical projects included ‘moby/moby’ with a score of 0.83385 and ‘prometheus/prometheus’ with 0.80515. On the other hand, ‘remyoudompheng/go-dbus’ and ‘bugsnag/osext’ scored 0.13531 and 0.15976, respectively, making them the least critical projects in this set.
The Criticality Score algorithm is highly customizable, allowing for weights to be adjusted according to your preferences. To demonstrate this, the Criticality Score was recalculated with modified weights.
Specifically, the weight for commit frequency was increased from 1 to 10 in the configuration file, creating a new configuration named pike_modified.yml. This meant that projects with higher commit frequencies would have a higher Criticality Score.
The Criticality Score was then recalculated using the modified weights with the command
cat signals.csv | scorer -config config/scorer/pike_modified.yml - > score_modifed.csv. This resulted in different scores for the projects.
The most critical projects with the modified weights were ‘moby/moby’, scoring 0.69588 and ‘hashicorp/consul’ with 0.66780. The least critical projects remained the same but their scores dropped even further. This experiment demonstrates weight modification’s direct influence on the final Criticality Score.
A Note on the Limitations of the Criticality Score
While the Criticality Score provides a valuable perspective on the activity and engagement around open source projects, it’s important to understand its limitations. The Criticality Score primarily measures activity, not necessarily criticality. Thus, a high score often indicates significant importance, but a lower score doesn’t mean a project isn’t critical. There are many critical projects with low scores due to factors like fewer contributors or less frequent commits. The challenge lies in the fact that some of these projects may be inactive yet widely depended upon. The projects with high scores are definitely critical, while some projects with lower scores can also be critical. The Criticality Score is one of many signals that can help identify critical software. Therefore, while it’s a useful tool, it should be used as part of a larger process to assess the importance of open source projects.
A Broader Perspective on Assessing Open Source Projects
As we explore tools like the Criticality Score, it’s essential to remember that no single tool provides a complete picture. The beauty of the Criticality Score lies in its flexibility and the ability to adjust weights according to what matters most to you. However, many other systems also use a weighting system, and the complexity of the Criticality Score’s system does not necessarily make it superior. It’s a piece of the puzzle, valuable for providing insights into project activity and contributing to a broader assessment of a project’s importance. As we continue to develop and refine tools for evaluating open source projects, it’s crucial to consider multiple viewpoints and approaches to get the most accurate results.
The journey with the OpenSSF Criticality Score has been one of learning and exploration. Delving into the intricacies of this tool has been a joy, and this blog post aims to help readers understand its significance and inner workings. Understanding tools like the Criticality Score is a valuable skill for anyone involved in open source contributions. So, don’t hesitate. Start exploring the Criticality Score for your projects today!
About the Author
Nathan Naveen, an 11th grader, is passionate about algorithms and has solved over 1100 Leetcode problems. He is actively involved in coding competitions, contributes to open-source projects, and is a jiu-jitsu practitioner. Recently he also spoke about Criticality Score at Open Source Summit North America 2023. To learn more about Nathan, please visit his LeetCode profile, GitHub profile, and his blog.