Abstract
Website tracking methods such as third-party cookies and browser fingerprints enable companies to secretly collect user data for advertising and profiling. While some data collection brings convenience, unconsented and excessive tracking threatens privacy. This paper explains how third-party cookies transfer information between websites and how fingerprinting uniquely identifies devices. To limit tracking, the paper recommends changing Chrome browser settings, using the Startpage search engine, and installing the uBlock Origin extension. However, overly aggressive blocking risks breaking websites, and fingerprinting is necessary for online security. Thus, individuals must balance privacy protections with functionality loss. Overall, while no solution fully prevents tracking, informed users can utilize available tools to improve personal data control given the constraints of the modern web.
Introduction
When you shop for items online, have you ever found advertisement banners on other websites which display related content and nudge you to browse more? This isn’t magic; this is website tracking at work. Websites employ a wide array of tracking methods, such as deploying cookies and utilizing fingerprinting. The subsequently collected data are used by companies to create personalized advertisements and sell the data to data brokers.
While companies try to limit users’ ability to prevent tracking, there is a solution: browser hardening. By surpassing a website’s tracking mechanisms, users can effectively reduce website-level tracking. For example, users can modify the browser’s settings, change the default search engine, and install privacy-focused extensions. Detailed implementations that allow for further protection include routing connections through private DNS servers and blocking third-party cookies. While not always practical, browser hardening allows technologically inclined users to limit data collection. However, systemic solutions beyond the individual level are still needed to address the website tracking problem.
Terminologies
It is necessary to distinguish three confusing terms: website, browser, and search engines. A website is a collection of web pages that share the same domain. For example, “google.com” is a website, and web pages that have the same domain include “google.com/mail” (Gmail) and “google.com/meet” (Google Meet). A browser is a software that can access websites, and examples include Google Chrome, Microsoft Edge, and Safari. A search engine displays a list of websites that are relevant to a search term, like Google, Microsoft Bing, and DuckDuckGo. This article refers to “Google” as the search engine, and “Google Chrome” as the browser.
Figure 1. Parts of a URL. This essay refers to top- and second-level domain as “domain”. [1]
How Do Cookies Work?
Third-party cookies are commonly used to record a user’s activity on the Internet. In general, cookies are small text files sent by a website and stored in a browser. They can save most types of information: punctuations, English alphabets, and numbers. An instance of cookie usage is the “remember me” checkbox on login pages – many pages use cookies to store the user’s information, such as the username and password. In the image below, __cfuid represents a user’s ID that a website can recognize, along with other information (_ga, _gid) for a website to work. As such, most websites – besides simple ones like resume websites – rely on cookies to function.
Figure 2. An example of a cookie: plain text that concatenates information and sent through a browser [2]
Third-party cookies are cookies not owned by the website one is visiting. Say a person visited “amazon.com,” all cookies stored under Amazon are first-party cookies, and those that are not stored under “amazon.com” are third-party cookies. First-party cookies are often needed: from helping the user stay logged in, to saving items in a shopping cart, they are vital for websites to function [3]. Websites often create first-party cookies to store information that changes often.
If a website uses a service from an advertisement company, the ad company can easily insert a cookie to the website: in the diagram below, when Site A connects to the ad server, the ad server sends a cookie back to Site A. The ad server’s cookie can record the user’s behavior (e.g., browsing history) on Site A and return it to the ad server. When the user visits Site B, if Site B uses the same advertising service as Site A, Site B will also gain access to the user’s behaviors. In most cases, third-party cookies are used for advertising [4]. For example, when a user looks up Thanksgiving presents on a shopping website, they may start seeing Thanksgiving present ads on other websites. As soon as Thanksgiving is over and the user starts shopping for other items, the ad banners will update across websites that share the same advertisement provider.
Figure 3. A demonstration of how third-party cookies can “transfer” information from one site onto another [5]
What is Fingerprinting?
Fingerprinting refers to linking data to a browser using a set of parameters. If a computer has Google Chrome and Microsoft Edge, each browser will have a fingerprint. A browser stores necessary information, such as time zone, language, and software version. These settings may seem insignificant, but when dozens piece together, they form a unique “fingerprint” that names a particular browser [8].
Websites can fingerprint a browser in many ways. One technique is canvas fingerprinting, which records all website elements displayed on a browser, such as background color. Canvas fingerprinting is based on how websites are rendered differently on different devices, as those with different screen sizes and graphics cards will display the same webpage differently. There are more methods such as WebGL fingerprinting, but in short, all the techniques combined can collect enough information to identify 74% of desktop browsers [10], [11]. Fingerprinting was initially developed to block account takeover attempts, in which an attacker tries to log into a user’s account from a different device, but is now widely used for advertising [9].
Figure 4. How fingerprinting creates a unique identifier for a browser by collecting its attributes [11]
To advertisers, fingerprinting is advantageous over cookies. First, they are not as strictly regulated as cookies, as advertisers do not need to notify users of fingerprinting. Next, some cookies can be deleted on a device, while fingerprint data cannot as the data are stored remotely on a server. If an ad server solely relies on cookies, that information is gone when the user deletes all their cookies unless the server copies and stores the data[8]. Notably, browser fingerprint is extremely accessible, and can be recorded even in Incognito mode or Tor [11].
Fingerprinting poses severe privacy concerns in personalized advertising. Imagine a user in New York City who uses Google Chrome on a Windows 10 laptop. Their browser has a specific combination of settings: English as the language, Eastern Time Zone (UTC-5), and the latest Chrome version. Additionally, they may have a particular set of browser extensions. When they visit a website, the site can collect this information and combine it to create a unique fingerprint for their browser. This persistent identification allows websites and advertisers to track users’ online behavior and gather data about their interests without their explicit consent.
Why Should You Care?
Personalized interactions may violate one’s privacy. For example, a one million-dollar Netflix contest in 2006 challenged contestants to develop the best algorithm to predict users’ movie ratings. Netflix’s dataset contained the subscriber ID, the movie title, the year of release, and the subscriber rating; only 16 days later, two University of Texas researchers matched the same users on IMDb [12]. This example shows how one website’s data can be analyzed to create a user profile that other websites can access; in this case, shared information between Netflix and IMDb.
The next reason is much more serious: what if you are considered an unsuitable job candidate or a risky insurance renewal only because of your online interactions? As recruiting websites record your browser fingerprint, they can purchase a data broker’s database and look for a matching fingerprint from other websites. Once a fingerprint is matched, other cookies and fingerprints can be gathered. Have you looked up bankruptcy lawyers? A banking firm may label you with poor credit. Have you visited certain websites that infer you may be gay? Jobs that dislike homosexual employees may reject you [13], [14]. Although these inferred profiles may be wildly inaccurate since multiple people may have similar browser profiles, businesses can choose to trust these fingerprints and discriminate against you for deeds that you have not done.
To further illustrate the potential consequences of personalized interactions, consider the hypothetical case of John. John is an avid user of a fitness tracking app that records his daily routines, heart rate, and sleep patterns. Unbeknownst to John, the fitness app sells his data to a data broker, which sells the data to a health insurance company, which then uses the information to determine that John is at a higher risk for certain health conditions. As a result, John’s insurance premiums may increase significantly, even though he has never been diagnosed.
In real life, refusing to offer loans or insurance to those in certain neighborhoods is called “redlining;” online, it is called “weblining,” which is legal and extremely difficult to validate. None of this is conspiracy: the Digital Advertising Alliance, a non-profit organization that advocates for responsible digital advertising, named fingerprinting “an adverse determination of a consumer’s eligibility for employment, credit standing, [and] health care treatment” [15]. If these reasons convince you to protect your digital privacy, it is time to take action.
Change These Settings if You Use Chrome
At 64% of the browser market share, Chrome is the most popular browser. As some users may not want to copy all their bookmarks and settings to another browser, changing these Chrome settings can greatly improve one’s privacy [16].
- Privacy and Security > Third-Party Cookies > Block third-party cookies
Given how Google is experimenting with blocking third-party cookies in Q1 2024 [17], web developers are accommodating their site to function without third-party cookies. At the time of writing, it is safe to enable this setting and block third-party cookies. - Privacy and Security > Site settings
Check under “Permissions,” which lists the sites that have access to your location, camera, and microphone data. Disable any websites that do not need such permissions. - You and Google > Sync and Google services > Other Google Services
Disable “Help improve Chrome’s features and performance,” “Make searches and browsing better,” and “Improve search suggestions” as these settings require Google to collect your analytics, which may contain your browsing history, usage time, and more. - Privacy and Security > Security > Use secure DNS
Select the “Custom” option, and fill in https://dns.quad9.net/dns-query. In short, a secure DNS (domain name service) server anonymizes web request data and filters content. Quad9 is a private DNS provider that blocks malicious web domains [18].
Figure 5. Screenshots of Google Chrome settings one can change to harden their browser with minimal effort (on Windows)
In terms of privacy, there are better browser options such as Brave, Firefox, and Mullvad. Regardless, this section is aimed toward those who wants to keep Google Chrome but still help them reduce their exposure to unconsented data collection.
Google.com without Google – Startpage
Google records all of a device’s activity, such as its location and IP address, even in Incognito mode [19]. Yet when considering the usability of many private search engines, their search results are not as accurate as Google’s. For users who recently moved to other private search engines, Google’s accuracy is hard to sacrifice in exchange for “privacy”. However, there is now a solution, as the search engine Startpage offers Google search’s preciseness without compromising privacy.
Startpage acts as a middleman between the user and Google. When one searches with Startpage, it first encrypts the search query. Even U.S. agencies will not see the search content, as only non-U.S. administrators manage the servers. Furthermore, according to Startpage, they are not required to retain personal data in their servers’ location, so they have no data to turn over in the first place [20], [21]. Next, Startpage removes any identifying information, such as IP address and geolocation. As such, when Startpage sends the search query to Google, Google cannot identify the original user. Google will then return search results to Startpage’s servers, which are encrypted by Startpage and decrypted in the user’s device upon being sent back [22].
Figure 6: A brief display of Startpage’s search process and how it guarantees anonymity
To illustrate the benefits of using Startpage, consider a journalist investigating a sensitive topic, who needs to gather accurate information without revealing their identity. By using Startpage, the journalist can access Google’s precise search results while maintaining their anonymity, as Startpage encrypts the search query and removes any identifying information before sending it to Google. This way, the journalist can obtain the necessary information without leaving a digital trail that could compromise their investigation or put them at risk.
Amend What Browsers Miss – Extensions
Extensions are scripts that provide additional functionalities to a browser. Arguably the most vital extension in the privacy sphere is uBlock Origin. For one, uBlock allows users to choose from different “blocking modes,” which allow the user to select an optimal point between blocking off as many intrusive advertisement servers as possible versus having to manually test out which domains should be blocked and which ones should not be [23].
Figure 7: A diagram that demonstrates different uBlock Origin blocking mode’s efficiency [23]
The main distinctions among blocking modes are the number of filter lists and if third-party scripts are allowed. Filter lists contain websites known to track users or collect data; as such, when more filters are applied, more domains are blacklisted and will provide a stronger protection. Still, as many websites require third-party scripts to function (which are different from third-party cookies, as shown in the following diagram), using stricter blocking modes may break sites. Alternatively phrased, even though stricter blocking modes can automatically filter out more domains that track users, it will inevitably block those that are necessary for a site to function, and the user needs to unblock such domains manually.
Figure 8: The difference between third-party scripts and third-party cookies
Why Is Browser Hardening Not Default?
Many technical aspects of browser hardening require a transition period. For example, Google estimates that web developers need a year to stop support for third-party cookies [17]. This statistic demonstrates the prevalence of third-party cookies, and it is unrealistic for mainstream browsers to stop supporting them before web developers completely remove them.
In addition, some websites will not function properly if all fingerprinting attributes are blocked. One example is “responsiveness,” or the ability for a website to display elements in a user-friendly manner across different devices [24]. Specifically, a website must know the screen resolution and available hardware to render content. Creating a browser that disables all such attributes will indeed prevent fingerprinting, but will also make website elements not display in their intended order. As an extra fact, there is a fingerprinting-blocking technique called “letterboxing” on Firefox configurations that simulates commonly used screen resolutions by adding padding around the website’s borders [25], but it may heavily degrade user experience.
Take the case of a user who installs uBlock Origin. When the user visits a news website, the website might rely on certain scripts or elements to display its content properly. If uBlock Origin blocks these scripts due to its strict settings, the user may be unable to read the article or navigate the site effectively. To address this issue, the user would need to manually whitelist the website or adjust the extension’s settings to allow the necessary scripts. This example demonstrates how aggressive browser hardening can lead to a compromised user experience.
Outlook
Unfortunately, browser hardening may still be required. Although laws such as California Consumer Privacy Act (CCPA) and the European Union’s General Data Protection Regulation (GDPR) force websites to notify users of cookie usage, they do not regulate fingerprinting. One reason is because fingerprinting is necessary in some contexts. For instance, fingerprinting blocks account takeover attempts: once an account is bound to a device’s fingerprint, login attempts on another device with a different fingerprint may raise security flags. Fingerprinting also labels suspicious connections, such as those possibly made from a virtual machine or through a proxy, which are vital to banking websites [9].
Furthermore, there are uncountably many tracking methods, with third-party cookies and fingerprinting merely two of them; many tracking methods are also too covert to be detected on a reliable basis [26]. Even assuming that a corresponding, comprehensive legislation can be established in certain regions, it is almost impossible to enforce it universally, as servers based in different regions must comply with their local legislations as well.
The best a user can do is to stay informed and vigilant about browser hardening and learn their preferences through continuous tweaking. While enhanced privacy protections may break website functionality and deteriorate user experience initially, each individual must weigh these tradeoffs against threats to their personal data. Though altering browser settings, minimizing collected data by changing search engines, and using tracker-blocking extensions, there exists many steps all users should take to value their privacy in the current online ecosystem.
Further Readings
- Firefox and Brave-specific settings: https://www.privacyguides.org/en/desktop-browsers/
- Complete uBlock Origin guide: https://github.com/gorhill/uBlock/wiki
- Secure DNS (Domain Name Service): https://us.norton.com/blog/privacy/secure-dns (but ignore Norton advertisements, only look at the explanations)
Multimedia Applications
- Firefox hardening guide 1: https://www.youtube.com/watch?v=F7-bW2y6lcI
- Firefox hardening guide 2: https://www.youtube.com/watch?v=Fr8UFJzpNls
- Chromium-based hardening guide: https://www.youtube.com/watch?v=2EQuLIVXXMc
Contact: wangwarr@usc.edu | phone available upon request
Works Cited
[1] “Parts of a URL: A Short Guide.” Accessed: May 07, 2024. [Online]. Available: https://blog.hubspot.com/marketing/parts-url
[2] “Requests – Working with Cookies.” Accessed: Feb. 17, 2024. [Online]. Available: https://www.tutorialspoint.com/requests/requests_working_with_cookies.htm
[3] “All you need to know about third-party cookies.” Accessed: Jan. 22, 2024. [Online]. Available: https://cookie-script.com/all-you-need-to-know-about-third-party-cookies.html
[4] “What is Third-Party Ad Serving?,” All About Cookies. Accessed: Feb. 03, 2024. [Online]. Available: https://allaboutcookies.org/ad-serving
[5] “Blocking Third-Party Hands from the Cookie Jar,” CSS-Tricks. Accessed: Jan. 22, 2024. [Online]. Available: https://css-tricks.com/blocking-third-party-hands-from-the-cookie-jar/
[6] M. G. Elias Jennifer, “How Google’s $150 billion advertising business works,” CNBC. Accessed: Jan. 22, 2024. [Online]. Available: https://www.cnbc.com/2021/05/18/how-does-google-make-money-advertising-business-breakdown-.html
[7] “Google Ads – Get Customers and Sell More with Online Advertising.” Accessed: May 07, 2024. [Online]. Available: https://ads.google.com/intl/en_us/home/
[8] P. Townshend, “Browser fingerprinting: Everything you need to know,” SmartFrame. Accessed: Jan. 22, 2024. [Online]. Available: https://smartframe.io/blog/browser-fingerprinting-everything-you-need-to-know/
[9] T. Kadar, “What is Browser Fingerprinting & How Does it Work?,” SEON. Accessed: Jan. 22, 2024. [Online]. Available: https://seon.io/resources/browser-fingerprinting/
[10] P. Hraška, “We collected 500,000 browser fingerprints. Here is what we found.,” Slido developers blog. Accessed: Jan. 22, 2024. [Online]. Available: https://medium.com/slido-dev-blog/we-collected-500-000-browser-fingerprints-here-is-what-we-found-82c319464dc9
[11] “The Top Browser Fingerprinting Techniques Explained,” Fingerprint. Accessed: Jan. 22, 2024. [Online]. Available: https://fingerprint.com/blog/browser-fingerprinting-techniques/
[12] “5 data breaches: From embarrassing to deadly – Netflix accidentally reveals rental histories (1) – CNNMoney.com.” Accessed: Feb. 03, 2024. [Online]. Available: https://money.cnn.com/galleries/2010/technology/1012/gallery.5_data_breaches/index.html
[13] D. Tynan, “Four reasons why you should worry about online tracking (and advertising isn’t one of them),” Computerworld. Accessed: Feb. 03, 2024. [Online]. Available: https://www.computerworld.com/article/2710565/four-reasons-why-you-should-worry-about-online-tracking–and-advertising-isn-t-one-of-them-.html
[14] “The good, the bad and the ugly sides of data tracking.” Accessed: Feb. 03, 2024. [Online]. Available: https://internethealthreport.org/2018/the-good-the-bad-and-the-ugly-sides-of-data-tracking/
[15] “DigitalAdvertisingAlliance.org | DAA Expands Self-Regulatory Framework Beyond OBA.” Accessed: Feb. 03, 2024. [Online]. Available: https://digitaladvertisingalliance.org/blog/daa-expands-self-regulatory-framework-beyond-oba
[16] “Privacy Respecting Web Browsers for PC and Mac – Privacy Guides.” Accessed: Jan. 22, 2024. [Online]. Available: https://www.privacyguides.org/en/desktop-browsers/
[17] “Prepare for phasing out third-party cookies | Privacy Sandbox,” Google for Developers. Accessed: Jan. 22, 2024. [Online]. Available: https://developers.google.com/privacy-sandbox/3pcd
[18] “Quad9 | A public and free DNS service for a better security and privacy,” Quad9. Accessed: Jan. 22, 2024. [Online]. Available: https://quad9.net/
[19] “Does google search keep log of search history while being in incognito mode (without account) – Google Search Community.” Accessed: Jan. 22, 2024. [Online]. Available: https://support.google.com/websearch/thread/210775248/does-google-search-keep-log-of-search-history-while-being-in-incognito-mode-without-account?hl=en
[20] “Where are your servers located, and how does this impact government/legal requirements to record, share, or hand over data?,” Startpage Support. Accessed: Jan. 22, 2024. [Online]. Available: https://support.startpage.com/hc/en-us/articles/4616037775892-Where-are-your-servers-located-and-how-does-this-impact-government-legal-requirements-to-record-share-or-hand-over-data
[21] “Startpage’s EuroPriSe audit,” Startpage Support. Accessed: Jan. 22, 2024. [Online]. Available: https://support.startpage.com/hc/en-us/articles/4455288188820-Startpage-s-EuroPriSe-audit
[22] Startpage, “How does Startpage’s Private Search Engine work?,” Startpage.com Blog. Accessed: Jan. 22, 2024. [Online]. Available: https://www.startpage.com/privacy-please/startpage-articles/how-does-startpages-private-search-engine-work
[23] “Blocking mode · gorhill/uBlock Wiki.” Accessed: Jan. 23, 2024. [Online]. Available: https://github.com/gorhill/uBlock/wiki/Blocking-mode
[24] “Viewport meta tag – HTML: HyperText Markup Language | MDN.” Accessed: Jan. 23, 2024. [Online]. Available: https://developer.mozilla.org/en-US/docs/Web/HTML/Viewport_meta_tag
[25] “Firefox to add Tor Browser anti-fingerprinting technique called letterboxing,” ZDNET. Accessed: Jan. 23, 2024. [Online]. Available: https://www.zdnet.com/article/firefox-to-add-tor-browser-anti-fingerprinting-technique-called-letterboxing/
[26] “JavaScript Bundlers: In-Depth Guide,” Snipcart. Accessed: Feb. 03, 2024. [Online]. Available: https://snipcart.com/blog/javascript-module-bundler