Online privacy is a huge problem for end users now. Our data is being appropriated and even stolen by corporations like Facebook, Google, Microsoft and even Apple. These corporations are obliged to share that data with the US government who use it and share it as they please. It’s one thing when websites spy on you. What’s worse is that many web browsers actively participate and profit in the data aggregation stakes. Google, in particular, has played a leading role here with Chrome, which is effectively (il)legal spyware.
Browsers should be our software and should be neutral. We don’t pay for browsers since Microsoft destroyed Netscape corporation by making Internet Explorer as good as Navigator 4 and free. There have been some high spots over the years, Omni Labs built and sold the independent Omni Web for many years on OS X. Opera has had its ups and downs in terms of protecting privacy, as has Firefox.
What we are interested in now is how do the current crowd of browsers protect our privacy. All of the browsers are free, none have a revenue model which is not dependent on collecting money from either ad companies, data miners or Google/Bing (for the search box). Privacy goes against their interests as browser developers make more money when we have less privacy. On the other hand, if the browser developer cannot at least maintain the illusion of privacy, many users will avoid that browser and without users there’s no data and no revenue.
Browsers phone home
Indeed, browsers do phone home, as stated in comprehensive study by Douglas Leith from Trinity College of Dublin, which focuses on privacy risks associated with back-end data exchange.
Leith’s study sheds a light on several aspects of privacy risks and puts through the academic scrutiny the data transmitted on browser startup, data transmitted by search autocomplete, and evaluation of privacy risks of popular back-end services. Browsers included in the study are Google Chrome, Mozilla Firefox, Apple Safari, MS Edge, and Brave.
The potential problem arises from the use of identifiers allowing tracking of IP address over time and sharing of visited web pages details. For better understanding, Leith divides identifiers into 4 categories based on their persistence.
- Ephemeral identifiers are linking a handful of transmissions and then reset
- Session identifiers persist during the interval between restarts
- Browser instance identifiers are created when the browser is first installed and persist until the uninstallation, thus, are durable across restarts.
- Device identifiers are derived from the device details such as serial number or hardware UUID. These identifiers persist across browser re-installs.
Mobile vs Desktop Versions
Very interesting finding of the study is that the mobile versions of browsers are generally much less private than their desktop counterparts. While Brave is deemed to be the most private browser with desktop and mobile version testing equally, Chrome, Firefox, and Safari mobile versions leave the user provide the user significantly less privacy.
The situation on desktop is dire enough. Leith explains:
Chrome, Firefox and Safari all tag requests with identifiers that are linked to the browser instance (i.e. which persist across browser restarts but are reset upon a fresh browser install). All three share details of web pages visited with backend servers. This happens via the search autocomplete feature, which sends web addresses to backend servers in realtime as they are typed.
The consequences of it are rather scary:
The default settings are for Chrome to send addresses to Google servers, Firefox to Google servers and Safari to Google and Apple servers. In general, this function shares everything a user enters in the top bar, not just URLs. For example, if users accidently type (or cut and paste) passwords or other secrets in the top bar then these will also be shared.
Alongside the sent web addresses, desktop Safari uses ephemeral identifiers, Chrome tags them with a persistent identifier, allowing them to be linked together, while Firefox sends no additional identifiers. However, Firefox uses identifiers in its Telemetry transmitted to Mozilla. While both telemetry and search autocomplete can be disabled, Firefox also maintains an open websocket for push notifications. It is linked to a unique identifier and this feature, Leith warns, cannot be easily disabled.
More worrying, though, are mobile versions. Firefox transmits the Google advertisingId and AndroidId to two third-party web sites: app.adjust.com and clients.google.com. Both identifiers are long-lived, persisting across browser re-installs. Mobile Firefox also sends device identifiers to analytics service api.leanplum.com
Unlike its desktop version, Mobile Chrome prefetches content from third-party domains, some of which set cookies. Moreover, iOS version of Chrome connects to app-measurement.com and firebaseinstallations.googleapis.com which send browser instance identifiers.
In his study, Leith investigates browsers’ behavior in first startup, page navigation, restart, and typing URL. Let’s take a closer look.
Google Chrome
On first startup, according to the study, Google Chrome makes number of network connections to various servers at Google registered domains, prior to clicking on permission displayed on initial popup window. Although it’s concerning, Leith concludes, no identifiers or personal information is transmitted to Google yet. However, when the “start google chrome” button shows up (after allowing telemetry), a batch of connections follow, mostly to CRX server checking for updates to Chrome extensions, which results in the browser set device_id value sent in call to accounts.google.com. Device_id value changes after fresh installs.
After pasting URL into top bar, a request is generated to www.google.com/complete/search with the URL details, passed as a parameter along with psi and sugkey, which are identifier-like quantities. Sugkey is an identifier tied to Chrome rather than particular instances of it. The psi value changes across the fresh restarts, so it can act like an identifier of an instance of Chrome. According to Leith:
This behaviour is reproducible across multiple fresh installs and indicates that user browsing history is by default communicated to Google.
Subsequent closing and reopening results in requests which contain data of persistent identifier nature. The first is a request to
accounts.google.com/ListAccounts
which transmits a cookie that acts like a persistent identifier. The second request goes to www.google.com/async/newtab_ogb
which sends an x-client-data header.
Lastly, search autocomplete sends text to www.google.com as it’s typed, resulting in 19 requests. Psi value, which acts as a persistent identifier is tied to each request header, allowing requests to be tied together.
Mozilla Firefox
After fresh install and starting Firefox, numerous connections are made to Mozilla registered domains and during startup three identifiers are transmitted to Mozilla. Impression_ID
and client_ID
– a randomized values set by browser – are sent to incoming.telemetry.mozilla.org. Secondly, a uaid value, a value set by server, is sent to Firefox by push.services.mozilla.com via a web socket. All three values persist across browser restarts but change across fresh reinstalls.
Once the startup is complete, Leith challenges the top bar of the browser by pasting a URL into it (http://leith.ie/nothingtosee.html). This action, however, generates no extraneous connections.
Finally, the browser was closed and reopened – the closure initiated data transmission to incoming.telemetry.mozilla.org, tagged with persistent client_id
identifier and sessionID
value, which changes between restarts, but the new sessionID
values can be easily linked back to the old ones.
All in all, four identifiers are used in the communication with Mozilla domains.
Client_id
and impression_id
values which are set by the browser and persistent across restarts.
SessionID
value, which changes, but can be linked back, since both, old and new ones are sent together in a telemetry handover message.
A uais
value, which is set by the server and persists across browser restarts
Leith concludes that his observations are consistent with Firefox telemetry documentation and the concerning thing is not the content of these requests, but the fact that they carry the client IP address as metadata.
When it comes to data transmission by search autocomplete, Firefox shows not as intrusive behaviour as its competitors. It sends text to www.google.com as it’s typed and results in 4 requests, compared to 19 in case of Chrome or 32 for Safari, which appears to be not as bad. Moreover, no identifiers are included in the requests to Google.
Apple Safari
Fresh startup in, and Safari makes connections to several websites to prefetch the content from wikipedia, twitter, yahoo, google, facebook, tripadvisir, yelp, linkedin, weather.com along with ad trackers, such as scorecardresearch. com, googlesyndication.com, googletagservices, moadads. com, perfectmarket.com. That is awfully lot, to say the least. As Leith summarizes:
Safari defaults to a choice of start page that leaks information to third parties and allows them to cache prefetched content without any user consent.
Furthermore, the most of connections to aforementioned websites respond with multiple set-cookies headers, as Leith explains, although these cookies resent in later requests to the prefetched pages. On the other hand, Safari embeds identifiers within the prefetched html/javascript, which can be passed as parameters in requests generated by clicking on the displayed icon.
After the startup, research continues with the URL pasted into the top bar of the browser, which results in extra connection to configuration.apple.com, however, this connection sends no identifiers. Additionally, unlike in case of Firefox, no data were transmitted after the closure of the browser. Reopening, though, results in request that transmits an X-CloudKit-UserID header value, which appears to be a persistent identifiers across the browser restarts.
In conclusion, apart from the starting page, Leith states:
Safari otherwise appears to be quite a quiet browser, making no extraneous network connections itself in these tests and transmitting no persistent identifiers. However, allied processes make connections which appear unnecessary.
Worse part, though, comes with data transmitted by search autocomplete. Safari sends text to both Google and Apple, resulting in total of 32 requests – 7 requests go to clients1.google.com
and 25 requests to api-glb-dub.smoot.apple.com
. When it comes to Google, no identifiers are included. However, api-glb-dub.smoot.apple.com
include several identifiers, such as e X-Apple-GeoMetadata
, which remains unchanged across fresh browser installs, X-Apple-UserGuid changes across fresh installs but remains constant across restarts and XApple-GeoSession
header values also remain constant across browser restarts.
Microsoft Edge
After initial startup, Edge connects to several domains administered by Microsoft, namely, Microsoft.com, msn.com, and bing.com, as well as ad tracking domain scorecardresearch.com. In general, the startup process of Edge is terribly invasive. Along with several session identifiers, the most worrisome issue is that it transmits the hardware UUID (Universally unique identifier), which is a strong, enduring identifier, unique to the device and never changes. Another troubling issue is the Bing autocomplete API, which transmits even pasted URLs to www.bing.com, thus, shares the browsing history with the Bing service of Microsoft. Yet another leakage of users’ browsing history comes when Edge transmits the URL to nav.smartscreen.microsoft.com
.
But this is not where it ends. After closing and reopening the browser, set of connections are made, including the transmission to self.events.data. microsoft.com of the device hardware UUID for a second time.
According to the study, Edge is by far the least private browser from the ones examined, ranking even worse than Google’s Chrome. Given Microsoft’s deep history of partnership with the NSA, these results should be unsurprising. Current colourful leadership is just a different dress on the same handmaiden to the NSA.
Brave
On the other hand, Leith considers Brave the most private browser from the ones inspected. His conclusion is overtly optimistic:
In summary, we do not find Brave making any use of identifiers allowing tracking by backend servers of IP address over time, and no sharing of the details of web pages visited with backend servers.
Regarding search autocomplete, which apparently causes a lot of trouble, Brave has it disabled by default. It also blocks ads.
Used “out of the box” with its default settings we found that Brave did not use identifiers allowing tracking of IP address over time, and did not share details of web pages visited with backend servers. In this regard we found Brave to be by far the most private of the browsers studied.
To sum this up, Letih’s study includes a table of data shared by browsers with back-end servers:
First Startup | Page Navigation | Restart | Typng URL | |
---|---|---|---|---|
Brave | ||||
Chrome | I,C | I,U | I,C | I,U |
Chrome Andriod | I,C,P | I,U | I,C | I,U |
Chome iOS | I,C,P,F,T | I,U | I,C | I,U |
Firefox | I,T | I,T | U | |
Firefox Mobile | I,T,D | I,T | U | |
Safari | P | I | U | |
Edge | I,H,C,T | C,U | I,H,T | U |
Edge Moblie | I,D,C,T,P | C,U | I,D,T | U |
I=instance identifier (reset by a fresh browser install), D=device identifier (requires factory reset of device to change), H=hardware identifier (cannot be changed by user), C=cookie, P=prefetch of popular web pages e.g. youtube (the web sites can embed identifiers within the prefetched pages/javascript), F=firebase, T=telemetry, U=url (and so user browsing history). – as explained in the study.
Additional concerns
In spite of the Brave’s good ranking in the study, several controversaries associated with its BAT (Basic Attention Token) cryptocurrency arose, Scott Ikeda argues in his article. This problem doesn’t violate users’ privacy, but it’s worth mentioning all the same, since Brave’s standards were seemingly set a bit too high for such misconduct.
The problematic part is that Brave’s auto-completing URLs to cryptocurrency sites with affiliate links, while users weren’t notified.
“Affiliate programs are a means of monetizing a website. In this case, a special URL is provided to an affiliate to promote a partner’s services. When someone signs up via that distinct URL, the affiliate gets some sort of payment for it,” he explains.
According to US as well as EU regulations, affiliates ought to disclose their relationship with the advertiser, in case links of this nature are posted. However, Brave has violated these regulations with their referral scheme by automatically redirecting users to the sites with Brave’s referral code when the URL of certain cryptocurrency site was entered. Brave was earning affiliate commissions by redirecting search queries to Binance, Coinbase, Ledger and Trezor among others.
It was spotted by @cryptonator1337 on Twitter, Brave hijacks links and insterts affiliate codes.
If you’re using Brave and try to go to the Binance crypto exchange, Brave hijacks the Binance link you typed in, and autofills with its own affiliate code.
On the good note, Brave’s founder Eich apologized for this issue once it was disclosed by the users and promised not to do it again, although he tried to justify this practice by stating that it has been visible in the open source code of the browser for months. Brave has to make money somewhere, and it looks like this is where Eich decided to make money. Frankly Brave associated affiliate ID tacked on non-persistent links is not really a privacy issue. Whether it’s an ethical issue is up to each reader to decide.
Firefox represents another browser of choice when it comes to privacy, since it’s adjustable and most of all, it doesn’t belong to Google nor Microsoft. However, Firefox has introduced two potentially troublesome features: DNS over HTTPS and Firefox proxy extension (Firefox Private Network). Interesting part of it is that both of these features rely on US-based CDN company Cloudfare and that’s the reason why Sven Taylor articulates his concerns in his article. First of all, using Firefox Private Network, only traffic through Firefox is encrypted and all traffic is routed through Cloudfare. Sven explicitly states that:
As disclosed in the respective privacy policies, Cloudflare will be logging your source IP address and the sites you visit. Mozilla is also recording technical, interaction, and registration data.
DNS over HTTPS also relies on Cloudflare infrastructure; thus, it makes Cloudflare the central processing point for all DNS requests in the Firefox by default.
When it comes to Safari, Google researchers have found vulnerabilities in their ITP mechanism described in the study called Information Leaks via Safari’s Intelligent Tracking Prevention. Sure, Google might not be the most trusted company, but the points they presented sound reasonable. After all, if Google finds vulnerabilities in competitor’s product, quite naturally they’d take advantage of it to improve their bad rep. According to the paper, several privacy risks arise from the way ITP works. Since its machine learning’s based on the individual browsing patterns, ITP introduces settings into Safari that can be modified and detected by any web page on the internet. The paper claims that this capability can be a gateway for hosts of severe attacks by the websites, such as:
- revealing domain of the ITP list
- Identifying individual visited websites
- Creating persistent fingerprint via ITP pinning
- Forcing a domain onto the ITP list
- Cross-site search attacks using ITP
As we can see, the protection by Apple that was meant to defend us on the internet, turns out to be really dangerous in itself.
From what’s already written according to Leith’s study, Safari leaks a lot of data. As Beau Carnes, a teacher and software developer at FreeCodeCamp.org, elaborates in his article, while using Safari your IP address is being sent to Google or Tencent, based on your country. What’s the most disturbing about this, is a fact that Tencent is closely tied to Chinese government. Known as Fraudulent Website Warning, here’s what Apple’s spokesman said to The Register about this feature:
When the feature is enabled, Safari checks the website URL against lists of known websites and displays a warning if the URL the user is visiting is suspected of fraudulent conduct like phishing. To accomplish this task, Safari receives a list of websites known to be malicious from Google, and for devices with their region code set to mainland China, it receives a list from Tencent.
In addition, he claims that actual URL of website is never shared with safe browsing provider. Carnes argues, though, that instead of URL your IP address is shared. He concludes, that even though Fraudulent Website Warning can be turned off in the options, this feature is enabled by default, so the user must opt it out intentionally.
Apple’s “Fraudulent Website Warning” uses the Google-developed Safe Browsing. But Safari is not the only browser that relies on Google Safe Browsing, which is not that safe after all. Matthew Green, a cryptographer and professor at Johns Hopkins University, elaborates on this topic in his article.
To protect users from phishing and malware websites, Google came up with Lookup API, which was sending full URLs, as well as the IP address to Google, to check it against the list of potentially malicious websites. Since this feature raised reasonable privacy concerns, Google presented at first glance more privacy-friendly Update API, which, Green explains in his article, works like this:
Google first computes the SHA256 hash of each unsafe URL in its database, and truncates each hash down to a 32-bit prefix to save space.
- Google sends the database of truncated hashes down to your browser.
- Each time you visit a URL, your browser hashes it and checks if its 32-bit prefix is contained in your local database.
If the prefix is found in the browser’s local copy, your browser now sends the prefix to Google’s ser-vers, which ship back a list of all full 256-bit hashes of the matching URLs, so your browser can check for an exact match.
This means a malicious provider will have many “bites at the apple” (no pun intended) in order to de-anonymize that user. A user who browses many related websites will gradually leak details about their browsing history to the provider, assuming the provider is malicious and can link the requests.
The Google Safe Browsing service has been adopted by most major web browsers: Chrome, Chromium, Safari and Opera. And according to Gerbet’s study, Google Safe Browsing has been criticized, because each request to the API sends a cookie identifying the client, which is the same cookie used by other Google services.1 Firefox and Chromium, in order to improve privacy have isolated Safe Browsing cookie from others. From a privacy point of view, though, the most intrusive Safe Browsing service might be MS SmartScreen, used by MS Edge, which unlike Update API sends the full URL, not just a hash prefix. Microsoft retains a full record of every URL you visit attached to you, just as Google maintains a full record of every search you make.
Google’s business model is apparently dependent on heavy online marketing, which collides with raised privacy concerns and even legislation, such as GDPR. To overcome these issues, Google has come up with supposedly perfect compromise called FLoC (Federated Learning of Cohorts), which is expected to satisfy needs of advertisers and simultaneously act in privacy-friendly manner. FLoC is a model currently tested on Chrome browser. Like any other Google-invented “privacy” feature, FLoC isn’t as benign as they claim.
So far, target marketing has been relying on cookies associated with each user individually. Thus, advertisers were able to target ads by obtaining users’ browsing history. To deflect the harsh criticism or even legal consequences of it, Google decided to replace individualised cookie with “cohort” identifier, which represents a group of individuals of similar interests. But does this sly solution to Googles problems really preserve users’ privacy? As explained in Mozilla CTO, Eric Rescorla’s Mozilla blog, which stems from his study, it seems we should stay alert.
Despite the large cohorts (consisting of thousands), individuals could be tracked, sharing a given cohort ID. It means that certain amount of additional information could help trackers narrow down the number of users quickly. Is the user speaking English or French? What browser are they using? Is it Mac or Windows they operate on? What are their specific interests? Answers to these questions could help the process of individual user disclosure.
Furthermore, cohort ID is not constant, it changes on approximately weekly basis. If tracker uses other information to link up user visits over time, they can use combination of cohort IDs in the first, second, third week etc. and this way individualize the user.
A security and privacy researcher Lukasz Olejnik found out that FLoC even leaks information whether the user’s browsing in private mode or not, as a result of design bug.
Concerning privacy design, he considers number of pivotal questions. Which sites should be included in the computation? Google promised not to include sites with sensitive content, though, let’s define “sensitive”. Another question, he argues, should be the minimum and maximum number of the considered websites as well as their competition, nature, whether the subpages are included or a frequency of ID updates.
Lastly, Google will not run FLoC testing in Europe, cause of the strict privacy regualtions. Obviously, Federated Learning of Cohorts, no matter how sophisticated, couldn’t satisfy legal demands of GDPR and ePrivacy Directive, because the tracked users are in a fact identifiable, which EU legislation frowns upon, to say the least. I believe this insight only proves the nightmare Google represents toward privacy, given their most elaborate scheme to meet the privacy regulations cannot de deployed in Europe (yet).
Who’s the most private browser?
All things considered, from major web browsers only Brave and Firefox, that is after a few tweaks, can be considered private. Of course, there are several smaller browsers, which might come as a good option, but the problem might arise from their uncompetitive feature parity, since the development teams working on them are considerably smaller. Another issue might be the fact that these alternatives to the popular browsers were not much studied.
The main aspect of choosing the alternative, though, should be the fact whether is the source open. Because in case of completely free and open source browsers, like IceCat by GNU Project or ungoogled-Chromium, privacy enthusiasts can determine potential problems by scrutinizing the source code, making the public aware of potential pitfalls.
Having said that, if you plan on sticking to one of the major browsers nonetheless, it’s important to opt out of tracking in the browser settings.
Footnotes
(1). For more comprehensive understanding of inner workings of Google Safe Browsing go check Gerbet et al. study A Privacy Analysis of Google and Yandex Safe Browsing, or H. Cui et al. study PPSB: An Open and Flexible Platform for Privacy-Preserving Safe Browsing.
The presence of orphan prefixes is very difficult to justify. Moreover, the behavior of a browser on these prefixes is not consistent. Some of the orphan prefixes are considered as false positives by Yandex while others are declared as true positives….The presence of large number orphans for Yandex proves that it is possible to include any arbitrary prefix in the blacklists. (Gerbet. p. 20)
After all, both studies as well as Green’s article accord on vulnerabilities of Update API, because whenever multiple prefixes correspond to the URLs are sent to the servers, the URL can be re-identified. Thus, Google and Yandex (providing Yandex Safe Browsing) have an effective tool to build a tracking system.
A thorough investigation of safe browsing would At Foliovision, we’d prefer to see no communication of URLs to the mothership when using Safe Browsing. The list should be static and local even if encoded. This does mean malware creators would be able to discover the full list of blacklisted URLs. It doesn’t really matter as bad actors can easily and manually (or systematically with automated tools) check the status set of URL which interest them. It would mean that end users could confidently use Safe Browsing without worrying that visiting URLs considered hostile or undesirable by the Five Eyes security establishment would not red flag their accounts and browsing for deeper scrutiny.
For anyone suggesting this is paranoia or that Western democracies or Google would never do such things, do not forget that a recent CIA Director lied to congress. If they lie to Congress (a crime), you can be sure they will lie to you and I. It would behoove “normal” sceptics to read through some of The Snowden Archive. Basically when Edward Snowden revealed what he knew about NSA activities from his time in the machine, every conspiracy theory about the surveillance state was revealed to be not theory but everyday practice.
This is not to say the engineers who built Google Safe Browsing had bad intentions. Most of the scientists who originally attempted to harness atomic energy had no intention of blowing up hundreds of thousands of citizens in Hiroshima and Nagasaki. It’s mostly the commissars who follow the engineers who turn invention to evil.
The most private way to use Google Safe Browsing would be to maintain a browser with Google Safe Browsing disabled for highly political browsing. There are two issues with this:
- creating the alternative browser and maintaining it
- remembering to switch to the alternative browser
In the absence of careful digital hygiene, Brave Browser maintains their own copy of the Google Safe Browsing database:
Additionally, instead of connecting to Google, Brave’s version of Safe Browsing connects to a Brave-run server which doesn’t keep any logs or store your IP address. Learn more about the design of Safe Browsing. This is a really important safety feature, so it’s on by default, but you can turn it off.
Brave is less likely to report directly to the NSA than Google. That is until it reaches critical mass. The way the GSB service is set up is a fundamentally weak link to improving privacy while at the same time advancing security. GSB aggregation centres in privacy protective jurisdictions would be the shortest path, as they leave the NSA to root through the records of tens or hundreds of thousands of users to try to determine who visited what site. Similar to safety in crowds. One has to trust the intermediary not to keep logs and to refuse to collaborate with authorities.
Image by Enrique Meseguer.
Leave a Reply