Chrome's hidden X-Browser-Validation header reverse engineered

URL: github.com
9 comments

Dug into chrome.dll and figured out how the x-browser-validation header is generated. Full write up and PoC code here: https://github.com/dsekz/chrome-x-browser-validation-header

Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?

> Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?

Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.

Bullshit. You don't have anything of value either. Scrapers will ram through _anything_, and figure out if it's useful later.

Making it easier to reject "unapproved" or "unsupported" browsers and take away user freedom. Trying to make it harder for other browsers to compete.

That can be done already based on User-Agent, though. Other browsers don't spoof their agent strings to look like Chrome, and never have (or, they do, but only in the sense that everyone still claims to be Mozilla). And browsers have always (for obvious reasons) been very happy to identify themselves correctly to backend sites.

The purpose here is surely to detect sophisticated spoofing by non-user-browser software, like crawlers and robots. Robots are in fact required by the net's Geneva Convention equivalent to identify themselves and respect limitations, but obviously many don't.

I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".

>I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".

The big one is that running a browser other than Chrome (or Safari) could come to mean endless captchas, degrading the experience. "Chrome doesn't have as many captchas" is a pretty good hook.

> I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".

In the name of robot detection, you can lock down device, require device attestation, prevent users from running non-standard devices/OS/software, prevent them from accessing websites (CloudFlare dislikes non-chrome browser and hates non-standard browsers, ReCaptcha blocks you out if you're not on Chrome-like/Safari/Firefox). Web Environment Integrity[1] is also a good example of where robot detection ends up affecting the end user.

[1] https://en.wikipedia.org/wiki/Web_Environment_Integrity

Seems like they are using these headers only for google.com requests.

Yes I think it is part of their multi level testing of for new version rollouts. In addition to all the internal unit and performance tests, they want an extra level of verification that weird things aren't happening in the wild

Is it not likely that it protects against AI bot Llama?

I have two questions:

1. Do I understand it correctly and the validation header is individual for each installation?

2. Is this header only in Google Chrome or also in Chromium?

>1. Do I understand it correctly and the validation header is individual for each installation?

I'm not sure how you got that impression. It's generated from fixed constants.

https://github.com/dsekz/chrome-x-browser-validation-header?...

It's still not clear to me because it's called the default API key. And for me, default means that this is normally overwritten. And if overwritten, during build or during install? That's what I'm asking myself.

[deleted]

This should be somewhat alarming to anyone who already knows about WEI.

I wonder if "x-browser-copyright" is an attempt at trying to use the legal system to stifle competition and further their monopoly. If so, have they not heard of Sega v. Accolade ?

I'm a bit amused that they're using SHA-1. Why not MD5, CRC32, or (as the dumb security scanners would recommend) even SHA256?

I am also alarmed. Google has to split off its development of both Chrome and Android now, this crazy vertical integration is akin to a private company building and owning both the roads AND the cars. Sure, you can build other cars, but we just need to verify that your tires are safe before you can drive on OUR roads. It's fine as long as you build your car on our complete frame, you can still choose whatever color you like! Also, the car has ads.

Ok but The Road is the internet, how much of that does google/alphabet actually own?

All of YouTube. The vast majority of email. All sources of revenue for ad-funded sites, basically, except for those ads pushed by Meta in their respective walled gardens. They are also the gatekeepers deciding what parts of the internet the users actually see, and they continuously work towards preventing people from actually visiting other sites by siphoning off information and keeping users on Google (AMP, AI summaries). The whole Play Store ecosystem is a walled garden which pretends to be open by building on an ostensibly open source OS but adding strict integrity checks on top which gives Google the ultimate power to decide what is allowed to run on peoples phones.

They don't have to own the servers and the pipes if they own all the clients, sources of revenue, distribution platforms and financial transaction systems.

The rest of your list is irrealistic but I had to react at least to this one :

> The vast majority of email.

Not even close, less than a third in reality

I agree that google should be cut down, but if done then other tech giant should be too, otherwise we're just trading one master for another

Even less than a third is absolutely massive on the scale of a protocol like E-mail.

Oh I am not saying they're not a gigantic provider, I'm saying less than a third is very far from "the vast majority" and exageration and misinformation help no one's case, be they on purpose or due to lack of knowledge.

I would shy away from calling them a majority myself, but it’s a fair point.

Remember that email involves at least two parties. It doesn’t matter if I use a non-Google provider, I still have to follow all of Google’s email rules, or email will be useless to me because I wouldn’t be able to send mail to Gmail or Google Workspace users.

In a practical sense, Google have very direct control over almost all email.

They're probably the biggest provider in existence.

I myself would bet on microsoft

How much is "the vast majority"? I would say that one third of something global with potentially infinite number of providers, when the second player is probably a fraction of that, is already a pretty big majority.

I don't know exactly where to draw the line on "the vast majority," but surely it must be higher than the bar for a simple majority, which is "more than half." If you want to describe something in the lead but under the 50% mark, the word you're looking for is "plurality."

> They don't have to own the servers and the pipes if they own all the clients, sources of revenue, distribution platforms and financial transaction systems.

They don't own all sources of revenue. Even on their major media platform they get siphoned off by companies like patreon. It is all a charade and not everyone is enamoured by that.

> how much of that does google/alphabet actually own?

A ton. They got shares in a bunch of submarine cables, their properties (YouTube, Maps, Google Search) make up a wide share of Internet traffic, they are via Google Search the chief traffic source for most if not all websites, they own a large CDN as well as one of the three dominant hyperscalers...

> I wonder if "x-browser-copyright" is an attempt at trying to use the legal system to stifle competition and further their monopoly. If so, have they not heard of Sega v. Accolade ?

My first thought was the Nintendo logo used for Gameboy game attestation.

I wonder what a court would make of the copyright header. What original work is copyright being claimed for here? The HTTP request? If I used Chrome to POST this comment, would Google be claiming copyright over the POST request?

SHA-1 is a head-scratcher for sure.

I can only assume it's the flawed logic that it's "reasonably secure, but shorter than sha256". Flawed because SHA1 is broken, and SHA256 is faster on most hardware, and you can just truncate your SHA256 output if you really want it to be shorter.

SHA-1 is broken for being used in digital signature algorithms or for any other application that requires collision resistance.

There are a lot of applications for which collision resistance is irrelevant and for which the use of SHA-1 is fine, for instance in some random number generators.

On the CPUs where I have tested this (with hardware instructions for both hashes, e.g. some Ryzen and some Aarch64), SHA-1 is faster than SHA-256, though the difference is not great.

In this case, collision resistance appears irrelevant. There is no point in finding other strings that will produce the same validation hash. The correct input strings can be obtained by reverse engineering anyway, which has been done by the author. Here the hash was used just for slight obfuscation.

The perf difference between SHA1 and SHA256 was marginal on the systems I tested (3950x, M1 Pro), which makes SHA256 a no-brainer to me if you're just picking between those two (collision resistance is nice to have even if you "don't need it").

You're right that collision resistance doesn't really matter here, but there's a fair chance SHA1 will end up deprecated or removed from whatever cryptography library you're using for it, at some point in the future.

There’s also the downside of every engineer you onboard spending time raising the same concern, and being trained to ignore it. You want engineers to raise red flags when they see SHA-1!

Sometimes something that looks wrong is bad even if it’s technically acceptable.

[dead]

> have they not heard of Sega v. Accolade ?

My mind went here immediately as well, but some details are subtly different. For example being a remote service instead of a locally-executed copy of software, Google could argue that they are materially relying on such representation to provide any service at all. Or that without access to the service's code, someone cannot prove this string is required in order to interoperate. It also wouldn't be the first time the current Supreme Court took advantage of slightly differing details as an excuse to reject longstanding precedent in favor of fascism.

And even if it falls under fair use in the US, they could still have a case in some other relevant market. The world is a big place

I have to imagine Google added these headers to make it easier for them to identify agentic requests vs human requests. What angers me is that this is yet another signal that can be used to uniquely fingerprint users.

It doesn't really meaningfully increase the fingerprinting surface. As the OP mentioned the hash is generated from constants that are the same for all chrome builds. The only thing it really does is help distinguish chrome from other chromium forks (eg. edge or brave), but there's already enough proprietary bits inside chrome that you can easily tell it apart.

> The only thing it really does is help distinguish chrome from other chromium forks (eg. edge or brave)

You could already do that with the user agent string. What this does is distinguishes between chrome and something else pretending to be chrome. Like say a firefox user who is spoofing a chrome user agent on a site that blocks, or reduces functionality for the firefox user agent.

Plenty of bots pretend to be Chrome via user agent, but if you look closely are actually running Headless Chromium. This is a very useful signal for fraud and abuse prevention.

Let's ignore for the moment that this has been reverse engineered.

If they only look at this header, then legitimate users using non-chrome browsers will get treated as bots.

If the these headers are only used for chrome user agents, then it would be easy to bypass by using headless chromium with a user agent that spoofs firefox or safari.

> This is a very useful signal for fraud and abuse prevention.

Like people spoofing the Chrome UA in Firefox to avoid artificial performance degradation inflicted by Google on their websites...

I'm more concerned that whether intentional or not this will probably cause problems for users who use non-chrome browsers. Like say slowing down requests that don't have this header, responding with different content, etc.

User-agent discrimination has been happening for literally decades at this point, but you're right that this could make things worse.

User-agent discrimination is tolerable when it's Joe Webmaster doing it out of ignorance. It is not acceptable if it is being used by a company leveraging their dominant position in one market to gain an advantage over its competitors in another market. It's not acceptable even if it's not said company's expressed intent to do so but merely a "happy accident" that is getting "overlooked".

Indeed, even for those who require a round of mental gymnastics before they concede that monopolies are, like, "bad" or whatever, GP points out precisely how this would constitute "consumer harm".

Tell that to Google intentionally slowing down Firefox even without ad blocking. (I'm talking about them using the fallback for web components instead, not the slowdowns when ads don't load.)

Why would they think this was a good idea after losing the chrome anti-trust trial? I don't know the intended purpose is for this, but I can see several ways this could be used anti-competitive way, although now it has been reverse engineered, an extension could spoof it. On the other hand, I wonder if they intend to claim the header is a form of DRM and such spoofing is a DMCA violation...

> after losing the chrome anti-trust trial?

There hasn't been such a trial.

x-browser-copyright seems like an attempt at something similar to the Gameboy's nintendo-logo DRM (wherein cartridges are required to have the nintendo logo bitmap before they can boot, so any unlicensed carts would be trademark infringement)

http://en.wikipedia.org/wiki/Sega_Enterprises_Ltd._v._Accola... is the legal precedent that says trying to do that won't work, but then again maybe Google thinks it's invincible and can do whatever it wants after it ironically defeated Oracle in a case about interoperability and copyright.

Even if they can't defend it legally, it costs them ~nothing to add the header and it could still act as a deterrent.

Apple famously does this with this word soup in their SMC chips, and proceeded to bankrupt a company that sold Hackintoshes and shipped it in their EFI: https://en.wikipedia.org/wiki/Psystar_Corporation

    Our hard work
    by these words guarded
    please don't steal

    (c) Apple Computer Inc
Though one could argue that they would have probably bankrupted them anyway even if they hadn't done that.

>an extension could spoof it

not if they make it dynamic somehow (e.g. include current day in hash). Then with MV3 changes that prevent dynamic header manipulation there is no way for an extension to spoof it.

> Then with MV3 changes that prevent dynamic header manipulation

That doesn't apply to Firefox

Fair, I was considering chrome headless since firefox users are already served google captchas more often.

FYI: Google enterprise workspace admins can enable policies which e.g. prevent login ability to google.com properties to only Chrome browsers.

I wonder if this is header is not connected in some way to that feature.

Seems unnecessary.

The same policies also offer the ability to force-install an official Google "Endpoint Verification" chrome extension which validates browser/OS integrity using Enterprise Chrome Extension APIs ("chrome.enterprise") [0] only available in force-installed enterprise extensions.

FWIW, in my years of managing enterprise chrome deployments, I haven't come across the feature to force people to use Chrome (there are a lot of settings, maybe I've missed this one). But, there definitely is the ability to prevent users from mixing their work and non-work gmail accounts in the same chrome profile.

[0] https://developer.chrome.com/docs/extensions/reference/api/e...

Edit: Okay, maybe one hole in my logic is the first-sign in experience. When signing into google for the first time in a new chrome browser, the force-installed extension wouldn't be there yet. Although Google could hypothetically still allow the login initially, but then abort/cancel the sign in process as part of the login flow if the extension doesn't sync and install (indicating non-chrome use).

In my current job we do have force-Chrome setting enabled. I can't log in to Gmail through any other browser. Neither SSO login to GitHub via Google.

I think it’s difficult to argue that Google doesn’t have the right and capability to build their own private internet, I just also think they’d like to make the entire internet their own private internet, and do away with the public internet, and I’d really prefer they not do that.

[dead]

So this is basically hidden client attestation?

Not really. It's just an API key + the user agent. There is no mechanism to detect the browser hasn't been tampered with. If you wanted to do that you'd at least include a hash over the browser binary, or better yet the in-memory application binary.

If you were using a user agent spoofing extension couldn't this be used to guess your "real" UA?

And why should anyone with a sane mind (except for Googlers) allow this kind of validation bs to exist?

At this point I am fully convinced that Google is abusing Chrome's dominant position to push their own agenda and shape the whole Internet the way they want. Privacy sandbox, manifest v3, you name it.

Sadly nobody can do anything about it, so far. We'll yet need to see the outcome of the antitrust trial.