How to avoid bot detection Client Checks
This section is all about the wonderful extensibility of javascript and the browser.
Client checks happen with a javascript library before any sensitive information is shown. Lot’s of metrics from browser performance, screen size and graphics rendering are all compiled into a nice little package and sent off to a validation server.
So here’s a bot detection script out in the wild. Looks pretty neat right?
The code is obfuscated, but with a little reverse engineering, we can get something that looks a bit more like this. (function names added by me, implementation is just decoded)
After we dig through a few of these scripts, we can find a few categories of checks
- Browser property existance
- Math operation performance
- Font rendering
- System information (screen size, graphics card name etc)
- GPU rendering
How to masquerade browser properties
Detection scripts want to know if you’re a real browser. And if you don’t know better, there’s lots of obvious give-aways.
Browser automation tools tend to leave window properties to allow on-page scripts to reference the controller. Properties like window.webdriver are a clear indication that this is a scraping operation.
So how to do we “remove” properties. Well, a wonderful thing about javascript is the ability to create getters/setters.
Object.defineProperty(window, 'webdriver', {get: () => undefined})
tada! no more webdriver property.
How to hit Math Operation Performance checks
Detection scripts will often run things like the following
for(let i = 0; i < 1000000; i++) Math.cos(Math.rand());
The reason they do this for a few reasons:
- Scrapers are cost sensitive and want to use small VMs
- VMs typically don’t have GPUs, so browsers are forced into CPU rendering which creates a bottle neck that can be detected in javascript
- Regular consumers typically do have GPUs and sufficient CPU to run these math operations without much lag.
So, how do you get around this problem? Use a larger VM – with a GPU. In my experience, the GPU (even a small one) makes a HUGE difference on CPU load.
You can get fairly cheap GPU instances by using AWS workspaces (something around 30/mo at this writing).
How to bypass font rendering checks
If you were playing defence and wanted to check which fonts were present on a system how would you do it? Any ideas?
Well, I’ll give you a hint. Look at the width of these two W’s
Yep, it’s that simple. Throw a bunch of W’s, or other wide character into a div, set the font-family and measure how wide it is. If the width matches what we’re expecting, the font must be installed. If the width matches the default or baseline measurement – no font installed. Here’s an some code I found in the wild that did this. (annotated for clarity)
So… how do we get around this? There’s an easy way and a hard way. Let’s see both:
First, the easy way to get around this is to have the fonts installed. Head over to your local package manager, find the fonts and get them fired up. On Linux there’s some fun things in /etc/fonts/conf.d, fc-cache and fc-list that will get you started on the right path.
But… the defensive team has thought of this. And they see a way to expose us.
They can test for fonts that SHOULDN’T be present too. For example, if we’re on a Windows system, Liberation and Dejavu fonts shouldn’t be installed. Those are Linux default fonts and there’s a 0.00001% of a Windows user having those installed.
So, that brings us to the second way to get around this. Did you see that detection script I posted above? Did you see how they’re measuring the size of the elements? It’s with HTMLElement.offsetWidth. Perfect – it’s something from javascript-land that we can override.
Object.defineProperty(window.HTMLElement.prototype, "offsetWidth", {
get () {
// our font-detection-avoiding logic
}
});
Look familiuar?
Yep, we can override the offsetWidth method. When we see strange fonts on the HTMLElement, we can use a lookup table of character widths to calculate what the width would be if that font was installed – and more importantly, what the width would be if that font WAS NOT installed.
So… that’s font rendering
Avoiding system information detection (screen size, graphics card name etc)
Bot detection scripts also check for system properties that the browser exposes. You’re going to be surprised at the amount of data your browser exposes. Here’s the top 40 items (there are at least 200 individual parameters, I could go into more depth about all the fun tricks and turns in here)
- javascript version support (es version)
- browser extension list
- screen.availWidth
- screen.availHeight
- screen.width
- screen.height
- screen.availLeft
- screen.availTop
- screen.logicalXDPI
- screen.logicalYDPI
- screenLeft
- screenTop
- clientWidth
- clientHeight
- display-mode (fullscreen, standalone, minimal-ui, browser)
- navigator.language
- navigator.product
- function.prototype.bind
- DeviceOrientationEvent
- DeviceMotionEvent
- TouchEvent
- spawn
- chrome
- XMLHttpRequest
- XDomainRequest
- Buffer
- PointerEvent
- ActiveXObject
- screenY
- screenX
- navigator.vibrate
- getBattery
- bluetooth
- sendBeacon
- webkitTemporaryStorage
- screen.colorDepth
- navigator.platform
- navigator.buildID
- window.MediaKeyStatusMap
- GPU Version
So, how do we get around this?
It’s the same trick as before
Object.defineProperty(obj, "<prop>", {
get () {
// our custom return value
}
});
Managing Browser Profiles
It can get difficult to manage all these properties. I ended up using a repository that gave had all the properties and reasonable combinations (e.g. common screen sizes that Macs support). Then I built a randomizer to build these properties into a profile and inject that into my sessions.
GPU Rendering
There’s actually a way that bot detection scripts can figure out what type of GPU you have, or at least what graphics libraries your application is using. SVG rasterizing.
When you make a nice drop shadow, some semi-transparent text or other complex rendering situation – your GPU/graphics library needs to make some decisions. It needs to decide how to merge layers, when to apply rounding, and at what resolution to perform the operation. Different GPUs and libraries do this differently.
So, the defense makes a very simple test. Render some semi-transparent object intersecting with another object (SVG/canvas) and export it to a rasterized image (JPG/PNG). Images rasterized on the same GPU/library will be identical. Ones on other systems will be ever so slightly different.
How can we get around this? Well we can’t really. Not without a whole lot of c++ programming and some custom browser code. I don’t have time for that.
But, what we can do is give it a really weird result and hopefully throw the detector off. If we randomize the image being rendered, the output will look nothing like what the detector expects and hopefully they’ll label us as “person worried about privacy”. There are user-land extensions that will do this for consumers – this is a known way to protect yourself from fingerprinting.
Fun fact: Apple’s iOS devices all expose the same fingerprint such that you cannot tell the difference between a late model iPhone and an old one. Privacy features – neat!
Summary
This has been client checks. The checks that a detection script runs on the browser to try and identify scrapers and block them. The detection script sends these results back to the server and either unlocks or blacklists the session.
Intro: How I bypassed military-grade bot detection software on popular ecommerce sites
Part 1: How to bypass the Server Bot Check