Google Analytics Bot Traffic – Identifying and Removing
Google Analytics has had a bots issue for some time, and there are various bots that end up causing some site owners to second guess their data. Bot traffic can inflate your traffic numbers, making many of your metrics meaningless if you don’t get them filtered out.
Some of these bots are easy to clean up. Almost two years ago, Google introduced a filter that can be applied to the views in an account to remove known bot traffic. If you haven’t checked that box, you should do it now — you can find it in your view settings, called “Bot Filtering”.
For most sites, this filter will likely fix a majority of bot traffic.
If you are still having issues with bots inflating your traffic numbers in Google Analytics, a great and very comprehensive resource can be found here:
http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/
If after you spent time following AnalyticsEdge’s guide, and you still have issues, then the the following guide may help.
The Humanoid Bots
The Problem
A little over a year ago, a bot network of sorts started affecting Google Analytics, and continues to affect sites (primarily eCommerce sites).
This bot problem is capable of sending 50,000-100,000+ sessions a month. It is all direct traffic.
The issue that makes this bot problem such a problem is the scale of the traffic, and that none of the solutions above fix it. The bot traffic looks human-like, and continually sends different parameters. So, once you remove one, another pops up. To see this in action, view the following chart:
At the start a January, one bot was removed, then there was a new bot in town at the end of January. It was partially removed at the start of February, then a new one showed up.
Identifying
This traffic has some similarities, at least in the data we’ve seen — They are generally on outdated versions of Chrome and Flash.
Spotting them can be easy, but they can blend in and get past you for some time. For example:
This traffic is just for one month. You can see see the issues with the top browser version:
- It is outdated – Chrome 43.0.2357.130 – Almost a year since released.
- The traffic has a high bounce rate.
- The traffic has a high % of New Sessions.
- Only a (very very small) fraction of the total site Transactions
The only surefire secondary dimension we can use to remove this data is a Flash Version (while retaining as much real-user data as possible).
Let’s take a look at the Flash Versions:
We can see that there are only three non-bot user-groups here, and they have sent transactions. The rest (or a very high majority) of the traffic is from bots. With the most coming from Chrome 43.0.2357.130 and flash version (not set).
Removing the Bot Traffic
This traffic could be excluded using Advanced Segments. But, we don’t want that, we really just don’t want to see the data at all because of how much it is influencing the overall values. And, not everyone is going to look at an Advanced Segment, and sometimes, you may forgot and you’d be looking at incorrect data.
What we want is a way to exclude this traffic with a Filter.
Sometimes you can get away with just a simple exclude Filter based a single factor, either the Browser Version or Flash Version.
For sites with Revenue (which this is most likely affecting), you’ll need to make the call: Will this single exclusion filter remove too much Revenue?
Sometimes the answer is no, and you can add a filter like this for flash version 11.5:
The likelihood of any real users being on Flash version 11.5 is very (very) small.
If you are wary of a blanket exclusion, then you’ll want to use multiple fields for your exclusion.
For this exclusion to work, it’ll require a two-step (multi-condition) filter:
Step 1 – An Exclusion Filter, excluding the traffic you specify in filter #2
Step 2 – An Advanced Filter, specifying two Dimensions/Fields to filter.
Setting up the Exclusion Filter
Below is an image of how you would set-up the exclusion filter, it is pretty simple:
Set the Filter Field to “Custom Field 1” and type in a filter pattern (it can be anything as long as it matches with the second step)
Setting up the Advanced Filter
To set up your advanced filter, you will want to make a note of the Chrome version(s) and the Flash version(s) you want to block which will be added to the Filter.
The following image shows a setup specifying one Chrome version, and multiple Flash versions.
The key component of this filter is to output these two conditions to a Custom Field. In this case, we are naming this value “Exclude” in order to be used by the filter we created in the first step.
With these two filters set up, you’ll be able to remove the bot traffic, while retaining real users (and more Transaction data for better quality data)
You can now continually create new Advanced Filters to exclude this bot traffic as long as you continue to specify the same Custom Field defined in Step 1. And, if your site is like our clients’ you’ll be visiting this, at least, once a month.
Of Note: Keep an eye on your filter order. If your Advanced Filter is placed after your Exclusion Filter, it will not work. This is likely to happen as you’ll need to continually push up the order of your Exclusion filter as you create new Advanced Filters.
Last Note: Always keep a View that is not filtered, so you can verify you’re not removing too much traffic (or in case you mess up and have to reference that raw data).
I haven’t seen much information on using filters this way, so I hope you find this helpful, and if you have any questions or see any issues, please feel free to comment.