If you use Google Analytics, especially if you have a small site, you may notice referral traffic from weird domains with 1 pageview, 0 time on site, 0 conversions and 100% bounce rate. This is garbage traffic and it comes from domains like:
This is what's called Google Analytics referral spam. The folks behind these services, like
social-buttons.xyz rack up page views with bots to appear at the top of your referral analytics. There are any number of reasons someone might do this, ranging from annoying to malicious. I err on the site of malicious but regardless, their spam is annoying and it's screwing up our data. Let's get rid of it.
The quick and easy way: Segmenting
We are going to use Google Analytics' segmentation feature to filter out our garbage traffic. If you have never created a segmentation before, this will a good introduction to do more in-depth analysis by isolation your data.
This will not alter your underlying data the same way a "filter" would (GA's terminology can be confusing). The downside to this approach is that you will need to apply this filter every time you run a report, which may or may not be a big deal to you.
Steps to Creating the Spam Traffic Segment
Open up Google Analytics and go to
Acquisition > All Traffic > Referrals. Click on the "Add A Segment` above the the chart.
A dropdown will appear and you'll have a whole list of things to choose from. Each of these items is a different variable you can use to segment your data based on. For example, "Bounced Sessions" would show you all of the sessions that ended with a "bounce". This would allow you to try to find patterns in this type of traffic.
+ New Segement
Now you'll see a form to create a new segment.
- Enter Name - Name it whatever you'd like, but make it understandable.
I'm going to go with
Exclude Bot Traffic.
- Click Traffic Source
- Select does not match regex and enter a regex that will select all of the crappy domains. If you aren't familiar with regexs, copy and paste all of the domains you want to exclude, combining them with a pipe
|. For example:
You should see the summary on the right hand side change based on your filter.
Once happy, hit save and then apply.Boom, now all of our spam traffic disappeared and things look a little more normal. Using the filter you can now compare the two line charts to see how much of your traffic was bots:
The domains that these spammers use will likely change over time so you will need to go in and update your regex ever now and then.
Another downside with this approach is if you have a decent amount of data, there is a good chance your reports will use sampling and you'll still get some some bad data in there.
Next, we're going to use an approach that scales better with Google Analytics.
The slightly harder way, using a Google Analytics "View" and Filters
My preferred approach would be to create a separate Google Analytics "View", using a series of Filters.
We are going to apply this same logic we used for creating the segment here, but instead permanently ignore data from these spammy domains.
Make sure you name it something descriptive that makes sense like
clean data or "without bs referrals".
Next, go to your view. Click on "Filters", then "Add Filter".
Now, following the directions from the official documentation we can add a filter to exclude this traffic from Google Analytics entirely.
If you know the IP Addresses of any traffic sources you want to exclude, you can use a pre-defined filter. If don't have the addresses, can grab the names from your spammy referral report.
- Select Custom Filter. Select
Excludeunder Filter type.
- Set the Filter field to
- Enter the domains you'd like to exclude. If you'd like to exclude multiple domains, you can use a regex. For example:
This will now remove traffic from these sources, permanently affect any data for a view. Don't make a dumb mistake and apply this to your primary view without testing.
Note: When using the regex, for filters, make sure you test them prior to application. Don't end the regex with a pipe (
|), which will cause all referral sources to be excluded.
Hit save. Now apply this filter to the view that we just create.
Both of these approaches only solves the problem from the Google Analytics aspect. These bots might affect your site in other ways, and it might behoove you to prevent them from hitting your site all together. Depending on your setup, you could exclude those IP addresses or domains in nginx or apache. If your DNS allows you to block IP addresses, that might work as well.