Block Bad Bots - New Security Feature from KeyCDN
KeyCDN is always looking for ways to improve its service and so we are excited to announce a new security feature, and that is the ability to block bad bots. This allows customers to save on bandwidth costs and block bad bots, spiders, and scrapers from crawling your CDN assets. This feature is now available to all customers and can be enabled from the KeyCDN dashboard. No more bots draining your credits!
Bad bots
When it comes to the web, there are good bots and bad bots. An example of a good bot would be Googlebot
. Googlebot is Google's web crawling bot which crawls people's new content and adds it to their search engine for indexing. An example of a bad bot would be Cheesebot
. Bad bots can include spiders, crawlers, and scrapers. They are not always malicious, however most of the time it is also not necessary that they crawl your site. They consume your CDN bandwidth, take up server resources, and steal your content.
Typically you can block bad bots with your robots.txt
file (which you can edit from the KeyCDN dashboard). However, not all robots honor this file, which means it must be done at the server level.
Block Bad Bots feature
KeyCDN uses a comprehensive list of known bad bots and blocks them based on their User-Agent
string. This is something we have had implemented in our own environment for a while now and we wanted to open it up to all KeyCDN customers so that everyone could benefit from it.
451
HTTP error status code
When a bad bot, who is blocked, hits our edge servers a 451
HTTP error status code is returned.
The HTTP 451 error code was approved by the IESG on December 18, 2015 and is intended to be used when resource access is denied for legal reasons (e.g. censorship or government-mandated blocked access). We chose to use 451
as opposed to 403
, 404
, or 405
because those are generally used for troubleshooting and thought it best to keep them separated.
Blocking bad bots on your origin server
Using the new feature above will only block bad bots on your CDN assets. You can also block bad bots from accessing your origin server. If you wanted to block multiple User-Agent
strings at once, you could add the following to your .htaccess
file:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(agent1|Cheesebot|Catall Spider).*$ [NC]
RewriteRule .* - [F,L]
Or you can also use the BrowserMatchNoCase
directive like this:
BrowserMatchNoCase "agent1" bots
BrowserMatchNoCase "Cheesebot" bots
BrowserMatchNoCase "Catall Spider" bots
Order Allow,Deny
Allow from ALL
Deny from env=bots
And here is an example on Nginx:
if ($http_user_agent ~ (agent1|Cheesebot|Catall Spider) ) {
return 403;
}
If you are running a popular CMS there are also extensions and plugins available that can be used to block bots. See our security guides:
- Complete Guide on WordPress Security
- Complete Guide on Drupal Security
- Complete Guide on Joomla Security
Summary
KeyCDN is committed to providing you further ways to decrease your bandwidth costs while providing additional security. We are excited that we could open up this new feature to the public. If you have any questions please feel free to comment below.