Keeping your digital property optimised, secure, and user-friendly is vital to its success. However, internet bots and web crawlers often lurk in the shadows, creating significant challenges for businesses.
It is essential to understand the multifaceted impact these automated agents have on our online operations. While some bots serve legitimate purposes, many can be detrimental, posing risks to business efficiency, security, and user experience.
PERFORMANCE AND RESOURCE DRAIN
To begin with, the high-frequency access of bots places a considerable strain on server resources. While human visitors interact with the website organically, bots can rapidly and repetitively request resources, leading to increased server load. This can result in slower response times, reduced website performance, and potentially, downtime. This is, to this day, one of the key challenges this business faces.
The excessive consumption of bandwidth by these automated programs can also inflate hosting costs. For businesses relying on cloud-based services, this means increased cloud service fees, which directly impact operational budgets. Ensuring a consistently fast and reliable user experience becomes challenging when significant portions of system resources are being monopolised by non-human traffic.
SKEWED WEB ANALYTICS
Website analytics play a crucial role in guiding business decisions. Metrics around user behaviour, traffic sources, and conversion rates help tailor marketing strategies, improve user experience, and enhance overall site effectiveness. Bots, however, can distort these metrics significantly.
For example, a web crawler that mimic human behaviour and uses a browser-based user agent can inflate visitor counts, skew bounce rates, and affect session durations. This muddied data makes it difficult to discern genuine user trends from artificial activity, leading to misguided business decisions. Inaccurate data can disrupt campaigns, hinder targeted marketing efforts, and ultimately, degrade the user experience.
SECURITY RISKS
Bots are not just a nuisance for performance but also a significant security threat. Malicious bots can execute various harmful activities such as scraping content, executing DDoS, and probing for vulnerabilities to exploit. Content scraping, which is a practise where bots copy website content illicitly, can lead to intellectual property theft and diminish the value of original content by disseminating it elsewhere without consent.
DDoS attacks orchestrated by botnets aim to overwhelm a website’s infrastructure, rendering it inaccessible to legitimate users. This not only causes immediate revenue loss but also damages the brand’s reputation.
On top of this, bots can be used for credential stuffing attacks, where stolen login credentials are used to gain unauthorised access to user accounts. This compromises user data security and could lead to legal repercussions under data protection regulations.
AD FRAUD
For businesses that rely on advertising as a crucial part of their revenue, bots represent a potent threat by engaging in ad fraud. Malicious bots can simulate genuine ad clicks and views, draining advertising budgets without providing any real business value. This not only affects the ad-performance metrics but also undermines the trust between advertisers and ad platforms.
The result is a lose-lose situation where advertisers lose money on fraudulent clicks, and legitimate website owners face diminished trust and credibility with their ad partners.
COMPETITIVE DISADVANTAGE
Bots employed by competitors to spy on your site’s pricing, product listings, and other strategic data can undermine competitive advantages. By harvesting vast quantities of data from your site, competitors can react swiftly with counterstrategies designed to outmanoeuvre your business. This can erode market share and stymie growth efforts.
LISTING THE BOTS TO CREATE A BLOCKLIST
Below is a list (from experience) of what we consider bad bots, in terms of resource usage. This isn’t our full list, merely a list that creates resource usage issues for the platform. Your list may differ as their are reasons to enable some bots we’ve listed here:
- Amazonbot
- Applebot-Extended
- Barkrowler
- BLEXBot
- Bytespider
- CCBot
- ChatGPT-User
- ClaudeBot
- Claude-Web
- DataForSeoBot
- Diffbot
- DotBot
- Expanse
- FacebookBot
- FriendlyCrawler
- GPTBot
- Image2dataset
- ImagesiftBot
- IonCrawl
- ISSCyberRiskCrawler
- Meta-ExternalAgent
- MJ12Bot
- omgilibot
- Orbbot
- peer39_crawler
- PerplexityBot
- PetalBot
- python-requests
- Scrapy
- SeekportBot
- Timpibot
- VelenPublicWebCrawler
- Zoominfobot
Worth noting on this list is the lack of SEO monitoring bots. We have removed many of those bots from the list, as we’ve found that they do (for the most part) respect any rate limiting rules you put in place.
A full list of all known bots can be found on Wikipedia. If you see one in your log files and its becoming a resource issue then that is a great resource for understanding more.
MITIGATION STRATEGIES
Understanding the risks associated with bots and crawlers is the first step toward mitigation. Implementing sophisticated bot management solutions can help differentiate between legitimate and malicious bots, thereby reducing their negative impacts.
- Robust Firewalls and Rate Limiting: Deploying advanced web application firewalls (WAFs) and setting rate limits on server requests can curb the excessive load caused by bots. We utilise rate limiting this quite often on this site as AI bots from leading LLM’s seeking to harvest our knowledge silos.
- CAPTCHAs and User Verification: Utilising CAPTCHAs can challenge suspected bots, although this must be balanced to avoid frustrating real users. Adding multi-factor authentication for logins can also thwart bot-driven credential stuffing attacks on forms and registration processes. The latest versions appear on all pages of your property, without actioning. This can and does cause problems with speed optimisation, but is worth considering.
- Behavioural Analysis: Implementing systems that distinguish between human and bot behaviour through AI-driven analytics can enhance security measures. Speed of access is one way of understanding who is human. Bots mostly operate in patterns.
- IP Blacklisting and Honeypots: Blocking IP addresses known for malicious activity and setting up honeypots to trap and analyse bots can further reduce harmful traffic.
- Engaging with Reputable Bots: Not all bots are detrimental. Engaging with and allowing bots from reputable organisations (like Google and Bing) ensures that your site remains visible and accessible while controlling the impact.
- Nuclear approach: One nuclear approach is to block IP ranges and/or entire nations from accessing the property. Many local e-commerce-focused businesses do this as it also helps them protect against fraud.
While internet bots and crawlers are an inevitable part of the digital ecosystem, the challenges they pose to website businesses are profound. From draining resources and skewing analytics, to posing security threats and enabling ad fraud, bots can significantly disrupt business operations.
Businesses must approach bot management with a combination of strategic foresight and technical precision, leveraging advanced solutions to mitigate risks and protect their digital assets.