Cloudflare Fights Back Against AI Website Scrapers

Stop AI from stealing your content! Cloudflare's new tool helps website owners fight back against aggressive data scraping.

Stop AI From Scraping Your Website
The battle for web content heats up! Cloudflare takes aim at AI companies scraping data for large language models. Symbolic image


Cloudflare, a major internet security and content delivery network, is taking a stand against AI companies scraping websites for content. This move aims to curb the practice of "generative AI" legally and illegally harvesting data to train large language models.

Cloudflare has launched a new, free tool for its clients. This tool specifically blocks bots used by AI companies from scraping website content. The tool applies to all Cloudflare customers, including those on free plans. "This feature will continuously adapt," Cloudflare says, "as we identify new methods used by bots to scrape the web for model training."

Interestingly, Cloudflare's data reveals that 85.2% of its clients have chosen to block even AI bots that follow proper identification protocols. This suggests a strong desire among website owners to control how their content is used by AI models.

Cloudflare identified the most active AI scraping bots over the past year. Bytedance's "ByteSpider" bot attempted to access 40% of websites under Cloudflare's protection, while OpenAI's "GPTBot" tried on 35%. These two bots, along with "Amazonbot" and "ClaudeBot", make up the top four AI scrapers by request volume on Cloudflare's network.

While Cloudflare's move is significant, fully stopping AI scraping is a challenge. The rapid development of AI models has led to cases of companies skirting or outright breaking scraping rules. A recent case involved Perplexity AI being accused of scraping content without permission.

Cloudflare recognizes the difficulty and acknowledges the possibility of AI companies adapting to evade detection. They plan to "keep watch and add more bot blocks" while evolving their machine learning models. This ongoing effort aims to create a safer online space where content creators retain control over their work and how it's used by AI.


Post a Comment

Previous Post Next Post

Contact Form