Cloudflare accuses AI startup Perplexity of bypassing anti-scraping rules

This isn’t the first time Perplexity has been under scrutiny. Last year, the startup was accused of plagiarizing content.

By  Storyboard18| Aug 5, 2025 9:20 AM
Perplexity’s ambition with Comet is clear: to move beyond traditional click-and-search models and create a seamless, task-oriented digital assistant.

Cloudflare has accused AI startup Perplexity of scraping websites that had explicitly opted out, raising new concerns about how AI companies gather online content. Ihe internet infrastructure provider said it observed Perplexity bypassing standard protections and hiding its activity to access restricted web pages.

According to Cloudflare’s research team, Perplexity ignored Robots.txt directives — a web standard that tells bots which pages should not be indexed — and disguised its identity by switching user-agent strings and changing its autonomous system numbers (ASNs). This tactic allowed the company to appear as a regular browser user rather than a known web crawler.

“This activity was observed across tens of thousands of domains and millions of requests per day,” Cloudflare stated. The company said it used machine learning and network signals to detect and fingerprint Perplexity’s activity.

AI models rely heavily on large-scale data scraped from the internet. While web publishers have attempted to protect their content through Robots.txt files or direct bot-blocking rules, enforcement has proven difficult. Cloudflare says it first investigated the issue after receiving complaints from customers whose websites were allegedly being scraped by Perplexity, despite explicit restrictions.

Perplexity, for its part, has denied the claims. Spokesperson Jesse Dwyer dismissed the blog post as a “sales pitch” and said the screenshots provided did not show actual content access. In a follow-up email to TechCrunch, Dwyer claimed the crawler named in the Cloudflare post “isn’t even ours.”

Cloudflare, however, doubled down, saying it confirmed the behavior through controlled testing. It also accused Perplexity of impersonating Google Chrome on macOS when its declared user-agent was blocked, a move the company sees as an attempt to evade detection.

As a result, Cloudflare has removed Perplexity’s bots from its list of verified crawlers and introduced new methods to block their access. The company recently launched a marketplace that allows website owners to charge AI scrapers and released a free tool to block bots used to train AI models. CEO Matthew Prince has been vocal about the risks AI poses to publishers, saying it threatens the internet’s ad-driven business model.

This isn’t the first time Perplexity has been under scrutiny. Last year, the startup was accused of plagiarizing content. With Cloudflare now joining the list of critics, the spotlight on Perplexity’s data practices is likely to intensify.

First Published onAug 5, 2025 9:19 AM

WPP signs MoU with MIB-led IICT; aims to boost media and entertainment ecosystem

The partnership aims to strengthen and support India’s rapidly evolving Media and Entertainment industry, fostering innovation and nurturing the next generation of creative talent.