Why Reddit is suing Perplexity and others over industrial-scale content scraping?

Content scraping is the process of using automated tools to extract data from websites, often in bulk.

By  Storyboard18| Oct 23, 2025 1:36 PM
Content scraping is the process of using automated tools to extract data from websites, often in bulk.

Reddit has taken legal action against artificial intelligence startup Perplexity AI, along with three other companies — Oxylabs, AWMProxy, and SerpApi — accusing them of unauthorised data scraping to train AI systems. Filed in a New York federal court, the lawsuit alleges that these companies bypassed Reddit’s protections to extract vast amounts of user-generated content from thousands of subreddit communities, which Reddit describes as part of an “industrial-scale data laundering economy.” This case highlights the tension between AI innovation and content ownership.

Reddit argues that the company’s content is highly sought after by AI developers because it provides quality human-generated text, critical for training AI models like Perplexity’s answer engine. According to Reddit’s chief legal officer, Ben Lee, this “arms race” for content has fuelled a market where data is harvested on a massive scale, often without the consent of content creators.

This case is the latest in a growing wave of disputes between content owners and AI developers. Reddit had previously filed a similar lawsuit against Anthropic, which remains ongoing. In Perplexity’s defence, the startup claims its methods are principled and responsible, rejecting allegations of unlawful scraping and asserting its commitment to openness and factual AI responses.

What is content scraping?

Content scraping is the process of using automated tools to extract data from websites, often in bulk. While it can have legitimate applications — such as price comparison or search indexing — it can also be exploited to steal content, violate copyrights, or harvest personal information. For platforms like Reddit, scraping can divert traffic, harm server performance, and undermine the rights of creators whose posts are used without permission.

Reddit’s lawsuit claims that Perplexity, despite receiving a cease-and-desist notice last year, amplified its use of Reddit content, citing it forty times more frequently in AI-generated responses. Reddit is seeking monetary damages and a court order preventing Perplexity from using its data, raising important questions about how AI companies source human-generated content responsibly.

As AI firms race to improve models, platforms like Reddit are increasingly taking a stand to protect intellectual property and the communities that create the data powering these technologies. The outcome of this lawsuit could set a precedent for how AI developers access publicly available content and respect copyright in the AI era.

First Published onOct 23, 2025 2:32 PM

The Grand Irony: Agencies That Built Brands, Forgot to Build For Themselves

Despite being the original architects of global brands, advertising holding companies are collapsing in market value because they still sell human hours while the world now rewards scalable, self-learning systems.