Cloudflare claims Perplexity circumvented website scraping blocks

Perplexity faces renewed scrutiny over data use after Cloudflare accused it of bypassing website security measures.

Perplexity is accused of using hidden tactics to collect web content despite being blocked, sparking fresh debate over AI data scraping ethics.

Cloudflare has accused AI startup Perplexity of ignoring explicit website instructions not to scrape their content.

According to the internet infrastructure company, Perplexity has allegedly disguised its identity and used technical workarounds to bypass restrictions set out in Robots.txt files, which tell bots which pages they may or may not access.

The behaviour was reportedly detected after multiple Cloudflare customers complained about unauthorised scraping attempts.

Instead of respecting these rules, Cloudflare claims Perplexity altered its bots’ user agent to appear as a Google Chrome browser on macOS and switched its network identifiers to avoid detection.

The company says these tactics were seen across tens of thousands of domains and millions of daily requests, and that it used machine learning and network analysis to identify the activity.

Perplexity has denied the allegations, calling Cloudflare’s report a ‘sales pitch’ and disputing that the bot named in the findings belongs to the company. Cloudflare has since removed Perplexity’s bots from its verified list and introduced new blocking measures.

The dispute arises as Cloudflare intensifies its efforts to grant website owners greater control over AI crawlers. Last month, it launched a marketplace enabling publishers to charge AI firms for scraping, alongside free tools to block unauthorised data collection.

Perplexity has previously faced criticism over content use, with outlets such as Wired accusing it of plagiarism in 2024.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!