26 Sep 2025

Content Signals Policy by Cloudflare lets websites signal data use preferences

New Cloudflare tool lets publishers control whether content can be indexed, used in AI input, or for model training.

Cloudflare has announced the launch of its Content Signals Policy, a new extension to robots.txt that allows websites to express their preferences for how their data is used after access. The policy is designed to help creators maintain open content while preventing misuse by data scrapers and AI trainers.

The new tool enables website owners to specify, in a machine-readable format, whether they permit search indexing, AI input, or AI model training. Operators can set each signal to ‘yes,’ ‘no,’ or leave it blank to indicate no stated preference, providing them with fine-grained control over their responses.

Cloudflare says the policy tackles the free-rider problem, where scraped content is reused without credit. With bot traffic set to surpass human traffic by 2029, it calls for clear, standard rules to protect creators and keep the web open.

Customers already using Cloudflare’s managed robots.txt will have the policy automatically applied, with a default setting that allows search but blocks AI training. Sites without a robots.txt file can opt in to publish the human-readable policy text and add their own preferences when ready.

Cloudflare emphasises that content signals are not enforcement mechanisms but a means of communicating expectations. It is releasing the policy under a CC0 licence to encourage broad adoption and is working with standards bodies to ensure the rules are recognised across the industry.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!