Misconfigured database triggered global Cloudflare failure, CEO says
Faulty metadata from ClickHouse updates pushed Cloudflare’s software beyond its limits.
Cloudflare says its global outage on 18 November was caused by an internal configuration error, not a cyberattack. CEO Matthew Prince apologised to users after a permissions update to a ClickHouse cluster generated a malformed feature file that caused systems worldwide to crash.
The oversized file exceeded a hard limit in Cloudflare’s routing software, triggering failures across its global edge. Intermittent recoveries during the first hours of the incident led engineers to suspect a possible attack, as the network randomly stabilised when a non-faulty file propagated.
Confusion intensified when Cloudflare’s externally hosted status page briefly became inaccessible, raising fears of coordinated targeting. The root cause was later traced to metadata duplication from an unexpected database source, which doubled the number of machine-learning features in the file.
The outage affected Cloudflare’s CDN, security layers, and ancillary services, including Turnstile, Workers KV, and Access. Some legacy proxies kept limited traffic moving, but bot scores and authentication systems malfunctioned, causing elevated latencies and blocked requests.
Engineers halted the propagation of the faulty file by mid-afternoon and restored a clean version before restarting affected systems. Prince called it Cloudflare’s most serious failure since 2019 and said lessons learned will guide major improvements to the company’s infrastructure resilience.
Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!
