Analysis of cyber failure of CrowdStrike and Microsoft

Blue screen of death for the Microsoft failure

On 19 July 2024, a blue screen of death appeared on many Microsoft computers. Australian users experienced major failures of banks and Qantas operations. As working Friday started worldwide, like a domino effect, computer systems of airports, banks, hospitals, and companies started failing. Flights were delayed as airport computer systems stopped in Singapore, Hong Kong, India, Europe, and the USA. Among those affected by cyber-failure were Manchester United and the Dutch Ministry of Foreign Affairs.

This global cyber failure was triggered by regular updates to the CrowdStrike system, which affected Microsoft’s Windows operating system.

While we will closely monitor the latest developments on CrowdStrike cyber failure, this page provides an analysis of the wider relevance and impact of this cyber incident.

Vulnerability from overreliance on single-point solutions

Current cyber systems are highly complex. The weakest link, which was, in this case, the update of software, triggered major failure. The sheer complexity of interconnected services and servers prevents us from identifying points of failure. At the same time, this case has highlighted an overreliance of numerous organisations on single-point IT solutions. All impacted organisations were running the same software and this underscored a vulnerability in their cyber-resilience strategies. This incident underscores the importance of a global conversation about how such IT solutions are maintained and updated.

Supply chain security dilemma

With the constantly evolving threat landscape and increasing complexity of the digital products we use, we are told about the necessity to frequently update our software in order to maintain security. However, this case highlighted that updates can be the root cause of security challenges. With more software opting for automatic updates, a new area of potential vulnerability to security risks has emerged: the rapidly evolving supply chain.

Failure without cyberattack

Cybersecurity is often associated with cyberattacks, and previously experts anticipated global IT outages as a result of malicious actors’ activity. And this is a major reason for concern. But, as this incident shows, our computer systems can be down without any malicious intent but as a result of faulty processes. A security feature turned out to lead to security challenges: the underlying cause seems to be an update to the kernel-level driver that CrowdStrike uses to protect Windows computers. After “numerous reports of blue screen of death errors on Windows hosts,” CrowdStrike identified the issue and rolled back the problematic update, but this does not appear to help machines that have already been affected.

Focus on critical infrastructure

At the time when global media focused on AI risks, the risks were much more immediate and mundane than they were, in this case, the update of the server. Countries and companies must focus on critical infrastructure and critical information infrastructure, from dealing with complex systems and supply chains to protecting submarine cables and critical points of failure of the modern internet. This incident also puts the trust in digital infrastructure at risk, leading to increased scrutiny and demand for more robust, resilient systems especially in critical sectors.

Need for international response

An update on a Crowdstrike server affected systems worldwide but also highlighted the dependence of numerous digital systems in critical sectors worldwide on a single provider. This may require owners and operators of critical infrastructure both from public and private sectors to diversify, where possible, their third party service providers but also certainly enhance cyber resilience. The challenge is though this would require international action which may be difficult to ensure in a current geopolitical environment.

The good news is that there are international instruments to be used in such situations. Among UN 11 cyber norms agreed upon states within the UN Group of Government Experts (GGE) and endorsed by all UN Member States within the UN Open-Ended Working Group (OEWG), several cyber norms are particularly relevant. For instance, the norms that call to protect critical infrastructure and ensure supply chain integrity. The Geneva Dialogue on Responsible Behaviour in Cyberspace, established by Switzerland and implemented by Diplo with the support of several partners, addresses the implementation of these norms and highlights the challenges that different non-state stakeholders face in implementing them. The results of a regular dialogue with representatives of the private sector, academia, civil society, and technical community from different countries are published in the Geneva Manual, a comprehensive guidance on the non-state stakeholder implementation. The first edition focuses on the implementation of the norms, including on supply chain security. This year the Geneva Dialogue discusses the implementation of norms related to critical infrastructure protection, and the next chapter of the Manual will focus on this topic.