Advanced 365 Limited

07/26/2024 | News release | Distributed by Public on 07/26/2024 04:32

Lessons Learned for IT Resilience Post-CrowdStrike

Organisations around the globe were thrown into disarray last week due to the major IT outages caused by a configuration update released on 19th July by the security vendor CrowdStrike. The outage grounded flights, disrupted hospitals, and knocked media outlets offline, impacting over 674,620 direct customer relationships of CrowdStrike and Microsoft, with over 49 million people affected indirectly.

But what should we do about this? And are the architectural patterns that protect us against this type of failure actually worth the operational headaches needed to maintain them? In this blog we'll dissect the CrowdStrike incident and look at the lessons to be learned for organisations seeking to safeguard themselves against future disruptions.

Understanding the CrowdStrike Outage

To preface, excluding Android systems synonymous with smartphones and IOT devices (Internet of Things), Microsoft holds the market share of operating systems. Additionally, CrowdStrike, in a 2022 report, detail that they hold a 17.7% share of the 'Endpoint Security Market'. Wherever these two titans of the industry intersected with the CrowdStrike Falcon platform, there appeared to be an impact, or a risk of one at least.

The update resulted in the dreaded 'Blue Screen of Death' appearing on affected systems which disrupted many IT systems across the world. The update was released on Friday July 19th at 04.09 UTC and the fix package was released just over an hour later at 05.27 UTC, however, for many Microsoft based systems with auto-update for the Falcon platform enabled, this was too late. Systems impacted were not able to boot and therefore unable to remotely grab and update the fixed CrowdStrike package.

Furthermore, any interventions after this point would not typically be possible for someone existing outside of dedicated IT support teams due to the technicalities of the remediation.

Consequences of the Update

The faulty CrowdStrike update didn't just impact organisations directly; it triggered a cascade of effects across interconnected platforms and service providers. Even if an organisation or its direct support wasn't affected, those in the extended support chain could have experienced issues, resulting in widespread disruptions overall.

Finally, although still likely to be the "worst cyber event in history", it could have been so much worse. Replace "Cyber" with "Cyber Security", and we can only try to imagine the impact of such an event, especially if further access to unaffected operating systems within the same environment was targeted by a malicious threat actor.

The Mitre ATT&CK framework's 'Groups' section details Advanced Persistent Threats (APTs), including large criminal organisations and government agencies. The latter often aim for significant impact and disruption to support their objectives. The recent CrowdStrike incident highlights this issue further, showcasing a clear divide between Western and Eastern powers in their use of technology, exacerbated by recent US sanctions and China's drive for self-reliant cyber security. Consequently, APTs from both regions have experienced a wide-reaching cyber event predominantly isolated to areas of geopolitical interest.

The Case for System Heterogeneity

Key takeaways that will be discussed going forward because of this cyber event will likely focus on architectural resilience, and potentially the re-invigoration of previously sidelined architectural concepts. One that specifically springs to mind is system heterogeneity, which is an umbrella term for systems variance from one another, and the variability within the collective group.

Just as in nature, where species (whether plant or animal) gain greater resistance to impactful viruses through diversity, diverse computer systems and networks (system heterogeneity), are more resilient against potential software failures like the one on July 19th. However, this increased diversity also adds complexity to systems and often conflicts with modern platform-based architectures designed to reduce configuration-based vulnerabilities.

For example, CrowdStrike's Falcon Sensor, which was the root of much pain for many IT teams recently, enables the collection of security telemetry data, and provides other security functionalities. To prevent similar failures in the future, organisations could deploy multiple collection platforms within a single environment, which could then be aggregated into a unified viewing pane. If one platform fails, as it did on July 19th, only systems using the affected software would be impacted, preserving some level of operational capability.

Extreme applications of this concept could even go as far as users end users being allocated different operating systems - Linux, Mac, and Microsoft, within the same team to maintain operational resilience per organisational unit. However, this may be an overly extensive measure of operational resilience and vastly increases complexity. Therefore, companies are likely to review current positions of heterogeneity, and take a risk-based approach to this architectural concept.

Future-Proofing Cyber Security

The recent CrowdStrike incident serves as a crucial reminder of the vulnerabilities inherent in heavily centralised IT systems. As organisations seek to future-proof their cyber security measures, embracing system heterogeneity may provide a pathway to enhanced resilience against similar disruptions. By diversifying their technological landscapes and carefully assessing the risk versus complexity trade-offs, businesses can better position themselves to withstand unforeseen challenges. Ultimately, proactive adaptation and vigilance will be essential in navigating an increasingly complex digital environment, ensuring that operations remain robust and secure in the face of potential threats.

-

OneAdvanced provide end-to-end managed security services, providing round the clock proactive security. Get in touch today to discuss how we can help.