Kevin Korte is President of Univention North America, making sure you stay in control of your data, your company and your future.
2024 has brought us a significant increase in cloud outages and service-based disruptions. A single day in July alone saw a widespread outage of Microsoft’s cloud services followed by CrowdStrike’s antivirus platform disabling more than eight million Windows devices worldwide.
Taken together, the two incidents can safely be considered the biggest IT failure in history—grounding thousands of flights all over the world, forcing hospitals to cancel scheduled procedures and stop admitting new patients, and shutting down supermarkets. According to one insurer’s preliminary analysis (via CNN), the incident caused more than $5 billion in direct losses for Fortune 500 companies alone.
As we navigate the aftermath of these massive disruptions, it’s helpful to view them as a valuable learning opportunity for the future. I see three fault lines that have been laid bare: failures in risk management, gaps in business continuity and a lack of funding for IT. That’s why now is a good moment to step back and learn from these mistakes to emerge stronger.
Risk Management Starts At The Very Top
Risk management is one of the core functions of the board of directors at any company. Yet today, only a minority of board members have any technology background. Likewise, few boards have dedicated technology committees.
Companies are therefore unprepared to handle large-scale IT outages even on the most basic level. When an IT outage shuts down a hospital, for instance, and the facility does not have enough pens and paper ready to maintain rudimentary charts, business continuity planning has utterly failed.
If you run an airline, a computer outage causing a weeklong cascading chaos of cancellations and delays means that the company didn’t judge its inherent risks correctly. It is especially problematic if your competition manages to recover significantly faster and with fewer upset customers. The one laggard in the industry that wasn’t prepared pays a hefty price by attracting more scrutiny, squandering large amounts of goodwill and inviting regulatory action or even fines.
Such cybersecurity and IT management incidents point to a larger and deeper problem in risk management. Boards today understand many of the legal, operational and financial risks a company faces. However, when it comes to information technology, it is still treated as an auxiliary service. This is a serious mistake because in our hyperconnected world, most business operations depend on IT—from sales to inventory management.
Risk management has to start at the top and always include corporate IT. Otherwise, companies leave themselves vulnerable to the next round of outages and cybersecurity incidents on more than one level. Why should middle management take IT seriously if the top brass doesn’t?
Cloud Service Risks Call For Mitigation Strategies
For on-premise services, IT administrators typically understand what mitigating actions they can take or prepare for. Uninterruptible power supplies and backup generators deal with power problems, cellphone connections can be a substitute for fiber optic cables, and consistent backups will secure data in case of a server outage. It’s not rocket science. There are plenty of books, manuals and exams out there to learn how to build resilient systems.
However, cloud services and software-as-a-service offerings do not undergo the same rigorous planning. We rely on service level agreements (SLAs) to judge the risks, and an acknowledgment of the brand name standing behind such an SLA too often replaces real outage planning. After all, the whole purpose of outsourcing a service to one of the leading providers is that we don’t have to deal with the details of service availability.
It’s true that the IT department can leave many technical details to their service providers and associated SLAs, but it introduces a new risk factor. What happens when the provider or the connection to it goes down? Without understanding and mitigating these new risks, we should ask ourselves whether using cloud services and IT outsourcing is really less hassle than hosting something on site. The recent past indicates that too many companies ignore the risks instead of trying to understand them.
Why IT Departments Are Desperate For Adequate Funding And Staffing
Above all, we should reexamine whether the IT department has adequate funding for its mission-critical role in our organizations. If IT departments had the staffing and budgets to test security updates quickly, some of the latest issues might have had less impact.
Instead, we face an ever-increasing staff shortage. Discussions about budgets, working conditions and contributions are lacking. We are fed vague hopes of new AI tools while pulling stressful all-nighters to deal with yet another attack.
Our companies, our entire economy and society run on computers. Without adequate funding for IT expertise, each service we introduce will become a new risk factor. The problem only gets exacerbated as the same overworked people are supposed to react to the current issues and plan for preventing future problems.
If We Neglect IT, We Are On Course For A Digital ‘Cancel Culture’
Over the past few months, it has become apparent that our ways of managing IT and services are no longer working. If airlines and hospitals fail once computers and servers go down, technology is no longer an adjacent service. As a result, we have to give the services that IT departments use the proper attention, risk assessments and budgets they need. We cannot continue outsourcing risk assessments like we outsource the risks themselves.
Nobody likes to be stuck at an airport staring at a screen full of canceled flights or have the hospital cancel a medical procedure. However, this could be our new reality unless we change how we think about and handle IT.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?