A black swan event…

On the heel of yet another potential black swan event.

Both the failed assassination attempt on a former U.S. president and the events of the last week are clearly, starkly different.

But they are similar in that they could have had dramatically different outcomes and have left more questions than answers.

I was on the road with my team over the last few days researching artificial intelligence and the explosion in data center build-out. Seeing is believing.

It ended up being a terrible time to travel, given the software problems caused by cybersecurity company CrowdStrike and Microsoft.

Ironically, Microsoft’s stock is trading higher than it was last Thursday before the news was out. CrowdStrike’s is down almost 20%.

I’m sure most of us have heard something about the calamity of the last few days, and some of us had the misfortune of being impacted by the chaos that occurred with airlines around the world. If you’re one of them, I feel your pain.

My team and I gave up on a couple of flights on Friday. We instead decided to drive the eight hours between Omaha, Nebraska, and Milwaukee, Wisconsin, to keep to our schedule.

In the end, it was a good decision. Our flight was canceled, and we never would have made it on time.

The media’s reporting on the event has been superficial. I get it. Software, software programming, and cybersecurity can be complex topics.

But something very interesting took place.

And I can all but promise you that what you’re about to read, you won’t find anywhere else.

BSoD

What happened, what we saw, was the blue screen of death (BSoD) across the world.

It showed up at airlines, hotels, hospitals, governments, and a seemingly endless number of industries. The common factor was those running Microsoft Windows-based operating systems that were using cybersecurity software from CrowdStrike.

The cause? Crowdstrike pushed out a software update to all its customers globally that resulted in the blue screen of death, a screen that indicates that Microsoft Windows has crashed.

The very first BSoD was in 1993 on the Windows NT operating system. It was actually the frequency of Windows crashes that was the catalyst for me switching to Mac operating systems more than 25 years ago. I haven’t seen a BSoD since, until this weekend.

I took this picture in the Detroit airport (DTW). Screens similar to this were everywhere around the world.

This singular event has already become the largest IT outage in history. That explains why Crowdstrike is suffering.

But Crowdstrike is not the only party at fault.

The media widely reported that the file that Crowdstrike pushed out to its Windows-based customers was the cause of the problems. That’s not untrue. After all, had the update not been sent, the IT outage wouldn’t have occurred.

It was originally believed that CrowdStrike had pushed out a faulty software driver to its Windows-based customers. A software driver is software that runs in what is called kernel mode, and the driver allows a software application – like CrowdStrike – to work with protected data and/or hardware devices.

Running in kernel mode is privileged access. Therefore any software that has that access can theoretically wreak havoc on an operating system. Faulty code running in kernel mode can crash a Windows system. And that’s what happened.

Only in this case, CrowdStrike hadn’t pushed out the faulty driver.

It was already there. It had been pre-installed by CrowdStrike some time ago.

The Backdoor Was Left Open

The similarities are striking…

Just like the Secret Service left the single most obvious staging point for a sniper unguarded – despite having identified the sniper as suspicious an hour before the attempt, scoped the sniper with a rifle 10 minutes before the assassination attempt, and given the sniper time for several shots to occur before taking action – someone at CrowdStrike had installed the equivalent of a backdoor to Microsoft’s Windows.

And somehow, Microsoft didn’t identify it.

It was the data update file that CrowdStrike pushed out, which “activated” the faulty driver and crashed the Windows-based machines. And only a certain kind of data file – written in the right way – would have triggered the faulty driver.

And it gets worse…

Contrary to media reports, the faulty driver is also installed on Mac and Linux machines. It’s just that CrowdStrike only pushed out the data files to Windows machines.

Had CrowdStrike pushed out the data file to all Linux machines, as well, the chaos would have been even more widespread.

To fully grasp the calamity that would have ensued had this happened, I highly recommend readers catch up on another backdoor recently discovered on the Linux OS in April of this year.

I covered this incredible story – and one software engineer’s incredible luck at coincidentally discovering the backdoor – right here in Outer Limits – The World Just Barely Avoided a Doomsday.

While CrowdStrike has been blamed for this massive IT outage, it’s not the only party at fault. I would argue that Microsoft is equally to blame.

This event raises some important questions, mainly:

  • Why did CrowdStrike pre-install a faulty driver on its customers’ systems? It has been there for months, or possibly years.

  • Why did CrowdStrike push out a data file to set off the faulty driver and crash its Windows-based customers’ machines?

  • Why didn’t CrowdStrike test the new data file in a controlled environment before pushing it out to all of its Windows-based customers?

  • Why did CrowdStrike decide to push the update out in one go to everyone, instead of doing it in controlled batches?

  • Why hadn’t Microsoft identified the faulty driver that had been pre-installed on its customers’ machines?

  • Was CrowdStrike itself compromised, perhaps infiltrated, by a nation-state… and did it unknowingly push out a “trigger” to set off the pre-installed file?

It’s important to note that CrowdStrike (CRWD) is a $64 billion publicly traded company.

It’s one of the most successful cybersecurity companies in history.

And we know Microsoft (MSFT) is worth more than $3 trillion.

These are both extremely well-staffed, profitable, best-in-class software companies. It is very hard for me to believe that this event was just sheer incompetence and that the company “just forgot” to test the data update before it sent it.

I’ve worked with software for most of my career as a high-tech executive, and there are extensive quality and testing processes in place, designed precisely to avoid a situation like this.

The software and cybersecurity industries always test in a “sandbox” to ensure that updates work properly and don’t crash systems. Always.

And yet, we’re supposed to believe that this was just a stupid mistake?

I don’t think so.

A Dry Run?

There’s an alternative line of thinking…

Lina Khan, the Chair of the Federal Trade Commission (FTC), used the event as an opportunity to highlight corporate concentration risk, as well as what she believes is a lack of resiliency in cloud providers.

Source: X

She’s not wrong. Concentration risk is real, especially when we’re referring to Windows, Linux (which is open source), and Android OS for mobile phones.

And there certainly is a need for more resiliency. We explored that topic as it relates to the vulnerabilities and espionage of subsea cables last Wednesday in The Bleeding Edge – The World’s Defense Against Subsea Espionage.

But was there concentration risk in a cloud provider like CrowdStrike? The CrowdStrike update impacted about 8.5 million Windows-based computers around the world. While that might sound like a lot, it represents less than 1% of all Windows machines.

Also interesting is that it was regulators, specifically the European Union, that forced Microsoft years ago to allow cybersecurity players to have access to the Windows kernel so there could be fair competition in cybersecurity to protect Windows machines.

Even more interesting is that CrowdStrike caused a similar event on a small subset of Linux servers earlier this year in April. One would have thought that if this was a bug discovered on Linux machines, it would have been double-checked for Windows as well.

It raises the question again…

Was April just a dry run? And was the latest event designed to send a clear message from one party to another?