The Supercomputing Goalposts Have Moved

Jeff Brown
|
Jan 23, 2025
|
The Bleeding Edge
|
5 min read


Overshadowed by the latest developments in artificial intelligence (AI) this month…

The commissioning and deployment of the world’s most powerful supercomputer – classical supercomputer, that is, El Capitan – barely made headlines.

Source: Lawrence Livermore National Laboratory

Built and operated at the Lawrence Livermore National Laboratory in Livermore, CA, El Capitan is only the third supercomputing system to achieve exascale performance, overtaking the Frontier supercomputer at the Oak Ridge National Laboratory in Tennessee.

El Capitan was built for the National Nuclear Safety Administration (NNSA) and is capable of performing more than 2 quintillion (1018) floating point operations per second (FLOPS).

It has been verified at a performance of 1.742 exaFLOPs, hence an exascale supercomputer, with a peak performance of 2.79 exaFLOPs. This represents a significant performance increase over Frontier which operates at 1.343 exaFLOPs.

It is literally the most powerful supercomputer in history, and yet, its commissioning barely evoked a yawn.

Soooo Many Chips

The technology enabling this record performance of El Capitan is driven almost entirely by Advanced Micro Devices (AMD) – 44,544 of its MI300A advanced processing units (APUs), which are a combination of AMD’s bleeding-edge CPUs and GPUs.

A Single MI300A APU with Multi-Chip Architecture

In total, there are 11,039,616 combined CPU and GPU cores that make up El Capitan. (Note: each CPU and GPU has multiple cores)

AMD Chairman and CEO Lisa Su (on right) | Source: Lawrence Livermore National Laboratory

In addition to AMD’s semiconductor technology, the other “secret” supplier to El Capitan is Micron Technology (MU), which provides its high bandwidth memory (HBM3), which is equally critical to achieving El Capitan’s remarkable performance.

Micron’s technology is “hidden” because it is not sold separately – its HMB3 memory is sold to AMD and incorporated directly into AMD’s MI300A advanced processing units (APUs).

The primary purpose of El Capitan will be for modeling and simulation for national security, as well as for the use of machine learning for materials discovery and inertial confinement fusion research – a popular approach to nuclear fusion.

But we’ve reached a strange point where despite the incredible advancements in supercomputer technology, hardly anyone is talking about them.

So what’s going on?

Sliding Into Obscurity

The world entered the Exascale Computing Era in May 2022, when the Frontier supercomputer was commissioned.

It was a paradigm shift at the time, as exascale computing had been an aspiration for decades. After all, without the computational horsepower, extremely complex problems couldn’t be tackled.

(Prior to exascale era of computing was petascale. An easy way to think about the paradigm shift is that exascale computing is 1,000 times faster and more powerful than petascale computing.)

At the moment, there are only three exascale computers on the planet. All three are in the U.S. And yet, while El Capitan’s recent commissioning is a massive deal, it received almost no coverage.

Why not?

The era of generative AI and hyperscale AI data centers – AI factories – which began when OpenAI released its first commercial version of ChatGPT at the end of November 2022 – only months after Frontier came online – has completely overshadowed the most impressive advancements in classical supercomputing.

It’s almost as if these classical supercomputers have lost their relevance.

The most complex problems are now delegated to the realm of various forms of neural networks and reinforcement learning. A classical supercomputer couldn’t have been used to accurately predict the structure of all of life’s proteins, DNA, RNA, ligands, and all of their interactions – which is what Google’s DeepMind division achieved in May of 2024.

RNA Modifying Protein 8AW3 Structure | Source: Google DeepMind

The problem isn’t the lack of horsepower in classical supercomputers. The problem is their architecture. Classical supercomputers process tasks one after the other, sequentially. You can have a bunch of CPUs and run them in parallel which improves performance, but each CPU has the same architectural limitations.

And AI data center at its heart uses GPUs and other AI-specific semiconductors that are designed for advanced parallel processing.

The architecture is designed specifically for parallel processing which is what makes GPUs so well-suited for deep learning and neural networks. A classical supercomputer simply couldn’t handle the complex calculations and workloads involved with training a neural network in any reasonable amount of time. It’s simply too inefficient.

The scale and raw computational capacity of these AI-specific hyperscale data centers are so far beyond the world’s most powerful classical supercomputer, they’re just not in the same league.

They’re not even in the same universe.

It almost feels as if Aurora, Frontier, and El Capitan are from a past generation of computing – almost a relic the moment they’re turned on…

El Capitan came at a price of $600 million (not billion) for what is now the world’s most powerful (classical) supercomputer.

Compare that with the hundreds of billions that the AI industry spent just last year building out AI factories in the U.S. alone. El Capitan came at a cost of less than 0.25% of what was spent on AI data centers in a single year.

What’s harder to believe? How insignificant classical supercomputers have become… or the level of investment in artificial intelligence in this very near-term race to create artificial general intelligence (AGI)?

Signposts Are Everywhere

Just yesterday, we learned about the world’s first $500 billion AI deal in The Bleeding Edge – The First $500 Billion AI Deal.

That’s almost 1,000 times more than the cost of El Capitan (833 times to be exact). For a single AI data center network – Project Stargate!

No, not all the money has been secured yet for the project. But that’s not unusual. Even if Project Stargate had $500 billion in cash, it couldn’t possibly deploy even 80% of it right now. There wouldn’t be enough GPUs available to purchase.

These hyperscale data centers take time to build and commission. But that timeframe isn’t measured in years like El Capitan. It’s measured in months.

Some 10 of 20 Project Stargate data centers are already under construction in Abilene, Texas, each about 500,000 square feet in size – more than 11 football fields.

There is such a sense of urgency right now. Not only is there a multi-trillion-dollar monetary incentive, but the industry can “feel it.” The breakthroughs are coming so quickly now. Every month.

The entire world may be asleep, but for those of us in the know, we can feel the next breakthrough as if it’s at our fingertips. Whatever problems we’re trying to solve, the answer is so close.

And the hints of AGI are already present. The signposts are everywhere.

And the path forward is marked as clearly as the yellow brick road.

Jeff


Want more stories like this one?

The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.