Scaling AI – Big and Small

Jeff Brown
|
Aug 6, 2024
|
Bleeding Edge
|
7 min read
Chart

It’s the greatest prize in history…

The creation of an artificial general intelligence (AGI).

Yesterday, we spent some time studying the field of competition.

The major players are Microsoft, OpenAI, Meta, Alphabet (Google), xAI (Elon Musk), Apple, Anthropic, and I’d throw in a little-known but no longer small competitor, Mistral AI.

For anyone who missed that discussion, you can catch up here in The Bleeding Edge – The Field for Large Language Models.

This competition is not for the faint of heart. The ticket to enter the race starts at 100,000 high-powered GPUs to train these foundational models and $1 billion in costs to do so. It only goes up from there.

And yes, more capital means more computing… and a chance to beat the competition in this all-out arms race.

It’s not just about making a ton of money from their large-language product offerings (in the form of subscription, API fees, or licensing fees), either. While the economic incentives are what justifies the capital spend and investment, the race is also highly ideological.

The Incredible Power of Rewriting Reality

The winner, or winners, will have incredible power.

The power to rewrite history, science, and biology in whatever way they prefer it to be… not necessarily as it actually is.

After all, a large language model (LLM) like Google’s Gemini 1.5, Meta’s Llama 3.1 405b, or OpenAI’s GPT-4o feels like an all-knowing magical black box to most of us. We tell it what we need… and seconds later, sometimes milliseconds, the answer just appears.

We got a taste of how this could work this past February when Google announced to the world the highly anticipated release of its foundational LLM, Gemini, version 1.5.

Chart

The historical likeness of the Vikings – the seafaring Norse people from Scandinavia who raided, traded, and settled in various parts of Europe from the 8th to 11th centuries – is not represented in these images.

Does anybody really care? It will be quite hard for the general population to even know that the answers are heavily biased. In fact, they have been programmed by humans to be so. After all, garbage in, garbage out.

That’s why this competition is so important. It’s not just about creating the largest productivity-enhancing technology in human history. It’s also about whether or not our foundation of knowledge will be real and factual… or made up.

The implications are staggering.

Every month, we see a new model released…

Each new model makes major improvements, a step ahead of the pack, and a new approach that takes us one step closer to the prize. It’s this tangible, material progress that drives billions more capital into the field.

An entire industry and complete ecosystem is being built from the dirt up, in real time. The sprouts came out of the ground so quickly… and they are maturing before our eyes as if it’s all happening in a single growing season.

But I don’t think most people understand what’s at stake – aside from a new suite of cool tools to help us generate on the spot a college essay, an email, an original photo, or some computer code.

Today’s LLMs are already writing legal agreements, conducting patent infringement searches, writing complex software code, developing websites, interacting with the internet, and even discovering how life’s 200 million plus proteins fold.

There’s so much progress. And it’s worth a fortune. It’s like a massive greenfield of opportunity. And any legitimate team with expertise in AI can get funding. And the talent is in flux.

Just yesterday, OpenAI’s president and two other key employees left OpenAI, arguably the leader in the race to AGI right now. Not the kind of horse we’d expect the talent to jump off of. One went to competitor Anthropic, and I’m confident the other two will pop up somewhere soon.

What does that mean for the future of OpenAI? Will it be able to maintain its lead?

And amid this frenetic competition to advance large language models, there is something small happening.

More specifically, small large-language models.

An Answer to Scaling AI?

I know this sounds counterintuitive. After all, the game in artificial intelligence (AI) is “go big or go home,” isn’t it?

But something we touched on yesterday was a hint.

We can have the greatest, highest performing large language model (LLM) like GPT-4o, but if it is very expensive to operate, it just won’t scale to be used by billions of people.

That’s where smaller LLMs come into play.

After all, most of us don’t need an all-knowing, all-capable AI for a small range of tasks we regularly need to achieve. It doesn’t matter if it’s for a work task or one at home, we usually don’t need an advanced AI that does just about everything. We need one that does a few things well.

And that’s precisely why the same companies that are building large LLMs with hundreds of billions, even trillions of parameters, have also been in a race to build small LLMs.

These models are typically referred to as “cost-efficient” models, designed with both performance and cost of operation in mind. And that means scale.

The most recent industry development for small LLMs was the launch by OpenAI of GPT-4o-mini. The name says it all.

Chart

Source: Artificial Analysis

As we can see above, we have a field of small LLMs, and their performance is graphed against their cost to operate.

With most charts, the “best” technology or company is in the upper right quadrant. But in the above chart, the ideal place is in the upper left. This equates to high performance and low cost.

OpenAI’s release of GPT-4o-mini was a big deal because it stands out as the highest-performing small LLM and also one of the cheapest models. GPT-4o-mini even outperformed GPT-4 (OpenAI’s previous LLM) on some benchmarks.

GPT-4o-mini scores 82% on the massive multitask language understanding (MMLU) benchmark, which covers a wide range of tasks and knowledge. It is priced at less than half Anthropic’s Claude 3 Haiku and Google’s Gemini 1.5 Flash models, and just a fraction of the cost compared to Meta’s Llama 3.

And compared to past frontier large language models, it is a magnitude cheaper.

Our instincts probably suggest that we’d be giving up a lot of functionality with these new small models. But that’s not the case.

GPT-4o-mini is capable of supporting text and real-time computer vision as inputs, with support for audio, image, and video soon.

That means, for fractions of a penny, we’ll be able to speak with our AIs, show them our world using our smartphone’s camera, and provide it images and video. And it will be able to speak with us, write to us, and provides us images and even video.

Some critics have suggested that this race for cost efficiency is a race to the bottom. They’re completely wrong.

And they miss the point entirely.

The Industry Shift Towards Inference

The industry is proactively getting ahead of what the market needs.

AI isn’t a one-size-fits-all technology.

There will be smaller models that specialize in scheduling, design, history, process automation, construction, materials… you name it. They will be designed and optimized for specific tasks while maintaining powerful natural language processing capabilities.

This isn’t a race to the bottom, it’s a race to scale.

AI isn’t something that’s being developed just for the top 10% of the global population, it’s being developed to reach the world’s connected population… which is more than 5 billion.

And this isn’t just about developing cost-efficient software. When it comes to running these models, there is a new breed of AI-specific semiconductor companies that are designed to run these models – this is called inference.

One of those companies has just recently confidentially filed for its IPO. It’s a company that I’ve been following for years. And in a few days, I’ll be sharing more details about how investors can get ready for that event – you can get ready by going here.

Counterintuitive or not, something big and small is happening in the world of artificial intelligence.

Does that mean that we’re nearing the end or closer to the beginning? I’ll let you decide.

But I’ll leave you with this…

Microsoft just spent $19 billion last quarter on capital expenditures (CapEx). Meta increased its CapEx forecast for the year to between $37 and $40 billion. Amazon has already spent $30.5 billion in the first half of this year and forecasted it will spend more in the second half.

Current forecasts for the data center CapEx spend in 2028 are almost $200 billion… just for the top five U.S. hyperscale tech companies.

I’ve never seen growth like this…

And AI is a foundational technology, a fabric, that is being woven into every industry. It will accelerate innovation, efficiencies, breakthroughs, and economic growth in all sectors.

So who cares if a bunch of institutions and hedge funds get their hands stuck in the cookie jar on the yen carry trade? They used too much leverage, and they got caught.

It’s just a blip of volatility that won’t get in the way of either big or small breakthroughs in artificial intelligence.


Want more stories like this one?

The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.