Chain of Thought

Jeff Brown
|
Sep 17, 2024
|
Bleeding Edge
|
7 min read

Editor’s Note: We’ve been working on something exciting

We’ve been closely following the rise of artificial intelligence and how, as it advances, it’s going to change the landscape of so many industries to be smarter, more efficient, and better than before…

And now that we’re reaching a tipping point in AI, we believe in moving full steam ahead into an AI-powered future… Which is why Jeff and the team have been working tirelessly to launch our AI project.

It’s something Jeff calls the Perceptron.

You’ll be hearing more about it over the next handful of days. In fact, Jeff’s going live next Wednesday, September 25, at 8 p.m. ET to reveal exactly how the Perceptron works and how it can detect shifts in one of the most unpredictable markets – the crypto market.

You can go here to sign up to attend.


Chain of Thought
By Jeff Brown, Editor, The Bleeding Edge

The rumors had been swirling for months…

The brain drain at OpenAI would stifle the company’s ability to innovate.

But this was flat-out wrong.

Yesterday, we explored the latest developments at OpenAI and its most recent breakthrough – named “o1.”

It was just the naysayers, detractors, and “safety” pontificators stating as “fact” what they hoped to be true rather than the actual truth.

But we only just scratched the surface of o1.

The Leap in o1

In today’s Bleeding Edge, we’ll dig a bit deeper…

We want to better understand why there was such a large performance increase over GPT-4o. And we’ll try to better understand what this all means, as we look to the near future.

We’ll start with a chart we reviewed in yesterday’s Bleeding Edge – Smarter Than Human PhDs.

On the far right is the chart showing that both o1 and o1-preview (orange and pink) significantly outperformed expert humans (lime green) in the GPQA analysis.

Source: OpenAI

As a reminder, GPQA is a fairly new benchmark for AI. It stands for “Graduate-Level Google-Proof Questions and Answers” benchmark.

The test is a dataset of complex questions in biology, physics, and chemistry that require domain expertise to answer correctly and are hard to answer even with the help of a search engine like Google.

Highly skilled human non-experts are only able to achieve a score of 34% with the use of Google. GPT-4o was only able to achieve 39% accuracy, and GPT-4o only demonstrated 56% accuracy.

In contrast, o1 and o1-preview were able to achieve a score of 78% compared to the expert human level at just 69.7%.

And on the far left, we can see a radical improvement in competition math scores, specifically in the AIME (American Invitational Mathematics Examination), which is an extremely hard 15-question, three-hour test given to only the top high school mathematics students in the country.

GPT-4o failed miserably at the AIME test.

But how about o1?

o1 scored a remarkable 12.5 out of 15 questions (83%).

The results were just as striking in the middle chart, shown above, for a competition software coding challenge.

GPT-4o struggled, only achieving a score in the 11th percentile.

But o1 was impressive… achieving a score in the 89th percentile.

Complex math, challenging software programming, and Ph.D.-level science questions are all tasks that require advanced reasoning skills, something large language models (LLMs) have historically struggled with.

Until now.

Chain of Thought

So what happened at OpenAI? The company is clearly not in trouble. And it just delivered a major breakthrough in the road to artificial general intelligence (AGI).

OpenAI used a classic machine learning technique known as reinforcement learning to augment its use of neural network technology.

Reinforcement learning is similar to a human trial-and-error process. Reward a software program (the AI) to achieve an optimal outcome.

Through this process, the software “learns” which paths or actions help achieve a better outcome, and those are “remembered” and reinforced, hence “reinforcement learning.”

OpenAI highlighted how its reinforcement learning algorithm uses a chain-of-thought approach to learning. Basically, this process learns from mistakes (i.e. actions that result in suboptimal outcomes) and refines its problem-solving strategies.

Also, shown in the charts below, OpenAI noted that increasing the “train-time compute” and the “test-time compute” resulted in massive improvements in the AI’s performance.

Source: OpenAI

Said more simply, when OpenAI used more computational power for its reinforcement learning software, it resulted in a higher-performing model. And the same was true when running the model (i.e. inference).

Regular Bleeding Edge readers know where I’m going with this…

The Key to Increased Performance

When OpenAI used more computational power to give o1 time to “think” (i.e. test time compute), the performance radically changed.

This is a topic that we’ve explored extensively in both Outer Limits and The Bleeding Edge. The more computational power that is thrown at training an AI, the more powerful and useful it becomes.

o1’s dramatically improved results, which were just made public, will only add fuel to the fire that’s already raging in the industry.

The charts above encapsulate why the massive investment will continue in the race to AGI. It is the concrete, demonstrable truth of AI research and development.

Continued improvements in the software algorithms, combined with larger and larger amounts of computational power, exponentially improve the performance of an artificial intelligence.

  • More racks
  • More servers
  • More GPUs
  • More CPUs
  • More fiber optic cabling
  • More cooling
  • More power
  • Bigger data centers

Simple.

And that’s precisely why Amazon (AMZN), Apple (APPL), Meta (META), and Alphabet (GOOGL) spent $52.9 billion on artificial intelligence in just the second quarter of this year alone.

That’s $52.9 billion across just four companies, and that’s after the four spent about $44 billion in the first quarter.

OpenAI isn’t in trouble as a company. Its latest o1 improved in all benchmark metrics except for one, compared to GPT-4o.

Source: OpenAI

It’s worth noting that the only category where there wasn’t an improvement was in AP English Language.

That’s by design because the primary focus of the new model was to improve the reasoning capabilities of the model, which has been historically a very challenging problem.

And to reinforce a point that I made in yesterday’s Bleeding Edge, these advancements weren’t done irresponsibly, adding greater risk and danger to the model improvements.

Quite the opposite.

Inbuilt Safety

As OpenAI continues to make performance improvements in the model, it continues to make improvements in the overall safety of the model, as shown below.

And this just makes sense…

Why would the free market optimize over the long run for an artificial intelligence that can be harmful and dangerous? There’s not much of a market for a dangerous AI, and we’ve already seen the market rise against AI models that have attempted to rewrite history or hallucinate and produce inaccurate responses.

The far greater danger comes from factions that wish to intentionally train an AI on factually inaccurate information. Garbage in, garbage out, after all. And subjects like mathematics are not a matter of perspective, they are precise, with each equation having one correct answer.

When an AI is in control of a vehicle and making a left-hand turn through traffic, it is solving for a precise turning radius – and avoiding any obstacles – using mathematical equations.

This is what the market wants and will pay for. It won’t pay for danger, and it won’t pay for failure. AI companies are optimizing for success. They are optimizing for their products to have the maximum amount of utility. It’s the corporate version of reinforcement learning.

The detractors and decels that tried to work so hard to frame OpenAI’s leadership as irresponsible and “growth at all costs,” something that was not provable, and it was simply wrong.

OpenAI and others can make improvements in both performance and safety without slowing down their innovation.

The more we work with this new technology, and develop and improve it, the better we will be able to both identify and defend against its misuse. And we won’t be giving free license to the bad actors to run away with the technology, unfettered.

Something incredible is happening right now. And we have a front-row seat.

The Cost of Synthetic Intelligence

An entirely new form of synthetic intelligence is being invented.

And while today’s LLMs tend to “think” for a period – measured in seconds – future versions will be able to “think” for days, weeks, and even months on end to determine the optimal way to solve complex problems.

That’s the significance of the chain of thought. Being able to iterate and problem-solve without being prompted is the key to self-directed research and development, something that an AGI will be capable of doing.

This latest OpenAI version o1 leaped ahead of the entire field of LLMs when given the Mensa Norway IQ test as shown below.

Its score of around 120 is not only the best of all the LLMs but also stands significantly above the average human IQ.

Source: Tracking AI

This is what OpenAI was able to achieve with roughly $10 billion of investment.

Just imagine the exponential improvement that we’ll see after $100 billion in investment. It’s coming fast, to those of us who are watching.

So it was no surprise that a few days ago, the news came out that OpenAI is raising $6.5 billion at a $150 billion valuation.

Keep in mind that its last valuation was $29 billion from a round that closed in April 2023.

It’s smart money behind the deal, as well.

Venture capital firm Thrive Capital is leading the round with a $1 billion investment, private equity giant Tiger Global is joining the round, and Microsoft, NVIDIA, and even Apple are reportedly in talks to join the round, as well.

They all see the path forward. And to get there, there is no shortcut. It will require more, much more investment in computational power. Hundreds of billions more.

They clearly all know something that most people don’t. AGI is worth a whole lot more than $150 billion.


Want more stories like this one?

The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.