The release of Grok 3 from xAI on February 17 came as quite a shock to the entire industry.
None of the major players building frontier AI models had been expecting much at all from the upstart. After all, xAI had only been at it less than a year.
But xAI’s “impossible” feat of assembling and commissioning 200,000 NVIDIA GPUs in a mere 214 days, and unique software architecture – which we explored in yesterday’s Bleeding Edge – The Everything App – resulted in a state-of-the-art frontier model that outperformed other leading models on various performance metrics.
That left the rest of the industry scrambling. After all, what xAI released is its Grok 3 “beta” version.
Nearly six weeks have already passed, and Grok 3 is performing noticeably better. Who knows what xAI has running in the Colossus “lab” in Memphis, Tennessee, that we haven’t seen yet…
Last week, it appeared that Google stepped up to the challenge.
It released an “experimental” version of its latest agentic AI model – Gemini 2.5 Pro Experimental.
To generate more widespread interest, Google has made Gemini 2.5 Pro Experimental available for free to try out – it just applies rate limits (restricting free query usage) so that those who are paying for Gemini Advanced have a higher-performance product.
Anyone curious to try it out can go here to the Google AI Studio to experiment with this latest frontier AI reasoning model. It’s definitely impressive.
By its own measure, Gemini 2.5 Pro Experimental is now ahead of the industry in several metrics.
AI Benchmark Comparison | Source: Google
Above we can see that Gemini 2.5 Pro leads the industry in Humanity’s Last Exam (a fairly new benchmark to assess artificial general intelligence capabilities), Aider Polyglot (for software code editing), and MMMU (for visual reasoning).
Anthropic’s Claude 3.7 Sonnet is still ahead on the SWE-bench Verified benchmark for agentic coding, which Grok 3 hasn’t been tested on yet.
But Grok 3, when allowed to “think,” still retains the industry lead for the GPQA diamond benchmark – the graduate-level, Google-proof biology, physics, and chemistry exam.
And the same is true for the AIME 2024 and AIME 2025 mathematics benchmarks. Grok 3 also dominates in the software code generation benchmark LiveCodeBench v5.
Despite Grok 3 being out since February 17, many benchmarks and testing being run and publicized continue to leave out Grok 3 in the analysis.
Hallucinations Benchmark | Source: Lech Mazur
Above is a hallucinations benchmark which is designed to measure how much a model hallucinates when answering prompts. (Note: The farther right on the scale is better.) Gemini 2.5 Pro Experimental came out on top, however Grok 3 isn’t even considered in the analysis. I suspect it would come out on top.
Pretending that xAI and Grok aren’t a “thing” won’t slow down its progress one bit, though.
And the progress that the team at Google DeepMind has been making is still impressive. It only announced Gemini 2.0 as its first agentic AI model in December, and here we are with a major release in March.
That gives us an indication of how much faster this race is moving.
While some of this might seem esoteric, here are a few examples to demonstrate how powerful this technology has become:
The short video below is what Gemini created from the following prompt:
Demonstrate electricity and magnetism using a simple example like a solenoid. Create an animated Three.js scene to depict a charged copper coil and the associated magnetic field. Use three/tsl, sprites, bloom, etc… HTML file.
Source: @renderfiction
Worth noting is that to create this kind of visual, Gemini had to write software code to represent the physics of electricity and magnetism. This is generative AI, not a screen capture of something that exists somewhere else on the internet.
Another interesting example is that of a user who uploaded a hand-drawn image to Gemini, asked Gemini to create a 3D representation of it, then prompted Gemini to write software code for computer-aided design to accurately represent the object – optimized for 3D printing – and then sent it to the 3D printer to bring it to life.
Source: X @xf1280
Start to finish, this took a few minutes plus 3D printing time for the object. No technical skills required whatsoever.
Or how about this: The team at Google literally built its Google AI Studio functional website from a sketch. Here’s what the original website sketch looked like:
Source: Google DeepMind, @anibaddepudi
And here is what Gemini 2.5 Pro came up with:
Source: Google DeepMind, @anibaddepudi
Absolutely ridiculous. And all done in a matter of minutes.
These are simple examples, but useful to quickly understand the radical productivity boosts that are possible using an AI reasoning model and agentic AI.
Not only does it allow workers to get things done in a small fraction of the time, but it also allows normal users – without any technical skills at all – to do things that would have otherwise been impossible for them.
Obviously, this kind of generative AI requires a lot of computational horsepower and electricity. Just imagine what it will be like when more than a billion people are leveraging this kind of technology every day…
But looking at all this, one has to wonder…
How is it that Google can offer this kind of valuable technology for free, albeit with performance and usage limitations?
As much as Alphabet (GOOGL) would like for us to think that it is being magnanimous – willing to eat the cost and use up a portion of its $95+ billion in cash for our benefit – that’s not the case.
All we need to do to understand how Google pays for its costs of computation is to remember that Google is, first and foremost, an advertising company.
Data Collected by Popular Chatbots | Source: Surfshark
Roughly 85% of Google’s total revenues are derived from data surveillance and collection… and selling access to that data to advertisers.
Shown above are the kinds of data collected by popular AI models. It’s no surprise that Google’s Gemini is at the very top, collecting 22 out of a possible 35 possible data types.
And Grok is near the bottom… only collecting data from three categories: contact info, identifiers, and diagnostics. Identifiers are things like IP addresses, the kind of device we’re using, user IDs, session data from websites, and even consented personal data (most don’t know that they have “consented”).
China-based DeepSeek, as we know, collects user content and usage data from the phone… and sends that data back to China.
It’s important to be well-informed on data collection, business models, and privacy issues when using this technology. Ultimately, it comes down to each individual’s comfort level and sensitivity to privacy matters.
At this stage, we’ve crossed the Rubicon. There’s no turning back. We can expect at least one major announcement concerning a frontier AI reasoning model and agentic AI every couple of weeks.
I know it might feel like drinking through a firehose right now, but we’re going to do our best to help keep Bleeding Edge readers up to speed. And I encourage everyone to experiment with these models. It’s one thing to read about it, but it’s another thing to experiment and experience just how powerful this technology really is.
Have fun, stay ahead, and lean in…
Jeff
P.S. As I wrote in yesterday’s Bleeding Edge, I’m predicting that Elon Musk’s xAI will become the first company in the world to develop an artificial general intelligence (AGI), and it will happen within 12 months.
Right now, xAI is still a private company… which means it’s almost impossible for everyday investors to claim a stake. But I’ve uncovered a way to get in on the biggest AI project of the century – xAI’s Project Colossus.
If you want to learn more about what I discovered outside of Memphis, Tennessee, at the xAI Colossus lab, you can go here for more info.
I get into what’s going on onsite… xAI’s position as the leader in the race to artificial general intelligence… and how you can become a “partner” in Elon Musk’s Project Colossus.
Just go here for more details.
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.