General-purpose robotics has long been elusive in the industry.
In many ways, it has been the grand challenge of robotics.
Robotic systems have historically been trained to perform very specific tasks. The process of doing so is time-consuming and doesn’t scale well. It simply takes too long to individually program each desired task. This is a key reason why the adoption of robotic systems has been relatively slow – a linear adoption rate.
For example, the adoption of robotics in factories around the world has been on a steady, consistent trend over the last decade.
Source: International Federation of Robotics
On average, the adoption of robotics has increased around 10% every year. There are now more than 4 million industrial robots in use, which actually isn’t much at all. These complex robotic systems require programming and careful calibration to perform specific tasks.
There’s one exception, however.
Elon Musk and his team at Tesla broke the mold and did something that most experts said was impossible.
They designed a general-purpose robotics system capable of learning from – and operating entirely with – vision.
Tesla and Musk did something that was completely the opposite of what the rest of the industry had been doing.
It’s easy to “see it,” now – now that we’ve had the benefit of time. But to nearly everyone, it was impossible to believe years ago.
Because this “vision-based” system started with a car.
I’ve been asserting since early 2018 that Tesla’s EVs were merely robots on wheels – and that Tesla was the most important artificial intelligence company in the world – even back then.
Tesla developed a neural network – its full self-driving (FSD) AI – which is today trained on billions of miles of real-world driving data.
The result is something that most of the media, the press, and even academic institutions don’t speak about openly.
Tesla’s FSD AI is capable of general-purpose navigation and driving of cars, trucks, and now even humanoid robots (Optimus), much to the chagrin of Musk’s critics.
What enabled the rapid training of Telsa’s neural network?
It was the collection of real-world data from the eight exterior cameras inconspicuously installed on each one of Tesla’s electric vehicles (i.e. the cars’ “eyes.”) It’s worth noting that each camera has a 250-meter depth of field resulting in each car having 360-degrees of visibility…something impossible for us humans.
About 9 billion miles were collected from Teslas using Autopilot mode while driving – a simpler mode of autonomy typically used on highways.
That data became the foundation for developing Tesla’s full self-driving software.
It’s also worth noting that in Musk’s “Master Plan, Part Deux” from July 2016, Musk wrote at the time that Tesla’s fleet was collecting over 3 million miles of real-world driving data per day. He suggested that worldwide regulatory approval for full, Level 5 autonomy might require around 6 billion miles (10 billion kilometers).
By the end of December 2024, Tesla had collected almost 3 billion miles, however, on FSD.
By my own calculations, I am confident that Tesla has well exceeded 4 billion by now… and will reach 6 billion full self-driving miles no later than May this year.
Source: Tesla
As we can see from the chart above, Tesla’s data collection isn’t linear. It is exponential growth, and it’s the secret to why Tesla’s FSD software has been improving at such a rapid rate.
The entire industry knows that this can be done. Tesla has proven it.
And now, they are rapidly adopting similar approaches.
Just last week, Alphabet’s Google DeepMind division announced Gemini Robotics – a reference to its Gemini 2.0 large language model (LLM) designed for general-purpose robotics applications.
Gemini 2.0 is built on multimodal inputs which include text, audio, images, and video. In the past, Google DeepMind has used the Gemini models to produce text, images, or short video outputs.
Now, the team at DeepMind has constructed a new vision-language-action (VLA) model that is built upon Gemini 2.0… which is capable of what DeepMind calls “embodied reasoning,” something that I have long referred to as manifested AI.
The purpose of this new model is to enable general-purpose robotics. It is designed to run on and control things like robotic arms, grabbers, or humanoid robots.
And because it was built on a multimodal LLM, it has the ability for natural language conversation and verbal prompts.
Shown below is an example where the robot is given a verbal prompt to “pick up the basketball and slam dunk it.”
Source: Google DeepMind
While the above video may not appear to be that remarkable, what is remarkable is that the robot had never before practiced or been trained to dunk a ball.
The Gemini Robotics software just “knew” – from its prior related training – what a basketball and a basketball net looked like, and it inferred the correct action that it needed to take.
The Gemini Robotics AI – and its real-time reasoning capabilities – are well demonstrated in the short video below. The AI was given a verbal prompt to “put the grapes in the clear container.”
Source: Google DeepMind
As we can see in the video above, the human is moving around the clear container, trying to make it difficult for the AI to complete its task. But the robotic arm, which is running the Gemini Robotics AI, has no problem tracking the clear container and completing the task.
And it has no problem at all with fine motor skills. Just look at the two robotic arms working in concert to fold paper to make an origami fox.
Source: Google DeepMind
How are these robots getting so good, so quickly?
It comes down to how they were trained.
The key point that enabled this kind of general-purpose robotics was a large and varied data training set – of both real-world and synthetic images and video.
This is what enables the Gemini Robotics AI – and others – to understand the task given and how to achieve it.
And knowing that, it can practice those tasks until it masters the individual task.
Shown below is a series of images of Apptronik’s humanoid robot running Gemini 2.0 packing a lunch bag for school.
Source: Google DeepMind
Google DeepMind chose to partner with Apptronik to build the next generation of humanoid robots, powered by Gemini 2.0 of course. This partnership was announced by Apptronik in February and I wrote about it and its implications in The Bleeding Edge – A Scarecrow No Longer.
The implications of these latest developments should be obvious. With general-purpose artificial intelligence (AI) capable of “thinking” and reasoning, the mastery of skills, any skill, will happen at an exponential pace.
Alphabet (GOOGL) and the rest of the industry have had to adapt quickly after seeing the incredible progress being made by Tesla with vision-based systems and applying them to its Optimus humanoid robot.
We’re about to witness a breathtaking increase in both the manufacturing and adoption of general-purpose robots.
We’re at the inflection point of exponential growth.
Ever wonder what it looks like to ship a humanoid robot? How about 16…
Factories and logistics companies around the world will soon be receiving deliveries, just like the one shown below.
Figure 02 Robots Headed Out for Deployment | Source: Figure AI
Jeff
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.
The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.