AI Is Ready for “Computer Use”

Jeff Brown
|
Oct 23, 2024
|
Bleeding Edge
|
6 min read

Editor’s Note: “Market Wizard” Larry Benedict is sounding a warning to “buy and hold” investors…

Due to a rare market event that’s only happened four times in the last 40 years, he believes we’re about to enter a Chaos Period… a turbulent time in the market that could set your retirement back years.

But Larry has navigated these periods over his 40-year trading career and helped his clients prosper. Now he’s preparing to do it again by sharing his blueprint for success.

That’s why he wants you to attend his Countdown to Chaos event on October 30. He’ll even share a specific ticker symbol he thinks will be very helpful in the years ahead.

If you have money in the stock market right now, this is an event you do not want to miss. You can sign up to attend by going right here.


Not a month goes by without at least one major product release of a foundational artificial intelligence (AI) model.

It’s hard to believe that advancements are happening this quickly. Not only is the pace incredible, but the material improvements in each successive release are impressive.

This continues to drive record levels of investment. If each new release was only 1% or 2% better than the previous one, progress would be slow and investment would be limited.

The reality is the opposite. We’re experiencing large leaps in progress toward the ultimate goal of artificial general intelligence (AGI).

These Are Not Incremental Improvements

Last month, the big release came from OpenAI with the OpenAI o1-preview.

The o1-preview is an early release of OpenAI’s new large language model (LLM). It is designed for vast improvements in reasoning, resulting in big jumps in benchmark scores in math, software coding, and PhD-level science. We explored this release in The Bleeding Edge – Smarter Than Human PhDs.

The month before, in August, Elon Musk and his team at xAI released Grok-2. It’s also a frontier LLM designed for strong reasoning capabilities. The performance of this release was no less impressive – even more so considering that xAI is the newest entrant to the race of building frontier models.

And yesterday, Anthropic announced the release of its “upgraded” Claude 3.5 Sonnet model and a new model, Claude 3.5 Haiku. To some, this announcement might appear to be an incremental improvement over previous models. But it’s not.

Anthropic Claude Models’ Benchmark Performance | Source: Anthropic

As we can see above, Claude 3.5 Sonnet’s benchmark performance improvements are impressive across the board. And its score for “Code” (software programming) – shown on the third row – is best-in-class at 93.7%. In this category, Anthropic continues to lead the industry.

For anyone thinking that AI isn’t really being used for anything significant yet, you’ll certainly be surprised to know that about 97% of all software programmers in the U.S. are using generative AI tools to assist with programming.

The release of Claude 3.5 Haiku is also significant because it has achieved the performance of Anthropic’s prior largest frontier model, Claude 3 Opus, at the same low cost of inference as the prior Haiku model. This is relevant for mass market adoption. It’s ideal to have a highly performant model capable of being run at low cost.

But neither of these product announcements was the biggest news.

An AI With Agency

For those with a keen eye, you might have noticed the last two rows in the chart above: “Agentic coding” and “Agentic tool use.”

The topic of agentic AI is something I’ve been writing about as one of the biggest trends for this year. We explored this topic in July in The Bleeding Edge – The Agentic AI Undercurrent and most recently again this Monday.

It refers to an AI having agency. It is given the authority or directive to solve a problem or complete a task through a series of steps. An agentic workflow is quite different than that of a LLM like ChatGPT. It is an iterative process, where an agentic AI uses a more humanlike workflow to accomplish a task.

The breakthrough that Anthropic announced is a public beta release of its new agentic AI capabilities with Claude 3.5 Sonnet. Anthropic refers to this new functionality as “computer use.”

While that name might not sound interesting, it is. It enables us humans to direct our AI to interact with computers in the same way that we do.

Claude 3.5 Sonnet can now “look” at a computer screen, move a cursor, click on buttons, type text, and fill out fields to transact on the internet.

I hope that this “computer use” capability brings all sorts of useful applications to mind. Just imagine how often we have to type in the same information over and over again with different vendors or retailers, a process that should be automated.

Anthropic provided an enterprise example of someone needing to fill out a vendor form so that they could conduct business with them. The user simply prompted Claude, shown below, with instructions on what actions to take.

After that, Claude had to access the company database and customer relationship management (CRM) software to collect the data needed to complete the task.

In the short clip below, all movement of the cursor and the typing in fields is automated by Claude 3.5 Sonnet:

An agentic AI has the ability to evolve from being just software code. It becomes empowered to interact with the real world – in this case, to do just about anything that we can do on the internet.

Referring back to the earlier benchmark chart, the new Claude 3.5 Sonnet scored 69.2% on web-based retail applications and 46% on airline applications.

As a reminder, these scores are just for Claude’s “computer use” functionality in beta mode. It is an early release to allow developers to poke and prod and provide feedback to Anthropic.

One thing I’m certain of, the performance numbers will be significantly higher when Anthropic makes this “computer use” capability generally available with its frontier models.

This is a huge deal.

The Economic Incentive for Mass Adoption

If Claude can search and sift through retail websites and purchase goods and services on our behalf, the convenience will be incredible.

With a simple prompt, written or spoken, we’ll be able to tell our AI agent what we need it to find and purchase, and it can work as long as it needs to in order to complete the task. That will potentially save us hours of effort and thousands of dollars.

The same goes for airlines and travel. And agentic AI will have the skills of the very best travel agent, and it will know our detailed preferences. I gave an example of this in a previous issue of The Bleeding Edge

  • Find the best travel schedule for me to be in Tokyo for meetings on Monday and Tuesday, Seoul for meetings on Wednesday afternoon, Taipei for meetings first thing on Friday morning, and to return to New York in time for a birthday party Saturday morning. Travel should be in business class, and seat preference is window seats, meal preference is chicken or fish for flights. Book travel once schedule is finalized.

This is a powerful and exciting development. And it’s just one of an infinite number of optimizations we will be empowered to make in our lives.

These kinds of capabilities will shave off hours of mundane and repetitive processes and workflow every day. And it will free up our personal time for things that we would much rather be doing.

This is game-changing.

It’s no surprise that companies like Asana (ASAN) and Doordash (DASH) have already started experimenting with the technology. Individuals will use it to save time and frustration at work and home. And corporations will use it to improve productivity and integrate it into customer-facing software applications.

After all, when we remove friction in business transactions, business improves. There is a clear economic incentive to adopt this technology.

And when we consider this technology in light of the world of Web3 – enabled by blockchain technology – the applications become even more interesting…

Unlike the internet technology that we use today, blockchain networks have economics built directly into their protocols.

Native tokens for each blockchain will make it even easier for an agentic AI to transact autonomously on the internet. And the AI becomes the solution for abstracting away the complexity of transacting with decentralized networks.

Anthropic’s “computer use” may just be in beta mode, but it won’t take years before this technology is widely available. This will happen in a matter of months. And others will follow quickly.

This is the last critical step toward us all having our own personalized agentic AI assistants.

Regards,

Jeff


Want more stories like this one?

The Bleeding Edge is the only free newsletter that delivers daily insights and information from the high-tech world as well as topics and trends relevant to investments.