Having long been a grand challenge of computer science, the Turing test waits patiently for a machine intelligence to pass its clever game…

Created by Alan Turing in 1950 – far before semiconductors existed – the test was designed to determine if a machine – an AI – could be indistinguishable from a human.

The construct of the test is simple. A human evaluator would observe a conversation between two parties, one being a machine – an AI – and the other being a human. The conversation would take place only by text.

If the evaluator can’t consistently tell the difference between a human and an AI, then the Turing test has been passed.

It seems straightforward, and we’d probably think that it should have been passed already.

But no AI has yet claimed the prize.

Or perhaps one has…

Time to Pass Turing’s Test

Every couple of years, there tends to be talk that the Turing test has been passed.

But then it soon appears that it wasn’t. 

The last time this happened was in June 2022 when an engineer at Google not only claimed that its AI passed the test, but that it was conscious and sentient.

It was an exciting moment because large language models (LLM) were advancing quickly. OpenAI already had GPT-3, which was far from perfect but showing great potential. And the much more secretive Google had been working on its LaMDA LLM.

It was a moment in time where many in the industry thought it just might be possible, which is why there was such excitement. And with limited information from Google, it led to that much more speculation.

After a few weeks, though, the excitement passed. It became understood that the AI was definitely intelligent and capable, but not humanlike enough to pass the Turning test. And definitely not sentient.

But given the developments of the last 12 months and the recent release of OpenAI’s GPT-4o multi-modal large language model, I’ve been a bit surprised that the topic of the Turing test hasn’t resurfaced.

It feels like it’s time…

Which is why I was so excited to review a recent paper by two scientists from the University of California San Diego.

Can You ID This AI?

The title sums it up perfectly:

The Turing test was structured with a human interrogator, who would converse with one of four possible “witnesses.” 

There were 500 participants in the trial. The first group of interrogators unknowingly communicated with human witnesses, whose job was to try and convince them that they were human. And the remaining four groups of interrogators were randomly assigned either a human witness, GPT-4, GPT-3.5, or ELIZA, the last three being AIs.

The user interface for the Turing test was like a familiar messaging application. Excerpts of some of the conversations are shown below…

(click here to expand image)

Well? How did you do?

I’ll give you a hint: Only one of the above chats is human. Can you guess which? (Answers at the end for readers inclined to participate.)

The results of the actual trial were pretty incredible.

GPT-4 was judged to be human by human interrogators 54% of the time… GPT-3.5 was judged to be human 50% of the time… And the much less sophisticated ELIZA model – which was intended only as a baseline – was judged human only 22% of the time.

Think about that. The majority of the human interrogators thought that they were talking to a human… when they were actually speaking with an AI – GPT-4.

Is that it? Are we there?

Not so fast…

The Big Trophy

The nuance of the test was that the conversations were limited to just five minutes. That’s not a lot of time to interrogate the witness and form an opinion about whether or not the witness is human or machine.

With that said, it was still a useful exercise. And one that definitely demonstrates a significant truth: We’re on the cusp of radical change.

And arguably the most interesting data that came out of the research regarded humans conversing with other humans. 

Shown above in blue, human interrogators only recognized that they were communicating with a human 67% of the time. 

That means that human interrogators thought they were communicating with an AI, despite speaking with a human, 33% of the time.

They couldn’t tell a human was a human. Sounds crazy, I know.

I believe that this is heavily influenced by the general awareness that AI technology has advanced so much that humanlike conversation is expected from the leading LLMs.

The reality is that, for most of us, it would be hard to tell the difference when conversing with GPT-4, or any of the newer models available, in this kind of randomized, controlled trial.

So naturally, the real question is – what about OpenAI’s GPT-4o, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, or xAI’s Grok 1.5? 

I’d bet anything that the test results would be even more impressive. Probably good enough technology for longer conversations, as well. They could arguably pass the Turing test.

Sure, skilled interrogators and computer scientists – given enough time – would likely be able to determine the difference. But for most, it would be too difficult to tell the difference.

So why hasn’t it been done yet?

To put it bluntly: It’s not the prize the industry is working towards. 

The Turing test is just a game.

The industry, meanwhile, is racing towards lifelike artificial general intelligence (AGI). That’s the trophy that all the big players want to hoist.

Ridiculously Humanlike Speech

To that end, OpenAI has started to quietly release an alpha version of its advanced voice mode.

OpenAI is clearly in testing mode, as the message above explains. But the direction is clear. And it is expected to roll out Advanced Voice Mode to all users later this year.

The new improvements will result in more natural conversations with human emotion and tone. And the ability to turn on our camera and share our surroundings with the AI tells us that the model is multi-modal, capable of “seeing” and understanding the real world.

For anyone who would like to hear how incredible the natural language of the AI sounds today, with emotion and tone, just click here to hear a one-minute clip of the AI telling a story.

Better yet, the AI inserts – in real-time – sound effects to bring the story to life.

It’s nearly impossible to tell machine versus human already. We don’t need a Turing test to tell us that.

And based on what’s happening right now, the Turing test won’t need to be limited to a chat window.

Before the end of the year, it will be possible to run the test using speech rather than text. Why not have the interrogator and the witness speak over the phone instead?

Emotion, tone, and speech cadence are what make us human. And rather than chasing a test held in a chat window, the industry is manifesting AI in a way that feels natural and comfortable to us humans.

So natural, in fact, that we won’t be able to tell the difference.


Results:

A: AI (GPT-4)
B: Human
C: AI (
GPT-3.5)
D: AI (ELIZA)