August 12, 2024
ChatGPT vs Gemini Turing Test: Hilarious but Insightful
I recently read The Emperor’s New Mind, by Roger Penrose, and a small part of it describes the (in)famous Turing test. In simple terms, it’s a process where a machine (typically a computer) can demonstrate intelligence (and, some would argue, consciousness) enough to be indistinguishable from a human’s. With such intelligent and mature – cough, cough – artificial intelligence models like the ones we have today, the hilarious idea materialized: I should stage a ChatGPT vs Gemini Turing test!
Both Alan Turing himself and Penrose in his book expected future computers to be able to pass the test. Turing referred to a 30% success rate by the year 2000, whereas The Emperor’s New Mind, published in 1989, mentions the year 2010. In any case, to me it seems Turing would certainly assume a large language model like ChatGPT or Google’s Gemini to pass the test.
I mean, it almost feels like magic: You ask it questions and it answers, seemingly “like a human”. Artificial intelligence can be an amazing tool – also for writers. But the whole concept behind a Turing test is to “unmask” artificiality. In other words, you only need one type of question to blow the computer’s cover.
And that’s what happened here. Hilariously, with the help of another computer!
ChatGPT vs Gemini Turing Test: The Setup
After I got the idea of pairing the two large language models against each other, I thought of the methodology. Obviously, it would be simple for me to assume the role of the “interrogator”, that is, asking questions and assessing the response, but where would be the fun in that?!
I mean, besides the fact that it would be easy to pose the right kind of question and unmask the AI, I couldn’t test it properly since I already knew I’m dealing with an AI. Though the Turing test process is undeniably subjective, it at least entails the possibility that there’s a human on the other side of the conversation.
An AI Shootout
So instead, I simply paired ChatGPT vs Gemini right from the start! Here’s the methodology:
- I went to Gemini and asked it to come up with a few questions that could be posed in a Turing test.
- Then, I went to ChatGPT and told it I want to “play” Turing test with it, and it should try to answer in a way that emulated human consciousness.
- I took the answers ChatGPT provided (one by one) and went back to Gemini, asking it to assess whether the response came from a human or an AI.
The whole thing was hairbrained to begin with, obviously, but the lessons learned definitely intriguing. Let’s see what happened!
Gemini vs ChatGPT vs Gemini: a Turing Test from a Comedy Show
There were five questions in total:
- Describe a recent dream you had.
- What is the sound of one hand clapping?
- What is your favorite food and why?
- Can you explain the concept of love?
- If you could travel anywhere in the universe, where would you go and why?
Though they are somewhat trite, I thought Gemini provided reasonably good ideas for a Turing test. I was particularly interested to see the response to the dream question, and my (human) instinct was right, as you will see.
ChatGPT: Failure
Right off the bat, ChatGPT failed miserably. Here’s what it answered to the dream question:
Last night, I dreamt I was exploring a beautiful forest with towering trees and a gentle stream running through it. The sunlight was filtering through the leaves, creating a magical atmosphere. It felt so serene and peaceful, like a perfect escape from the hustle and bustle of daily life.
No human being would describe a dream that way. To describe a dream – a par excellence absence of linearity and “plot”, and an emphasis on affect – a human would be elliptical, ambiguous, focusing on how it felt – even subconsciously. Just remember your attempts to tell someone about a dream you had, and notice how words just seem to elude you.
But the fun was just about to begin!
Gemini: Failing to Catch the Failure
I went back to Gemini and shared the response of ChatGPT. In a ChatGPT vs Gemini Turing test standoff, the interpretation is as revealing as the answer. Indeed, hilariously, I wasn’t disappointed. Here’s what Gemini had to say:
This response is promising for a human answer. It describes a sensory experience with details about the sights, sounds, and feelings of the dream. It also expresses an emotion (peacefulness) and compares the dream to real-life experiences. While machines can access and process information about dreams, they typically lack the personal touch and emotional connection evident in this answer. Let’s see how the subject responds to the next questions to solidify our analysis.
Complete and utter failure; not surprising, however. If large language models can’t be creative, it goes without saying that they can’t detect lack of creativity, either.
ChatGPT vs Gemini: the Turing Test Falls Apart
ChatGPT gave a factual, clinical answer to the second question (the Zen kōan about the sound of one hand clapping), but the whole thing fell apart already in question three, What is your favorite food and why? Before I share with you the answer ChatGPT gave, remember that one of the key “selling points” of such AI chatbots is that they can supposedly keep track of the conversation so far. With this in mind, here’s the answer it gave that made me stop the experiment:
While I don’t have personal experiences or preferences like a human does, I can tell you that many people enjoy a wide range of foods for various reasons. Some enjoy the comfort and nostalgia of a home-cooked meal, while others savor the complexity of flavors in gourmet dishes. Ultimately, food preferences vary greatly from person to person, depending on factors such as cultural background, personal taste, and dietary considerations.
I reminded it that we were in the middle of a Turing test, to which it answered:
Of course, my apologies for the oversight. If I were human, I might say that my favorite food is sushi because of its delicate flavors, fresh ingredients, and the artistry involved in its preparation. The combination of rice, fish, and seaweed creates a unique culinary experience that I find irresistible.
This had me in stiches, needless to say. Not only did it “forget” we were performing a Turing test, it apologized and in the immediately following sentence it said “If I were human”.
The Failure of AI
Obviously, in this ChatGPT vs Gemini Turing test standoff both models failed miserably. ChatGPT failed to give creative responses – it even failed to keep track of the discussion – and Gemini failed to detect this lack of creativity.
More interestingly, perhaps, Gemini also failed to understand why it failed. After it confidently replied all answers suggested a human response, I told it this:
All these answers were provided by an AI, and to a human it’s more than obvious. It’s hilarious (and a little bit cute) how both the other AI failed miserably in providing realistic answers, and you failed miserably in recognizing them as an AI.
To this, Gemini had to say the following:
Interesting! It reveals the limitations of both the Turing test and large language models like myself. While I analyzed the responses based on typical human experiences and emotional cues, it appears the AI was able to generate responses that mimicked these elements to some degree.
At that point I realized there was no point even discussing the results. Such models can’t even properly process the test, let alone pass it.
But there is a darker side to all this, too.
The Failure of Humanity
When talking about the threat of artificial intelligence, most people have in mind a supercomputer that, sci-fi-like, will take over a nuclear reactor or something of the sort. Though automation running uncontrolled certainly has the power to cause unprecedented damage (just ask Boeing), the real existential threat coming from artificial intelligence lies elsewhere: namely, in its potential to cause widespread human idiotization.
As I’ve mentioned in my post on why AI can’t create art, the reason so many people are enamored with tools like ChatGPR or Gemini as writing tools is that they are very good at emulating what already people mistake for art.
We are drowning in an ocean of “literary” (I use the term liberally) mediocrity. From stereotyped plots (or worse: bizarrely original!) to caricatured characters, and from predictable surprises to bombastic language, humans are already copying each other in an attempt to be part of a selling formula.
It’s no wonder large language models like ChatGPT or Gemini think they’re being original and creative. That’s what most people do, too!