All Episodes

April 4, 2025 16 mins

Summary of https://arxiv.org/pdf/2503.23674

Researchers Cameron R. Jones and Benjamin K. Bergen investigated whether advanced large language models (LLMs) can pass the standard three-party Turing test. Their study involved human interrogators conversing with both a human and an AI, then judging which was human.

The findings indicate that GPT-4.5, when prompted to adopt a persona, was identified as human significantly more often than the actual human participant, marking the first empirical evidence of an AI passing this rigorous version of the test.

While other models like LLaMa-3.1 showed some human-like qualities, only the persona-prompted models consistently fooled human judges. The study also explored the strategies used by interrogators and the implications of these results for our understanding of AI capabilities and their societal impact.

  • This study provides the first empirical evidence that a Large Language Model (LLM), specifically GPT-4.5 when prompted to adopt a humanlike persona, can pass a standard three-party Turing test. In the study, GPT-4.5-PERSONA was judged as human 73% of the time, significantly more often than the actual human participants.
  • Prompting with a humanlike persona was crucial for success. While GPT-4.5-PERSONA and LLAMA-PERSONA were not reliably distinguished from humans, the same models without the persona prompt (GPT-4.5-NO-PERSONA and LLAMA-NO-PERSONA) performed significantly worse, often being identified as AI at rates significantly below chance in the undergraduate study.
  • The study compared the performance of several AI systems, including GPT-4.5, LLaMa-3.1-405B, GPT-4o, and ELIZA. The baseline models, GPT-4o-NO-PERSONA and ELIZA, had significantly lower win rates, indicating that interrogators could generally distinguish them from humans. This suggests the interrogators were not simply guessing randomly.
  • The research indicates that interrogators often relied on social, emotional, and linguistic cues rather than traditional measures of knowledge and reasoning when trying to distinguish between humans and AI. Interestingly, providing strange prompts or using "jailbreaks" was the most effective strategy for interrogators, while asking about the weather or human experiences was least effective.
  • The findings have significant social and economic implications, suggesting that contemporary LLMs could potentially substitute for humans in short conversations, raising concerns about deception, misinformation, and the potential undermining of real human interaction. The study also found that general knowledge about LLMs and frequent chatbot interaction did not consistently improve participants' ability to distinguish AI from humans.
Mark as Played

Advertise With Us

Popular Podcasts

24/7 News: The Latest
Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.