Nikki: Riddle Experiment

The Riddle Experiment

Two groups try to solve a Black Story behind a screen. Only one group is alive.

By Nikki Rademaker, Linthe van Rooij, and Yanna Smid

Overview

This project was made for the course Non-Human Cognition. Together with Yanna Smid and Linthe van Rooij, I researched how GPT-4 solves Black Stories riddles and how its way of reasoning compares to humans. Black Stories are riddles that describe mysterious and often dark scenarios. At least two players are required to play this game. One player reads a brief cryptic description of the ending of the story out loud and knows the full story. The other players ask questions to piece together the story. The goal of the game is to uncover the full story by asking yes-or-no questions, which will be answered solely with 'yes' 'no', 'false assumption', or 'not relevant'. Essentially, these riddles require asking the right yes-or-no questions to uncover a mysterious scenario. We wanted to see if GPT-4 could do this just as well as people.

The Study

The study examines the problem-solving abilities of GPT-4 by comparing its performance and strategy in solving riddles from the game Black Stories to human participants. We selected 12 riddles and deviated their details to avoid recognition by the AI. GPT-4 interacted through Python using the OpenAI API. Humans used WhatsApp to have the same chat-like setting. Both were given the same riddles and asked questions until they solved the mystery. We measured how many questions they needed, how many hints were given, and how long it took. Furthermore, we analyzed and compared the way they asked questions.

Findings

There was no significant difference between how well the groups performed. GPT-4 was often faster and more focused, but it missed the bigger picture. Humans were more flexible but sometimes got stuck. GPT-4 repeated details and stayed on the same path longer. Humans changed direction more easily and asked broader questions. Human participants also inteded to get frustrated at times when they got stuck, which influenced their approach.

Poster, Presentation, and Publication

We showed our work at the Non-Human Cognition exhibition and won best poster. You can download both versions of the poster here.

We have written two papers together with Tessa Verhoef, which have been accepted by the Computational Linguistics in the Netherlands journal (CLIN) and the Cognitive Science Society journal (CogSci). Furthermore the poster was presented by Linthe van Rooij and me, at the 34th Meeting of Computational Linguistics in Leiden (August 2024). Together with Tessa Verhoef, I also presented our poster at the Annual Meeting of the Cognitive Science Society in San Francisco (August 2025).

Next Project Previous Project

MSc Creative Intelligence & Technology, Leiden University (LIACS),
Non-Human Cognition, experiments conducted March - April 2024.