Two groups try to solve a Black Story behind a screen. Only one group is alive.
By Nikki Rademaker, Linthe van Rooij, and Yanna Smid
This study was conducted for the course Non-Human Cognition. Together with Yanna Smid and Linthe van Rooij, I researched how GPT-4 solves Black Stories riddles and how its way of reasoning compares to humans. Black Stories are riddles that describe mysterious and often dark scenarios. At least two players are required to play this game. One player reads a brief cryptic description of the ending of the story aloud and knows the full story. The other players ask questions to piece together the story. The goal of the game is to uncover the full story by asking yes-or-no questions, which will be answered solely with "yes", "no", "false assumption", or "not relevant". Essentially, these riddles require asking the right yes-or-no questions to uncover a mysterious scenario. We wanted to see if GPT-4 could do this just as well as human participants.
An example of such riddle can be seen in the images below.
The study examined the problem-solving abilities of GPT-4 by comparing its performance and strategy in solving riddles from the game Black Stories to human participants. We selected 12 riddles and modified their details to avoid recognition by the AI. GPT-4 interacted through Python using the OpenAI API. Humans used WhatsApp to create the same chat-like setting. Both were given the same riddles and asked questions until they solved the mystery. We measured how many questions they needed, how many hints were given, and how long it took. Furthermore, we analyzed and compared the way they asked questions.
There was no significant difference in overall performance between the groups. GPT-4 was often faster and more focused, but it missed the bigger picture. Humans were more flexible but sometimes got stuck. GPT-4 repeated details and stayed on the same path longer. Humans changed direction more easily and asked broader questions. Human participants also tended to get frustrated at times when they got stuck, which influenced their approach.
You can find the latest version of the research poster here.
We have written two papers together with Tessa Verhoef for the Computational Linguistics in the Netherlands journal (CLIN) and the Cognitive Science Society proceedings (CogSci). Furthermore, the poster was presented by Linthe van Rooij and me at the 34th Meeting of Computational Linguistics in Leiden (August 2024). I also presented our poster together with Tessa Verhoef at the Annual Meeting of the Cognitive Science Society in San Francisco (August 2025).