Two groups try to solve a Black Story behind a screen. Only one group is alive.
By Nikki Rademaker, Linthe van Rooij, and Yanna Smid
This project was made for the course Non-Human Cognition. Together with Yanna Smid and Linthe van Rooij, I researched how GPT-4 solves Black Stories riddles and how its way of reasoning compares to humans. Black Stories are riddles that describe mysterious and often dark scenarios. At least two players are required to play this game. One player reads a brief cryptic description of the ending of the story out loud and knows the full story. The other players ask questions to piece together the story. The goal of the game is to uncover the full story by asking yes-or-no questions, which will be answered solely with 'yes' 'no', 'false assumption', or 'not relevant'. Essentially, these riddles require asking the right yes-or-no questions to uncover a mysterious scenario. We wanted to see if GPT-4 could do this just as well as people.
The study examines the problem-solving abilities of GPT-4 by comparing its performance and strategy in solving riddles from the game Black Stories to human participants. We selected 12 riddles and deviated their details to avoid recognition by the AI. GPT-4 interacted through Python using the OpenAI API. Humans used WhatsApp to have the same chat-like setting. Both were given the same riddles and asked questions until they solved the mystery. We measured how many questions they needed, how many hints were given, and how long it took. Furthermore, we analyzed and compared the way they asked questions.
There was no significant difference between how well the groups performed. GPT-4 was often faster and more focused, but it missed the bigger picture. Humans were more flexible but sometimes got stuck. GPT-4 repeated details and stayed on the same path longer. Humans changed direction more easily and asked broader questions. Human participants also inteded to get frustrated at times when they got stuck, which influenced their approach.
We showed our work at the Non-Human Cognition exhibition and won best poster. Later, we presented our research poster at the CLIN 34 conference. You can download the poster here.
We have now written two papers together with Tessa Verhoef to share our results more widely. We are working on publishing it soon. We submitted the research to the Computational Linguistics in the Netherlands Journal (CLIN), and Cognitive Science (CSJ). Both papers got provisionally accepted.