AI is only human
Generative AI models are trained on data generated by humans. At least originally. By now, there is so much AI slop that the whole model training thing may become circular, and AI is trained on output generated by AI. But if AI is trained on human output, what do you think will happen if you test AI for behavioural biases that many humans display? Yup, AI has the same biases that many humans have, but it is also more rational than humans are in many instances.
A team of researchers let different genAI models answer a battery of tests that have historically been used to demonstrate behavioural biases in humans:
Diminishing sensitivity to gains
Loss aversion
Subjective probability weighting
Narrow framing
Ambiguity aversion
Hyperbolic discounting
Then there are seven additional questions on things like sample size neglect, base rate neglect, the gambler’s fallacy, anchoring, etc. I won’t explain all these biases; you can Google them or ask ChatGPT, Claude or Gemini to explain them to you.
When you give these tests to an AI model, it will either answer rationally or as many humans do. Or it might get all confused and give you some weird, unclassifiable answer. Below is the kind of answer you get from the different models on the first six tasks (preference-based tasks, left-hand column) and the seven belief-based questions (right-hand column).
Proportion of AI responses
Source: Bini et al. (2026)
Note a couple of things. First, belief-based questions typically involve maths and statistics, and genAI models tend to be really good at maths, so they give the rational answer almost all the time.
Second, when it comes to expressing preferences for one option or another, all they can rely on is what they have learned from the output of other humans. And that is where human-like behaviour comes in. In particular, the most popular genAI models GPT, Claude and Gemini show clear human-like biases in many of the tasks they were asked to perform.
The one big outlier is Llama, which tends to give more rational answers in preference-based tasks but more human-like answers in belief-based tasks. I wonder what happens there…


