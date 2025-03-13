Generative AI is being adopted faster than any other major technology we have seen before. The St. Louis Fed reported in September 2024 that two years after the launch of the first mass market generative AI tool, 40% of Americans use it at home or at work. It took the internet five years to get to the same penetration level. The question is whether AI tools make us better at different tasks.

Adoption rate of computers, the internet and generative AI

Source: St. Louis Fed

A new article in Nature Human Behaviour examined 370 results published in 106 studies to see whether AI can compete with humans and if a collaboration between humans and AI is even better than humans or machines alone.

AI optimists say that AI tools can improve human performance and have maximum impact when humans and AI collaborate. AI pessimists say that AI is going to be so much better than humans that it will completely replace humans sooner or later.

News flash: At the moment, it looks like the AI pessimists are right.

The big chart at the end of this post provides a comprehensive overview of the change in task performance of humans that use AI tools vs. two different benchmarks. The right-hand chart shows the improvement (and it almost always is a significant improvement) of humans that use AI tools vs. humans without the help of AI.

Across all 370 studies, the average effect size, measured as Hedges’ g, is a pretty hefty 0.63. For the uninitiated, Hedges’ g measures the performance difference between two setups relative to the standard deviation of performance between subjects or trials. As a rule of thumb, a Hedges’ g below 0.2 is hardly noticeable, while a Hedges’ g of around 0.5 is large enough to significantly improve your life in terms of time saved or increased quality of output. Very large effects are in the order of 0.75 or higher.

The improvement from AI augmentation is particularly large for numerical tasks (0.91) but also quite large for creative tasks (0.52) and decision tasks (0.65). the bottom two rows show, however, that the key to improved performance is integration. When labour was divided between the humans and the AI, the gains were much smaller than when the AI was integrated with the task with no separation of labour.

Unfortunately, comparing AI-augmented human performance with human performance ex AI is a bit of a skewed benchmark. It’s like comparing the portfolio of a stock/bond portfolio against a benchmark of only bonds. You outperform most of the time simply because equities generally have higher returns than bonds.

Besides, the real challenge we white collar workers face is whether AI is going to replace us simply because it is so much better (and cheaper and doesn’t go on vacations, etc.) than we are. This is where the left-hand chart comes in.

It compares the combination of humans and AI with the better of AI or humans. If you look at the second and third row of the chart you see that out of the 370 experiments, AI performed better than humans alone in 249 experiments while humans did better in 121 experiments. And that is as of mid-2023 and ignores all the advances made since then.

Side note: If you want to know how good the cutting edge of generative AI models has become, read this excellent article by Ethan Mollick. Honestly, it will blow your mind.

If AI is already better than humans in two out of three cases, can the combination of humans and AI give us even better results?

Unfortunately, not. The combination of humans and AI dilutes the power of AI and is on average slightly worse than the better of AI or humans (Hedges’ g of -0.23). When AI is anyway better than humans, the performance reduction of the human-AI team vs. AI alone is meaningful (-0.54). It is only when humans alone are better than AI alone that the combination of humans and AI create an even better output.

From the point of view of a business, it thus seems as if it is best to test in a company at which tasks the AI is better than humans and replace humans with machines if the AI wins but give humans some AI tools to use if the humans win. And then repeat this experiment regularly to take advantage of technological progress in AI to replace more and more of your workforce with machines. Good times…

Combination of human and AI vs. better of human and AI (left) or human (right) benchmark

Source: Vaccaro et al. (2024)