Generative AI is being adopted faster than any other major technology we have seen before. The St. Louis Fed reported in September 2024 that two years after the launch of the first mass market generative AI tool, 40% of Americans use it at home or at work. It took the internet five years to get to the same penetration level. The question is whether AI tools make us better at different tasks.
Adoption rate of computers, the internet and generative AI
Source: St. Louis Fed
A new article in Nature Human Behaviour examined 370 results published in 106 studies to see whether AI can compete with humans and if a collaboration between humans and AI is even better than humans or machines alone.
AI optimists say that AI tools can improve human performance and have maximum impact when humans and AI collaborate. AI pessimists say that AI is going to be so much better than humans that it will completely replace humans sooner or later.
News flash: At the moment, it looks like the AI pessimists are right.
The big chart at the end of this post provides a comprehensive overview of the change in task performance of humans that use AI tools vs. two different benchmarks. The right-hand chart shows the improvement (and it almost always is a significant improvement) of humans that use AI tools vs. humans without the help of AI.
Across all 370 studies, the average effect size, measured as Hedges’ g, is a pretty hefty 0.63. For the uninitiated, Hedges’ g measures the performance difference between two setups relative to the standard deviation of performance between subjects or trials. As a rule of thumb, a Hedges’ g below 0.2 is hardly noticeable, while a Hedges’ g of around 0.5 is large enough to significantly improve your life in terms of time saved or increased quality of output. Very large effects are in the order of 0.75 or higher.
The improvement from AI augmentation is particularly large for numerical tasks (0.91) but also quite large for creative tasks (0.52) and decision tasks (0.65). the bottom two rows show, however, that the key to improved performance is integration. When labour was divided between the humans and the AI, the gains were much smaller than when the AI was integrated with the task with no separation of labour.
Unfortunately, comparing AI-augmented human performance with human performance ex AI is a bit of a skewed benchmark. It’s like comparing the portfolio of a stock/bond portfolio against a benchmark of only bonds. You outperform most of the time simply because equities generally have higher returns than bonds.
Besides, the real challenge we white collar workers face is whether AI is going to replace us simply because it is so much better (and cheaper and doesn’t go on vacations, etc.) than we are. This is where the left-hand chart comes in.
It compares the combination of humans and AI with the better of AI or humans. If you look at the second and third row of the chart you see that out of the 370 experiments, AI performed better than humans alone in 249 experiments while humans did better in 121 experiments. And that is as of mid-2023 and ignores all the advances made since then.
Side note: If you want to know how good the cutting edge of generative AI models has become, read this excellent article by Ethan Mollick. Honestly, it will blow your mind.
If AI is already better than humans in two out of three cases, can the combination of humans and AI give us even better results?
Unfortunately, not. The combination of humans and AI dilutes the power of AI and is on average slightly worse than the better of AI or humans (Hedges’ g of -0.23). When AI is anyway better than humans, the performance reduction of the human-AI team vs. AI alone is meaningful (-0.54). It is only when humans alone are better than AI alone that the combination of humans and AI create an even better output.
From the point of view of a business, it thus seems as if it is best to test in a company at which tasks the AI is better than humans and replace humans with machines if the AI wins but give humans some AI tools to use if the humans win. And then repeat this experiment regularly to take advantage of technological progress in AI to replace more and more of your workforce with machines. Good times…
Combination of human and AI vs. better of human and AI (left) or human (right) benchmark
Source: Vaccaro et al. (2024)
This text presents a compelling yet somewhat one-sided view of the impact of generative AI on human work and productivity. While it draws on empirical data from a Nature Human Behaviour study and uses a structured argument, there are several areas where its reasoning can be challenged or at least contextualized more carefully.
Strengths:
Strong Empirical Basis – The text cites a large-scale meta-analysis (370 studies) and uses Hedges’ g, a recognized statistical measure, to support its claims about AI’s performance. This gives its conclusions a quantitative foundation rather than relying on anecdotal evidence.
Clear and Engaging Style – The writing is direct, engaging, and persuasive, making complex data more accessible to a general audience.
Recognition of AI’s Strengths – It effectively highlights the increasing dominance of AI in both numerical and creative tasks and acknowledges that AI-augmented human performance often exceeds human performance alone.
Weaknesses and Critique:
Overgeneralization of AI Superiority – The text states that "AI pessimists are right" because AI outperforms humans in 249 out of 370 experiments. However, this conclusion lacks nuance. The fact that AI is superior in certain tasks does not mean it will fully replace humans in those domains. AI’s capabilities depend on task complexity, interpretability, ethical considerations, and human trust in automated systems—factors that the analysis does not address.
Narrow Focus on Efficiency Over Other Considerations – The text assumes that business decisions about AI adoption should be based solely on performance metrics. However, human-AI collaboration may have benefits beyond immediate efficiency gains, such as improving explainability, fairness, and user trust in AI systems. These qualitative factors are ignored.
Flawed Comparison in Human-AI Collaboration Analysis – The assertion that human-AI collaboration "dilutes the power of AI" and is "on average slightly worse than the better of AI or humans" is misleading. The phrasing implies that hybrid collaboration is inherently inefficient, but this could depend on how AI is integrated. If collaboration is poorly designed or lacks proper training, suboptimal results are expected. Additionally, the -0.23 Hedges’ g difference is small and does not justify an absolute dismissal of human-AI collaboration.
Dismissive Attitude Toward Workforce Displacement – The final paragraph bluntly suggests that businesses should "replace humans with machines" whenever AI is better. This techno-deterministic approach ignores ethical, legal, and social concerns about automation. A more balanced perspective would consider job redesign, upskilling, and AI governance frameworks rather than outright replacement.
Missing Considerations:
Task Suitability: Not all tasks can or should be automated. AI lacks reasoning, moral judgment, and adaptability in unpredictable situations.
Ethical and Social Costs: Job displacement has significant social consequences. The piece does not address how businesses should manage transitions for workers.
Long-Term Viability: The assumption that AI will continually replace human labor at an accelerating rate may not hold if regulatory, economic, or technical constraints slow adoption.
Conclusion:
While the text presents a persuasive case for AI’s rapid adoption and superiority in many tasks, it leans too heavily on a binary "AI vs. humans" framework rather than exploring more nuanced scenarios where AI and human expertise complement each other. A more balanced approach would acknowledge AI's limitations, the importance of human oversight, and the broader economic and ethical implications of mass automation.
^The preceding was written entirely by ChatGPT. I think our role as humans will increasingly be to sit by and watch the robots fight it out. https://despair.com/products/motivation .
**My Point of View:**
I have been a heavy user of AI since the release of the first version of ChatGPT.
At the moment, I consider Claude by Anthropic to be the best on the market (I have the Pro subscription).
If someone were to ask me what the real value of AI is today, I would make some distinctions:
Many of the jobs that are progressively disappearing—and will continue to disappear—would have done so even without AI, simply through a better organization of workflows via digitization.
In Italy, we’ve had electronic invoicing for five years now, which was an avant-garde move in Europe.
A simple XML file allows all data to flow into an interchange system and automatically generates corporate accounting by consolidating all collected information.
Let me repeat: you don’t need AI for this; traditional IT systems suffice.
What has been missing until now is the political will to take the necessary steps to reach the logical endpoint, which would result in cutting a large number of jobs.
One such step would be the elimination of cash, which could solve a series of issues in places like Italy—but let’s be honest, also in the USA, Russia, and Germany, with their less-than-transparent dealings in laundering hubs like Cyprus.
But we’re getting there anyway, at a steady pace.
In short: with or without AI, many white-collar workers would have lost their jobs—or never found them in the first place—with the same skill sets that once allowed them to sit comfortably at a desk.
Now, about the actual use of AI:
It works great in the field of entertainment—the kind of entertainment that often rhymes with time-wasting for the younger generations.
And it works exceptionally well as an assistant for those who already know what they’re doing.
The key point is that AI can write you a program or solve a math problem, but it doesn’t know what to do with its results. For now, that still requires human intervention.
P.s. text translated from Italian using Alibaba AI "QWEN".