As natural language processing technology advances, more and more hedge funds use sentiment analysis based on natural language processing to make split second decisions on stocks and other assets. Now, two researchers from the University of Zurich have shown how you can manipulate these algorithms with an adversarial attack.
Aysun Can Turetken and Markus Leippold used the Financial Phrase Base and other widely used training datasets to teach chatGPT and BERT how to interpret the sentiment in typical phrases and expressions used in financial news. The result are two small language models specialised and trained to read financial news and classify it into positive, neutral, or negative sentiment.
Then they tried to ‘hack’ these automated sentiment models by changing the sentences ever so slightly. If done right, they were able to flip the assessment of the small language models from neutral to positive, from neutral to negative or even from positive to negative. The chart below shows the average results for this financial GPT model for the three datasets used.
Share of assessments changed due to hack
Source: Turetken and Leippold (2025)
Typically, the hacks could flip the sentiment assessment in c.40% of the cases and reduce the accuracy of the sentiment assessment by about 20%. In other words, algorithms that use such models to make money would see a large decline in profitability (and potentially a complete loss of profitability).
If you want to know how big, or rather how small, these changes in sentences need to be to trigger a different assessment by the sentiment models, here are some examples.
Examples of identified vulnerabilities
Source: Turetken and Leippold (2025)
In practice, it is difficult to ‘hack’ a hedge fund algorithm using these natural language sentiment models because they are typically fed with real life Twitter data or news feeds from official sources. Hacking these models would then require hacking into the data feed of these hedge funds.
More practically, though, what this experiment shows is that these algorithmically driven models are incredibly sensitive to small changes in the language used by a journalist at Bloomberg, the Wall Street Journal, the FT, etc. The room for error is large and I think this is what many investors currently don’t appreciate enough.
And then we must take into account that more and more financial news is created by large language models themselves. Nowadays, if you read a Bloomberg news article on the terminal you automatically get an AI summary of the text at the top. And as far as I can tell, nobody is checking the content of these AI summaries before they are published. The result is that AI natural language models are increasingly reacting to AI-written news items. And if that doesn’t go horribly wrong at some point, I will eat my hair.
Dan Davies most recent Substack post was on the Genius Act which introduces stable coins into the banking system. These lines from his post stuck with me. We seem to be introducing trust problems everywhere at the moment. Tim Harford wrote a piece for the Financial Times recently on the value of boring systems. We are wildly underestimating the value of boring these days.
It’s ubiquitous problem in banking – just trusting your counterparty isn’t enough, you have to be confident you can trust everyone they trust, reasonably confident that you can trust everyone who everyone they trust trusts, and even the fourth and fifth degrees are potentially material.
One thing that I've always observed is that investors say that stocks either "go up" or "go down" ... and if the move is especially pronounced, they might add "a lot".
In the financial press, stocks only do two things: "Plunge", or "soar".
I wonder how AI algorithms are going to parse out that dichotomy ;-)